Skip to content

Doc-transform: Implement document transformation #306

@kartpop

Description

@kartpop

Is your feature request related to a problem? Please describe.
Partners often deal with scanned PDFs or document formats which are not very amenable with AI services like OpenAI’s vector stores. This increases the chances of poor RAG performance and may limit the platform’s utility.

Describe the solution you'd like
Build the foundational document transformation pipeline for the AI Platform. The primary goal is to enhance the /documents/upload endpoint to support on-demand, pluggable document conversion. This will allow users to prepare documents for optimal RAG performance directly within the platform by simplifying the conversion of documents to LLM-friendly formats, eg. pdf (i.e. OCR images encoded in pdf)→ markdown.

Reference doc - Document management and transformation

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions