Feature/doc transform by kartpop · Pull Request #332 · ProjectTech4DevAI/kaapi-backend

kartpop · 2025-08-13T01:22:28Z

Summary

Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- Upload documents with optional format conversion; transformation requests return 202 with job details and status URL.
- Background processing for document transformations with progress tracking.
- Transformed documents are linked to their original source.
API
- Added endpoints to fetch one or multiple transformation jobs by ID.
- Enhanced document upload to accept target_format and optional transformer.
Chores
- Increased minimum Python version to 3.11.

…document_id field

…ation

…h; implement document transformation job handling

…date target format and return job info

…Job table

…former for testing purposes

…etection, validate transformations, and improve transformer resolution logic.

…nd multiple IDs

coderabbitai · 2025-08-13T01:22:36Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds document transformation capability: new models and migrations, CRUD helpers, FastAPI routes for starting and querying jobs, a background service to execute transformations, and a transformer framework (abstract base, registry, test and Zerox-backed implementations). Upload endpoint now optionally triggers async transformation jobs. Dependency and Python version updated.

Changes

Cohort / File(s)	Summary
Database migrations `backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py`, `backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py`	Add document.source_document_id (self-FK); create doc_transformation_job table with status enum and FKs; adjust/downgrade related FKs.
Models `backend/app/models/document.py`, `backend/app/models/doc_transformation_job.py`, `backend/app/models/__init__.py`	Add Document.source_document_id; introduce DocTransformationJob and TransformationStatus; export model via package init.
CRUD `backend/app/crud/doc_transformation_job.py`	CRUD for transformation jobs: create, read, update status, list.
API routes – transformations `backend/app/api/routes/doc_transformation_job.py`	New endpoints to fetch one or multiple transformation jobs by ID.
API routes – documents `backend/app/api/routes/documents.py`	Upload route becomes async; accepts target_format/transformer; starts background transformation job; returns 202 with job info when applicable.
Doctransform core `backend/app/core/doctransform/transformer.py`, `.../zerox_transformer.py`, `.../test_transformer.py`, `.../registry.py`, `.../service.py`	Define Transformer base; add Zerox and test transformers; registry for formats/transformers and conversion; background job orchestration with retries, storage I/O, and record updates.
Project config `backend/pyproject.toml`	Require Python >=3.11; add py-zerox dependency.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API as Documents API
  participant Service as Transform Service
  participant DB
  participant Storage

  Client->>API: POST /documents/upload (file, target_format, transformer?)
  API->>DB: Create DocTransformationJob (PENDING)
  API->>Service: Schedule execute_job(user_id, job_id, transformer, target_format)
  API-->>Client: 202 Accepted (job_id, status_check_url, source doc info)

  Service->>DB: Mark job PROCESSING
  Service->>DB: Load source Document
  Service->>Storage: Download source file
  Service->>Service: convert_document(source, transformer)
  Service->>Storage: Upload transformed file
  Service->>DB: Create transformed Document (source_document_id set)
  Service->>DB: Mark job COMPLETED (transformed_document_id)

sequenceDiagram
  participant Client
  participant API as Transform Jobs API
  participant DB

  Client->>API: GET /documents/transformations/{job_id}
  API->>DB: Read job by id
  DB-->>API: Job record
  API-->>Client: 200 OK (job)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

enhancement

Suggested reviewers

AkhileshNegi
avirajsingh7

Poem

I thump my paws on server logs,
A hop, a skip—convert the docs!
From PDF to text I leap,
While background tasks quietly creep.
Job IDs sprout like clover green,
Transform complete—so crisp, so clean! 🥕🐇

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/doc-transform

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 16

🔭 Outside diff range comments (1)

backend/app/core/doctransform/zerox_transformer.py (1)
40-40: Remove trailing empty line at end of file.

The pipeline failure indicates that the end-of-file-fixer modified this file. There's an unnecessary empty line at the end.
-            ) from e
-
+            ) from e

🧹 Nitpick comments (24)

backend/app/core/doctransform/test_transformer.py (1)
9-13: Silence Ruff ARG001 by marking the parameter intentionally unused.

Ruff has ARG001 enabled; the method parameter isn’t used. Rename it to underscore-prefixed to indicate intent.
-    def transform(self, input_path: Path) -> str:
+    def transform(self, _input_path: Path) -> str:
         return (
             "Lorem ipsum dolor sit amet, consectetur adipiscing elit, "
             "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
         )
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (2)
10-10: Remove unused import (sqlmodel.sql.sqltypes).

This import isn’t used.
-import sqlmodel.sql.sqltypes
22-23: Optional: add an index on source_document_id to improve query performance.

If you plan to query by source_document_id (e.g., fetching derived documents), an index will help.
 def upgrade():
@@
-    op.add_column('document', sa.Column('source_document_id', sa.Uuid(), nullable=True))
-    op.create_foreign_key(
+    op.add_column('document', sa.Column('source_document_id', sa.Uuid(), nullable=True))
+    op.create_index('ix_document_source_document_id', 'document', ['source_document_id'])
+    op.create_foreign_key(
         'document_source_document_id_fkey',
         'document',
         'document',
         ['source_document_id'],
         ['id'],
         ondelete='SET NULL',
     )
@@
 def downgrade():
@@
-    op.drop_constraint('document_source_document_id_fkey', 'document', type_='foreignkey')
-    op.drop_column('document', 'source_document_id')
+    op.drop_constraint('document_source_document_id_fkey', 'document', type_='foreignkey')
+    op.drop_index('ix_document_source_document_id', table_name='document')
+    op.drop_column('document', 'source_document_id')
Also applies to: 31-34
backend/app/models/__init__.py (1)
6-6: Export DocTransformationJob in __all__ to clarify public API.

The static analysis tool correctly identifies that DocTransformationJob is imported but not explicitly used in this module. Since this is an __init__.py file that serves as a public API interface, you should either add it to an __all__ list to make the export explicit, or use it somewhere in the module.

Consider adding an __all__ list at the end of the file to explicitly define the public API:
__all__ = [
    # Auth
    "Token", "TokenPayload",
    # Collections
    "Collection", "DocumentCollection",
    # Documents
    "Document", "DocTransformationJob",
    # ... (other exports)
]
backend/app/core/doctransform/zerox_transformer.py (1)
1-5: Fix import order according to PEP 8.

The imports should be ordered according to PEP 8: standard library imports first, then third-party imports, then local imports.
-from asyncio import Runner
-import logging
-from pathlib import Path
-from .transformer import Transformer
-from pyzerox import zerox
+import logging
+from asyncio import Runner
+from pathlib import Path
+
+from pyzerox import zerox
+
+from .transformer import Transformer
backend/app/api/routes/doc_transformation_job.py (2)
3-6: Use modern type hints instead of deprecated typing.List.

The static analysis correctly identifies that typing.List is deprecated in favor of the built-in list type (available from Python 3.9+).
-from typing import List
-from fastapi import APIRouter
-from fastapi import Path as FastPath
-from fastapi import Query
+from fastapi import APIRouter, Query
+from fastapi import Path as FastPath
Then update Line 40:
-job_id_list: List[UUID] = [UUID(jid.strip()) for jid in job_ids.split(",") if jid.strip()]
+job_id_list: list[UUID] = [UUID(jid.strip()) for jid in job_ids.split(",") if jid.strip()]
41-43: Import HTTPException at module level.

HTTPException is imported inside the exception handler. This should be imported at the module level for consistency and better readability.

Add to imports at the top:
 from uuid import UUID
-from fastapi import APIRouter
+from fastapi import APIRouter, HTTPException
 from fastapi import Path as FastPath
Then remove the import from Line 42:
 except Exception:
-    from fastapi import HTTPException
     raise HTTPException(status_code=400, detail="Invalid job_ids format. Must be comma-separated UUIDs.")
backend/app/api/routes/documents.py (3)
3-3: Use modern type hints instead of deprecated typing imports.

Replace deprecated typing.List with built-in list type.
-from typing import List, Optional
+from typing import Optional
And update Line 32:
-    response_model=APIResponse[List[Document]],
+    response_model=APIResponse[list[Document]],
54-56: Use modern union type syntax for optional parameters.

Use X | None syntax instead of Optional[X] for Python 3.10+.
-    target_format: Optional[str] = Form(None),
-    transformer: Optional[str] = Form(None),
+    target_format: str | None = Form(None),
+    transformer: str | None = Form(None),
110-123: Consider returning job status information in a standardized format.

The response structure mixes document metadata with transformation job details. Consider using a more structured response format that clearly separates these concerns.
 # Compose response with full document metadata and job info
 response_data = {
-    "message": f"Document accepted for transformation from {source_format} to {target_format}.",
-    "original_document": APIResponse.success_response(source_document).data,
-    "transformation_job_id": str(job_id),
-    "source_format": source_format,
-    "target_format": target_format,
-    "transformer": actual_transformer,
-    "status_check_url": f"/documents/transformations/{job_id}"
+    "document": source_document,
+    "transformation": {
+        "job_id": str(job_id),
+        "status": "PENDING",
+        "source_format": source_format,
+        "target_format": target_format,
+        "transformer": actual_transformer,
+        "status_url": f"/api/v1/documents/transformations/{job_id}"
+    },
+    "message": f"Document uploaded successfully. Transformation from {source_format} to {target_format} has been queued."
 }
backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py (1)
22-33: Consider adding indexes for foreign key columns.

The foreign key columns source_document_id and transformed_document_id will likely be used in queries. Adding indexes would improve query performance.

Consider adding indexes after creating the table:
 def upgrade():
     # ### commands auto generated by Alembic - please adjust! ###
     op.create_table('doc_transformation_job',
     sa.Column('id', sa.Uuid(), nullable=False),
     sa.Column('source_document_id', sa.Uuid(), nullable=False),
     sa.Column('transformed_document_id', sa.Uuid(), nullable=True),
     sa.Column('status', sa.Enum('PENDING', 'PROCESSING', 'COMPLETED', 'FAILED', name='transformationstatus'), nullable=False),
     sa.Column('error_message', sqlmodel.sql.sqltypes.AutoString(), nullable=True),
     sa.Column('created_at', sa.DateTime(), nullable=False),
     sa.Column('updated_at', sa.DateTime(), nullable=False),
     sa.ForeignKeyConstraint(['source_document_id'], ['document.id'], ),
     sa.ForeignKeyConstraint(['transformed_document_id'], ['document.id'], ),
     sa.PrimaryKeyConstraint('id')
     )
+    # Add indexes for foreign keys and status for better query performance
+    op.create_index('ix_doc_transformation_job_source_document_id', 'doc_transformation_job', ['source_document_id'])
+    op.create_index('ix_doc_transformation_job_status', 'doc_transformation_job', ['status'])
     # ### end Alembic commands ###
And update downgrade accordingly:
 def downgrade():
     # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_index('ix_doc_transformation_job_status', 'doc_transformation_job')
+    op.drop_index('ix_doc_transformation_job_source_document_id', 'doc_transformation_job')
     op.drop_table('doc_transformation_job')
+    op.execute("DROP TYPE IF EXISTS transformationstatus")
     # ### end Alembic commands ###
backend/app/models/doc_transformation_job.py (1)
14-23: Consider using modern Python type hints and apply formatting fixes.

The model structure is well-designed with appropriate field defaults and foreign key relationships. However, there are formatting issues and opportunities to modernize the type hints.

Apply these formatting and type annotation updates:
 class DocTransformationJob(SQLModel, table=True):
     __tablename__ = "doc_transformation_job"
 
     id: UUID = Field(default_factory=uuid4, primary_key=True)
     source_document_id: UUID = Field(foreign_key="document.id")
-    transformed_document_id: Optional[UUID] = Field(default=None, foreign_key="document.id")
+    transformed_document_id: UUID | None = Field(default=None, foreign_key="document.id")
     status: TransformationStatus = Field(default=TransformationStatus.PENDING)
-    error_message: Optional[str] = Field(default=None)
+    error_message: str | None = Field(default=None)
     created_at: datetime = Field(default_factory=now)
     updated_at: datetime = Field(default_factory=now)
Also ensure the file ends with a newline character to comply with the formatting standards detected by the CI pipeline.
backend/app/crud/doc_transformation_job.py (3)
1-10: Use modern Python type hints.

The imports are appropriate and the logger setup follows best practices. Consider modernizing the type hints.

Apply this diff to modernize type hints:
 import logging
 from uuid import UUID
-from typing import List, Optional
 from sqlmodel import Session, select
 from app.models.doc_transformation_job import DocTransformationJob, TransformationStatus
 from app.core.util import now
 from app.core.exception_handlers import HTTPException
29-48: Update type hints and consider simplifying the update logic.

The method correctly updates job status with optional fields. Consider modernizing type hints and a minor simplification.

Apply this diff to modernize type hints and simplify:
     def update_status(
         self,
         job_id: UUID,
         status: TransformationStatus,
         *,
-        error_message: Optional[str] = None,
-        transformed_document_id: Optional[UUID] = None,
+        error_message: str | None = None,
+        transformed_document_id: UUID | None = None,
     ) -> DocTransformationJob:
         job = self.read_one(job_id)
         job.status = status
         job.updated_at = now()
         if error_message is not None:
             job.error_message = error_message
         if transformed_document_id is not None:
             job.transformed_document_id = transformed_document_id
 
         self.session.add(job)
         self.session.commit()
         self.session.refresh(job)
         return job
50-52: Update return type annotation and fix formatting.

The pagination logic is correct. Update the type hint to use modern Python syntax.

Apply this diff to modernize the type hint:
-    def read_many(self, skip: int = 0, limit: int = 100) -> List[DocTransformationJob]:
+    def read_many(self, skip: int = 0, limit: int = 100) -> list[DocTransformationJob]:
         statement = select(DocTransformationJob).offset(skip).limit(limit)
         return self.session.exec(statement).all()
Also ensure the file ends with a newline character.
backend/app/core/doctransform/service.py (2)
23-40: Fix whitespace issues but the implementation is solid.

The function correctly creates a job and schedules the background task. Good practice extracting the user ID before passing to the background task.

Remove trailing whitespace from line 35:
     job = job_crud.create(source_document_id=source_document_id)
     logger.debug(f"Job created | job_id={job.id}")
-    
+
     # Extract the user ID before passing to background task
101-111: Consider using a dataclass or named tuple for FileUpload.

The inline class definition works but could be better organized.

Consider moving this to a module-level definition or using a dataclass:
from dataclasses import dataclass
from typing import BinaryIO

@dataclass
class FileUpload:
    filename: str
    file: BinaryIO
    content_type: str
backend/app/core/doctransform/registry.py (7)
1-6: Modernize type hints.

Update the import statements to use modern Python type hints.

Apply this diff:
 from pathlib import Path
-from typing import Type, Dict, Set, Tuple, Optional
+from typing import Optional
Then update all type annotations throughout the file to use built-in types (e.g., dict, set, tuple, type) instead of their typing module equivalents.

12-16: Update type annotations.

The transformer registry is well-structured with sensible defaults.

Update the type annotation:
 # Map transformer names to their classes
-TRANSFORMERS: Dict[str, Type[Transformer]] = {
+TRANSFORMERS: dict[str, type[Transformer]] = {
     "default": ZeroxTransformer,
     "test": TestTransformer,
     "zerox": ZeroxTransformer,
 }
19-27: Update type annotations and fix trailing whitespace.

The transformation mapping structure is logical and extensible.

Apply these fixes:
 # Define supported transformations: (source_format, target_format) -> [available_transformers]
-SUPPORTED_TRANSFORMATIONS: Dict[Tuple[str, str], Dict[str, str]] = {
+SUPPORTED_TRANSFORMATIONS: dict[tuple[str, str], dict[str, str]] = {
     ("pdf", "markdown"): {
         "default": "zerox",
         "zerox": "zerox",
     },
     # Future transformations can be added here
     # ("docx", "markdown"): {"default": "pandoc", "pandoc": "pandoc"},
     # ("html", "markdown"): {"default": "pandoc", "pandoc": "pandoc"},
 }
30-39: Fix trailing whitespace.

The extension mapping is comprehensive.

Remove trailing whitespace from line 33:
     ".pdf": "pdf",
     ".docx": "docx",
-    ".doc": "doc", 
+    ".doc": "doc",
     ".html": "html",
59-64: Update type annotations and fix trailing whitespace.

The function correctly transforms the data structure.

Apply these fixes:
-def get_supported_transformations() -> Dict[Tuple[str, str], Set[str]]:
+def get_supported_transformations() -> dict[tuple[str, str], set[str]]:
     """Get all supported transformation combinations."""
     return {
-        key: set(transformers.keys()) 
+        key: set(transformers.keys())
         for key, transformers in SUPPORTED_TRANSFORMATIONS.items()
     }
70-72: Update type annotation.

Simple and effective lookup function.
-def get_available_transformers(source_format: str, target_format: str) -> Dict[str, str]:
+def get_available_transformers(source_format: str, target_format: str) -> dict[str, str]:
     """Get available transformers for a specific transformation."""
     return SUPPORTED_TRANSFORMATIONS.get((source_format, target_format), {})
74-96: Fix whitespace issues and update type annotation.

Good validation logic with helpful error messages.

Apply these fixes:
-def resolve_transformer(source_format: str, target_format: str, transformer_name: Optional[str] = None) -> str:
+def resolve_transformer(source_format: str, target_format: str, transformer_name: str | None = None) -> str:
     """
     Resolve the actual transformer to use for a transformation.
     Returns the transformer name to use.
     """
     available_transformers = get_available_transformers(source_format, target_format)
-    
+
     if not available_transformers:
         raise ValueError(
             f"Transformation from {source_format} to {target_format} is not supported"
         )
-    
+
     if transformer_name is None:
         transformer_name = "default"
-    
+
     if transformer_name not in available_transformers:
         available = ", ".join(available_transformers.keys())
         raise ValueError(
             f"Transformer '{transformer_name}' not available for {source_format} to {target_format}. "
             f"Available: {available}"
         )
-    
+
     return available_transformers[transformer_name]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db872b9 and 7d78cdb.

⛔ Files ignored due to path filters (1)

backend/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (1 hunks)
backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py (1 hunks)
backend/app/api/routes/doc_transformation_job.py (1 hunks)
backend/app/api/routes/documents.py (2 hunks)
backend/app/core/doctransform/registry.py (1 hunks)
backend/app/core/doctransform/service.py (1 hunks)
backend/app/core/doctransform/test_transformer.py (1 hunks)
backend/app/core/doctransform/transformer.py (1 hunks)
backend/app/core/doctransform/zerox_transformer.py (1 hunks)
backend/app/crud/doc_transformation_job.py (1 hunks)
backend/app/models/__init__.py (1 hunks)
backend/app/models/doc_transformation_job.py (1 hunks)
backend/app/models/document.py (2 hunks)
backend/pyproject.toml (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (13)

backend/app/core/doctransform/transformer.py (1)

backend/app/core/doctransform/zerox_transformer.py (1)

transform (15-38)

backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py (2)

backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (2)

upgrade (20-26)

downgrade (29-35)

backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)

upgrade (20-36)

backend/app/models/document.py (3)

backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)

upgrade (20-36)

backend/app/models/document_collection.py (1)

DocumentCollection (9-23)

backend/app/models/user.py (1)

User (48-60)

backend/app/core/doctransform/test_transformer.py (2)

backend/app/core/doctransform/transformer.py (1)

Transformer (4-12)

backend/app/core/doctransform/zerox_transformer.py (1)

transform (15-38)

backend/app/core/doctransform/zerox_transformer.py (2)

backend/app/core/doctransform/transformer.py (2)

Transformer (4-12)

transform (8-12)

backend/app/core/doctransform/test_transformer.py (1)

transform (9-13)

backend/app/api/routes/doc_transformation_job.py (3)

backend/app/crud/doc_transformation_job.py (1)

DocTransformationJobCrud (11-52)

backend/app/utils.py (2)

APIResponse (27-48)

success_response (34-37)

backend/app/api/routes/assistants.py (1)

update_assistant_route (74-93)

backend/app/crud/doc_transformation_job.py (1)

backend/app/models/doc_transformation_job.py (2)

DocTransformationJob (14-23)

TransformationStatus (8-12)

backend/app/models/doc_transformation_job.py (2)

backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)

upgrade (20-36)

backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)

upgrade (21-73)

backend/app/models/__init__.py (1)

backend/app/models/doc_transformation_job.py (1)

DocTransformationJob (14-23)

backend/app/core/doctransform/service.py (6)

backend/app/crud/doc_transformation_job.py (4)

DocTransformationJobCrud (11-52)

create (15-20)

update_status (29-48)

read_one (22-27)

backend/app/crud/document.py (1)

DocumentCrud (14-133)

backend/app/models/document.py (1)

Document (11-36)

backend/app/models/doc_transformation_job.py (1)

TransformationStatus (8-12)

backend/app/core/cloud/storage.py (1)

AmazonCloudStorage (121-206)

backend/app/core/doctransform/registry.py (1)

convert_document (98-114)

backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (3)

backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)

upgrade (20-36)

backend/app/alembic/versions/e9dd35eff62c_add_openai_conversation_table.py (1)

upgrade (20-70)

backend/app/alembic/versions/66abc97f3782_user_id_from_uuid_to_int.py (1)

upgrade (20-152)

backend/app/core/doctransform/registry.py (3)

backend/app/core/doctransform/transformer.py (2)

Transformer (4-12)

transform (8-12)

backend/app/core/doctransform/test_transformer.py (2)

TestTransformer (4-13)

transform (9-13)

backend/app/core/doctransform/zerox_transformer.py (2)

ZeroxTransformer (7-38)

transform (15-38)

backend/app/api/routes/documents.py (5)

backend/app/crud/document.py (2)

DocumentCrud (14-133)

update (97-122)

backend/app/models/document.py (1)

Document (11-36)

backend/app/core/cloud/storage.py (3)

AmazonCloudStorage (121-206)

put (114-115)

put (126-151)

backend/app/core/doctransform/registry.py (4)

get_file_format (51-57)

is_transformation_supported (66-68)

get_available_transformers (70-72)

resolve_transformer (74-96)

backend/app/core/doctransform/service.py (1)

start_job (23-40)

🪛 GitHub Actions: AI Platform CI

backend/app/core/doctransform/transformer.py