Feature/doc transform#332
Conversation
…document_id field
…h; implement document transformation job handling
…date target format and return job info
…former for testing purposes
…etection, validate transformations, and improve transformer resolution logic.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds document transformation capability: new models and migrations, CRUD helpers, FastAPI routes for starting and querying jobs, a background service to execute transformations, and a transformer framework (abstract base, registry, test and Zerox-backed implementations). Upload endpoint now optionally triggers async transformation jobs. Dependency and Python version updated. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Documents API
participant Service as Transform Service
participant DB
participant Storage
Client->>API: POST /documents/upload (file, target_format, transformer?)
API->>DB: Create DocTransformationJob (PENDING)
API->>Service: Schedule execute_job(user_id, job_id, transformer, target_format)
API-->>Client: 202 Accepted (job_id, status_check_url, source doc info)
Service->>DB: Mark job PROCESSING
Service->>DB: Load source Document
Service->>Storage: Download source file
Service->>Service: convert_document(source, transformer)
Service->>Storage: Upload transformed file
Service->>DB: Create transformed Document (source_document_id set)
Service->>DB: Mark job COMPLETED (transformed_document_id)
sequenceDiagram
participant Client
participant API as Transform Jobs API
participant DB
Client->>API: GET /documents/transformations/{job_id}
API->>DB: Read job by id
DB-->>API: Job record
API-->>Client: 200 OK (job)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 16
🔭 Outside diff range comments (1)
backend/app/core/doctransform/zerox_transformer.py (1)
40-40: Remove trailing empty line at end of file.The pipeline failure indicates that the end-of-file-fixer modified this file. There's an unnecessary empty line at the end.
- ) from e - + ) from e
🧹 Nitpick comments (24)
backend/app/core/doctransform/test_transformer.py (1)
9-13: Silence Ruff ARG001 by marking the parameter intentionally unused.Ruff has ARG001 enabled; the method parameter isn’t used. Rename it to underscore-prefixed to indicate intent.
- def transform(self, input_path: Path) -> str: + def transform(self, _input_path: Path) -> str: return ( "Lorem ipsum dolor sit amet, consectetur adipiscing elit, " "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." )backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (2)
10-10: Remove unused import (sqlmodel.sql.sqltypes).This import isn’t used.
-import sqlmodel.sql.sqltypes
22-23: Optional: add an index on source_document_id to improve query performance.If you plan to query by source_document_id (e.g., fetching derived documents), an index will help.
def upgrade(): @@ - op.add_column('document', sa.Column('source_document_id', sa.Uuid(), nullable=True)) - op.create_foreign_key( + op.add_column('document', sa.Column('source_document_id', sa.Uuid(), nullable=True)) + op.create_index('ix_document_source_document_id', 'document', ['source_document_id']) + op.create_foreign_key( 'document_source_document_id_fkey', 'document', 'document', ['source_document_id'], ['id'], ondelete='SET NULL', ) @@ def downgrade(): @@ - op.drop_constraint('document_source_document_id_fkey', 'document', type_='foreignkey') - op.drop_column('document', 'source_document_id') + op.drop_constraint('document_source_document_id_fkey', 'document', type_='foreignkey') + op.drop_index('ix_document_source_document_id', table_name='document') + op.drop_column('document', 'source_document_id')Also applies to: 31-34
backend/app/models/__init__.py (1)
6-6: ExportDocTransformationJobin__all__to clarify public API.The static analysis tool correctly identifies that
DocTransformationJobis imported but not explicitly used in this module. Since this is an__init__.pyfile that serves as a public API interface, you should either add it to an__all__list to make the export explicit, or use it somewhere in the module.Consider adding an
__all__list at the end of the file to explicitly define the public API:__all__ = [ # Auth "Token", "TokenPayload", # Collections "Collection", "DocumentCollection", # Documents "Document", "DocTransformationJob", # ... (other exports) ]backend/app/core/doctransform/zerox_transformer.py (1)
1-5: Fix import order according to PEP 8.The imports should be ordered according to PEP 8: standard library imports first, then third-party imports, then local imports.
-from asyncio import Runner -import logging -from pathlib import Path -from .transformer import Transformer -from pyzerox import zerox +import logging +from asyncio import Runner +from pathlib import Path + +from pyzerox import zerox + +from .transformer import Transformerbackend/app/api/routes/doc_transformation_job.py (2)
3-6: Use modern type hints instead of deprecatedtyping.List.The static analysis correctly identifies that
typing.Listis deprecated in favor of the built-inlisttype (available from Python 3.9+).-from typing import List -from fastapi import APIRouter -from fastapi import Path as FastPath -from fastapi import Query +from fastapi import APIRouter, Query +from fastapi import Path as FastPathThen update Line 40:
-job_id_list: List[UUID] = [UUID(jid.strip()) for jid in job_ids.split(",") if jid.strip()] +job_id_list: list[UUID] = [UUID(jid.strip()) for jid in job_ids.split(",") if jid.strip()]
41-43: Import HTTPException at module level.HTTPException is imported inside the exception handler. This should be imported at the module level for consistency and better readability.
Add to imports at the top:
from uuid import UUID -from fastapi import APIRouter +from fastapi import APIRouter, HTTPException from fastapi import Path as FastPathThen remove the import from Line 42:
except Exception: - from fastapi import HTTPException raise HTTPException(status_code=400, detail="Invalid job_ids format. Must be comma-separated UUIDs.")backend/app/api/routes/documents.py (3)
3-3: Use modern type hints instead of deprecatedtypingimports.Replace deprecated
typing.Listwith built-inlisttype.-from typing import List, Optional +from typing import OptionalAnd update Line 32:
- response_model=APIResponse[List[Document]], + response_model=APIResponse[list[Document]],
54-56: Use modern union type syntax for optional parameters.Use
X | Nonesyntax instead ofOptional[X]for Python 3.10+.- target_format: Optional[str] = Form(None), - transformer: Optional[str] = Form(None), + target_format: str | None = Form(None), + transformer: str | None = Form(None),
110-123: Consider returning job status information in a standardized format.The response structure mixes document metadata with transformation job details. Consider using a more structured response format that clearly separates these concerns.
# Compose response with full document metadata and job info response_data = { - "message": f"Document accepted for transformation from {source_format} to {target_format}.", - "original_document": APIResponse.success_response(source_document).data, - "transformation_job_id": str(job_id), - "source_format": source_format, - "target_format": target_format, - "transformer": actual_transformer, - "status_check_url": f"/documents/transformations/{job_id}" + "document": source_document, + "transformation": { + "job_id": str(job_id), + "status": "PENDING", + "source_format": source_format, + "target_format": target_format, + "transformer": actual_transformer, + "status_url": f"/api/v1/documents/transformations/{job_id}" + }, + "message": f"Document uploaded successfully. Transformation from {source_format} to {target_format} has been queued." }backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py (1)
22-33: Consider adding indexes for foreign key columns.The foreign key columns
source_document_idandtransformed_document_idwill likely be used in queries. Adding indexes would improve query performance.Consider adding indexes after creating the table:
def upgrade(): # ### commands auto generated by Alembic - please adjust! ### op.create_table('doc_transformation_job', sa.Column('id', sa.Uuid(), nullable=False), sa.Column('source_document_id', sa.Uuid(), nullable=False), sa.Column('transformed_document_id', sa.Uuid(), nullable=True), sa.Column('status', sa.Enum('PENDING', 'PROCESSING', 'COMPLETED', 'FAILED', name='transformationstatus'), nullable=False), sa.Column('error_message', sqlmodel.sql.sqltypes.AutoString(), nullable=True), sa.Column('created_at', sa.DateTime(), nullable=False), sa.Column('updated_at', sa.DateTime(), nullable=False), sa.ForeignKeyConstraint(['source_document_id'], ['document.id'], ), sa.ForeignKeyConstraint(['transformed_document_id'], ['document.id'], ), sa.PrimaryKeyConstraint('id') ) + # Add indexes for foreign keys and status for better query performance + op.create_index('ix_doc_transformation_job_source_document_id', 'doc_transformation_job', ['source_document_id']) + op.create_index('ix_doc_transformation_job_status', 'doc_transformation_job', ['status']) # ### end Alembic commands ###And update downgrade accordingly:
def downgrade(): # ### commands auto generated by Alembic - please adjust! ### + op.drop_index('ix_doc_transformation_job_status', 'doc_transformation_job') + op.drop_index('ix_doc_transformation_job_source_document_id', 'doc_transformation_job') op.drop_table('doc_transformation_job') + op.execute("DROP TYPE IF EXISTS transformationstatus") # ### end Alembic commands ###backend/app/models/doc_transformation_job.py (1)
14-23: Consider using modern Python type hints and apply formatting fixes.The model structure is well-designed with appropriate field defaults and foreign key relationships. However, there are formatting issues and opportunities to modernize the type hints.
Apply these formatting and type annotation updates:
class DocTransformationJob(SQLModel, table=True): __tablename__ = "doc_transformation_job" id: UUID = Field(default_factory=uuid4, primary_key=True) source_document_id: UUID = Field(foreign_key="document.id") - transformed_document_id: Optional[UUID] = Field(default=None, foreign_key="document.id") + transformed_document_id: UUID | None = Field(default=None, foreign_key="document.id") status: TransformationStatus = Field(default=TransformationStatus.PENDING) - error_message: Optional[str] = Field(default=None) + error_message: str | None = Field(default=None) created_at: datetime = Field(default_factory=now) updated_at: datetime = Field(default_factory=now)Also ensure the file ends with a newline character to comply with the formatting standards detected by the CI pipeline.
backend/app/crud/doc_transformation_job.py (3)
1-10: Use modern Python type hints.The imports are appropriate and the logger setup follows best practices. Consider modernizing the type hints.
Apply this diff to modernize type hints:
import logging from uuid import UUID -from typing import List, Optional from sqlmodel import Session, select from app.models.doc_transformation_job import DocTransformationJob, TransformationStatus from app.core.util import now from app.core.exception_handlers import HTTPException
29-48: Update type hints and consider simplifying the update logic.The method correctly updates job status with optional fields. Consider modernizing type hints and a minor simplification.
Apply this diff to modernize type hints and simplify:
def update_status( self, job_id: UUID, status: TransformationStatus, *, - error_message: Optional[str] = None, - transformed_document_id: Optional[UUID] = None, + error_message: str | None = None, + transformed_document_id: UUID | None = None, ) -> DocTransformationJob: job = self.read_one(job_id) job.status = status job.updated_at = now() if error_message is not None: job.error_message = error_message if transformed_document_id is not None: job.transformed_document_id = transformed_document_id self.session.add(job) self.session.commit() self.session.refresh(job) return job
50-52: Update return type annotation and fix formatting.The pagination logic is correct. Update the type hint to use modern Python syntax.
Apply this diff to modernize the type hint:
- def read_many(self, skip: int = 0, limit: int = 100) -> List[DocTransformationJob]: + def read_many(self, skip: int = 0, limit: int = 100) -> list[DocTransformationJob]: statement = select(DocTransformationJob).offset(skip).limit(limit) return self.session.exec(statement).all()Also ensure the file ends with a newline character.
backend/app/core/doctransform/service.py (2)
23-40: Fix whitespace issues but the implementation is solid.The function correctly creates a job and schedules the background task. Good practice extracting the user ID before passing to the background task.
Remove trailing whitespace from line 35:
job = job_crud.create(source_document_id=source_document_id) logger.debug(f"Job created | job_id={job.id}") - + # Extract the user ID before passing to background task
101-111: Consider using a dataclass or named tuple for FileUpload.The inline class definition works but could be better organized.
Consider moving this to a module-level definition or using a dataclass:
from dataclasses import dataclass from typing import BinaryIO @dataclass class FileUpload: filename: str file: BinaryIO content_type: strbackend/app/core/doctransform/registry.py (7)
1-6: Modernize type hints.Update the import statements to use modern Python type hints.
Apply this diff:
from pathlib import Path -from typing import Type, Dict, Set, Tuple, Optional +from typing import OptionalThen update all type annotations throughout the file to use built-in types (e.g.,
dict,set,tuple,type) instead of their typing module equivalents.
12-16: Update type annotations.The transformer registry is well-structured with sensible defaults.
Update the type annotation:
# Map transformer names to their classes -TRANSFORMERS: Dict[str, Type[Transformer]] = { +TRANSFORMERS: dict[str, type[Transformer]] = { "default": ZeroxTransformer, "test": TestTransformer, "zerox": ZeroxTransformer, }
19-27: Update type annotations and fix trailing whitespace.The transformation mapping structure is logical and extensible.
Apply these fixes:
# Define supported transformations: (source_format, target_format) -> [available_transformers] -SUPPORTED_TRANSFORMATIONS: Dict[Tuple[str, str], Dict[str, str]] = { +SUPPORTED_TRANSFORMATIONS: dict[tuple[str, str], dict[str, str]] = { ("pdf", "markdown"): { "default": "zerox", "zerox": "zerox", }, # Future transformations can be added here # ("docx", "markdown"): {"default": "pandoc", "pandoc": "pandoc"}, # ("html", "markdown"): {"default": "pandoc", "pandoc": "pandoc"}, }
30-39: Fix trailing whitespace.The extension mapping is comprehensive.
Remove trailing whitespace from line 33:
".pdf": "pdf", ".docx": "docx", - ".doc": "doc", + ".doc": "doc", ".html": "html",
59-64: Update type annotations and fix trailing whitespace.The function correctly transforms the data structure.
Apply these fixes:
-def get_supported_transformations() -> Dict[Tuple[str, str], Set[str]]: +def get_supported_transformations() -> dict[tuple[str, str], set[str]]: """Get all supported transformation combinations.""" return { - key: set(transformers.keys()) + key: set(transformers.keys()) for key, transformers in SUPPORTED_TRANSFORMATIONS.items() }
70-72: Update type annotation.Simple and effective lookup function.
-def get_available_transformers(source_format: str, target_format: str) -> Dict[str, str]: +def get_available_transformers(source_format: str, target_format: str) -> dict[str, str]: """Get available transformers for a specific transformation.""" return SUPPORTED_TRANSFORMATIONS.get((source_format, target_format), {})
74-96: Fix whitespace issues and update type annotation.Good validation logic with helpful error messages.
Apply these fixes:
-def resolve_transformer(source_format: str, target_format: str, transformer_name: Optional[str] = None) -> str: +def resolve_transformer(source_format: str, target_format: str, transformer_name: str | None = None) -> str: """ Resolve the actual transformer to use for a transformation. Returns the transformer name to use. """ available_transformers = get_available_transformers(source_format, target_format) - + if not available_transformers: raise ValueError( f"Transformation from {source_format} to {target_format} is not supported" ) - + if transformer_name is None: transformer_name = "default" - + if transformer_name not in available_transformers: available = ", ".join(available_transformers.keys()) raise ValueError( f"Transformer '{transformer_name}' not available for {source_format} to {target_format}. " f"Available: {available}" ) - + return available_transformers[transformer_name]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py(1 hunks)backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py(1 hunks)backend/app/api/routes/doc_transformation_job.py(1 hunks)backend/app/api/routes/documents.py(2 hunks)backend/app/core/doctransform/registry.py(1 hunks)backend/app/core/doctransform/service.py(1 hunks)backend/app/core/doctransform/test_transformer.py(1 hunks)backend/app/core/doctransform/transformer.py(1 hunks)backend/app/core/doctransform/zerox_transformer.py(1 hunks)backend/app/crud/doc_transformation_job.py(1 hunks)backend/app/models/__init__.py(1 hunks)backend/app/models/doc_transformation_job.py(1 hunks)backend/app/models/document.py(2 hunks)backend/pyproject.toml(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (13)
backend/app/core/doctransform/transformer.py (1)
backend/app/core/doctransform/zerox_transformer.py (1)
transform(15-38)
backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py (2)
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (2)
upgrade(20-26)downgrade(29-35)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)
backend/app/models/document.py (3)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)backend/app/models/user.py (1)
User(48-60)
backend/app/core/doctransform/test_transformer.py (2)
backend/app/core/doctransform/transformer.py (1)
Transformer(4-12)backend/app/core/doctransform/zerox_transformer.py (1)
transform(15-38)
backend/app/core/doctransform/zerox_transformer.py (2)
backend/app/core/doctransform/transformer.py (2)
Transformer(4-12)transform(8-12)backend/app/core/doctransform/test_transformer.py (1)
transform(9-13)
backend/app/api/routes/doc_transformation_job.py (3)
backend/app/crud/doc_transformation_job.py (1)
DocTransformationJobCrud(11-52)backend/app/utils.py (2)
APIResponse(27-48)success_response(34-37)backend/app/api/routes/assistants.py (1)
update_assistant_route(74-93)
backend/app/crud/doc_transformation_job.py (1)
backend/app/models/doc_transformation_job.py (2)
DocTransformationJob(14-23)TransformationStatus(8-12)
backend/app/models/doc_transformation_job.py (2)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
upgrade(21-73)
backend/app/models/__init__.py (1)
backend/app/models/doc_transformation_job.py (1)
DocTransformationJob(14-23)
backend/app/core/doctransform/service.py (6)
backend/app/crud/doc_transformation_job.py (4)
DocTransformationJobCrud(11-52)create(15-20)update_status(29-48)read_one(22-27)backend/app/crud/document.py (1)
DocumentCrud(14-133)backend/app/models/document.py (1)
Document(11-36)backend/app/models/doc_transformation_job.py (1)
TransformationStatus(8-12)backend/app/core/cloud/storage.py (1)
AmazonCloudStorage(121-206)backend/app/core/doctransform/registry.py (1)
convert_document(98-114)
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py (3)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/alembic/versions/e9dd35eff62c_add_openai_conversation_table.py (1)
upgrade(20-70)backend/app/alembic/versions/66abc97f3782_user_id_from_uuid_to_int.py (1)
upgrade(20-152)
backend/app/core/doctransform/registry.py (3)
backend/app/core/doctransform/transformer.py (2)
Transformer(4-12)transform(8-12)backend/app/core/doctransform/test_transformer.py (2)
TestTransformer(4-13)transform(9-13)backend/app/core/doctransform/zerox_transformer.py (2)
ZeroxTransformer(7-38)transform(15-38)
backend/app/api/routes/documents.py (5)
backend/app/crud/document.py (2)
DocumentCrud(14-133)update(97-122)backend/app/models/document.py (1)
Document(11-36)backend/app/core/cloud/storage.py (3)
AmazonCloudStorage(121-206)put(114-115)put(126-151)backend/app/core/doctransform/registry.py (4)
get_file_format(51-57)is_transformation_supported(66-68)get_available_transformers(70-72)resolve_transformer(74-96)backend/app/core/doctransform/service.py (1)
start_job(23-40)
🪛 GitHub Actions: AI Platform CI
backend/app/core/doctransform/transformer.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/core/doctransform/test_transformer.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/core/doctransform/zerox_transformer.py
[error] 1-1: End-of-file-fixer modified this file.
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/api/routes/doc_transformation_job.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/crud/doc_transformation_job.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/models/doc_transformation_job.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/core/doctransform/service.py
[error] 1-1: End-of-file-fixer modified this file.
[error] 1-1: Trailing whitespace removed (hook: trailing-whitespace).
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/core/doctransform/registry.py
[error] 1-1: Trailing whitespace removed (hook: trailing-whitespace).
[error] 1-1: Black formatting applied. Reformatted this file.
backend/app/api/routes/documents.py
[error] 1-1: Trailing whitespace removed (hook: trailing-whitespace).
[error] 1-1: Black formatting applied. Reformatted this file.
🪛 Ruff (0.12.2)
backend/app/models/document.py
30-30: Use X | None for type annotations
Convert to X | None
(UP045)
backend/app/api/routes/doc_transformation_job.py
6-6: typing.List is deprecated, use list instead
(UP035)
21-21: Unused function argument: current_user
(ARG001)
35-35: Unused function argument: current_user
(ARG001)
40-40: Use list instead of List for type annotation
Replace with list
(UP006)
backend/app/crud/doc_transformation_job.py
3-3: typing.List is deprecated, use list instead
(UP035)
34-34: Use X | None for type annotations
Convert to X | None
(UP045)
35-35: Use X | None for type annotations
Convert to X | None
(UP045)
50-50: Use list instead of List for type annotation
Replace with list
(UP006)
backend/app/models/doc_transformation_job.py
19-19: Use X | None for type annotations
Convert to X | None
(UP045)
21-21: Use X | None for type annotations
Convert to X | None
(UP045)
backend/app/models/__init__.py
6-6: .doc_transformation_job.DocTransformationJob imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/core/doctransform/service.py
7-7: fastapi.UploadFile imported but unused
Remove unused import: fastapi.UploadFile
(F401)
9-9: sqlmodel.select imported but unused
Remove unused import: sqlmodel.select
(F401)
15-15: app.core.util.now imported but unused
Remove unused import: app.core.util.now
(F401)
35-35: Blank line contains whitespace
Remove whitespace from blank line
(W293)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
138-138: No newline at end of file
Add trailing newline
(W292)
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py
10-10: sqlmodel.sql.sqltypes imported but unused
Remove unused import: sqlmodel.sql.sqltypes
(F401)
backend/app/core/doctransform/registry.py
2-2: typing.Type is deprecated, use type instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Tuple is deprecated, use tuple instead
(UP035)
12-12: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
12-12: Use type instead of Type for type annotation
Replace with type
(UP006)
19-19: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
19-19: Use tuple instead of Tuple for type annotation
Replace with tuple
(UP006)
19-19: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
30-30: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
33-33: Trailing whitespace
Remove trailing whitespace
(W291)
42-42: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
59-59: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
59-59: Use tuple instead of Tuple for type annotation
Replace with tuple
(UP006)
59-59: Use set instead of Set for type annotation
Replace with set
(UP006)
62-62: Trailing whitespace
Remove trailing whitespace
(W291)
70-70: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
74-74: Use X | None for type annotations
Convert to X | None
(UP045)
80-80: Blank line contains whitespace
Remove whitespace from blank line
(W293)
85-85: Blank line contains whitespace
Remove whitespace from blank line
(W293)
88-88: Blank line contains whitespace
Remove whitespace from blank line
(W293)
95-95: Blank line contains whitespace
Remove whitespace from blank line
(W293)
backend/app/api/routes/documents.py
3-3: typing.List is deprecated, use list instead
(UP035)
19-19: Trailing whitespace
Remove trailing whitespace
(W291)
20-20: Trailing whitespace
Remove trailing whitespace
(W291)
55-55: Use X | None for type annotations
Convert to X | None
(UP045)
56-56: Use X | None for type annotations
Convert to X | None
(UP045)
🔇 Additional comments (11)
backend/app/core/doctransform/transformer.py (1)
4-12: Solid abstraction for the transformer interface.The ABC is clean and sets a clear contract. No issues spotted.
backend/app/models/doc_transformation_job.py (2)
1-6: LGTM!The imports are well-organized and all are used in the code. The use of UUID for primary keys and the utility function
nowfor timestamps are good practices.
8-12: LGTM!The enum definition follows Python best practices by inheriting from both
strandenum.Enum, providing both type safety and string compatibility. The status values comprehensively cover the job lifecycle.backend/app/crud/doc_transformation_job.py (3)
11-14: LGTM!Clean class initialization that properly stores the session for use in CRUD operations.
15-20: LGTM!The create method correctly creates and persists a new job with proper transaction handling.
22-27: LGTM!Good error handling with appropriate logging and HTTP exception raising when a job is not found.
backend/app/core/doctransform/service.py (2)
42-48: LGTM with retry strategy!Excellent use of the retry decorator with exponential backoff for handling transient failures. The retry parameters (3 attempts, 5-10 second waits) are reasonable for document transformation operations.
130-138: Good error handling pattern!Excellent error handling with proper logging and status updates. The nested try-except ensures job status is updated even if there are database errors.
Add a newline at the end of the file to comply with formatting standards.
backend/app/core/doctransform/registry.py (3)
8-10: LGTM!Good custom exception class for transformation-specific errors.
51-57: LGTM!Good error handling with a clear error message for unsupported file extensions.
98-114: Excellent error handling pattern!The function properly wraps exceptions with context and provides helpful error messages. The error chaining with
from epreserves the original stack trace.
| def upgrade(): | ||
| # ### commands auto generated by Alembic - please adjust! ### | ||
| op.add_column('document', sa.Column('source_document_id', sa.Uuid(), nullable=True)) | ||
| op.create_foreign_key(None, 'document', 'document', ['source_document_id'], ['id']) |
There was a problem hiding this comment.
Name the FK and fix downgrade: dropping a FK with name=None will fail.
Alembic can’t drop a constraint by passing None. Provide an explicit name when creating the FK and use the same name when dropping it. Also consider ondelete="SET NULL" for a nullable self-reference.
- op.create_foreign_key(None, 'document', 'document', ['source_document_id'], ['id'])
+ op.create_foreign_key(
+ 'document_source_document_id_fkey',
+ 'document',
+ 'document',
+ ['source_document_id'],
+ ['id'],
+ ondelete='SET NULL',
+ )
@@
- op.drop_constraint(None, 'document', type_='foreignkey')
+ op.drop_constraint('document_source_document_id_fkey', 'document', type_='foreignkey')Also applies to: 33-34
🤖 Prompt for AI Agents
In
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py
around lines 23 and 33-34, the migration creates a foreign key with name=None
which cannot be dropped in downgrade; update the op.create_foreign_key call to
give the FK an explicit name (e.g. "fk_document_source_document_id"), add
ondelete="SET NULL" since source_document_id is nullable, and update the
downgrade to call op.drop_constraint with that explicit name and
constraint_type="foreignkey" so the constraint can be reliably removed.
| op.drop_constraint('openai_conversation_organization_id_fkey1', 'openai_conversation', type_='foreignkey') | ||
| op.drop_constraint('openai_conversation_project_id_fkey1', 'openai_conversation', type_='foreignkey') |
There was a problem hiding this comment.
Do not drop unrelated foreign keys on openai_conversation in this migration.
These drops are unrelated and will remove referential integrity for openai_conversation. They also aren’t re-created in this upgrade. This is likely an autogenerate artifact and should be removed.
Apply this diff:
- op.drop_constraint('openai_conversation_organization_id_fkey1', 'openai_conversation', type_='foreignkey')
- op.drop_constraint('openai_conversation_project_id_fkey1', 'openai_conversation', type_='foreignkey')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| op.drop_constraint('openai_conversation_organization_id_fkey1', 'openai_conversation', type_='foreignkey') | |
| op.drop_constraint('openai_conversation_project_id_fkey1', 'openai_conversation', type_='foreignkey') |
🤖 Prompt for AI Agents
In
backend/app/alembic/versions/269675883ecf_add_source_document_id_to_document.py
around lines 24-25, the migration is dropping openai_conversation foreign key
constraints 'openai_conversation_organization_id_fkey1' and
'openai_conversation_project_id_fkey1' which are unrelated to adding
source_document_id; remove those op.drop_constraint lines from the upgrade (and
any matching re-create in downgrade if present) so the migration only modifies
constraints/tables related to the document/source_document_id change, preserving
referential integrity for openai_conversation.
| def downgrade(): | ||
| # ### commands auto generated by Alembic - please adjust! ### | ||
| op.drop_table('doc_transformation_job') | ||
| # ### end Alembic commands ### |
There was a problem hiding this comment.
Add proper cleanup of enum type in downgrade.
The downgrade function drops the table but doesn't clean up the transformationstatus enum type that was created. This could cause issues if the migration is rolled back and then re-applied.
def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table('doc_transformation_job')
+ # Drop the enum type
+ op.execute("DROP TYPE IF EXISTS transformationstatus")
# ### end Alembic commands ###📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def downgrade(): | |
| # ### commands auto generated by Alembic - please adjust! ### | |
| op.drop_table('doc_transformation_job') | |
| # ### end Alembic commands ### | |
| def downgrade(): | |
| # ### commands auto generated by Alembic - please adjust! ### | |
| op.drop_table('doc_transformation_job') | |
| # Drop the enum type | |
| op.execute("DROP TYPE IF EXISTS transformationstatus") | |
| # ### end Alembic commands ### |
🤖 Prompt for AI Agents
In
backend/app/alembic/versions/93b86c1246b1_create_doc_transformation_job_table.py
around lines 37 to 40, the downgrade drops the doc_transformation_job table but
does not remove the Postgres enum type `transformationstatus`; after
op.drop_table('doc_transformation_job') add a command to drop the enum (for
example: op.execute("DROP TYPE IF EXISTS transformationstatus")) so the enum is
cleaned up on downgrade and re-creation of the migration will not fail.
| def get_transformation_job( | ||
| session: SessionDep, | ||
| current_user: CurrentUser, | ||
| job_id: UUID = FastPath(description="Transformation job ID"), | ||
| ): | ||
| crud = DocTransformationJobCrud(session) | ||
| job = crud.read_one(job_id) | ||
| return APIResponse.success_response(job) |
There was a problem hiding this comment.
Remove unused current_user parameter or add authorization logic.
The current_user parameter is not used in this function. Either remove it if authorization is not needed, or add proper authorization checks to ensure users can only access their own transformation jobs.
If authorization is intended:
def get_transformation_job(
session: SessionDep,
current_user: CurrentUser,
job_id: UUID = FastPath(description="Transformation job ID"),
):
crud = DocTransformationJobCrud(session)
job = crud.read_one(job_id)
+ # TODO: Verify the user has permission to view this job
+ # This would require checking if the job's source_document belongs to current_user
return APIResponse.success_response(job)Or remove if not needed:
def get_transformation_job(
session: SessionDep,
- current_user: CurrentUser,
job_id: UUID = FastPath(description="Transformation job ID"),
):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_transformation_job( | |
| session: SessionDep, | |
| current_user: CurrentUser, | |
| job_id: UUID = FastPath(description="Transformation job ID"), | |
| ): | |
| crud = DocTransformationJobCrud(session) | |
| job = crud.read_one(job_id) | |
| return APIResponse.success_response(job) | |
| def get_transformation_job( | |
| session: SessionDep, | |
| job_id: UUID = FastPath(description="Transformation job ID"), | |
| ): | |
| crud = DocTransformationJobCrud(session) | |
| job = crud.read_one(job_id) | |
| return APIResponse.success_response(job) |
🧰 Tools
🪛 Ruff (0.12.2)
21-21: Unused function argument: current_user
(ARG001)
| @router.get( | ||
| "/", | ||
| description="Get the status and details of multiple document transformation jobs by IDs.", | ||
| response_model=APIResponse, | ||
| ) | ||
| def get_multiple_transformation_jobs( | ||
| session: SessionDep, | ||
| current_user: CurrentUser, | ||
| job_ids: str = Query(..., description="Comma-separated list of transformation job IDs"), | ||
| ): | ||
| crud = DocTransformationJobCrud(session) | ||
| try: | ||
| job_id_list: List[UUID] = [UUID(jid.strip()) for jid in job_ids.split(",") if jid.strip()] | ||
| except Exception: | ||
| from fastapi import HTTPException | ||
| raise HTTPException(status_code=400, detail="Invalid job_ids format. Must be comma-separated UUIDs.") | ||
| jobs = [crud.read_one(job_id) for job_id in job_id_list] | ||
| return APIResponse.success_response(jobs) |
There was a problem hiding this comment.
Add authorization check and improve error handling for batch retrieval.
Similar to the single job endpoint, the current_user parameter is unused. Additionally, the batch retrieval could fail if any individual job is not found, which would cause the entire request to fail.
def get_multiple_transformation_jobs(
session: SessionDep,
current_user: CurrentUser,
job_ids: str = Query(..., description="Comma-separated list of transformation job IDs"),
):
crud = DocTransformationJobCrud(session)
try:
job_id_list: list[UUID] = [UUID(jid.strip()) for jid in job_ids.split(",") if jid.strip()]
- except Exception:
- from fastapi import HTTPException
+ except (ValueError, AttributeError) as e:
raise HTTPException(status_code=400, detail="Invalid job_ids format. Must be comma-separated UUIDs.")
- jobs = [crud.read_one(job_id) for job_id in job_id_list]
+
+ jobs = []
+ for job_id in job_id_list:
+ try:
+ job = crud.read_one(job_id)
+ # TODO: Verify user has permission to view this job
+ jobs.append(job)
+ except HTTPException as e:
+ # Log the error but continue processing other jobs
+ logger.warning(f"Failed to fetch job {job_id}: {e.detail}")
+ # Optionally include an error placeholder in the response
+
return APIResponse.success_response(jobs)Would you like me to implement proper authorization checks that verify document ownership?
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
35-35: Unused function argument: current_user
(ARG001)
40-40: Use list instead of List for type annotation
Replace with list
(UP006)
| fname_no_ext = Path(source_doc.fname).stem | ||
| target_extension = FORMAT_TO_EXTENSION.get(target_format, f".{target_format}") | ||
| transformed_doc_id = uuid4() | ||
| tmp_out = tmp_dir / f"<transformed>{fname_no_ext}{target_extension}" |
There was a problem hiding this comment.
Security concern: Avoid angle brackets in filenames.
Using <transformed> in the filename could cause issues with file systems or tools that interpret angle brackets specially.
Replace the angle brackets with a safer prefix:
- tmp_out = tmp_dir / f"<transformed>{fname_no_ext}{target_extension}"
+ tmp_out = tmp_dir / f"transformed_{fname_no_ext}{target_extension}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tmp_out = tmp_dir / f"<transformed>{fname_no_ext}{target_extension}" | |
| tmp_out = tmp_dir / f"transformed_{fname_no_ext}{target_extension}" |
🤖 Prompt for AI Agents
In backend/app/core/doctransform/service.py around line 87, the code embeds
"<transformed>" in the temporary filename which uses angle brackets that can
break filesystems or tools; change the prefix to a safe string (e.g.,
"transformed_" or "transformed-"), avoid special characters, and if appropriate
run a small sanitization on fname_no_ext to ensure the resulting filename
contains only cross-platform-safe characters before joining with
target_extension.
| def transform(self, input_path: Path) -> str: | ||
| logging.info(f"ZeroxTransformer: {input_path} (model={self.model})") | ||
| try: | ||
| with Runner() as runner: | ||
| result = runner.run(zerox( | ||
| file_path=str(input_path), | ||
| model=self.model, | ||
| )) | ||
| if result is None or not hasattr(result, "pages") or result.pages is None: | ||
| raise RuntimeError("Zerox returned no pages. This may indicate a PDF/image conversion failure (is Poppler installed and in PATH?)") | ||
| output = '\n\n'.join(x.content for x in result.pages) | ||
| if not output: | ||
| raise ValueError('Empty output from zerox') | ||
| return output |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add input validation and improve error context.
The function should validate that the input file exists before attempting transformation. Also, the error handling could be more specific about different failure scenarios.
def transform(self, input_path: Path) -> str:
+ if not input_path.exists():
+ raise FileNotFoundError(f"Input file does not exist: {input_path}")
+ if not input_path.is_file():
+ raise ValueError(f"Input path is not a file: {input_path}")
+
logging.info(f"ZeroxTransformer: {input_path} (model={self.model})")
try:
with Runner() as runner:
result = runner.run(zerox(
file_path=str(input_path),
model=self.model,
))
if result is None or not hasattr(result, "pages") or result.pages is None:
raise RuntimeError("Zerox returned no pages. This may indicate a PDF/image conversion failure (is Poppler installed and in PATH?)")
output = '\n\n'.join(x.content for x in result.pages)
if not output:
raise ValueError('Empty output from zerox')
return output📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def transform(self, input_path: Path) -> str: | |
| logging.info(f"ZeroxTransformer: {input_path} (model={self.model})") | |
| try: | |
| with Runner() as runner: | |
| result = runner.run(zerox( | |
| file_path=str(input_path), | |
| model=self.model, | |
| )) | |
| if result is None or not hasattr(result, "pages") or result.pages is None: | |
| raise RuntimeError("Zerox returned no pages. This may indicate a PDF/image conversion failure (is Poppler installed and in PATH?)") | |
| output = '\n\n'.join(x.content for x in result.pages) | |
| if not output: | |
| raise ValueError('Empty output from zerox') | |
| return output | |
| def transform(self, input_path: Path) -> str: | |
| if not input_path.exists(): | |
| raise FileNotFoundError(f"Input file does not exist: {input_path}") | |
| if not input_path.is_file(): | |
| raise ValueError(f"Input path is not a file: {input_path}") | |
| logging.info(f"ZeroxTransformer: {input_path} (model={self.model})") | |
| try: | |
| with Runner() as runner: | |
| result = runner.run(zerox( | |
| file_path=str(input_path), | |
| model=self.model, | |
| )) | |
| if result is None or not hasattr(result, "pages") or result.pages is None: | |
| raise RuntimeError("Zerox returned no pages. This may indicate a PDF/image conversion failure (is Poppler installed and in PATH?)") | |
| output = '\n\n'.join(x.content for x in result.pages) | |
| if not output: | |
| raise ValueError('Empty output from zerox') | |
| return output |
🤖 Prompt for AI Agents
In backend/app/core/doctransform/zerox_transformer.py around lines 15 to 28, add
input validation to check that input_path.exists() and input_path.is_file() (and
optionally os.access for readability) and raise a clear FileNotFoundError or
PermissionError if the checks fail before creating the Runner; keep the existing
with Runner() block but narrow the try/except to only wrap runner.run(...) and
catch specific exceptions (e.g., RuntimeError/ValueError/Exception from zerox or
Runner), then re-raise or raise a new RuntimeError that includes contextual
details (input_path, model, and the original exception message) so callers get
actionable info about whether the failure was due to a missing file, unreadable
file, empty pages, or a conversion error.
| @@ -1,5 +1,6 @@ | |||
| from uuid import UUID, uuid4 | |||
| from datetime import datetime | |||
| from typing import Optional | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use modern union types and add safe FK behavior (SET NULL) for self-reference.
- Be consistent with Python 3.11 unions (you already use datetime | None at Line 29). This also satisfies Ruff’s UP045 hint.
- Consider setting ondelete="SET NULL" to avoid cascaded deletions wiping derived documents if the source is removed (safer default for a nullable self-reference).
- Optional: add an index on source_document_id for lookup performance.
Apply this diff:
-from typing import Optional
+from typing import Optional # will be removed below if using PEP 604 unions
@@
- source_document_id: Optional[UUID] = Field(
- default=None,
- foreign_key="document.id",
- nullable=True,
- )
+ source_document_id: UUID | None = Field(
+ default=None,
+ foreign_key="document.id",
+ nullable=True,
+ ondelete="SET NULL",
+ index=True,
+ )If you adopt the PEP 604 union above, remove the now-unused Optional import:
-from typing import OptionalAlso applies to: 30-34
🤖 Prompt for AI Agents
In backend/app/models/document.py around lines 3 and 30-34, the file imports
Optional but elsewhere uses PEP 604 unions; replace Optional usage with the
modern X | None annotation, remove the now-unused Optional import at line 3,
update the self-referential ForeignKey for source_document_id to include
ondelete="SET NULL" (and ensure the column is nullable), and add an index on
source_document_id (e.g., sa.Index or index=True on the Column) to improve
lookup performance; keep type annotations consistent and run linters to confirm
UP045 is resolved.
| version = "0.1.0" | ||
| description = "" | ||
| requires-python = ">=3.10,<4.0" | ||
| requires-python = ">=3.11,<4.0" |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Align Ruff target version with the new Python requirement (3.11).
You bumped requires-python to >=3.11, but Ruff still targets py310. This mismatch can cause misleading/later fixes (e.g., PEP 604 unions) and inconsistent linting behavior.
Apply this diff to align Ruff with Python 3.11:
[project]
-requires-python = ">=3.11,<4.0"
+requires-python = ">=3.11,<4.0"
[tool.ruff]
-target-version = "py310"
+target-version = "py311"Also applies to: 57-57
🤖 Prompt for AI Agents
In backend/pyproject.toml around lines 5 and 57, the project sets
requires-python = ">=3.11,<4.0" but Ruff is still targeting py310; update the
Ruff configuration by replacing any target-version entries (e.g., "py310") with
"py311" so the linter target matches the declared Python requirement, and ensure
all occurrences of "py310" in the file are changed to "py311".
| "openai_responses", | ||
| "langfuse>=2.60.3", | ||
| "asgi-correlation-id>=4.3.4", | ||
| "py-zerox>=0.0.7,<1.0.0" |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Document and provision OS dependencies for py-zerox (Poppler) and verify image compatibility.
py-zerox typically requires Poppler to be installed on the host/container. Without it, runtime will fail (as also hinted by ZeroxTransformer’s error messaging). Ensure:
- Dockerfile(s) install Poppler (e.g., poppler-utils on Debian/Ubuntu).
- CI runners and dev instructions are updated.
Run this script to check if your repo’s Dockerfiles already install Poppler and if a Python 3.11 base image is used:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for Dockerfiles..."
fd -i --glob "Dockerfile*" || true
echo -e "\nChecking for Poppler installation mentions..."
rg -n "(poppler|poppler-utils|libpoppler|poppler-data)" -S || true
echo -e "\nChecking for Python 3.11 base images in Dockerfiles..."
rg -n "python:3\.11" -S || true
echo -e "\nChecking whether declared dependency versions exist on PyPI..."
echo "py-zerox latest:"
curl -s https://pypi.org/pypi/py-zerox/json | jq -r '.info.version'
echo "boto3 latest:"
curl -s https://pypi.org/pypi/boto3/json | jq -r '.info.version'Length of output: 1318
Install Poppler in Dockerfile & update docs
We ran a quick check and found that:
- backend/Dockerfile exists but doesn’t install any Poppler packages
- it isn’t using a Python 3.11 base image (if you intend to target 3.11)
- PyPI shows py-zerox 0.0.7, matching your
>=0.0.7,<1.0.0declaration
Action items:
- In backend/Dockerfile, after
apt-get updateinstall Poppler tools. For example:FROM python:3.11-slim # or your current base +RUN apt-get update \ + && apt-get install -y poppler-utils libpoppler-dev poppler-data \ + && rm -rf /var/lib/apt/lists/*
- If you’re standardizing on Python 3.11, bump the
FROMline topython:3.11-slim(or equivalent). - Update CI definitions and local‐dev docs (README/CONTRIBUTING.md) to call out the OS dependency on Poppler (and how to install it on macOS/Linux).
- Optionally, add a quick smoke test in CI to verify that
pdfimages(from poppler-utils) is inPATH.
🤖 Prompt for AI Agents
In backend/pyproject.toml around line 32, the dependency on py-zerox is fine but
the Dockerfile and docs must be updated: modify backend/Dockerfile to use a
Python 3.11 base (e.g. change FROM to python:3.11-slim if targeting 3.11) and,
after the apt-get update step, install Poppler (e.g. poppler-utils or
distro-equivalent) so pdf tools like pdfimages are available; update CI
definitions and README/CONTRIBUTING.md to document the OS dependency and
installation steps for macOS/Linux; and optionally add a small CI smoke test
that checks pdfimages is in PATH to prevent regressions.
f0b60b8 to
2dd5009
Compare
…d Poppler installation (#342) * add doc transformation endpoint to router main.py * Handle response of upload document and validate request first than start upload * Modify docker file to install poppler * fix alembic migration order * use fastapi UploadFile * use model_validate in response * Option to include signed url in get document endpoint * speciify response type in doc transformation route * logs reviewed and added * logs modified * resolve alembic head and remove duplicate get_signed url from cloud * update background job to use project id instead of user id * fix DocTransformationJobCrud and routes * unit test for doc transformation crud * extend unit test for upload document endpoint * fix background job to use different session * delete temp directories after job completion * test for background job for transformer * configure test_service properly * fix migrations head * return signed url in upload response * fix testcases * pass filepath to storage * Instead of string return path of output file after transformation
|
Closing this one, created new PR#363 |
Summary
Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
API
Chores