Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 0 additions & 11 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,3 @@ AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=ap-south-1
AWS_S3_BUCKET_PREFIX="bucket-prefix-name"

# OpenAI

OPENAI_API_KEY="this_is_not_a_secret"
LANGFUSE_PUBLIC_KEY="this_is_not_a_secret"
LANGFUSE_SECRET_KEY="this_is_not_a_secret"
LANGFUSE_HOST="this_is_not_a_secret"

# Misc

CI=""
30 changes: 30 additions & 0 deletions .env.test.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
ENVIRONMENT=testing

PROJECT_NAME="AI Platform"
STACK_NAME=ai-platform

#Backend
SECRET_KEY=changethis
FIRST_SUPERUSER=superuser@example.com
FIRST_SUPERUSER_PASSWORD=changethis
EMAIL_TEST_USER="test@example.com"

# Postgres

POSTGRES_SERVER=localhost
POSTGRES_PORT=5432
POSTGRES_DB=ai_platform_test
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres

# Configure these with your own Docker registry images

DOCKER_IMAGE_BACKEND=backend
DOCKER_IMAGE_FRONTEND=frontend

# AWS

AWS_ACCESS_KEY_ID=this_is_a_test_key
AWS_SECRET_ACCESS_KEY=this_is_a_test_key
AWS_DEFAULT_REGION=ap-south-1
AWS_S3_BUCKET_PREFIX="bucket-prefix-name"
6 changes: 1 addition & 5 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,12 @@ jobs:
count: [100]

env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LANGFUSE_PUBLIC_KEY: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
LANGFUSE_SECRET_KEY: ${{ secrets.LANGFUSE_SECRET_KEY }}
LANGFUSE_HOST: ${{ secrets.LANGFUSE_HOST }}
LOCAL_CREDENTIALS_ORG_OPENAI_API_KEY: ${{ secrets.LOCAL_CREDENTIALS_ORG_OPENAI_API_KEY }}
LOCAL_CREDENTIALS_API_KEY: ${{ secrets.LOCAL_CREDENTIALS_API_KEY }}

steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- run: |
cp .env.example .env
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/cd-production.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:

steps:
- name: Checkout the repo
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4 # More information on this action can be found below in the 'AWS Credentials' section
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/cd-staging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:

steps:
- name: checkout the repo
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4 # More information on this action can be found below in the 'AWS Credentials' section
Expand Down
10 changes: 6 additions & 4 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: ai_platform
POSTGRES_DB: ai_platform_test
ports:
- 5432:5432
options: --health-cmd "pg_isready -U postgres" --health-interval 10s --health-timeout 5s --health-retries 5
Expand All @@ -26,15 +26,17 @@ jobs:
redis-version: [6]

steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Making env file
run: cp .env.example .env
run: |
cp .env.test.example .env
cp .env.test.example .env.test

- name: Install uv
uses: astral-sh/setup-uv@v6
Expand Down Expand Up @@ -63,7 +65,7 @@ jobs:
working-directory: backend

- name: Upload coverage reports to codecov
uses: codecov/codecov-action@v5.4.3
uses: codecov/codecov-action@v5.5.0
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ node_modules/
/playwright/.cache/

# Environments
.env
.env*
.venv
env/
venv/
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

- [docker](https://docs.docker.com/get-started/get-docker/) Docker
- [uv](https://docs.astral.sh/uv/) for Python package and environment management.
- **Poppler** – Install Poppler, required for PDF processing.

## Project Setup

Expand Down
7 changes: 5 additions & 2 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,11 @@ ENV PYTHONUNBUFFERED=1
# Set working directory
WORKDIR /app/

# Install system dependencies
RUN apt-get update && apt-get install -y curl
# Install system dependencies (added poppler-utils)
RUN apt-get update && apt-get install -y \
curl \
poppler-utils \
&& rm -rf /var/lib/apt/lists/*

# Install uv package manager
COPY --from=ghcr.io/astral-sh/uv:0.5.11 /uv /uvx /bin/
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"""add storage_path to project and project_id to document table

Revision ID: 40307ab77e9f
Revises: 8725df286943
Create Date: 2025-08-28 10:54:30.712627

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = "40307ab77e9f"
down_revision = "8725df286943"
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###

op.add_column("project", sa.Column("storage_path", sa.Uuid(), nullable=True))

conn = op.get_bind()
conn.execute(sa.text("UPDATE project SET storage_path = gen_random_uuid()"))

op.alter_column("project", "storage_path", nullable=False)
op.create_unique_constraint("uq_project_storage_path", "project", ["storage_path"])

op.add_column("document", sa.Column("project_id", sa.Integer(), nullable=True))
op.add_column("document", sa.Column("is_deleted", sa.Boolean(), nullable=True))

conn.execute(
sa.text(
"""
UPDATE document
SET is_deleted = CASE
WHEN deleted_at IS NULL THEN false
ELSE true
END
"""
)
)
conn.execute(
sa.text(
"""
UPDATE document
SET project_id = (
SELECT project_id FROM apikey
WHERE apikey.user_id = document.owner_id
LIMIT 1
)
"""
)
)

op.alter_column("document", "is_deleted", nullable=False)
op.alter_column("document", "project_id", nullable=False)

op.drop_constraint("document_owner_id_fkey", "document", type_="foreignkey")
op.create_foreign_key(
None, "document", "project", ["project_id"], ["id"], ondelete="CASCADE"
)
op.drop_column("document", "owner_id")

# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_constraint("uq_project_storage_path", "project", type_="unique")
op.drop_column("project", "storage_path")

op.add_column(
"document",
sa.Column("owner_id", sa.Integer(), autoincrement=False, nullable=True),
)

conn = op.get_bind()
# Backfill owner_id from project_id using api_key mapping
conn.execute(
sa.text(
"""
UPDATE document d
SET owner_id = (
SELECT user_id
FROM apikey a
WHERE a.project_id = d.project_id
LIMIT 1
)
"""
)
)

op.alter_column("document", "owner_id", nullable=False)

op.drop_constraint("document_project_id_fkey", "document", type_="foreignkey")
op.create_foreign_key(
"document_owner_id_fkey",
"document",
"user",
["owner_id"],
["id"],
ondelete="CASCADE",
)
op.drop_column("document", "is_deleted")
op.drop_column("document", "project_id")
# ### end Alembic commands ###
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""unique constraint on project_name and org id

Revision ID: 8725df286943
Revises: 38f0e8c8dc92
Create Date: 2025-08-27 12:22:36.633904

"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes


# revision identifiers, used by Alembic.
revision = "8725df286943"
down_revision = "38f0e8c8dc92"
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.create_unique_constraint(
"uq_project_name_org_id", "project", ["name", "organization_id"]
)
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_constraint("uq_project_name_org_id", "project", type_="unique")
# ### end Alembic commands ###
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""create doc transformation job table

Revision ID: 9f8a4af9d6fd
Revises: b5b9412d3d2a
Create Date: 2025-08-29 16:00:47.848950

"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes


# revision identifiers, used by Alembic.
revision = '9f8a4af9d6fd'
down_revision = 'b5b9412d3d2a'
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('doc_transformation_job',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('source_document_id', sa.Uuid(), nullable=False),
sa.Column('transformed_document_id', sa.Uuid(), nullable=True),
sa.Column('status', sa.Enum('PENDING', 'PROCESSING', 'COMPLETED', 'FAILED', name='transformationstatus'), nullable=False),
sa.Column('error_message', sqlmodel.sql.sqltypes.AutoString(), nullable=True),
sa.Column('created_at', sa.DateTime(), nullable=False),
sa.Column('updated_at', sa.DateTime(), nullable=False),
sa.ForeignKeyConstraint(['source_document_id'], ['document.id'], ),
sa.ForeignKeyConstraint(['transformed_document_id'], ['document.id'], ),
sa.PrimaryKeyConstraint('id')
)
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table('doc_transformation_job')
# ### end Alembic commands ###
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""add source document id to document table

Revision ID: b5b9412d3d2a
Revises: 40307ab77e9f
Create Date: 2025-08-29 15:59:34.347031

"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes


# revision identifiers, used by Alembic.
revision = 'b5b9412d3d2a'
down_revision = '40307ab77e9f'
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('document', sa.Column('source_document_id', sa.Uuid(), nullable=True))
op.create_foreign_key(None, 'document', 'document', ['source_document_id'], ['id'])
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_constraint(None, 'document', type_='foreignkey')
op.drop_column('document', 'source_document_id')
# ### end Alembic commands ###
19 changes: 17 additions & 2 deletions backend/app/api/docs/documents/upload.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,17 @@
Upload a document to the AI platform. The response will contain an ID,
which is the document ID required by other routes.
Upload a document to the AI platform.

- If only a file is provided, the document will be uploaded and stored, and its ID will be returned.
- If a target format is specified, a transformation job will also be created to transform document into target format in the background. The response will include both the uploaded document details and information about the transformation job.

### Supported Transformations

The following (source_format → target_format) transformations are supported:

- pdf → markdown
- zerox

### Transformers

Available transformer names and their implementations, default transformer is zerox:

- `zerox`
Loading