Kaapi: V1.0: Database optimization#817
Conversation
📝 WalkthroughWalkthroughThis PR implements a comprehensive soft-delete semantic migration across the backend, replacing boolean ChangesSoft-Delete Schema & Logic Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…h4DevAI/kaapi-backend into enhancement/database-cleanups
…o enhancement/database-cleanups
…h4DevAI/kaapi-backend into enhancement/database-cleanups
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/models/evaluation.py (1)
357-364:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd
onupdatehook toEvaluationRun.updated_atandEvaluationDataset.updated_at.Without SQLAlchemy's
onupdate, theupdated_atfield only gets set on INSERT. Updates at lines 208–210 inbatch.py(status, batch_job_id, total_items), at lines 230–231 (error_message, status), and inprocessing.py(object_store_url) all modifyeval_runwithout explicitly settingupdated_at, leaving the timestamp stale.The codebase pattern (e.g.,
user.py,tts_evaluation.py) uses"onupdate": nowinsa_column_kwargs. Apply the same fix:- updated_at: datetime = SQLField( - default_factory=now, - nullable=False, - description="The timestamp when the evaluation run was last updated", - sa_column_kwargs={ - "comment": "Timestamp when the evaluation run was last updated" - }, - ) + updated_at: datetime = SQLField( + default_factory=now, + nullable=False, + description="The timestamp when the evaluation run was last updated", + sa_column_kwargs={ + "comment": "Timestamp when the evaluation run was last updated", + "onupdate": now, + }, + )Apply the same fix to
EvaluationDataset.updated_at(lines 172–178) for consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/models/evaluation.py` around lines 357 - 364, The updated_at SQLField for EvaluationRun and EvaluationDataset currently only sets a default timestamp on INSERT; add an SQLAlchemy onupdate hook so it auto-updates on UPDATEs by adding "onupdate": now to the sa_column_kwargs for the SQLField definitions named EvaluationRun.updated_at and EvaluationDataset.updated_at (the SQLField instances using default_factory=now and sa_column_kwargs={"comment": ...}); ensure the sa_column_kwargs now contains both "comment" and "onupdate": now consistent with other models like user.py and tts_evaluation.py.backend/app/crud/assistants.py (1)
274-278:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSet
updated_atwhen soft-deleting an assistant for consistency with other delete operations in this PR.The
Assistant.updated_atfield lacks a SQL-levelonupdate=default and only sets adefault_factory. This means the soft-delete operation at line 275 does not update the timestamp — unlike theAPIKeyCrud.deleteandDocumentCrud.deleteimplementations elsewhere in the PR that update bothdeleted_atandupdated_at. This breaks the "last modified" semantics for clients relying onupdated_atto track when records were last changed.♻️ Proposed fix
existing_assistant.deleted_at = now() + existing_assistant.updated_at = now() session.add(existing_assistant) session.commit() session.refresh(existing_assistant)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/crud/assistants.py` around lines 274 - 278, The soft-delete path for Assistant doesn't update the Assistant.updated_at timestamp; before committing the change in the block that sets existing_assistant.deleted_at = now(), also set existing_assistant.updated_at = now() (mirroring APIKeyCrud.delete and DocumentCrud.delete behavior) so the record's "last modified" time is updated, then session.add, session.commit, session.refresh as currently done.backend/app/api/routes/user_project.py (1)
65-67:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate the project/org pairing before creating memberships.
These checks are independent right now. If
body.project_idbelongs to a different organization, the code below will persistUserProject.organization_id = body.organization_idand generate invite tokens for the wrong org/project combination.🛠️ Suggested fix
- validate_organization(session=session, org_id=body.organization_id) - validate_project(session=session, project_id=body.project_id) + validate_organization(session=session, org_id=body.organization_id) + project = get_project_by_id(session=session, project_id=body.project_id) + if project is None or project.organization_id != body.organization_id: + raise HTTPException( + status_code=status.HTTP_400_BAD_REQUEST, + detail="Project does not belong to the given organization", + )Also applies to: 114-118
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/api/routes/user_project.py` around lines 65 - 67, The current independent checks (validate_organization and validate_project) allow mismatched org/project combos; update validation so the project is confirmed to belong to the requested organization before creating memberships or invite tokens—either extend validate_project or add a new function (e.g., validate_project_belongs_to_org) to fetch the Project by body.project_id and assert project.organization_id == body.organization_id, and call this check in place of or after validate_project (also apply the same fix for the block referenced at lines 114-118); ensure the code paths that set UserProject.organization_id and generate invite tokens only run after this combined validation passes.
🧹 Nitpick comments (3)
backend/app/crud/thread_results.py (1)
24-24: ⚡ Quick winReplace deprecated
data.dict()withdata.model_dump().Pydantic v2 deprecated
.dict()in favour of.model_dump(). Since line 24 is already being touched, this is the right moment to fix it.🔧 Proposed fix
- new_thread = OpenAIThread(**data.dict()) + new_thread = OpenAIThread(**data.model_dump())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/crud/thread_results.py` at line 24, Replace the deprecated Pydantic v2 call data.dict() with data.model_dump() where the OpenAIThread is constructed: change the OpenAIThread(**data.dict()) call to use OpenAIThread(**data.model_dump()) so the created_thread/ new_thread instantiation uses the v2-compatible model dump; update any other occurrences in the same module that still call data.dict() (search for data.dict and replace with data.model_dump).backend/app/alembic/versions/058_v1_query_optimization.py (1)
182-225: ⚡ Quick winAdd explicit
-> Noneannotations to the Alembic entrypoints.Both changed functions are missing return annotations.
As per coding guidelines,
**/*.py: Always add type hints to all function parameters and return values in Python code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/alembic/versions/058_v1_query_optimization.py` around lines 182 - 225, The upgrade and downgrade Alembic entrypoint functions (upgrade and downgrade) lack return type hints; modify their definitions to include explicit return annotations by changing their signatures to include "-> None" (i.e., def upgrade() -> None: and def downgrade() -> None:) so both functions are fully type-annotated per the project's guidelines.backend/app/alembic/versions/060_v1_assorted_cleanups.py (1)
47-163: ⚡ Quick winAdd explicit
-> Noneannotations toupgrade()anddowngrade().These changed entrypoints are missing return annotations.
As per coding guidelines,
**/*.py: Always add type hints to all function parameters and return values in Python code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py` around lines 47 - 163, Add explicit return type annotations -> None to the Alembic entrypoint functions upgrade() and downgrade(): update the function definitions for upgrade and downgrade in this diff to include a return annotation of None (i.e., def upgrade() -> None: and def downgrade() -> None:) so they comply with the project's typing guideline; no other behavioral changes are needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/alembic/versions/058_v1_query_optimization.py`:
- Around line 182-202: Add explicit return type annotations "-> None" to the
upgrade() and downgrade() functions, and prevent unrecoverable partial commits
by ensuring Phase A (the UPDATE and drop_column on IS_DELETED_TABLES) cannot be
committed before Phase B index creation fails: either run Phase A inside the
same transactional boundary as Phase B (avoid using
op.get_context().autocommit_block() for Phase A) or keep autocommit for Phase B
only and guard each drop with a runtime existence check (use a helper like
column_exists(table, "is_deleted") before calling op.drop_column) so a retry
won't error on missing columns; locate these changes around upgrade(),
autocommit_block(), IS_DELETED_TABLES, FK_INDEXES, and COMPOSITE_INDEXES and
apply the same pattern to downgrade().
In `@backend/app/alembic/versions/059_drop_redundant_indexes.py`:
- Around line 89-98: The migration functions lack explicit return type hints;
update the function signatures for upgrade and downgrade to declare return type
None (i.e., change upgrade() and downgrade() to upgrade() -> None and
downgrade() -> None) so they comply with the repository typing guidelines while
keeping the existing bodies (including op.get_context().autocommit_block() and
the loops over INDEXES_TO_DROP) unchanged.
In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py`:
- Around line 47-147: Add explicit return type hints (-> None) to both upgrade()
and downgrade(), and refactor this migration so all transactional schema changes
(op.add_column, op.alter_column, op.create_foreign_key, DELETEs, ALTER TABLE ...
ADD CONSTRAINT, ALTER COLUMN type) occur inside the single migration transaction
and do not include CONCURRENTLY/index-creation autocommit blocks; extract the
two autocommit index creations that use op.get_context().autocommit_block() (the
CREATE INDEX CONCURRENTLY for "ix_llm_call_job_inserted_at_active" and the
CREATE UNIQUE INDEX CONCURRENTLY for "uq_document_collection") into a new,
separate migration file that only runs those concurrent, non-transactional
statements, leaving this file to perform only transactional changes and making
upgrade()/downgrade() idempotent and safe to rerun.
In `@backend/app/models/document_collection.py`:
- Around line 9-11: DocumentCollectionCrud.create() currently commits without
catching SQLAlchemy IntegrityError from the composite UniqueConstraint, causing
500s on duplicates; wrap the bulk_save_objects()/commit() in a try/except that
catches sqlalchemy.exc.IntegrityError, calls session.rollback(), and handle
duplicates by either skipping them (idempotent re-linking) or raising a
user-friendly validation error (e.g., a ValidationError/HTTPException with a
clear message) instead of letting the exception propagate; also add a return
type hint "-> None" to create() and make any error logs include the "[create]"
prefix per logging guidelines, and import IntegrityError where needed.
In `@backend/app/tests/api/test_user_project.py`:
- Around line 220-222: The POST requests in TestAddProjectUsers are using a
trailing slash (f"{USER_PROJECTS_URL}/") which triggers a 307 redirect; update
both occurrences to use the canonical route without the slash (use
USER_PROJECTS_URL or f"{USER_PROJECTS_URL}") so the POST hits the intended
handler directly (search for f"{USER_PROJECTS_URL}/" inside TestAddProjectUsers
and replace those two occurrences).
---
Outside diff comments:
In `@backend/app/api/routes/user_project.py`:
- Around line 65-67: The current independent checks (validate_organization and
validate_project) allow mismatched org/project combos; update validation so the
project is confirmed to belong to the requested organization before creating
memberships or invite tokens—either extend validate_project or add a new
function (e.g., validate_project_belongs_to_org) to fetch the Project by
body.project_id and assert project.organization_id == body.organization_id, and
call this check in place of or after validate_project (also apply the same fix
for the block referenced at lines 114-118); ensure the code paths that set
UserProject.organization_id and generate invite tokens only run after this
combined validation passes.
In `@backend/app/crud/assistants.py`:
- Around line 274-278: The soft-delete path for Assistant doesn't update the
Assistant.updated_at timestamp; before committing the change in the block that
sets existing_assistant.deleted_at = now(), also set
existing_assistant.updated_at = now() (mirroring APIKeyCrud.delete and
DocumentCrud.delete behavior) so the record's "last modified" time is updated,
then session.add, session.commit, session.refresh as currently done.
In `@backend/app/models/evaluation.py`:
- Around line 357-364: The updated_at SQLField for EvaluationRun and
EvaluationDataset currently only sets a default timestamp on INSERT; add an
SQLAlchemy onupdate hook so it auto-updates on UPDATEs by adding "onupdate": now
to the sa_column_kwargs for the SQLField definitions named
EvaluationRun.updated_at and EvaluationDataset.updated_at (the SQLField
instances using default_factory=now and sa_column_kwargs={"comment": ...});
ensure the sa_column_kwargs now contains both "comment" and "onupdate": now
consistent with other models like user.py and tts_evaluation.py.
---
Nitpick comments:
In `@backend/app/alembic/versions/058_v1_query_optimization.py`:
- Around line 182-225: The upgrade and downgrade Alembic entrypoint functions
(upgrade and downgrade) lack return type hints; modify their definitions to
include explicit return annotations by changing their signatures to include "->
None" (i.e., def upgrade() -> None: and def downgrade() -> None:) so both
functions are fully type-annotated per the project's guidelines.
In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py`:
- Around line 47-163: Add explicit return type annotations -> None to the
Alembic entrypoint functions upgrade() and downgrade(): update the function
definitions for upgrade and downgrade in this diff to include a return
annotation of None (i.e., def upgrade() -> None: and def downgrade() -> None:)
so they comply with the project's typing guideline; no other behavioral changes
are needed.
In `@backend/app/crud/thread_results.py`:
- Line 24: Replace the deprecated Pydantic v2 call data.dict() with
data.model_dump() where the OpenAIThread is constructed: change the
OpenAIThread(**data.dict()) call to use OpenAIThread(**data.model_dump()) so the
created_thread/ new_thread instantiation uses the v2-compatible model dump;
update any other occurrences in the same module that still call data.dict()
(search for data.dict and replace with data.model_dump).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e5424384-f9f4-48de-b522-a08cf8402648
📒 Files selected for processing (56)
backend/app/alembic/versions/058_v1_query_optimization.pybackend/app/alembic/versions/059_drop_redundant_indexes.pybackend/app/alembic/versions/060_v1_assorted_cleanups.pybackend/app/api/deps.pybackend/app/api/routes/llm.pybackend/app/api/routes/user_project.pybackend/app/core/security.pybackend/app/crud/api_key.pybackend/app/crud/assistants.pybackend/app/crud/auth.pybackend/app/crud/document/doc_transformation_job.pybackend/app/crud/document/document.pybackend/app/crud/fine_tuning.pybackend/app/crud/llm.pybackend/app/crud/model_evaluation.pybackend/app/crud/openai_conversation.pybackend/app/crud/thread_results.pybackend/app/models/__init__.pybackend/app/models/api_key.pybackend/app/models/assistants.pybackend/app/models/document.pybackend/app/models/document_collection.pybackend/app/models/evaluation.pybackend/app/models/fine_tuning.pybackend/app/models/job.pybackend/app/models/llm/request.pybackend/app/models/model_evaluation.pybackend/app/models/openai_conversation.pybackend/app/models/project.pybackend/app/models/threads.pybackend/app/models/user.pybackend/app/models/user_project.pybackend/app/seed_data/seed_data.jsonbackend/app/seed_data/seed_data.pybackend/app/tests/api/routes/documents/test_route_document_remove.pybackend/app/tests/api/routes/test_fine_tuning.pybackend/app/tests/api/routes/test_threads.pybackend/app/tests/api/test_user_project.pybackend/app/tests/core/test_security.pybackend/app/tests/crud/collections/collection/test_crud_collection_delete.pybackend/app/tests/crud/documents/documents/test_crud_document_delete.pybackend/app/tests/crud/documents/documents/test_crud_document_read_many.pybackend/app/tests/crud/documents/documents/test_crud_document_update.pybackend/app/tests/crud/documents/test_doc_transformation_job.pybackend/app/tests/crud/test_api_key.pybackend/app/tests/crud/test_assistants.pybackend/app/tests/crud/test_fine_tuning.pybackend/app/tests/crud/test_model_evaluation.pybackend/app/tests/crud/test_onboarding.pybackend/app/tests/crud/test_openai_conversation.pybackend/app/tests/seed_data/seed_data.jsonbackend/app/tests/seed_data/seed_data.pybackend/app/tests/utils/auth.pybackend/app/tests/utils/document.pybackend/app/tests/utils/test_data.pybackend/app/tests/utils/utils.py
💤 Files with no reviewable changes (9)
- backend/app/tests/seed_data/seed_data.json
- backend/app/seed_data/seed_data.json
- backend/app/seed_data/seed_data.py
- backend/app/models/document.py
- backend/app/models/api_key.py
- backend/app/models/openai_conversation.py
- backend/app/models/assistants.py
- backend/app/tests/seed_data/seed_data.py
- backend/app/tests/utils/document.py
| def upgrade(): | ||
| # Phase A (transactional): preserve audit timestamp, drop redundant column. | ||
| for table in IS_DELETED_TABLES: | ||
| op.execute( | ||
| f"UPDATE {table} " | ||
| f"SET deleted_at = NOW() " | ||
| f"WHERE is_deleted = TRUE AND deleted_at IS NULL" | ||
| ) | ||
| op.drop_column(table, "is_deleted") | ||
|
|
||
| # Phase B (autocommit): CONCURRENTLY index creation. Each statement | ||
| # runs in its own implicit transaction, required by the CONCURRENTLY | ||
| # variant. | ||
| with op.get_context().autocommit_block(): | ||
| for table, column, index in FK_INDEXES: | ||
| op.execute( | ||
| f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index}" ' | ||
| f'ON "{table}" ("{column}")' | ||
| ) | ||
| for index, body, _schema in COMPOSITE_INDEXES: | ||
| op.execute(f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index}" {body}') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find backend/app/alembic/versions -name "058_v1_query_optimization.py" -type fRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 133
🏁 Script executed:
wc -l backend/app/alembic/versions/058_v1_query_optimization.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 137
🏁 Script executed:
cat -n backend/app/alembic/versions/058_v1_query_optimization.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 10237
Add type hints to upgrade() and downgrade() functions, and redesign migration phases to prevent unrecoverable partial failures.
The upgrade() and downgrade() functions are missing return type annotations (-> None), violating the type hints guideline.
More critically, autocommit_block() at line 195 commits Phase A changes (UPDATE + drop_column) before Phase B index creation begins. If any CREATE INDEX CONCURRENTLY fails, the revision is not recorded as complete but the is_deleted columns are already dropped. A rerun fails on drop_column() with a missing column error, leaving the migration unrecoverable without manual intervention. Wrap Phase A operations inside conditional checks (e.g., column_exists() before drop) or consolidate transaction semantics so either all changes commit together or all roll back on any failure.
🧰 Tools
🪛 OpenGrep (1.20.0)
[ERROR] 202-202: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/alembic/versions/058_v1_query_optimization.py` around lines 182 -
202, Add explicit return type annotations "-> None" to the upgrade() and
downgrade() functions, and prevent unrecoverable partial commits by ensuring
Phase A (the UPDATE and drop_column on IS_DELETED_TABLES) cannot be committed
before Phase B index creation fails: either run Phase A inside the same
transactional boundary as Phase B (avoid using
op.get_context().autocommit_block() for Phase A) or keep autocommit for Phase B
only and guard each drop with a runtime existence check (use a helper like
column_exists(table, "is_deleted") before calling op.drop_column) so a retry
won't error on missing columns; locate these changes around upgrade(),
autocommit_block(), IS_DELETED_TABLES, FK_INDEXES, and COMPOSITE_INDEXES and
apply the same pattern to downgrade().
| def upgrade(): | ||
| with op.get_context().autocommit_block(): | ||
| for index_name, _body in INDEXES_TO_DROP: | ||
| op.execute(f'DROP INDEX CONCURRENTLY IF EXISTS "{index_name}"') | ||
|
|
||
|
|
||
| def downgrade(): | ||
| with op.get_context().autocommit_block(): | ||
| for index_name, body in INDEXES_TO_DROP: | ||
| op.execute(f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index_name}" {body}') |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import ast
from pathlib import Path
path = Path("backend/app/alembic/versions/059_drop_redundant_indexes.py")
tree = ast.parse(path.read_text())
for node in tree.body:
if isinstance(node, ast.FunctionDef):
if node.returns is None:
print(f"{node.name}: missing return type annotation")
PYRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 157
Add explicit return type hints to migration functions.
The upgrade() (line 89) and downgrade() (line 95) functions must declare -> None to comply with repository typing guidelines for Python files.
♻️ Suggested patch
-def upgrade():
+def upgrade() -> None:
with op.get_context().autocommit_block():
for index_name, _body in INDEXES_TO_DROP:
op.execute(f'DROP INDEX CONCURRENTLY IF EXISTS "{index_name}"')
-def downgrade():
+def downgrade() -> None:
with op.get_context().autocommit_block():
for index_name, body in INDEXES_TO_DROP:
op.execute(f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index_name}" {body}')🧰 Tools
🪛 OpenGrep (1.20.0)
[ERROR] 92-92: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
[ERROR] 98-98: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/alembic/versions/059_drop_redundant_indexes.py` around lines 89 -
98, The migration functions lack explicit return type hints; update the function
signatures for upgrade and downgrade to declare return type None (i.e., change
upgrade() and downgrade() to upgrade() -> None and downgrade() -> None) so they
comply with the repository typing guidelines while keeping the existing bodies
(including op.get_context().autocommit_block() and the loops over
INDEXES_TO_DROP) unchanged.
| def upgrade(): | ||
| # 1. Backfill missing timestamp columns. Each column is created with | ||
| # a NOW() server default so existing rows are populated atomically; | ||
| # the default is then dropped so future inserts get their value | ||
| # from the model's `default_factory=now`. | ||
| op.add_column( | ||
| "user_project", | ||
| sa.Column( | ||
| "updated_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| server_default=sa.text("NOW()"), | ||
| comment="Timestamp when the mapping was last updated", | ||
| ), | ||
| ) | ||
| op.alter_column("user_project", "updated_at", server_default=None) | ||
|
|
||
| op.add_column( | ||
| "user", | ||
| sa.Column( | ||
| "inserted_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| server_default=sa.text("NOW()"), | ||
| comment="Timestamp when the user was created", | ||
| ), | ||
| ) | ||
| op.alter_column("user", "inserted_at", server_default=None) | ||
|
|
||
| op.add_column( | ||
| "user", | ||
| sa.Column( | ||
| "updated_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| server_default=sa.text("NOW()"), | ||
| comment="Timestamp when the user was last updated", | ||
| ), | ||
| ) | ||
| op.alter_column("user", "updated_at", server_default=None) | ||
|
|
||
| # 2. job.project_id foreign key (with orphan backfill) | ||
| op.execute( | ||
| """ | ||
| UPDATE job | ||
| SET project_id = NULL | ||
| WHERE project_id IS NOT NULL | ||
| AND project_id NOT IN (SELECT id FROM project) | ||
| """ | ||
| ) | ||
| op.create_foreign_key( | ||
| "job_project_id_fkey", | ||
| "job", | ||
| "project", | ||
| ["project_id"], | ||
| ["id"], | ||
| ondelete="CASCADE", | ||
| ) | ||
|
|
||
| # 3. Rename created_at → inserted_at on job and llm_call, then create | ||
| # the llm_call hot-path index using the new column name. Index | ||
| # creation is CONCURRENTLY and must run outside a transaction. | ||
| op.alter_column("job", "created_at", new_column_name="inserted_at") | ||
| op.alter_column("llm_call", "created_at", new_column_name="inserted_at") | ||
| with op.get_context().autocommit_block(): | ||
| op.execute( | ||
| "CREATE INDEX CONCURRENTLY IF NOT EXISTS " | ||
| '"ix_llm_call_job_inserted_at_active" ' | ||
| 'ON "llm_call" ("job_id", "inserted_at" DESC) ' | ||
| 'WHERE "deleted_at" IS NULL' | ||
| ) | ||
|
|
||
| # 4. documentcollection unique constraint (with dedupe). | ||
| # Build the underlying unique index CONCURRENTLY (no AccessExclusive | ||
| # on the table during the scan/build), then attach it as a constraint | ||
| # via ADD CONSTRAINT ... USING INDEX (catalog-only, brief lock). | ||
| op.execute( | ||
| """ | ||
| DELETE FROM documentcollection | ||
| WHERE id NOT IN ( | ||
| SELECT MIN(id) | ||
| FROM documentcollection | ||
| GROUP BY document_id, collection_id | ||
| ) | ||
| """ | ||
| ) | ||
| with op.get_context().autocommit_block(): | ||
| op.execute( | ||
| "CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS " | ||
| '"uq_document_collection" ' | ||
| 'ON "documentcollection" ("document_id", "collection_id")' | ||
| ) | ||
| op.execute( | ||
| 'ALTER TABLE "documentcollection" ' | ||
| 'ADD CONSTRAINT "uq_document_collection" ' | ||
| 'UNIQUE USING INDEX "uq_document_collection"' | ||
| ) | ||
|
|
||
| # 5. Align global.languages.id to INTEGER | ||
| op.execute("ALTER TABLE global.languages ALTER COLUMN id SET DATA TYPE INTEGER") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and read the full migration file
find backend/app/alembic/versions -name "060*.py" -type fRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 132
🏁 Script executed:
# Read the full migration file to see function signatures and complete context
cat -n backend/app/alembic/versions/060_v1_assorted_cleanups.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 7160
🏁 Script executed:
# Check if there are similar migration patterns in the codebase
fd -e py "versions" backend/app/alembic/ -x wc -l {} \; | sort -n | tail -5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Look at the function signatures in a few other migration files to understand the pattern
fd "versions.*\.py$" backend/app/alembic/ -x head -20 {} \;Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check a few other migration files to see their function signatures
fd "versions.*\.py$" backend/app/alembic/versions -type f | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -50 {}'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 246
🏁 Script executed:
# Search specifically for upgrade/downgrade function signatures across migrations
rg "^def (upgrade|downgrade)" backend/app/alembic/versions/ -A 1Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 24506
Add return type hints and refactor migration to prevent partial application on failure.
This migration lacks return type annotations on upgrade() and downgrade() functions (see recent migrations 047–048 using -> None).
More critically, the destructive operations (column adds, renames, FK creation, row deletion) run within transaction blocks that precede two autocommit_block() calls. If execution fails after line 111's index creation but before the DELETE completes, or if the second autocommit block at line 133 fails, the migration leaves the database partially applied with committed DDL but rolled-back constraints. Future reruns become unsafe because op.add_column() is not idempotent.
Split concurrent-index work (lines 111–117 and 133–138) into a separate migration to complete all transactional schema changes atomically first, then handle index creation outside the transaction boundary.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py` around lines 47 -
147, Add explicit return type hints (-> None) to both upgrade() and downgrade(),
and refactor this migration so all transactional schema changes (op.add_column,
op.alter_column, op.create_foreign_key, DELETEs, ALTER TABLE ... ADD CONSTRAINT,
ALTER COLUMN type) occur inside the single migration transaction and do not
include CONCURRENTLY/index-creation autocommit blocks; extract the two
autocommit index creations that use op.get_context().autocommit_block() (the
CREATE INDEX CONCURRENTLY for "ix_llm_call_job_inserted_at_active" and the
CREATE UNIQUE INDEX CONCURRENTLY for "uq_document_collection") into a new,
separate migration file that only runs those concurrent, non-transactional
statements, leaving this file to perform only transactional changes and making
upgrade()/downgrade() idempotent and safe to rerun.
| __table_args__ = ( | ||
| UniqueConstraint("document_id", "collection_id", name="uq_document_collection"), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all places that insert/add DocumentCollection rows to confirm duplicate-insert handling.
rg -nP --type=py -C4 '\bDocumentCollection\s*\('Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 1448
🏁 Script executed:
cat -n backend/app/crud/document_collection.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1866
🏁 Script executed:
rg -nP --type=py "IntegrityError|DocumentCollection" backend/app/crud/document_collection.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 391
🏁 Script executed:
rg -nP --type=py "session\.(add|commit)|flush\|execute" backend/app/crud/document_collection.py -A 2 -B 2Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 223
Add IntegrityError handling to DocumentCollectionCrud.create() to gracefully handle duplicate document-collection pairs.
The composite unique constraint is well-designed, but the upstream insertion code at backend/app/crud/document_collection.py lines 21–22 does not catch IntegrityError. When bulk_save_objects() and commit() are called, a duplicate (document_id, collection_id) pair will raise an unhandled IntegrityError, resulting in a 500 error instead of a clean validation response.
Wrap the commit in a try/except block to catch IntegrityError and either:
- Skip duplicate pairs silently (if re-linking is idempotent), or
- Raise a user-friendly validation error.
Additionally, the method is missing a return type hint (-> None) and violates the logging guideline (no [function_name] prefix on any errors).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/models/document_collection.py` around lines 9 - 11,
DocumentCollectionCrud.create() currently commits without catching SQLAlchemy
IntegrityError from the composite UniqueConstraint, causing 500s on duplicates;
wrap the bulk_save_objects()/commit() in a try/except that catches
sqlalchemy.exc.IntegrityError, calls session.rollback(), and handle duplicates
by either skipping them (idempotent re-linking) or raising a user-friendly
validation error (e.g., a ValidationError/HTTPException with a clear message)
instead of letting the exception propagate; also add a return type hint "->
None" to create() and make any error logs include the "[create]" prefix per
logging guidelines, and import IntegrityError where needed.
| resp = client.post( | ||
| f"{USER_PROJECTS_URL}/", | ||
| json={ |
There was a problem hiding this comment.
Trailing slash in POST URLs is inconsistent with the rest of the test class.
Lines 221 and 252 use f"{USER_PROJECTS_URL}/", while every other POST in TestAddProjectUsers uses the bare USER_PROJECTS_URL. With FastAPI's default redirect_slashes=True, this causes a 307 redirect before the actual handler runs — so the tests still pass, but they're not hitting the canonical route directly. If redirect_slashes is ever disabled, these tests will break with a 404/405.
🔧 Proposed fix
- resp = client.post(
- f"{USER_PROJECTS_URL}/",
+ resp = client.post(
+ USER_PROJECTS_URL,(Apply to both occurrences at lines 221 and 252.)
Also applies to: 251-253
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/tests/api/test_user_project.py` around lines 220 - 222, The POST
requests in TestAddProjectUsers are using a trailing slash
(f"{USER_PROJECTS_URL}/") which triggers a 307 redirect; update both occurrences
to use the canonical route without the slash (use USER_PROJECTS_URL or
f"{USER_PROJECTS_URL}") so the POST hits the intended handler directly (search
for f"{USER_PROJECTS_URL}/" inside TestAddProjectUsers and replace those two
occurrences).
Summary
Target issue is #808
Explain the motivation for making this change. What existing problem does the pull request solve?
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
Release Notes
Performance Improvements
Data Integrity
Schema Updates