Kaapi: V1.0: Database optimization by AkhileshNegi · Pull Request #817 · ProjectTech4DevAI/kaapi-backend

AkhileshNegi · 2026-05-07T02:35:43Z

Summary

Target issue is #808
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

Release Notes

Performance Improvements
- Added database indexes to optimize query performance for frequently accessed data paths.
Data Integrity
- Improved soft-deletion mechanism using timestamp tracking.
- Added unique constraint on document collections to prevent duplicate entries.
Schema Updates
- Enhanced timestamp tracking across records for better audit trails.
- Refined database constraints and relationships for data consistency.

coderabbitai · 2026-05-07T02:35:49Z

📝 Walkthrough

Walkthrough

This PR implements a comprehensive soft-delete semantic migration across the backend, replacing boolean is_deleted columns with timestamp-based deleted_at deletion tracking via three database migrations, ORM model schema updates, CRUD/query refactoring, API layer changes, and full test coverage spanning 149 files.

Changes

Soft-Delete Schema & Logic Migration

Layer / File(s)	Summary
Database Migrations `backend/app/alembic/versions/058_v1_query_optimization.py`, `059_drop_redundant_indexes.py`, `060_v1_assorted_cleanups.py`	Three coordinated migrations: 058 backfills `deleted_at` from `is_deleted`, drops the boolean column, and creates tenant-scoped/query-optimized indexes; 059 drops redundant indexes; 060 adds missing timestamp columns (`user.inserted_at`, `user.updated_at`, `user_project.updated_at`), renames `created_at` → `inserted_at` on `job` and `llm_call`, deduplicates `documentcollection`, and aligns type definitions.
ORM Model Schema `backend/app/models/api_key.py`, `assistants.py`, `document.py`, `document_collection.py`, `evaluation.py`, `fine_tuning.py`, `job.py`, `llm/request.py`, `model_evaluation.py`, `openai_conversation.py`, `threads.py`, `user.py`, `user_project.py`, `project.py`, `__init__.py`	Remove `is_deleted: bool` fields from APIKey, Assistant, Document, ModelEvaluation, OpenAIConversation. Add timestamps: `User.inserted_at`/`updated_at`, `UserProject.updated_at`. Rename timestamp fields: `Job.created_at` → `inserted_at`, `LlmCall.created_at` → `inserted_at`. Rename model classes for consistency: `Fine_Tuning` → `FineTuning`, `OpenAI_Thread` → `OpenAIThread` (with explicit `__tablename__` mappings). Update relationship type annotations and add `DocumentCollection` uniqueness constraint.
CRUD & Query Layer `backend/app/crud/api_key.py`, `assistants.py`, `auth.py`, `document/doc_transformation_job.py`, `document/document.py`, `fine_tuning.py`, `llm.py`, `model_evaluation.py`, `openai_conversation.py`, `thread_results.py`	Update read operations to filter `deleted_at.is_(None)` instead of `is_deleted.is_(False)`. Soft-delete writes now set `deleted_at = now()` instead of toggling `is_deleted`. Rename model references: `OpenAI_Thread` → `OpenAIThread` in thread_results; `Fine_Tuning` → `FineTuning` across fine_tuning CRUD. Update query ordering: `LlmCall` ordered by `inserted_at` DESC. Optimize `fetch_top_model_by_doc_id` to sort by numeric `mcc_score` at query time rather than in Python.
API Routes & Authentication `backend/app/api/deps.py`, `core/security.py`, `api/routes/llm.py`, `api/routes/user_project.py`	JWT/API key verification checks `deleted_at.is_(None)` for validity. LLM endpoint response maps `job.inserted_at` instead of `job.created_at`. User-project bulk-add refactored from per-user CRUD delegation to atomic prefetch + in-memory conflict detection with transactional rollback on constraint violations; adds user creation with generated `hashed_password` and `is_active=False` for new invitees.
Seed Data & Configuration `backend/app/seed_data/seed_data.json`, `seed_data.py`, `backend/app/tests/seed_data/seed_data.json`, `seed_data.py`	Remove `is_deleted` field from API key seed records; `APIKeyData` model removes `is_deleted` attribute and relies on optional `deleted_at` timestamp for soft-delete state.
Tests `backend/app/tests/api/routes/documents/test_route_document_remove.py`, `test_fine_tuning.py`, `test_threads.py`, `test_user_project.py`, `core/test_security.py`, `crud/collections/collection/test_crud_collection_delete.py`, `crud/documents/documents/test_crud_document_delete.py`, `test_crud_document_read_many.py`, `test_crud_document_update.py`, `test_doc_transformation_job.py`, `test_api_key.py`, `test_assistants.py`, `test_fine_tuning.py`, `test_model_evaluation.py`, `test_onboarding.py`, `test_openai_conversation.py`, `seed_data/...`, `utils/auth.py`, `utils/document.py`, `utils/test_data.py`, `utils/utils.py`	Soft-delete assertions updated: `is_deleted is True` → `deleted_at is not None`, `is_deleted is False` → `deleted_at is None`. Model rename updates: `Fine_Tuning` → `FineTuning`, `OpenAI_Thread` → `OpenAIThread` in queries and fixtures. New tests for bulk-add conflict detection and soft-delete filtering in model evaluation/fine-tuning. Seed data tests updated for timestamp-based deletion.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

ProjectTech4DevAI/kaapi-backend#488: Updates seed data modules (APIKeyData) with related soft-delete/configuration changes.
ProjectTech4DevAI/kaapi-backend#476: Overlapping ORM model schema changes (api_key, assistants, document, fine_tuning, threads) with soft-delete and timestamp field modifications.
ProjectTech4DevAI/kaapi-backend#548: Related database schema/migration work around soft-delete handling and partial index strategies.

Suggested reviewers

kartpop
Prajna1999

Poem

🐰 The schema hops with gentle grace,
From is_deleted to timestamp's embrace,
Each soft-delete now marks its end time,
As queries bloom where conditions align,
A cleaner schema, timestamp-signed and refined! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 49.40% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Kaapi: V1.0: Database optimization' directly and clearly summarizes the main changes in the PR, which involve significant database schema optimization including migrations, soft-delete consolidation, and index management.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch enhancement/database-cleanups

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…h4DevAI/kaapi-backend into enhancement/database-cleanups

…o enhancement/database-cleanups

…h4DevAI/kaapi-backend into enhancement/database-cleanups

codecov · 2026-05-07T09:09:48Z

Codecov Report

❌ Patch coverage is 98.48485% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/api/routes/user_project.py	96.77%	1 Missing ⚠️
backend/app/crud/fine_tuning.py	87.50%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

backend/app/models/evaluation.py (1)
357-364: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add onupdate hook to EvaluationRun.updated_at and EvaluationDataset.updated_at.

Without SQLAlchemy's onupdate, the updated_at field only gets set on INSERT. Updates at lines 208–210 in batch.py (status, batch_job_id, total_items), at lines 230–231 (error_message, status), and in processing.py (object_store_url) all modify eval_run without explicitly setting updated_at, leaving the timestamp stale.

The codebase pattern (e.g., user.py, tts_evaluation.py) uses "onupdate": now in sa_column_kwargs. Apply the same fix:
-    updated_at: datetime = SQLField(
-        default_factory=now,
-        nullable=False,
-        description="The timestamp when the evaluation run was last updated",
-        sa_column_kwargs={
-            "comment": "Timestamp when the evaluation run was last updated"
-        },
-    )
+    updated_at: datetime = SQLField(
+        default_factory=now,
+        nullable=False,
+        description="The timestamp when the evaluation run was last updated",
+        sa_column_kwargs={
+            "comment": "Timestamp when the evaluation run was last updated",
+            "onupdate": now,
+        },
+    )
Apply the same fix to EvaluationDataset.updated_at (lines 172–178) for consistency.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/models/evaluation.py` around lines 357 - 364, The updated_at
SQLField for EvaluationRun and EvaluationDataset currently only sets a default
timestamp on INSERT; add an SQLAlchemy onupdate hook so it auto-updates on
UPDATEs by adding "onupdate": now to the sa_column_kwargs for the SQLField
definitions named EvaluationRun.updated_at and EvaluationDataset.updated_at (the
SQLField instances using default_factory=now and sa_column_kwargs={"comment":
...}); ensure the sa_column_kwargs now contains both "comment" and "onupdate":
now consistent with other models like user.py and tts_evaluation.py.
backend/app/crud/assistants.py (1)
274-278: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set updated_at when soft-deleting an assistant for consistency with other delete operations in this PR.

The Assistant.updated_at field lacks a SQL-level onupdate= default and only sets a default_factory. This means the soft-delete operation at line 275 does not update the timestamp — unlike the APIKeyCrud.delete and DocumentCrud.delete implementations elsewhere in the PR that update both deleted_at and updated_at. This breaks the "last modified" semantics for clients relying on updated_at to track when records were last changed.
♻️ Proposed fix
    existing_assistant.deleted_at = now()
+   existing_assistant.updated_at = now()
    session.add(existing_assistant)
    session.commit()
    session.refresh(existing_assistant)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/crud/assistants.py` around lines 274 - 278, The soft-delete path
for Assistant doesn't update the Assistant.updated_at timestamp; before
committing the change in the block that sets existing_assistant.deleted_at =
now(), also set existing_assistant.updated_at = now() (mirroring
APIKeyCrud.delete and DocumentCrud.delete behavior) so the record's "last
modified" time is updated, then session.add, session.commit, session.refresh as
currently done.
backend/app/api/routes/user_project.py (1)
65-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate the project/org pairing before creating memberships.

These checks are independent right now. If body.project_id belongs to a different organization, the code below will persist UserProject.organization_id = body.organization_id and generate invite tokens for the wrong org/project combination.
🛠️ Suggested fix
-    validate_organization(session=session, org_id=body.organization_id)
-    validate_project(session=session, project_id=body.project_id)
+    validate_organization(session=session, org_id=body.organization_id)
+    project = get_project_by_id(session=session, project_id=body.project_id)
+    if project is None or project.organization_id != body.organization_id:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail="Project does not belong to the given organization",
+        )
Also applies to: 114-118
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/routes/user_project.py` around lines 65 - 67, The current
independent checks (validate_organization and validate_project) allow mismatched
org/project combos; update validation so the project is confirmed to belong to
the requested organization before creating memberships or invite tokens—either
extend validate_project or add a new function (e.g.,
validate_project_belongs_to_org) to fetch the Project by body.project_id and
assert project.organization_id == body.organization_id, and call this check in
place of or after validate_project (also apply the same fix for the block
referenced at lines 114-118); ensure the code paths that set
UserProject.organization_id and generate invite tokens only run after this
combined validation passes.

🧹 Nitpick comments (3)

backend/app/crud/thread_results.py (1)
24-24: ⚡ Quick win

Replace deprecated data.dict() with data.model_dump().

Pydantic v2 deprecated .dict() in favour of .model_dump(). Since line 24 is already being touched, this is the right moment to fix it.
🔧 Proposed fix
-        new_thread = OpenAIThread(**data.dict())
+        new_thread = OpenAIThread(**data.model_dump())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/crud/thread_results.py` at line 24, Replace the deprecated
Pydantic v2 call data.dict() with data.model_dump() where the OpenAIThread is
constructed: change the OpenAIThread(**data.dict()) call to use
OpenAIThread(**data.model_dump()) so the created_thread/ new_thread
instantiation uses the v2-compatible model dump; update any other occurrences in
the same module that still call data.dict() (search for data.dict and replace
with data.model_dump).
backend/app/alembic/versions/058_v1_query_optimization.py (1)
182-225: ⚡ Quick win

Add explicit -> None annotations to the Alembic entrypoints.

Both changed functions are missing return annotations.

As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/alembic/versions/058_v1_query_optimization.py` around lines 182 -
225, The upgrade and downgrade Alembic entrypoint functions (upgrade and
downgrade) lack return type hints; modify their definitions to include explicit
return annotations by changing their signatures to include "-> None" (i.e., def
upgrade() -> None: and def downgrade() -> None:) so both functions are fully
type-annotated per the project's guidelines.
backend/app/alembic/versions/060_v1_assorted_cleanups.py (1)
47-163: ⚡ Quick win

Add explicit -> None annotations to upgrade() and downgrade().

These changed entrypoints are missing return annotations.

As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py` around lines 47 -
163, Add explicit return type annotations -> None to the Alembic entrypoint
functions upgrade() and downgrade(): update the function definitions for upgrade
and downgrade in this diff to include a return annotation of None (i.e., def
upgrade() -> None: and def downgrade() -> None:) so they comply with the
project's typing guideline; no other behavioral changes are needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/alembic/versions/058_v1_query_optimization.py`:
- Around line 182-202: Add explicit return type annotations "-> None" to the
upgrade() and downgrade() functions, and prevent unrecoverable partial commits
by ensuring Phase A (the UPDATE and drop_column on IS_DELETED_TABLES) cannot be
committed before Phase B index creation fails: either run Phase A inside the
same transactional boundary as Phase B (avoid using
op.get_context().autocommit_block() for Phase A) or keep autocommit for Phase B
only and guard each drop with a runtime existence check (use a helper like
column_exists(table, "is_deleted") before calling op.drop_column) so a retry
won't error on missing columns; locate these changes around upgrade(),
autocommit_block(), IS_DELETED_TABLES, FK_INDEXES, and COMPOSITE_INDEXES and
apply the same pattern to downgrade().

In `@backend/app/alembic/versions/059_drop_redundant_indexes.py`:
- Around line 89-98: The migration functions lack explicit return type hints;
update the function signatures for upgrade and downgrade to declare return type
None (i.e., change upgrade() and downgrade() to upgrade() -> None and
downgrade() -> None) so they comply with the repository typing guidelines while
keeping the existing bodies (including op.get_context().autocommit_block() and
the loops over INDEXES_TO_DROP) unchanged.

In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py`:
- Around line 47-147: Add explicit return type hints (-> None) to both upgrade()
and downgrade(), and refactor this migration so all transactional schema changes
(op.add_column, op.alter_column, op.create_foreign_key, DELETEs, ALTER TABLE ...
ADD CONSTRAINT, ALTER COLUMN type) occur inside the single migration transaction
and do not include CONCURRENTLY/index-creation autocommit blocks; extract the
two autocommit index creations that use op.get_context().autocommit_block() (the
CREATE INDEX CONCURRENTLY for "ix_llm_call_job_inserted_at_active" and the
CREATE UNIQUE INDEX CONCURRENTLY for "uq_document_collection") into a new,
separate migration file that only runs those concurrent, non-transactional
statements, leaving this file to perform only transactional changes and making
upgrade()/downgrade() idempotent and safe to rerun.

In `@backend/app/models/document_collection.py`:
- Around line 9-11: DocumentCollectionCrud.create() currently commits without
catching SQLAlchemy IntegrityError from the composite UniqueConstraint, causing
500s on duplicates; wrap the bulk_save_objects()/commit() in a try/except that
catches sqlalchemy.exc.IntegrityError, calls session.rollback(), and handle
duplicates by either skipping them (idempotent re-linking) or raising a
user-friendly validation error (e.g., a ValidationError/HTTPException with a
clear message) instead of letting the exception propagate; also add a return
type hint "-> None" to create() and make any error logs include the "[create]"
prefix per logging guidelines, and import IntegrityError where needed.

In `@backend/app/tests/api/test_user_project.py`:
- Around line 220-222: The POST requests in TestAddProjectUsers are using a
trailing slash (f"{USER_PROJECTS_URL}/") which triggers a 307 redirect; update
both occurrences to use the canonical route without the slash (use
USER_PROJECTS_URL or f"{USER_PROJECTS_URL}") so the POST hits the intended
handler directly (search for f"{USER_PROJECTS_URL}/" inside TestAddProjectUsers
and replace those two occurrences).

---

Outside diff comments:
In `@backend/app/api/routes/user_project.py`:
- Around line 65-67: The current independent checks (validate_organization and
validate_project) allow mismatched org/project combos; update validation so the
project is confirmed to belong to the requested organization before creating
memberships or invite tokens—either extend validate_project or add a new
function (e.g., validate_project_belongs_to_org) to fetch the Project by
body.project_id and assert project.organization_id == body.organization_id, and
call this check in place of or after validate_project (also apply the same fix
for the block referenced at lines 114-118); ensure the code paths that set
UserProject.organization_id and generate invite tokens only run after this
combined validation passes.

In `@backend/app/crud/assistants.py`:
- Around line 274-278: The soft-delete path for Assistant doesn't update the
Assistant.updated_at timestamp; before committing the change in the block that
sets existing_assistant.deleted_at = now(), also set
existing_assistant.updated_at = now() (mirroring APIKeyCrud.delete and
DocumentCrud.delete behavior) so the record's "last modified" time is updated,
then session.add, session.commit, session.refresh as currently done.

In `@backend/app/models/evaluation.py`:
- Around line 357-364: The updated_at SQLField for EvaluationRun and
EvaluationDataset currently only sets a default timestamp on INSERT; add an
SQLAlchemy onupdate hook so it auto-updates on UPDATEs by adding "onupdate": now
to the sa_column_kwargs for the SQLField definitions named
EvaluationRun.updated_at and EvaluationDataset.updated_at (the SQLField
instances using default_factory=now and sa_column_kwargs={"comment": ...});
ensure the sa_column_kwargs now contains both "comment" and "onupdate": now
consistent with other models like user.py and tts_evaluation.py.

---

Nitpick comments:
In `@backend/app/alembic/versions/058_v1_query_optimization.py`:
- Around line 182-225: The upgrade and downgrade Alembic entrypoint functions
(upgrade and downgrade) lack return type hints; modify their definitions to
include explicit return annotations by changing their signatures to include "->
None" (i.e., def upgrade() -> None: and def downgrade() -> None:) so both
functions are fully type-annotated per the project's guidelines.

In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py`:
- Around line 47-163: Add explicit return type annotations -> None to the
Alembic entrypoint functions upgrade() and downgrade(): update the function
definitions for upgrade and downgrade in this diff to include a return
annotation of None (i.e., def upgrade() -> None: and def downgrade() -> None:)
so they comply with the project's typing guideline; no other behavioral changes
are needed.

In `@backend/app/crud/thread_results.py`:
- Line 24: Replace the deprecated Pydantic v2 call data.dict() with
data.model_dump() where the OpenAIThread is constructed: change the
OpenAIThread(**data.dict()) call to use OpenAIThread(**data.model_dump()) so the
created_thread/ new_thread instantiation uses the v2-compatible model dump;
update any other occurrences in the same module that still call data.dict()
(search for data.dict and replace with data.model_dump).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e5424384-f9f4-48de-b522-a08cf8402648

📥 Commits

Reviewing files that changed from the base of the PR and between e863cf4 and 914ff2c.

📒 Files selected for processing (56)

backend/app/alembic/versions/058_v1_query_optimization.py
backend/app/alembic/versions/059_drop_redundant_indexes.py
backend/app/alembic/versions/060_v1_assorted_cleanups.py
backend/app/api/deps.py
backend/app/api/routes/llm.py
backend/app/api/routes/user_project.py
backend/app/core/security.py
backend/app/crud/api_key.py
backend/app/crud/assistants.py
backend/app/crud/auth.py
backend/app/crud/document/doc_transformation_job.py
backend/app/crud/document/document.py
backend/app/crud/fine_tuning.py
backend/app/crud/llm.py
backend/app/crud/model_evaluation.py
backend/app/crud/openai_conversation.py
backend/app/crud/thread_results.py
backend/app/models/__init__.py
backend/app/models/api_key.py
backend/app/models/assistants.py
backend/app/models/document.py
backend/app/models/document_collection.py
backend/app/models/evaluation.py
backend/app/models/fine_tuning.py
backend/app/models/job.py
backend/app/models/llm/request.py
backend/app/models/model_evaluation.py
backend/app/models/openai_conversation.py
backend/app/models/project.py
backend/app/models/threads.py
backend/app/models/user.py
backend/app/models/user_project.py
backend/app/seed_data/seed_data.json
backend/app/seed_data/seed_data.py
backend/app/tests/api/routes/documents/test_route_document_remove.py
backend/app/tests/api/routes/test_fine_tuning.py
backend/app/tests/api/routes/test_threads.py
backend/app/tests/api/test_user_project.py
backend/app/tests/core/test_security.py
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
backend/app/tests/crud/documents/documents/test_crud_document_delete.py
backend/app/tests/crud/documents/documents/test_crud_document_read_many.py
backend/app/tests/crud/documents/documents/test_crud_document_update.py
backend/app/tests/crud/documents/test_doc_transformation_job.py
backend/app/tests/crud/test_api_key.py
backend/app/tests/crud/test_assistants.py
backend/app/tests/crud/test_fine_tuning.py
backend/app/tests/crud/test_model_evaluation.py
backend/app/tests/crud/test_onboarding.py
backend/app/tests/crud/test_openai_conversation.py
backend/app/tests/seed_data/seed_data.json
backend/app/tests/seed_data/seed_data.py
backend/app/tests/utils/auth.py
backend/app/tests/utils/document.py
backend/app/tests/utils/test_data.py
backend/app/tests/utils/utils.py

💤 Files with no reviewable changes (9)

backend/app/tests/seed_data/seed_data.json
backend/app/seed_data/seed_data.json
backend/app/seed_data/seed_data.py
backend/app/models/document.py
backend/app/models/api_key.py
backend/app/models/openai_conversation.py
backend/app/models/assistants.py
backend/app/tests/seed_data/seed_data.py
backend/app/tests/utils/document.py

coderabbitai · 2026-05-08T11:34:27Z

+def upgrade():
+    # Phase A (transactional): preserve audit timestamp, drop redundant column.
+    for table in IS_DELETED_TABLES:
+        op.execute(
+            f"UPDATE {table} "
+            f"SET deleted_at = NOW() "
+            f"WHERE is_deleted = TRUE AND deleted_at IS NULL"
+        )
+        op.drop_column(table, "is_deleted")
+
+    # Phase B (autocommit): CONCURRENTLY index creation. Each statement
+    # runs in its own implicit transaction, required by the CONCURRENTLY
+    # variant.
+    with op.get_context().autocommit_block():
+        for table, column, index in FK_INDEXES:
+            op.execute(
+                f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index}" '
+                f'ON "{table}" ("{column}")'
+            )
+        for index, body, _schema in COMPOSITE_INDEXES:
+            op.execute(f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index}" {body}')


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

find backend/app/alembic/versions -name "058_v1_query_optimization.py" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 133

🏁 Script executed:

wc -l backend/app/alembic/versions/058_v1_query_optimization.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 137

🏁 Script executed:

cat -n backend/app/alembic/versions/058_v1_query_optimization.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 10237

Add type hints to upgrade() and downgrade() functions, and redesign migration phases to prevent unrecoverable partial failures.

The upgrade() and downgrade() functions are missing return type annotations (-> None), violating the type hints guideline.

More critically, autocommit_block() at line 195 commits Phase A changes (UPDATE + drop_column) before Phase B index creation begins. If any CREATE INDEX CONCURRENTLY fails, the revision is not recorded as complete but the is_deleted columns are already dropped. A rerun fails on drop_column() with a missing column error, leaving the migration unrecoverable without manual intervention. Wrap Phase A operations inside conditional checks (e.g., column_exists() before drop) or consolidate transaction semantics so either all changes commit together or all roll back on any failure.

🧰 Tools

🪛 OpenGrep (1.20.0)

[ERROR] 202-202: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/alembic/versions/058_v1_query_optimization.py` around lines 182 - 202, Add explicit return type annotations "-> None" to the upgrade() and downgrade() functions, and prevent unrecoverable partial commits by ensuring Phase A (the UPDATE and drop_column on IS_DELETED_TABLES) cannot be committed before Phase B index creation fails: either run Phase A inside the same transactional boundary as Phase B (avoid using op.get_context().autocommit_block() for Phase A) or keep autocommit for Phase B only and guard each drop with a runtime existence check (use a helper like column_exists(table, "is_deleted") before calling op.drop_column) so a retry won't error on missing columns; locate these changes around upgrade(), autocommit_block(), IS_DELETED_TABLES, FK_INDEXES, and COMPOSITE_INDEXES and apply the same pattern to downgrade().

coderabbitai · 2026-05-08T11:34:27Z

+def upgrade():
+    with op.get_context().autocommit_block():
+        for index_name, _body in INDEXES_TO_DROP:
+            op.execute(f'DROP INDEX CONCURRENTLY IF EXISTS "{index_name}"')
+
+
+def downgrade():
+    with op.get_context().autocommit_block():
+        for index_name, body in INDEXES_TO_DROP:
+            op.execute(f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index_name}" {body}')


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import ast from pathlib import Path path = Path("backend/app/alembic/versions/059_drop_redundant_indexes.py") tree = ast.parse(path.read_text()) for node in tree.body: if isinstance(node, ast.FunctionDef): if node.returns is None: print(f"{node.name}: missing return type annotation") PY

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 157

Add explicit return type hints to migration functions.

The upgrade() (line 89) and downgrade() (line 95) functions must declare -> None to comply with repository typing guidelines for Python files.

♻️ Suggested patch

-def upgrade(): +def upgrade() -> None: with op.get_context().autocommit_block(): for index_name, _body in INDEXES_TO_DROP: op.execute(f'DROP INDEX CONCURRENTLY IF EXISTS "{index_name}"') -def downgrade(): +def downgrade() -> None: with op.get_context().autocommit_block(): for index_name, body in INDEXES_TO_DROP: op.execute(f'CREATE INDEX CONCURRENTLY IF NOT EXISTS "{index_name}" {body}')

🧰 Tools

🪛 OpenGrep (1.20.0)

[ERROR] 92-92: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

[ERROR] 98-98: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/alembic/versions/059_drop_redundant_indexes.py` around lines 89 - 98, The migration functions lack explicit return type hints; update the function signatures for upgrade and downgrade to declare return type None (i.e., change upgrade() and downgrade() to upgrade() -> None and downgrade() -> None) so they comply with the repository typing guidelines while keeping the existing bodies (including op.get_context().autocommit_block() and the loops over INDEXES_TO_DROP) unchanged.

coderabbitai · 2026-05-08T11:34:27Z

+def upgrade():
+    # 1. Backfill missing timestamp columns. Each column is created with
+    #    a NOW() server default so existing rows are populated atomically;
+    #    the default is then dropped so future inserts get their value
+    #    from the model's `default_factory=now`.
+    op.add_column(
+        "user_project",
+        sa.Column(
+            "updated_at",
+            sa.DateTime(),
+            nullable=False,
+            server_default=sa.text("NOW()"),
+            comment="Timestamp when the mapping was last updated",
+        ),
+    )
+    op.alter_column("user_project", "updated_at", server_default=None)
+
+    op.add_column(
+        "user",
+        sa.Column(
+            "inserted_at",
+            sa.DateTime(),
+            nullable=False,
+            server_default=sa.text("NOW()"),
+            comment="Timestamp when the user was created",
+        ),
+    )
+    op.alter_column("user", "inserted_at", server_default=None)
+
+    op.add_column(
+        "user",
+        sa.Column(
+            "updated_at",
+            sa.DateTime(),
+            nullable=False,
+            server_default=sa.text("NOW()"),
+            comment="Timestamp when the user was last updated",
+        ),
+    )
+    op.alter_column("user", "updated_at", server_default=None)
+
+    # 2. job.project_id foreign key (with orphan backfill)
+    op.execute(
+        """
+        UPDATE job
+        SET project_id = NULL
+        WHERE project_id IS NOT NULL
+          AND project_id NOT IN (SELECT id FROM project)
+        """
+    )
+    op.create_foreign_key(
+        "job_project_id_fkey",
+        "job",
+        "project",
+        ["project_id"],
+        ["id"],
+        ondelete="CASCADE",
+    )
+
+    # 3. Rename created_at → inserted_at on job and llm_call, then create
+    #    the llm_call hot-path index using the new column name. Index
+    #    creation is CONCURRENTLY and must run outside a transaction.
+    op.alter_column("job", "created_at", new_column_name="inserted_at")
+    op.alter_column("llm_call", "created_at", new_column_name="inserted_at")
+    with op.get_context().autocommit_block():
+        op.execute(
+            "CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+            '"ix_llm_call_job_inserted_at_active" '
+            'ON "llm_call" ("job_id", "inserted_at" DESC) '
+            'WHERE "deleted_at" IS NULL'
+        )
+
+    # 4. documentcollection unique constraint (with dedupe).
+    #    Build the underlying unique index CONCURRENTLY (no AccessExclusive
+    #    on the table during the scan/build), then attach it as a constraint
+    #    via ADD CONSTRAINT ... USING INDEX (catalog-only, brief lock).
+    op.execute(
+        """
+        DELETE FROM documentcollection
+        WHERE id NOT IN (
+            SELECT MIN(id)
+            FROM documentcollection
+            GROUP BY document_id, collection_id
+        )
+        """
+    )
+    with op.get_context().autocommit_block():
+        op.execute(
+            "CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS "
+            '"uq_document_collection" '
+            'ON "documentcollection" ("document_id", "collection_id")'
+        )
+    op.execute(
+        'ALTER TABLE "documentcollection" '
+        'ADD CONSTRAINT "uq_document_collection" '
+        'UNIQUE USING INDEX "uq_document_collection"'
+    )
+
+    # 5. Align global.languages.id to INTEGER
+    op.execute("ALTER TABLE global.languages ALTER COLUMN id SET DATA TYPE INTEGER")
+


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the full migration file find backend/app/alembic/versions -name "060*.py" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 132

🏁 Script executed:

# Read the full migration file to see function signatures and complete context cat -n backend/app/alembic/versions/060_v1_assorted_cleanups.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 7160

🏁 Script executed:

# Check if there are similar migration patterns in the codebase fd -e py "versions" backend/app/alembic/ -x wc -l {} \; | sort -n | tail -5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Look at the function signatures in a few other migration files to understand the pattern fd "versions.*\.py$" backend/app/alembic/ -x head -20 {} \;

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Check a few other migration files to see their function signatures fd "versions.*\.py$" backend/app/alembic/versions -type f | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -50 {}'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 246

🏁 Script executed:

# Search specifically for upgrade/downgrade function signatures across migrations rg "^def (upgrade|downgrade)" backend/app/alembic/versions/ -A 1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 24506

Add return type hints and refactor migration to prevent partial application on failure.

This migration lacks return type annotations on upgrade() and downgrade() functions (see recent migrations 047–048 using -> None).

More critically, the destructive operations (column adds, renames, FK creation, row deletion) run within transaction blocks that precede two autocommit_block() calls. If execution fails after line 111's index creation but before the DELETE completes, or if the second autocommit block at line 133 fails, the migration leaves the database partially applied with committed DDL but rolled-back constraints. Future reruns become unsafe because op.add_column() is not idempotent.

Split concurrent-index work (lines 111–117 and 133–138) into a separate migration to complete all transactional schema changes atomically first, then handle index creation outside the transaction boundary.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/alembic/versions/060_v1_assorted_cleanups.py` around lines 47 - 147, Add explicit return type hints (-> None) to both upgrade() and downgrade(), and refactor this migration so all transactional schema changes (op.add_column, op.alter_column, op.create_foreign_key, DELETEs, ALTER TABLE ... ADD CONSTRAINT, ALTER COLUMN type) occur inside the single migration transaction and do not include CONCURRENTLY/index-creation autocommit blocks; extract the two autocommit index creations that use op.get_context().autocommit_block() (the CREATE INDEX CONCURRENTLY for "ix_llm_call_job_inserted_at_active" and the CREATE UNIQUE INDEX CONCURRENTLY for "uq_document_collection") into a new, separate migration file that only runs those concurrent, non-transactional statements, leaving this file to perform only transactional changes and making upgrade()/downgrade() idempotent and safe to rerun.

coderabbitai · 2026-05-08T11:34:27Z

+    __table_args__ = (
+        UniqueConstraint("document_id", "collection_id", name="uq_document_collection"),
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all places that insert/add DocumentCollection rows to confirm duplicate-insert handling. rg -nP --type=py -C4 '\bDocumentCollection\s*\('

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1448

🏁 Script executed:

cat -n backend/app/crud/document_collection.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1866

🏁 Script executed:

rg -nP --type=py "IntegrityError|DocumentCollection" backend/app/crud/document_collection.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 391

🏁 Script executed:

rg -nP --type=py "session\.(add|commit)|flush\|execute" backend/app/crud/document_collection.py -A 2 -B 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 223

Add IntegrityError handling to DocumentCollectionCrud.create() to gracefully handle duplicate document-collection pairs.

The composite unique constraint is well-designed, but the upstream insertion code at backend/app/crud/document_collection.py lines 21–22 does not catch IntegrityError. When bulk_save_objects() and commit() are called, a duplicate (document_id, collection_id) pair will raise an unhandled IntegrityError, resulting in a 500 error instead of a clean validation response.

Wrap the commit in a try/except block to catch IntegrityError and either:

Skip duplicate pairs silently (if re-linking is idempotent), or

Raise a user-friendly validation error.

Additionally, the method is missing a return type hint (-> None) and violates the logging guideline (no [function_name] prefix on any errors).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/models/document_collection.py` around lines 9 - 11, DocumentCollectionCrud.create() currently commits without catching SQLAlchemy IntegrityError from the composite UniqueConstraint, causing 500s on duplicates; wrap the bulk_save_objects()/commit() in a try/except that catches sqlalchemy.exc.IntegrityError, calls session.rollback(), and handle duplicates by either skipping them (idempotent re-linking) or raising a user-friendly validation error (e.g., a ValidationError/HTTPException with a clear message) instead of letting the exception propagate; also add a return type hint "-> None" to create() and make any error logs include the "[create]" prefix per logging guidelines, and import IntegrityError where needed.

coderabbitai · 2026-05-08T11:34:27Z

+        resp = client.post(
+            f"{USER_PROJECTS_URL}/",
+            json={


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Trailing slash in POST URLs is inconsistent with the rest of the test class.

Lines 221 and 252 use f"{USER_PROJECTS_URL}/", while every other POST in TestAddProjectUsers uses the bare USER_PROJECTS_URL. With FastAPI's default redirect_slashes=True, this causes a 307 redirect before the actual handler runs — so the tests still pass, but they're not hitting the canonical route directly. If redirect_slashes is ever disabled, these tests will break with a 404/405.

🔧 Proposed fix

- resp = client.post( - f"{USER_PROJECTS_URL}/", + resp = client.post( + USER_PROJECTS_URL,

(Apply to both occurrences at lines 221 and 252.)

Also applies to: 251-253

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/tests/api/test_user_project.py` around lines 220 - 222, The POST requests in TestAddProjectUsers are using a trailing slash (f"{USER_PROJECTS_URL}/") which triggers a 307 redirect; update both occurrences to use the canonical route without the slash (use USER_PROJECTS_URL or f"{USER_PROJECTS_URL}") so the POST hits the intended handler directly (search for f"{USER_PROJECTS_URL}/" inside TestAddProjectUsers and replace those two occurrences).

AkhileshNegi added 3 commits May 5, 2026 11:34

v1.0 readiness database cleanups

d0de1ce

database cleanups next iteration

367f258

added unique constraint

6523910

Merge branch 'main' into enhancement/database-cleanups

f8c7ee7

AkhileshNegi self-assigned this May 7, 2026

AkhileshNegi added the enhancement New feature or request label May 7, 2026

AkhileshNegi added 3 commits May 7, 2026 08:19

added migration

47329e3

Merge branch 'enhancement/database-cleanups' of github.com:ProjectTec…

6f98242

…h4DevAI/kaapi-backend into enhancement/database-cleanups

migration cleanups

bb2a56a

AkhileshNegi marked this pull request as ready for review May 7, 2026 04:25

AkhileshNegi added 5 commits May 7, 2026 11:16

Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…

3bf260b

…o enhancement/database-cleanups

Merge branch 'main' into enhancement/database-cleanups

7a33dc5

migration cleanups

fbc3452

migration cleanups

67e2e14

Merge branch 'enhancement/database-cleanups' of github.com:ProjectTec…

9b9e636

…h4DevAI/kaapi-backend into enhancement/database-cleanups

review resolves

914ff2c

AkhileshNegi requested review from Prajna1999 and kartpop May 8, 2026 11:21

AkhileshNegi added the ready-for-review label May 8, 2026

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kaapi: V1.0: Database optimization#817

Kaapi: V1.0: Database optimization#817
AkhileshNegi wants to merge 13 commits intomainfrom
enhancement/database-cleanups

AkhileshNegi commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AkhileshNegi commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AkhileshNegi commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading