Evaluation: TTS by AkhileshNegi · Pull Request #619 · ProjectTech4DevAI/kaapi-backend

AkhileshNegi · 2026-02-23T05:21:04Z

Summary

Target issue is #535

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

New Features

Full TTS evaluation flow: create/upload text datasets, start runs, synthesise with selectable models/voices, and view paginated runs/results.
Automated batch processing and polling to retrieve generated audio (WAV) and attach duration/size metadata.
Human feedback: update result correctness and comments.

Functional Requirements Testing

coderabbitai · 2026-02-23T05:21:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a TTS evaluation subsystem: API docs and routes, CRUD for datasets/runs/results, Gemini batch submission support, cron polling/processing, audio utilities, DB models and migration, and wiring into existing evaluations cron and routing.

Changes

Cohort / File(s)	Summary
API Documentation `backend/app/api/docs/tts_evaluation/create_dataset.md`, `backend/app/api/docs/tts_evaluation/get_dataset.md`, `backend/app/api/docs/tts_evaluation/get_result.md`, `backend/app/api/docs/tts_evaluation/get_run.md`, `backend/app/api/docs/tts_evaluation/list_datasets.md`, `backend/app/api/docs/tts_evaluation/list_runs.md`, `backend/app/api/docs/tts_evaluation/start_evaluation.md`, `backend/app/api/docs/tts_evaluation/update_feedback.md`	New docs for TTS evaluation endpoints: dataset create/list/get, start/list/get runs, get result, and update human feedback.
API Routes `backend/app/api/routes/tts_evaluations/router.py`, `backend/app/api/routes/tts_evaluations/dataset.py`, `backend/app/api/routes/tts_evaluations/evaluation.py`, `backend/app/api/routes/tts_evaluations/result.py`, `backend/app/api/routes/tts_evaluations/__init__.py`, `backend/app/api/routes/evaluations/__init__.py`	New /evaluations/tts router with dataset, run, and result endpoints; routes enforce project scope and return typed APIResponse payloads; mounted into existing evaluations router.
CRUD Layer `backend/app/crud/tts_evaluations/__init__.py`, `backend/app/crud/tts_evaluations/batch.py`, `backend/app/crud/tts_evaluations/cron.py`, `backend/app/crud/tts_evaluations/dataset.py`, `backend/app/crud/tts_evaluations/result.py`, `backend/app/crud/tts_evaluations/run.py`	New CRUD modules: create/list/get datasets and runs; create/list/update results; batch submission to Gemini; polling and processing completed batches (audio extraction, WAV upload), and run/result lifecycle updates.
Core Batch Provider `backend/app/core/batch/gemini.py`, `backend/app/core/batch/__init__.py`	Added `create_tts_batch_requests` for Gemini TTS JSONL batch submissions and changed download_batch_results to return raw response payload; re-exported new function in core.batch.
Models & Migration `backend/app/models/tts_evaluation.py`, `backend/app/alembic/versions/048_add_tts_evaluation_tables.py`	New SQLModel/Pydantic models for TTS samples, datasets, runs, results, feedback, and public schemas; Alembic migration adding `tts_result` table with JSONB fields, FKs, and indexes.
Services & Utilities `backend/app/services/tts_evaluations/dataset.py`, `backend/app/services/tts_evaluations/audio.py`, `backend/app/services/tts_evaluations/constants.py`, `backend/app/services/tts_evaluations/__init__.py`	Dataset upload/CSV serialization and parsing, PCM→WAV conversion and duration calculation, and TTS default constants.
Cron Integration `backend/app/crud/evaluations/cron.py`	Integrated TTS polling into the evaluations cron, aggregated TTS summary into cron reporting.
Tests `backend/app/tests/core/batch/test_gemini.py`	Adjusted expectations for Gemini response shape to match raw response passthrough changes.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant API as API Routes
    participant CRUD as CRUD Layer
    participant Gemini as Gemini Batch API
    participant Storage as Object Storage
    participant DB as Database
    participant Cron as Cron Poller

    Client->>API: POST /evaluations/tts/runs (start evaluation)
    API->>CRUD: create_tts_run + create_tts_results
    CRUD->>DB: persist run & results (PENDING)
    API->>CRUD: start_tts_evaluation_batch
    CRUD->>Gemini: submit batch job(s) (JSONL texts + voice config)
    Gemini-->>CRUD: batch_job_id(s)
    CRUD->>DB: update run (PROCESSING, batch_job_id)
    API-->>Client: return TTSEvaluationRunPublic

    Note over Cron: asynchronous polling loop
    Cron->>CRUD: poll_all_pending_tts_evaluations
    CRUD->>Gemini: get batch status / download results
    Gemini-->>CRUD: batch results (base64 PCM in JSONL)
    CRUD->>CRUD: _extract_audio_from_response -> pcm_to_wav
    CRUD->>Storage: upload WAV
    Storage-->>CRUD: object_store_url
    CRUD->>DB: update result (COMPLETED, url, metadata)
    CRUD->>DB: update run (COMPLETED)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Evaluation: Fix score format #549 — modifies evaluations router wiring and may conflict with route inclusion changes.
Evaluation: STT #571 — parallel STT evaluation work; similar routing, CRUD and Gemini batch/cron integration patterns.
STT Evaluation: Refactor #601 — changes to Gemini batch provider and batch request/response handling that overlap with core.batch/gemini edits.

Suggested labels

ready-for-review

Suggested reviewers

Prajna1999
kartpop
vprashrex

Poem

🐇🎶
I nibbled lines and hummed a tune,
Turned text to waves beneath the moon,
Batches sailed and bytes took flight,
WAVs landed soft and bright,
A rabbit claps for devs tonight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Evaluation: TTS' is concise and directly references the main feature being added (TTS evaluation), which aligns with the comprehensive changeset adding TTS evaluation endpoints, CRUD operations, models, and services.
Docstring Coverage	✅ Passed	Docstring coverage is 91.84% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/tts-evaluation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (13)

backend/app/services/tts_evaluations/audio.py (1)
3-5: Unused import: struct.

struct is imported but never used — the wave module handles header construction internally.
🧹 Proposed fix
 import io
-import struct
 import wave
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/tts_evaluations/audio.py` around lines 3 - 5, The import
statement for the symbol struct in backend/app/services/tts_evaluations/audio.py
is unused and should be removed to clean up the module imports; update the
top-of-file imports by deleting the struct import so only io and wave remain
(leave any other existing imports untouched).
backend/app/alembic/versions/047_add_tts_evaluation_tables.py (2)
133-150: Redundant single-column index ix_tts_result_run_id.

Both idx_tts_result_feedback(evaluation_run_id, is_correct) and idx_tts_result_status(evaluation_run_id, status) have evaluation_run_id as the leading column, so either composite index already satisfies queries that filter only on evaluation_run_id. The standalone index ix_tts_result_run_id is redundant and adds write overhead without benefit.
♻️ Proposed fix: remove the redundant index

In upgrade():
-    op.create_index(
-        "ix_tts_result_run_id",
-        "tts_result",
-        ["evaluation_run_id"],
-        unique=False,
-    )
In downgrade():
-    op.drop_index("ix_tts_result_run_id", table_name="tts_result")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/alembic/versions/047_add_tts_evaluation_tables.py` around lines
133 - 150, The migration creates a redundant single-column index
ix_tts_result_run_id on tts_result.evaluation_run_id because the composite
indexes idx_tts_result_feedback (evaluation_run_id, is_correct) and
idx_tts_result_status (evaluation_run_id, status) already cover queries
filtering only by evaluation_run_id; remove the op.create_index call for
"ix_tts_result_run_id" from upgrade(), and also remove the corresponding
op.drop_index for "ix_tts_result_run_id" from downgrade() so the migration no
longer creates or drops that redundant index (leave idx_tts_result_feedback and
idx_tts_result_status unchanged).
103-114: Consider adding server_default for timestamp columns.

inserted_at and updated_at are nullable=False with no server_default. If a row is ever inserted outside the ORM (e.g., a manual SQL fix, a migration backfill, or another service), the insert will fail. Adding server_default=sa.func.now() provides a safety net.
♻️ Suggested change
         sa.Column(
             "inserted_at",
             sa.DateTime(),
             nullable=False,
+            server_default=sa.func.now(),
             comment="Timestamp when the result was created",
         ),
         sa.Column(
             "updated_at",
             sa.DateTime(),
             nullable=False,
+            server_default=sa.func.now(),
             comment="Timestamp when the result was last updated",
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/alembic/versions/047_add_tts_evaluation_tables.py` around lines
103 - 114, The timestamp columns inserted_at and updated_at in the Alembic
revision (columns named "inserted_at" and "updated_at") are non-nullable but
lack server defaults; update their Column definitions to include a
server_default (e.g., server_default=sa.func.now() or sa.text('now()')) so
inserts outside the ORM won't fail, and keep the existing nullable=False and
comments intact; apply this change to both Column(...) calls in this migration.
backend/app/crud/evaluations/cron.py (1)
49-66: Minor inconsistency in dict access style (pre-existing).

Lines 49–61 use direct key access (summary["processed"]) while Lines 62–65 use .get("details", []). Both are safe since all three polling functions guarantee these keys, but you might consider making the access style uniform for consistency. Not blocking since this pattern was pre-existing before TTS was added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/cron.py` around lines 49 - 66, Summary: Make
dict access style consistent by using .get for all summary keys. Fix: Replace
direct key indexing for processed/failed/still_processing in the computations of
total_processed, total_failed, and total_still_processing to use
text_summary.get("processed", 0), stt_summary.get("processed", 0),
tts_summary.get("processed", 0) (and similar for "failed" and
"still_processing"), and keep all_details using .get("details", []); this
ensures uniform access for text_summary, stt_summary, and tts_summary and safe
defaults if keys are ever missing.
backend/app/api/routes/tts_evaluations/result.py (2)
64-83: Duplicated TTSResultPublic construction across both endpoints.

The mapping from TTSResult → TTSResultPublic (including the metadata_ extraction) is repeated verbatim. Consider extracting a small helper like _to_public(result: TTSResult) -> TTSResultPublic to keep it DRY—especially since this same pattern appears in the evaluation routes too.

Also applies to: 109-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/tts_evaluations/result.py` around lines 64 - 83, The
TTSResult→TTSResultPublic mapping is duplicated; extract a helper function (e.g.
_to_public(result: TTSResult) -> TTSResultPublic) that centralizes building
TTSResultPublic (including extracting duration_seconds and size_bytes from
result.metadata_). Replace the inline constructions in this file’s endpoints
(the current return block that creates TTSResultPublic and the similar block at
lines 109-128) with calls to _to_public(result) so both endpoints reuse the same
conversion logic.
44-62: Redundant existence check before update_tts_human_feedback.

update_tts_human_feedback (per backend/app/crud/tts_evaluations/result.py lines 221-273) already validates existence via get_tts_result_by_id and raises HTTPException(404) if not found. The pre-check on lines 44-52 duplicates that query.

You can remove the pre-check and rely on the CRUD layer's 404 handling, or remove the internal check from update_tts_human_feedback and keep it here—but having both is an unnecessary extra DB query.
♻️ Suggested simplification
 def update_result_feedback(
     _session: SessionDep,
     auth_context: AuthContextDep,
     result_id: int,
     feedback: TTSFeedbackUpdate = Body(...),
 ) -> APIResponse[TTSResultPublic]:
     """Update human feedback on a TTS result."""
     logger.info(
         f"[update_result_feedback] Updating feedback | "
         f"result_id: {result_id}, is_correct: {feedback.is_correct}"
     )
 
-    # Verify result exists and belongs to this project
-    existing = get_tts_result_by_id(
-        session=_session,
-        result_id=result_id,
-        org_id=auth_context.organization_.id,
-        project_id=auth_context.project_.id,
-    )
-
-    if not existing:
-        raise HTTPException(status_code=404, detail="Result not found")
-
     # Update feedback
     result = update_tts_human_feedback(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/tts_evaluations/result.py` around lines 44 - 62,
Remove the redundant existence check that calls get_tts_result_by_id before
calling update_tts_human_feedback: delete the block that queries existing and
raises HTTPException when falsy, and rely on update_tts_human_feedback to
perform the existence validation (or conversely, if you prefer validation in the
route, remove the internal check inside update_tts_human_feedback); locate the
redundant code by the get_tts_result_by_id call and the subsequent if not
existing / raise HTTPException block and remove it so only one DB existence
check remains for result_id with auth_context.organization_.id and
auth_context.project_.id.
backend/app/crud/tts_evaluations/batch.py (1)
152-153: Bare Exception raised when all batch submissions fail.

Using raise Exception(...) loses context and makes it hard to distinguish this failure from unexpected errors upstream (e.g., in the API route's except Exception handler at evaluation.py line 119). Consider defining or reusing a domain-specific exception.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/batch.py` around lines 152 - 153, Replace
the bare raise Exception(...) used when batch_jobs is empty with a
domain-specific exception: define a new exception class (e.g.,
BatchSubmissionError) in this module or a shared exceptions module and raise
that instead (include the original error details or chain underlying exceptions
via "from" where applicable) so callers can distinguish batch submission
failures; update any upstream handlers if needed to catch BatchSubmissionError
instead of a generic Exception.
backend/app/crud/tts_evaluations/dataset.py (1)
117-173: Consider using model_validate to reduce boilerplate in the mapping.

The manual field-by-field construction of TTSDatasetPublic (lines 157-169) is repeated here and in the API route's get_dataset. If TTSDatasetPublic fields align with EvaluationDataset, Pydantic's model_validate (or from_orm) can eliminate this boilerplate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/dataset.py` around lines 117 - 173, Replace
the manual field-by-field construction in list_tts_datasets with Pydantic model
validation/mapping to remove boilerplate: instead of building
TTSDatasetPublic(...) for each EvaluationDataset, call
TTSDatasetPublic.model_validate(dataset) (or
TTSDatasetPublic.model_validate(dataset, mode="from_attributes") / use from_orm
equivalent) to convert the ORM EvaluationDataset into the Pydantic
TTSDatasetPublic; if needed, enable TTSDatasetPublic model_config
(from_attributes/orm_mode) so model_validate/from_orm can read attributes from
EvaluationDataset, and update the mapping in list_tts_datasets (and mirror the
same change in get_dataset) to return [TTSDatasetPublic.model_validate(d) for d
in datasets] and keep the same returned total.
backend/app/crud/tts_evaluations/result.py (3)
62-64: Redundant flush() before commit().

session.commit() implicitly flushes, so session.flush() on Line 63 is unnecessary here.
Simplify
     session.add_all(results)
-    session.flush()
     session.commit()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/result.py` around lines 62 - 64, Remove the
redundant explicit flush before commit: the block that calls
session.add_all(results) followed by session.flush() and session.commit() should
drop the session.flush() call because session.commit() implicitly flushes; keep
session.add_all(results) and then call session.commit() (reference:
session.add_all, session.flush, session.commit in the tts_evaluations/result.py
function).
320-328: Simplify dict comprehension to dict() constructor.

Per static analysis (Ruff C416), {status: count for status, count in rows} can be simplified.
Fix
-    return {status: count for status, count in rows}
+    return dict(rows)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/result.py` around lines 320 - 328, The dict
comprehension building the status->count map can be simplified: replace the
final return that currently constructs {status: count for status, count in rows}
with a direct dict conversion of the query result (rows) so the code returns
dict(rows); locate the query using TTSResult (the statement variable) and the
rows variable set from session.exec(statement).all(), and change only the return
expression to use dict(rows).
177-219: update_tts_result flushes but does not commit — document this caller obligation.

This function intentionally uses flush() to support batch contexts (e.g., process_completed_tts_batch in cron.py commits once at the end). However, if called from a non-batch context, uncommitted changes will be silently lost. A brief docstring note about the caller's commit responsibility would prevent misuse.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/result.py` around lines 177 - 219, Update
the update_tts_result docstring to explicitly state that this function calls
session.flush() but does not commit the transaction, so callers (e.g.,
process_completed_tts_batch) are responsible for calling session.commit() when
not running inside a batch; reference the function name update_tts_result, the
model TTSResult, and the use of now() to indicate this only persists flushed
changes until the caller commits.
backend/app/models/tts_evaluation.py (1)
207-230: Validator parameter name valid_model is misleading; default duplicates the constant.

The field_validator parameter (Line 220) is named valid_model but it receives the raw input before validation. A name like models or v (Pydantic convention) would be clearer.

The default factory on Line 213 hard-codes ["gemini-2.5-pro-preview-tts"] while SUPPORTED_TTS_MODELS already holds the same list. Reuse the constant to avoid drift.
Suggested changes
     models: list[str] = Field(
-        default_factory=lambda: ["gemini-2.5-pro-preview-tts"],
+        default_factory=lambda: list(SUPPORTED_TTS_MODELS),
         description="List of TTS models to use",
         min_length=1,
     )
 
     `@field_validator`("models")
     `@classmethod`
-    def validate_models(cls, valid_model: list[str]) -> list[str]:
+    def validate_models(cls, v: list[str]) -> list[str]:
         """Validate that all models are supported."""
-        if not valid_model:
+        if not v:
             raise ValueError("At least one model must be specified")
-        unsupported = [m for m in valid_model if m not in SUPPORTED_TTS_MODELS]
+        unsupported = [m for m in v if m not in SUPPORTED_TTS_MODELS]
         if unsupported:
             raise ValueError(
                 f"Unsupported model(s): {', '.join(unsupported)}. "
                 f"Supported models are: {', '.join(SUPPORTED_TTS_MODELS)}"
             )
-        return valid_model
+        return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/tts_evaluation.py` around lines 207 - 230, In
TTSEvaluationRunCreate, update the validator and default to be clearer and less
brittle: rename the validator parameter in validate_models (the method decorated
with `@field_validator`("models")) from valid_model to a conventional name like
models or v to reflect it receives the raw input, and change the models Field
default_factory from the hard-coded ["gemini-2.5-pro-preview-tts"] to reuse
SUPPORTED_TTS_MODELS (e.g., default_factory=lambda: SUPPORTED_TTS_MODELS) so the
default cannot drift from the supported list; keep the validation logic in
validate_models the same but refer to the renamed parameter.
backend/app/crud/tts_evaluations/run.py (1)
178-230: update_tts_run lacks org/project/type scoping, unlike other functions in this module.

get_tts_run_by_id and list_tts_runs both filter by organization_id, project_id, and EvaluationType.TTS. This function queries only by run_id (Line 202), which is inconsistent. While current callers already have valid run IDs, adding scoping here would be more defensive.

Also, the if value is not None guard (Line 217) means you can never explicitly clear a field (e.g., reset error_message to None on a retry). Consider using a sentinel value or an explicit "fields to update" set if clearing is needed later.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/run.py` around lines 178 - 230, The
update_tts_run function currently selects EvaluationRun only by run_id; change
its lookup to also filter by organization_id, project_id and EvaluationType.TTS
(similar to get_tts_run_by_id and list_tts_runs) so the update is properly
scoped; additionally replace the current updates dict + "if value is not None"
loop (which prevents clearing fields like error_message) with an explicit
fields_to_update parameter or a sentinel (e.g., use a special UNSET constant)
and only update attributes present in that set/whose value is not UNSET while
allowing None to be written, then persist via
session.add/session.commit/session.refresh and keep logging with update_tts_run
and updated_at = now().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/routes/tts_evaluations/evaluation.py`:
- Around line 277-297: The GET handler get_tts_evaluation_run must include the
same run_metadata that start_tts_evaluation sets on TTSEvaluationRunPublic so
payloads match; update the TTSEvaluationRunWithResults construction in
get_tts_evaluation_run to populate run_metadata (e.g.,
run_metadata=run.run_metadata or the appropriate attribute on the run object)
and return it in the APIResponse (or remove from POST if you choose consistency
the other way), ensuring the field name matches TTSEvaluationRunPublic's
run_metadata.
- Line 130: Replace the HTTPException detail that interpolates the raw exception
(the raise HTTPException(status_code=500, detail=f"Batch submission failed:
{e}") line) with a generic message like "Batch submission failed" and do not
include the exception text; keep the existing server-side logging (the logger
calls around lines 120-123) to record full error details for debugging. Ensure
only the generic message is returned to the client while the original exception
is logged server-side.
- Around line 165-169: The helper _get_sample_texts_from_dataset uses
forward-string types that are unresolved by type checkers; import
EvaluationDataset and use a concrete sqlmodel Session type instead of the
FastAPI dependency alias SessionDep: add an import for EvaluationDataset (from
app.models) at the top of the file, replace the parameter type SessionDep with
Session in the _get_sample_texts_from_dataset signature, and ensure Session is
imported from sqlmodel (add it to the existing sqlmodel imports if missing).
- Around line 132-138: get_tts_run_by_id may return None but the code assumes
run is non-null; add a null guard after the refresh call that checks the run
variable returned by get_tts_run_by_id and handles the None case (e.g., raise a
clear HTTPException/return a 404 or appropriate error using the same
auth_context/project/org info) instead of dereferencing run.id later; update the
block around run = get_tts_run_by_id(...) to validate run and provide a
descriptive error path so subsequent accesses (run.id, run.status, etc.) are
safe.

In `@backend/app/crud/tts_evaluations/batch.py`:
- Around line 98-118: The config dict passed to start_batch_job uses the bare
variable model while GeminiBatchProvider was constructed with model_path =
f"models/{model}", causing create_batch (which prefers config.get("model",
self._model)) to receive an un-prefixed model name; update the config to pass
the prefixed model_path (i.e., use model_path instead of model) so the
provider/Google API gets the "models/..." name expected by
GeminiBatchProvider.create_batch; apply the same change in the STT batch code
where a bare model is passed.

In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 466-474: The string passed as the subdirectory argument to
upload_to_object_store uses an unnecessary f-string prefix; update the call in
the block that prepares audio upload (where audio_filename is set and
upload_to_object_store is invoked) to use a plain string for subdirectory
(replace f"evaluations/tts/audio" with "evaluations/tts/audio") so there is no
redundant f-prefix in the subdirectory parameter.
- Around line 506-518: The exception handler in process_completed_tts_batch
currently re-raises without rolling back the SQLAlchemy session, which can leave
session in an invalid state and cause later commits (e.g., in
poll_all_pending_tts_evaluations → update_tts_run) to fail; fix by calling
session.rollback() (or using a nested savepoint / begin_nested around the
batch-result loop) before re-raising so the session is returned to a usable
state, ensuring you reference the same session variable and keep the existing
logger.error call and raise behavior.
- Around line 275-368: The code sets any_succeeded but never uses it and may
mark runs as "completed" even when some batch jobs failed; remove the unused
any_succeeded variable and ensure final_status reflects failures: change the
finalization logic around final_status (used before calling update_tts_run and
in the return) to set final_status to "failed" (or "completed_with_errors") when
any_failed is True (instead of basing it solely on pending == 0), and then call
update_tts_run and compute action from any_failed as done now so DB state and
returned action are consistent; update references in the block that computes
final_status, error_message, update_tts_run, and the returned dict accordingly
(functions/variables to edit: any_succeeded, any_failed, final_status,
update_tts_run, and the return payload).

In `@backend/app/services/tts_evaluations/dataset.py`:
- Around line 96-145: Currently _upload_samples_to_object_store swallows all
exceptions and returns None, which allows upload_tts_dataset to create a dataset
with object_store_url=None; instead, surface the failure so the dataset creation
aborts or the caller is notified: remove the broad except swallowing or re-raise
after logging (or raise a specific UploadError) from
_upload_samples_to_object_store, and update upload_tts_dataset to catch that
specific error and return/propagate an HTTP/functional error rather than
proceeding; reference _upload_samples_to_object_store, upload_tts_dataset and
_get_sample_texts_from_dataset when making these changes.

---

Nitpick comments:
In `@backend/app/alembic/versions/047_add_tts_evaluation_tables.py`:
- Around line 133-150: The migration creates a redundant single-column index
ix_tts_result_run_id on tts_result.evaluation_run_id because the composite
indexes idx_tts_result_feedback (evaluation_run_id, is_correct) and
idx_tts_result_status (evaluation_run_id, status) already cover queries
filtering only by evaluation_run_id; remove the op.create_index call for
"ix_tts_result_run_id" from upgrade(), and also remove the corresponding
op.drop_index for "ix_tts_result_run_id" from downgrade() so the migration no
longer creates or drops that redundant index (leave idx_tts_result_feedback and
idx_tts_result_status unchanged).
- Around line 103-114: The timestamp columns inserted_at and updated_at in the
Alembic revision (columns named "inserted_at" and "updated_at") are non-nullable
but lack server defaults; update their Column definitions to include a
server_default (e.g., server_default=sa.func.now() or sa.text('now()')) so
inserts outside the ORM won't fail, and keep the existing nullable=False and
comments intact; apply this change to both Column(...) calls in this migration.

In `@backend/app/api/routes/tts_evaluations/result.py`:
- Around line 64-83: The TTSResult→TTSResultPublic mapping is duplicated;
extract a helper function (e.g. _to_public(result: TTSResult) ->
TTSResultPublic) that centralizes building TTSResultPublic (including extracting
duration_seconds and size_bytes from result.metadata_). Replace the inline
constructions in this file’s endpoints (the current return block that creates
TTSResultPublic and the similar block at lines 109-128) with calls to
_to_public(result) so both endpoints reuse the same conversion logic.
- Around line 44-62: Remove the redundant existence check that calls
get_tts_result_by_id before calling update_tts_human_feedback: delete the block
that queries existing and raises HTTPException when falsy, and rely on
update_tts_human_feedback to perform the existence validation (or conversely, if
you prefer validation in the route, remove the internal check inside
update_tts_human_feedback); locate the redundant code by the
get_tts_result_by_id call and the subsequent if not existing / raise
HTTPException block and remove it so only one DB existence check remains for
result_id with auth_context.organization_.id and auth_context.project_.id.

In `@backend/app/crud/evaluations/cron.py`:
- Around line 49-66: Summary: Make dict access style consistent by using .get
for all summary keys. Fix: Replace direct key indexing for
processed/failed/still_processing in the computations of total_processed,
total_failed, and total_still_processing to use text_summary.get("processed",
0), stt_summary.get("processed", 0), tts_summary.get("processed", 0) (and
similar for "failed" and "still_processing"), and keep all_details using
.get("details", []); this ensures uniform access for text_summary, stt_summary,
and tts_summary and safe defaults if keys are ever missing.

In `@backend/app/crud/tts_evaluations/batch.py`:
- Around line 152-153: Replace the bare raise Exception(...) used when
batch_jobs is empty with a domain-specific exception: define a new exception
class (e.g., BatchSubmissionError) in this module or a shared exceptions module
and raise that instead (include the original error details or chain underlying
exceptions via "from" where applicable) so callers can distinguish batch
submission failures; update any upstream handlers if needed to catch
BatchSubmissionError instead of a generic Exception.

In `@backend/app/crud/tts_evaluations/dataset.py`:
- Around line 117-173: Replace the manual field-by-field construction in
list_tts_datasets with Pydantic model validation/mapping to remove boilerplate:
instead of building TTSDatasetPublic(...) for each EvaluationDataset, call
TTSDatasetPublic.model_validate(dataset) (or
TTSDatasetPublic.model_validate(dataset, mode="from_attributes") / use from_orm
equivalent) to convert the ORM EvaluationDataset into the Pydantic
TTSDatasetPublic; if needed, enable TTSDatasetPublic model_config
(from_attributes/orm_mode) so model_validate/from_orm can read attributes from
EvaluationDataset, and update the mapping in list_tts_datasets (and mirror the
same change in get_dataset) to return [TTSDatasetPublic.model_validate(d) for d
in datasets] and keep the same returned total.

In `@backend/app/crud/tts_evaluations/result.py`:
- Around line 62-64: Remove the redundant explicit flush before commit: the
block that calls session.add_all(results) followed by session.flush() and
session.commit() should drop the session.flush() call because session.commit()
implicitly flushes; keep session.add_all(results) and then call session.commit()
(reference: session.add_all, session.flush, session.commit in the
tts_evaluations/result.py function).
- Around line 320-328: The dict comprehension building the status->count map can
be simplified: replace the final return that currently constructs {status: count
for status, count in rows} with a direct dict conversion of the query result
(rows) so the code returns dict(rows); locate the query using TTSResult (the
statement variable) and the rows variable set from
session.exec(statement).all(), and change only the return expression to use
dict(rows).
- Around line 177-219: Update the update_tts_result docstring to explicitly
state that this function calls session.flush() but does not commit the
transaction, so callers (e.g., process_completed_tts_batch) are responsible for
calling session.commit() when not running inside a batch; reference the function
name update_tts_result, the model TTSResult, and the use of now() to indicate
this only persists flushed changes until the caller commits.

In `@backend/app/crud/tts_evaluations/run.py`:
- Around line 178-230: The update_tts_run function currently selects
EvaluationRun only by run_id; change its lookup to also filter by
organization_id, project_id and EvaluationType.TTS (similar to get_tts_run_by_id
and list_tts_runs) so the update is properly scoped; additionally replace the
current updates dict + "if value is not None" loop (which prevents clearing
fields like error_message) with an explicit fields_to_update parameter or a
sentinel (e.g., use a special UNSET constant) and only update attributes present
in that set/whose value is not UNSET while allowing None to be written, then
persist via session.add/session.commit/session.refresh and keep logging with
update_tts_run and updated_at = now().

In `@backend/app/models/tts_evaluation.py`:
- Around line 207-230: In TTSEvaluationRunCreate, update the validator and
default to be clearer and less brittle: rename the validator parameter in
validate_models (the method decorated with `@field_validator`("models")) from
valid_model to a conventional name like models or v to reflect it receives the
raw input, and change the models Field default_factory from the hard-coded
["gemini-2.5-pro-preview-tts"] to reuse SUPPORTED_TTS_MODELS (e.g.,
default_factory=lambda: SUPPORTED_TTS_MODELS) so the default cannot drift from
the supported list; keep the validation logic in validate_models the same but
refer to the renamed parameter.

In `@backend/app/services/tts_evaluations/audio.py`:
- Around line 3-5: The import statement for the symbol struct in
backend/app/services/tts_evaluations/audio.py is unused and should be removed to
clean up the module imports; update the top-of-file imports by deleting the
struct import so only io and wave remain (leave any other existing imports
untouched).

coderabbitai · 2026-02-23T05:28:40Z

+            status="failed",
+            error_message=str(e),
+        )
+        raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}")


⚠️ Potential issue | 🟠 Major

Internal error details leaked to the client.

f"Batch submission failed: {e}" exposes the raw exception message (which could contain credentials, internal paths, or stack details) in the HTTP 500 response body. Return a generic message and rely on the server-side log (line 120-123) for debugging.

🔧 Suggested fix

- raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}") + raise HTTPException(status_code=500, detail="Batch submission failed")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}")

raise HTTPException(status_code=500, detail="Batch submission failed")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/tts_evaluations/evaluation.py` at line 130, Replace the HTTPException detail that interpolates the raw exception (the raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}") line) with a generic message like "Batch submission failed" and do not include the exception text; keep the existing server-side logging (the logger calls around lines 120-123) to record full error details for debugging. Ensure only the generic message is returned to the client while the original exception is logged server-side.

coderabbitai · 2026-02-23T05:28:41Z

+        model_path = f"models/{model}"
+        batch_provider = GeminiBatchProvider(
+            client=gemini_client.client, model=model_path
+        )
+
+        try:
+            batch_job = start_batch_job(
+                session=session,
+                provider=batch_provider,
+                provider_name="google",
+                job_type="tts_evaluation",
+                organization_id=org_id,
+                project_id=project_id,
+                jsonl_data=jsonl_data,
+                config={
+                    "model": model,
+                    "tts_provider": model,
+                    "evaluation_run_id": run.id,
+                    "voice_name": DEFAULT_VOICE_NAME,
+                    "style_prompt": DEFAULT_STYLE_PROMPT,
+                },


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual file content around lines 98-118 cat -n backend/app/crud/tts_evaluations/batch.py | sed -n '90,125p'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1512

🏁 Script executed:

# Find the GeminiBatchProvider class to see how create_batch uses the model parameter fd -t f '*.py' | xargs grep -l "class GeminiBatchProvider" | head -1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 358

🏁 Script executed:

# Search for GeminiBatchProvider.create_batch method rg -n "def create_batch" --type=py -A20 backend/app/core/batch/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4866

🏁 Script executed:

# Look for DEFAULT_MODEL in GeminiBatchProvider rg -n "DEFAULT_MODEL\|class GeminiBatchProvider" --type=py -B5 -A15

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Check how STT batch code handles model paths for comparison rg -n "model_path\|models/" --type=py backend/app/crud/tts_evaluations/ -C2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Get the full GeminiBatchProvider class to see DEFAULT_MODEL and the implementation cat -n backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 20920

🏁 Script executed:

# Check how create_batch uses the model variable sed -n '68,120p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2177

🏁 Script executed:

# Search for other uses of GeminiBatchProvider to see how model is typically passed rg -n "GeminiBatchProvider" --type=py -B3 -A3

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 17820

🏁 Script executed:

# Look for how the model is actually used in the batch request creation sed -n '88,115p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1076

🏁 Script executed:

# Check the STT batch code to see how it handles the config model parameter sed -n '138,170p' backend/app/crud/stt_evaluations/batch.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1119

🏁 Script executed:

# Look at the full context of config in STT batch sed -n '140,180p' backend/app/crud/stt_evaluations/batch.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1457

Model name inconsistency: config passes bare model name while provider expects "models/" prefix.

Line 98 constructs model_path = f"models/{model}" and passes it to GeminiBatchProvider, but line 113 passes the bare model (without "models/" prefix) in the config dict. Inside GeminiBatchProvider.create_batch (line 88), the config value takes priority (config.get("model", self._model)), so the Gemini API receives the model name without the prefix at line 109—contradicting the provider's DEFAULT_MODEL = "models/gemini-2.5-pro" convention. This will cause the batch job creation to fail with an invalid model error.

The same pattern exists in the STT batch code (backend/app/crud/stt_evaluations/batch.py line 154), indicating a systemic issue.

🐛 Suggested fix — use the prefixed path in config

model_path = f"models/{model}" batch_provider = GeminiBatchProvider( client=gemini_client.client, model=model_path ) try: batch_job = start_batch_job( session=session, provider=batch_provider, provider_name="google", job_type="tts_evaluation", organization_id=org_id, project_id=project_id, jsonl_data=jsonl_data, config={ - "model": model, + "model": model_path, "tts_provider": model, "evaluation_run_id": run.id, "voice_name": DEFAULT_VOICE_NAME, "style_prompt": DEFAULT_STYLE_PROMPT, }, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model_path = f"models/{model}"

batch_provider = GeminiBatchProvider(

client=gemini_client.client, model=model_path

)

try:

batch_job = start_batch_job(

session=session,

provider=batch_provider,

provider_name="google",

job_type="tts_evaluation",

organization_id=org_id,

project_id=project_id,

jsonl_data=jsonl_data,

config={

"model": model,

"tts_provider": model,

"evaluation_run_id": run.id,

"voice_name": DEFAULT_VOICE_NAME,

"style_prompt": DEFAULT_STYLE_PROMPT,

},

model_path = f"models/{model}"

batch_provider = GeminiBatchProvider(

client=gemini_client.client, model=model_path

)

try:

batch_job = start_batch_job(

session=session,

provider=batch_provider,

provider_name="google",

job_type="tts_evaluation",

organization_id=org_id,

project_id=project_id,

jsonl_data=jsonl_data,

config={

"model": model_path,

"tts_provider": model,

"evaluation_run_id": run.id,

"voice_name": DEFAULT_VOICE_NAME,

"style_prompt": DEFAULT_STYLE_PROMPT,

},

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/batch.py` around lines 98 - 118, The config dict passed to start_batch_job uses the bare variable model while GeminiBatchProvider was constructed with model_path = f"models/{model}", causing create_batch (which prefers config.get("model", self._model)) to receive an un-prefixed model name; update the config to pass the prefixed model_path (i.e., use model_path instead of model) so the provider/Google API gets the "models/..." name expected by GeminiBatchProvider.create_batch; apply the same change in the STT batch code where a bare model is passed.

coderabbitai · 2026-02-23T05:28:41Z

+def _upload_samples_to_object_store(
+    session: Session,
+    project_id: int,
+    dataset_name: str,
+    samples: list[TTSSampleCreate],
+) -> str | None:
+    """Upload TTS samples as CSV to object store.
+
+    Args:
+        session: Database session
+        project_id: Project ID for storage credentials
+        dataset_name: Dataset name for filename
+        samples: List of samples to upload
+
+    Returns:
+        Object store URL if successful, None otherwise
+    """
+    try:
+        storage = get_cloud_storage(session=session, project_id=project_id)
+
+        csv_content = _samples_to_csv(samples)
+
+        filename = generate_timestamped_filename(dataset_name, "csv")
+        object_store_url = upload_to_object_store(
+            storage=storage,
+            content=csv_content,
+            filename=filename,
+            subdirectory="tts_datasets",
+            content_type="text/csv",
+        )
+
+        if object_store_url:
+            logger.info(
+                f"[_upload_samples_to_object_store] Upload successful | "
+                f"url={object_store_url}"
+            )
+        else:
+            logger.info(
+                "[_upload_samples_to_object_store] Upload returned None | "
+                "continuing without object store storage"
+            )
+
+        return object_store_url
+
+    except Exception as e:
+        logger.warning(
+            f"[_upload_samples_to_object_store] Failed to upload | {e}",
+            exc_info=True,
+        )
+        return None


⚠️ Potential issue | 🟡 Minor

Silent upload failure creates a dataset that cannot be used for evaluation runs.

If _upload_samples_to_object_store fails (line 140-145), it returns None and upload_tts_dataset still creates a dataset record with object_store_url=None. Downstream, _get_sample_texts_from_dataset in evaluation.py (line 180) returns an empty list when object_store_url is absent, causing any evaluation run against this dataset to fail with "Could not load samples from dataset".

Consider either propagating the upload error so the dataset isn't created at all, or at minimum warning the caller in the response.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/tts_evaluations/dataset.py` around lines 96 - 145, Currently _upload_samples_to_object_store swallows all exceptions and returns None, which allows upload_tts_dataset to create a dataset with object_store_url=None; instead, surface the failure so the dataset creation aborts or the caller is notified: remove the broad except swallowing or re-raise after logging (or raise a specific UploadError) from _upload_samples_to_object_store, and update upload_tts_dataset to catch that specific error and return/propagate an HTTP/functional error rather than proceeding; reference _upload_samples_to_object_store, upload_tts_dataset and _get_sample_texts_from_dataset when making these changes.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

backend/app/alembic/versions/048_add_tts_evaluation_tables.py (3)

133-150: Standardize index name prefix — ix_ vs idx_ are mixed.

ix_tts_result_run_id uses SQLAlchemy's conventional ix_ prefix while the two composite indexes use idx_. Pick one and apply it consistently across all three.

♻️ Proposed fix (standardize on `ix_`)

 op.create_index(
-    "idx_tts_result_feedback",
+    "ix_tts_result_feedback",
     "tts_result",
     ["evaluation_run_id", "is_correct"],
     unique=False,
 )
 op.create_index(
-    "idx_tts_result_status",
+    "ix_tts_result_status",
     "tts_result",
     ["evaluation_run_id", "status"],
     unique=False,
 )

And mirror the renames in downgrade():

-    op.drop_index("idx_tts_result_status", table_name="tts_result")
-    op.drop_index("idx_tts_result_feedback", table_name="tts_result")
+    op.drop_index("ix_tts_result_status", table_name="tts_result")
+    op.drop_index("ix_tts_result_feedback", table_name="tts_result")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines
133 - 150, The three index names are inconsistent: change
"idx_tts_result_feedback" and "idx_tts_result_status" to use the same "ix_"
prefix as "ix_tts_result_run_id" by renaming them to "ix_tts_result_feedback"
and "ix_tts_result_status" in the op.create_index calls; also mirror these exact
name changes in the downgrade() op.drop_index calls so upgrade/downgrade use
matching index names. Use the existing op.create_index/op.drop_index calls to
locate where to update the names.

54-60: status has no DB-level constraint on valid values.

The column accepts any string ≤ 20 characters. Adding a CHECK constraint (or using a native PostgreSQL ENUM) ensures the DB rejects anything outside PENDING, SUCCESS, FAILED — a cheap safety net against data corruption from direct DB writes or bugs.

♻️ Proposed fix (add CHECK constraint inline)

 sa.Column(
     "status",
     sa.String(length=20),
     nullable=False,
     server_default="PENDING",
     comment="Result status: PENDING, SUCCESS, FAILED",
 ),
+sa.CheckConstraint(
+    "status IN ('PENDING', 'SUCCESS', 'FAILED')",
+    name="ck_tts_result_status",
+),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines
54 - 60, The migration defines the "status" Column in the
048_add_tts_evaluation_tables migration but doesn't enforce allowed values;
update the Column definition for "status" to include a DB-level constraint that
restricts values to 'PENDING','SUCCESS','FAILED' (either an inline
sa.CheckConstraint on the Column or use a PostgreSQL sa.Enum and add/drop it in
upgrade()/downgrade()); modify the "status" Column declaration in the migration
(the sa.Column("status", sa.String(...)) line) to include the constraint and
ensure the downgrade removes the constraint/enum accordingly.

115-130: Name the remaining FK constraints for consistency and manageability.

evaluation_run_id has name="fk_tts_result_run_id" but the organization_id and project_id constraints are unnamed, making them hard to reference in future migrations or schema introspection.

♻️ Proposed fix

 sa.ForeignKeyConstraint(
     ["organization_id"],
     ["organization.id"],
+    name="fk_tts_result_organization_id",
     ondelete="CASCADE",
 ),
 sa.ForeignKeyConstraint(
     ["project_id"],
     ["project.id"],
+    name="fk_tts_result_project_id",
     ondelete="CASCADE",
 ),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines
115 - 130, The two unnamed ForeignKeyConstraint entries in the TTS result table
should be given explicit names for consistency with the existing evaluation_run
FK (name="fk_tts_result_run_id"); update the organization_id constraint to
include a name like "fk_tts_result_organization_id" and the project_id
constraint to include a name like "fk_tts_result_project_id" (keeping
ondelete="CASCADE"), so the ForeignKeyConstraint definitions for organization_id
and project_id mirror the pattern used by the existing evaluation_run
constraint.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py`:
- Line 20: Add explicit return type annotations to the migration functions:
update the function signatures for upgrade and downgrade to include "-> None"
(i.e., def upgrade() -> None: and def downgrade() -> None:) so both functions
have explicit return type hints; locate the definitions named upgrade and
downgrade in this migration module (the top-level functions in
048_add_tts_evaluation_tables.py) and modify their signatures accordingly.
- Around line 103-114: The migration defines non-nullable timestamp columns
inserted_at and updated_at without server defaults which will cause NOT NULL
violations on inserts; update the sa.Column definitions for "inserted_at" and
"updated_at" in the migration (functions/classes around the table creation in
the 048 migration) to include server_default=sa.text("now()") so they match the
established pattern used in other revisions (e.g., migrations 005/032/040) and
avoid failing raw/bulk inserts or tests.

---

Nitpick comments:
In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py`:
- Around line 133-150: The three index names are inconsistent: change
"idx_tts_result_feedback" and "idx_tts_result_status" to use the same "ix_"
prefix as "ix_tts_result_run_id" by renaming them to "ix_tts_result_feedback"
and "ix_tts_result_status" in the op.create_index calls; also mirror these exact
name changes in the downgrade() op.drop_index calls so upgrade/downgrade use
matching index names. Use the existing op.create_index/op.drop_index calls to
locate where to update the names.
- Around line 54-60: The migration defines the "status" Column in the
048_add_tts_evaluation_tables migration but doesn't enforce allowed values;
update the Column definition for "status" to include a DB-level constraint that
restricts values to 'PENDING','SUCCESS','FAILED' (either an inline
sa.CheckConstraint on the Column or use a PostgreSQL sa.Enum and add/drop it in
upgrade()/downgrade()); modify the "status" Column declaration in the migration
(the sa.Column("status", sa.String(...)) line) to include the constraint and
ensure the downgrade removes the constraint/enum accordingly.
- Around line 115-130: The two unnamed ForeignKeyConstraint entries in the TTS
result table should be given explicit names for consistency with the existing
evaluation_run FK (name="fk_tts_result_run_id"); update the organization_id
constraint to include a name like "fk_tts_result_organization_id" and the
project_id constraint to include a name like "fk_tts_result_project_id" (keeping
ondelete="CASCADE"), so the ForeignKeyConstraint definitions for organization_id
and project_id mirror the pattern used by the existing evaluation_run
constraint.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52abe01 and e63e8bd.

📒 Files selected for processing (1)

backend/app/alembic/versions/048_add_tts_evaluation_tables.py

coderabbitai · 2026-02-24T05:34:32Z

+        sa.Column(
+            "inserted_at",
+            sa.DateTime(),
+            nullable=False,
+            comment="Timestamp when the result was created",
+        ),
+        sa.Column(
+            "updated_at",
+            sa.DateTime(),
+            nullable=False,
+            comment="Timestamp when the result was last updated",
+        ),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "inserted_at|updated_at" --type=py backend/app/alembic/versions/ -A 3 | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 9016

🏁 Script executed:

rg -n "server_default" --type=py backend/app/alembic/versions/ | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 5826

🏁 Script executed:

wc -l backend/app/alembic/versions/048_add_tts_evaluation_tables.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 140

🏁 Script executed:

cat -n backend/app/alembic/versions/048_add_tts_evaluation_tables.py | head -120

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4189

inserted_at and updated_at are non-nullable with no server_default — inserts without explicit values will fail.

Both columns are nullable=False but lack a server_default. Any INSERT that doesn't supply these explicitly (raw SQL, test fixtures, bulk operations) will hit a NOT NULL constraint violation. The established pattern across migrations (005, 032, 040) consistently uses server_default=sa.text("now()") for timestamp columns.

🐛 Proposed fix

sa.Column( "inserted_at", sa.DateTime(), nullable=False, + server_default=sa.text("now()"), comment="Timestamp when the result was created", ), sa.Column( "updated_at", sa.DateTime(), nullable=False, + server_default=sa.text("now()"), comment="Timestamp when the result was last updated", ),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sa.Column(

"inserted_at",

sa.DateTime(),

nullable=False,

comment="Timestamp when the result was created",

),

sa.Column(

"updated_at",

sa.DateTime(),

nullable=False,

comment="Timestamp when the result was last updated",

),

sa.Column(

"inserted_at",

sa.DateTime(),

nullable=False,

server_default=sa.text("now()"),

comment="Timestamp when the result was created",

),

sa.Column(

"updated_at",

sa.DateTime(),

nullable=False,

server_default=sa.text("now()"),

comment="Timestamp when the result was last updated",

),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines 103 - 114, The migration defines non-nullable timestamp columns inserted_at and updated_at without server defaults which will cause NOT NULL violations on inserts; update the sa.Column definitions for "inserted_at" and "updated_at" in the migration (functions/classes around the table creation in the 048 migration) to include server_default=sa.text("now()") so they match the established pattern used in other revisions (e.g., migrations 005/032/040) and avoid failing raw/bulk inserts or tests.

codecov · 2026-02-24T05:35:51Z

Codecov Report

❌ Patch coverage is 73.12661% with 312 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/evaluations/cron_utils.py	22.68%	92 Missing ⚠️
backend/app/services/tts_evaluations/dataset.py	25.39%	47 Missing ⚠️
backend/app/crud/tts_evaluations/cron.py	31.57%	39 Missing ⚠️
backend/app/crud/tts_evaluations/result.py	47.94%	38 Missing ⚠️
backend/app/crud/tts_evaluations/batch.py	27.65%	34 Missing ⚠️
backend/app/crud/stt_evaluations/cron.py	20.00%	16 Missing ⚠️
backend/app/crud/tts_evaluations/dataset.py	62.85%	13 Missing ⚠️
backend/app/core/audio_utils.py	20.00%	12 Missing ⚠️
backend/app/core/batch/gemini.py	61.53%	10 Missing ⚠️
backend/app/crud/evaluations/cron.py	0.00%	6 Missing ⚠️
... and 3 more

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

backend/app/crud/tts_evaluations/cron.py (3)

517-523: ⚠️ Potential issue | 🟠 Major

Add session.rollback() before re-raising to keep the session usable for the caller.

If an exception (e.g., IntegrityError) leaves the session in an invalidated state, the caller's subsequent update_tts_run call will also fail, preventing the run from being marked as "failed".

Proposed fix

     except Exception as e:
         logger.error(
             f"[process_completed_tts_batch] Failed to process batch results | "
             f"batch_job_id={batch_job.id}, error={str(e)}",
             exc_info=True,
         )
+        session.rollback()
         raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 517 - 523, The
exception handler in process_completed_tts_batch currently logs and re-raises
without rolling back the DB session; add a session.rollback() before re-raising
so the SQLAlchemy session is returned to a usable state for the caller (e.g., so
the subsequent update_tts_run call can run and mark the run failed). Locate the
except Exception as e block in process_completed_tts_batch, call
session.rollback() (or the appropriate Session object used in that scope)
immediately after logging and before the raise, then re-raise the exception.

280-373: ⚠️ Potential issue | 🟠 Major

any_succeeded is unused; final_status can be "completed" even when batch jobs failed.

These two issues remain from the prior review round:

any_succeeded (Lines 281, 291, 327) is assigned but never read — remove it.
When any_failed is True and pending == 0, final_status is set to "completed" (Line 349) while action is "failed" (Line 363). The DB record will be marked "completed" despite batch-level failures. Consider setting final_status = "failed" when any_failed is true.

Proposed fix

     all_terminal = True
-    any_succeeded = False
     any_failed = False
     errors: list[str] = []

     for batch_job in batch_jobs:
         provider_name = batch_job.config.get("tts_provider", "unknown")

         # Skip batch jobs already in terminal state
         if batch_job.provider_status in TERMINAL_STATES:
-            if batch_job.provider_status == BatchJobState.SUCCEEDED.value:
-                any_succeeded = True
-            else:
+            if batch_job.provider_status != BatchJobState.SUCCEEDED.value:
                 any_failed = True
                 errors.append(
                     f"{provider_name}: {batch_job.error_message or batch_job.provider_status}"
                 )
             continue
         ...
         if provider_status == BatchJobState.SUCCEEDED.value:
             await process_completed_tts_batch(...)
-            any_succeeded = True
         else:
             any_failed = True
             ...

     ...
-    final_status = "completed" if pending == 0 else "processing"
+    if any_failed:
+        final_status = "failed"
+    elif pending == 0:
+        final_status = "completed"
+    else:
+        final_status = "processing"
     ...
-    action = "completed" if not any_failed else "failed"
+    action = final_status

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 280 - 373, Remove the
unused any_succeeded variable and its assignments (references in the loop where
any_succeeded is set) and adjust final_status logic so a run with any_failed ==
True is marked "failed" rather than "completed": compute final_status from
pending and any_failed (e.g., if any_failed then "failed" else "completed" if
pending == 0 else "processing") before calling update_tts_run and when
constructing the returned payload; ensure update_tts_run(...) and the action
variable still reflect the new final_status.

471-479: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix on string without placeholders.

Proposed fix

-                        subdirectory=f"evaluations/tts/audio",
+                        subdirectory="evaluations/tts/audio",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 471 - 479, The
filename construction uses an unnecessary f-string prefix; change the
audio_filename assignment in cron.py (variable audio_filename used before
calling upload_to_object_store) to build the string without an f-string, e.g.
audio_filename = str(uuid.uuid4()) + ".wav" (or use
"{}.wav".format(uuid.uuid4())), so the code no longer uses an extraneous f
prefix while preserving the same filename value.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 376-529: process_completed_tts_batch is async but calls blocking
functions and will block the event loop; either make it sync or run blocking
work in a thread. Recommended quick fix: keep process_completed_tts_batch async
and wrap blocking calls with asyncio.to_thread() (and await them) — specifically
wrap batch_provider.download_batch_results(provider_batch_id),
upload_to_object_store(...), base64.b64decode(audio_b64), and
pcm_to_wav(pcm_data) so they run off the event loop; leave small CPU ops (e.g.,
calculate_duration) inline or also push to threads if needed; ensure you await
the asyncio.to_thread() results and keep session.commit() behavior unchanged.
- Around line 376-381: The parameter batch_job in process_completed_tts_batch is
typed as Any but is used as a BatchJob; change its annotation from Any to
BatchJob, update the function signature async def
process_completed_tts_batch(session: Session, run: EvaluationRun, batch_job:
BatchJob, batch_provider: GeminiBatchProvider) -> None, and add the necessary
import for BatchJob at the top of the file (or from its module) so static type
checkers recognize attributes like batch_job.id, batch_job.config, and
batch_job.provider_batch_id.

---

Duplicate comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 517-523: The exception handler in process_completed_tts_batch
currently logs and re-raises without rolling back the DB session; add a
session.rollback() before re-raising so the SQLAlchemy session is returned to a
usable state for the caller (e.g., so the subsequent update_tts_run call can run
and mark the run failed). Locate the except Exception as e block in
process_completed_tts_batch, call session.rollback() (or the appropriate Session
object used in that scope) immediately after logging and before the raise, then
re-raise the exception.
- Around line 280-373: Remove the unused any_succeeded variable and its
assignments (references in the loop where any_succeeded is set) and adjust
final_status logic so a run with any_failed == True is marked "failed" rather
than "completed": compute final_status from pending and any_failed (e.g., if
any_failed then "failed" else "completed" if pending == 0 else "processing")
before calling update_tts_run and when constructing the returned payload; ensure
update_tts_run(...) and the action variable still reflect the new final_status.
- Around line 471-479: The filename construction uses an unnecessary f-string
prefix; change the audio_filename assignment in cron.py (variable audio_filename
used before calling upload_to_object_store) to build the string without an
f-string, e.g. audio_filename = str(uuid.uuid4()) + ".wav" (or use
"{}.wav".format(uuid.uuid4())), so the code no longer uses an extraneous f
prefix while preserving the same filename value.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e63e8bd and 348e81b.

📒 Files selected for processing (1)

backend/app/crud/tts_evaluations/cron.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (5)

backend/app/crud/tts_evaluations/cron.py (5)

538-544: Pre-existing: no session.rollback() before re-raise may leave the session invalid.

This was flagged in a previous review cycle and remains unaddressed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 538 - 544, The except
block in process_completed_tts_batch logs and re-raises exceptions without
rolling back the DB session (leaving it invalid); update the except handler to
call session.rollback() before re-raising so the SQLAlchemy session is returned
to a clean state (e.g., invoke session.rollback() immediately before the
logger.error and/or before raise in the except Exception as e block that
currently logs batch_job.id and re-raises). Ensure you reference the same
session object used in process_completed_tts_batch and keep exc_info logging
as-is.

284-394: Pre-existing: any_succeeded unused; final_status/action can disagree when any_failed is True.

These issues were raised in a previous review cycle and remain unaddressed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 284 - 394, The
variable any_succeeded is never used—remove it (or wire it into logic if
intended) and eliminate related assignments; ensure final_status and action
cannot disagree by deriving final_status from any_failed and pending: compute
status_counts via count_results_by_status(session, run_id=run.id), set
final_status = "failed" if any_failed else ("completed" if pending == 0 else
"processing"), set action = "failed" if any_failed else "completed", then call
update_tts_run(...) with that final_status and error_message; update references
to JobStatus, any_failed, final_status, action, and update_tts_run in the
function to keep the run result consistent.

397-402: Pre-existing: batch_job typed as Any instead of BatchJob.

This was flagged in a previous review cycle and remains unaddressed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 397 - 402, The
parameter batch_job in process_completed_tts_batch is currently typed as Any;
change its annotation to the correct BatchJob type (e.g., batch_job: BatchJob)
and import BatchJob from the module where GeminiBatchProvider/BatchJob are
defined so static typing picks it up; update any other references/usages in this
file that assume Any to use the proper type annotation to satisfy the earlier
review comment.

397-550: Pre-existing: process_completed_tts_batch is async but all I/O is blocking — no await in the function body.

This was flagged in a previous review cycle and remains unaddressed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` around lines 397 - 550, The
function process_completed_tts_batch is declared async but only does blocking
I/O; either make it synchronous (remove async) or convert blocking calls to
async-compatible thread calls — e.g., wrap get_cloud_storage,
batch_provider.download_batch_results, session.exec / session.commit,
upload_to_object_store, and update_tts_result calls with asyncio.to_thread (or
anyio.to_thread) and await them so the async function doesn't block the event
loop; update references to those symbols (process_completed_tts_batch,
batch_provider.download_batch_results, get_cloud_storage,
upload_to_object_store, session.exec, session.commit, update_tts_result)
accordingly so callers remain correct.

498-498: Pre-existing: extraneous f prefix on f"evaluations/tts/audio" — no interpolation.

This was flagged in a previous review cycle and remains unaddressed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` at line 498, The code uses an
unnecessary f-string prefix in the call using
subdirectory=f"evaluations/tts/audio" in cron.py; replace it with a plain string
subdirectory="evaluations/tts/audio" (remove the leading f) wherever this exact
parameter occurs to eliminate the extraneous interpolation marker.

🧹 Nitpick comments (1)

backend/app/crud/tts_evaluations/cron.py (1)
38-38: Consider moving GeminiClient to a shared location to avoid the STT→TTS cross-module dependency.

TTS cron pulling GeminiClient from app.services.stt_evaluations.gemini creates coupling between two evaluation domains. If GeminiClient is common to both, it belongs in app.services.gemini or app.core.batch.
#!/bin/bash
# Check all importers of GeminiClient to understand its current reach
rg -n --type py 'from app\.services\.stt_evaluations\.gemini import GeminiClient'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/cron.py` at line 38, TTS cron is importing
GeminiClient from app.services.stt_evaluations.gemini which creates a
cross-domain dependency; move the GeminiClient class into a shared module (e.g.,
app.services.gemini or app.core.batch), update the class location, then change
the import in backend/app/crud/tts_evaluations/cron.py to import GeminiClient
from the new shared module and update any other files that import from
app.services.stt_evaluations.gemini (use the unique symbol GeminiClient to find
references and ensure tests/imports are updated accordingly).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/core/batch/gemini.py`:
- Around line 222-230: The code currently uses a truthiness check on
parsed.get("response") which will skip valid-but-empty response payloads; change
the conditional to check for key presence (e.g., if "response" in parsed) and
then set response_obj = parsed.get("response") and append the dict to results
using BATCH_KEY and custom_id so empty responses are not dropped; update the
block around parsed, response_obj, results, BATCH_KEY and custom_id accordingly.

In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 319-325: poll_tts_run currently calls the synchronous
poll_batch_status (which calls provider.get_batch_status) directly and will
block the event loop; change the call so the blocking work runs off the event
loop (e.g. wrap poll_batch_status in asyncio.to_thread or loop.run_in_executor)
when invoked from the async poll_tts_run; do the same symmetric fix for
poll_stt_run to ensure provider.get_batch_status is executed in a
thread/executor rather than synchronously on the event loop.

---

Duplicate comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 538-544: The except block in process_completed_tts_batch logs and
re-raises exceptions without rolling back the DB session (leaving it invalid);
update the except handler to call session.rollback() before re-raising so the
SQLAlchemy session is returned to a clean state (e.g., invoke session.rollback()
immediately before the logger.error and/or before raise in the except Exception
as e block that currently logs batch_job.id and re-raises). Ensure you reference
the same session object used in process_completed_tts_batch and keep exc_info
logging as-is.
- Around line 284-394: The variable any_succeeded is never used—remove it (or
wire it into logic if intended) and eliminate related assignments; ensure
final_status and action cannot disagree by deriving final_status from any_failed
and pending: compute status_counts via count_results_by_status(session,
run_id=run.id), set final_status = "failed" if any_failed else ("completed" if
pending == 0 else "processing"), set action = "failed" if any_failed else
"completed", then call update_tts_run(...) with that final_status and
error_message; update references to JobStatus, any_failed, final_status, action,
and update_tts_run in the function to keep the run result consistent.
- Around line 397-402: The parameter batch_job in process_completed_tts_batch is
currently typed as Any; change its annotation to the correct BatchJob type
(e.g., batch_job: BatchJob) and import BatchJob from the module where
GeminiBatchProvider/BatchJob are defined so static typing picks it up; update
any other references/usages in this file that assume Any to use the proper type
annotation to satisfy the earlier review comment.
- Around line 397-550: The function process_completed_tts_batch is declared
async but only does blocking I/O; either make it synchronous (remove async) or
convert blocking calls to async-compatible thread calls — e.g., wrap
get_cloud_storage, batch_provider.download_batch_results, session.exec /
session.commit, upload_to_object_store, and update_tts_result calls with
asyncio.to_thread (or anyio.to_thread) and await them so the async function
doesn't block the event loop; update references to those symbols
(process_completed_tts_batch, batch_provider.download_batch_results,
get_cloud_storage, upload_to_object_store, session.exec, session.commit,
update_tts_result) accordingly so callers remain correct.
- Line 498: The code uses an unnecessary f-string prefix in the call using
subdirectory=f"evaluations/tts/audio" in cron.py; replace it with a plain string
subdirectory="evaluations/tts/audio" (remove the leading f) wherever this exact
parameter occurs to eliminate the extraneous interpolation marker.

---

Nitpick comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Line 38: TTS cron is importing GeminiClient from
app.services.stt_evaluations.gemini which creates a cross-domain dependency;
move the GeminiClient class into a shared module (e.g., app.services.gemini or
app.core.batch), update the class location, then change the import in
backend/app/crud/tts_evaluations/cron.py to import GeminiClient from the new
shared module and update any other files that import from
app.services.stt_evaluations.gemini (use the unique symbol GeminiClient to find
references and ensure tests/imports are updated accordingly).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 348e81b and 8b37b8b.

📒 Files selected for processing (4)

backend/app/core/batch/gemini.py
backend/app/crud/stt_evaluations/cron.py
backend/app/crud/tts_evaluations/cron.py
backend/app/tests/core/batch/test_gemini.py

coderabbitai · 2026-02-24T06:34:44Z

+                        # Return the raw response so callers can extract
+                        # text (STT) or audio (TTS) as needed.
                        response_obj = parsed.get("response")
                        if response_obj:
-                            text = self._extract_text_from_response_dict(response_obj)
                            results.append(
                                {
                                    BATCH_KEY: custom_id,
-                                    "response": {"text": text},
+                                    "response": response_obj,
                                    "error": None,


⚠️ Potential issue | 🟡 Minor

Don’t drop empty response objects.

Using a truthiness check can skip valid-but-empty response payloads. Prefer key presence to avoid losing results.

✅ Proposed fix

- response_obj = parsed.get("response") - if response_obj: + if "response" in parsed: + response_obj = parsed.get("response") results.append( { BATCH_KEY: custom_id, "response": response_obj, "error": None, } )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/core/batch/gemini.py` around lines 222 - 230, The code currently uses a truthiness check on parsed.get("response") which will skip valid-but-empty response payloads; change the conditional to check for key presence (e.g., if "response" in parsed) and then set response_obj = parsed.get("response") and append the dict to results using BATCH_KEY and custom_id so empty responses are not dropped; update the block around parsed, response_obj, results, BATCH_KEY and custom_id accordingly.

…/kaapi-backend into feature/tts-evaluation

Prajna1999 · 2026-03-02T10:40:05Z

+                TTSResult.provider == tts_provider,
+            )
+            result_map: dict[int, TTSResult] = {
+                r.id: r for r in session.exec(prefetch_stmt).all()  # type: ignore[misc]


This line is unreadable and difficult to debug

Prajna1999 · 2026-03-06T05:12:15Z

+        TTSResult.project_id == project_id,
+    ]
+
+    statement = select(TTSResult).where(*where_clauses).order_by(TTSResult.id)


instead of ordering by id, reverse chronological ordering looks to be a better choice

why is that? can u elaborate

Prajna1999 · 2026-03-06T05:17:37Z

+        EvaluationRun | None: Updated run
+    """
+    where_clauses = [EvaluationRun.id == run_id]
+    if org_id is not None:


nitpick: Can this above statement be made more readable like

statement = select(EvaluationRun).where(EvaluationRun.id == run_id) if org_id is not None: statement = statement.where(EvaluationRun.organization_id == org_id) if project_id is not None: statement = statement.where(EvaluationRun.project_id == project_id) run = session.exec(statement).one_or_none()

Prajna1999 · 2026-03-06T05:22:22Z

@@ -0,0 +1,56 @@
+"""Audio processing utilities for TTS evaluation."""
+


there are ditto utils with pydub are available at backend/app/core/audio_utils.py. we should reuse them

Prajna1999 · 2026-03-06T05:27:45Z

+
+
+def _samples_to_csv(samples: list[TTSSampleCreate]) -> bytes:
+    """Convert TTS samples to CSV format.


imo this utils might already be there or be made more generic to be usable as a common csv util

this is different, it takes the text and then serialise it to CSV to be uploaded in S3 as copy. different than text or stt as well

first stab at TTS evaluation

f0ae443

AkhileshNegi self-assigned this Feb 23, 2026

AkhileshNegi added the enhancement New feature or request label Feb 23, 2026

Merge branch 'main' into feature/tts-evaluation

1c3f326

AkhileshNegi marked this pull request as ready for review February 23, 2026 05:21

coderabbitai Bot reviewed Feb 23, 2026

View reviewed changes

AkhileshNegi added 2 commits February 24, 2026 10:45

Merge branch 'main' into feature/tts-evaluation

52abe01

updated migratino

e63e8bd

coderabbitai Bot reviewed Feb 24, 2026

View reviewed changes

update to custom id

348e81b

coderabbitai Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread backend/app/crud/tts_evaluations/cron.py Outdated

Comment thread backend/app/crud/tts_evaluations/cron.py Outdated

AkhileshNegi added 2 commits February 24, 2026 11:37

fix bug for parsing audio

f9cfdb6

updats

8b37b8b

coderabbitai Bot reviewed Feb 24, 2026

View reviewed changes

AkhileshNegi added 4 commits February 24, 2026 12:50

first stab at moving to celery

603a6ee

cleanups and refactoring

3b6c2fe

cleanups and refactoring

99610bb

cleanups

bba3d03

ProjectTech4DevAI deleted a comment from coderabbitai Bot Feb 25, 2026

AkhileshNegi added 9 commits February 25, 2026 11:06

typo cleanups

0e11857

doc updates

bcd9659

minor cleanups and refactoring crons

b56b20d

cleanups

613dc3a

Merge branch 'main' into feature/tts-evaluation

a0f83b3

minor cleanups

69593b5

Merge branch 'feature/tts-evaluation' of github.com:ProjectTech4DevAI…

fc64736

…/kaapi-backend into feature/tts-evaluation

added testcases and cover few more edgecases

cb56f76

testcases pass

8cafe1a

AkhileshNegi requested review from Prajna1999, kartpop and vprashrex February 25, 2026 20:40

Prajna1999 approved these changes Mar 2, 2026

View reviewed changes

Prajna1999 reviewed Mar 6, 2026

View reviewed changes

AkhileshNegi added 4 commits March 9, 2026 20:28

cleanup

a755392

refactoring

56056b0

cleanups

2c174d0

refactor gemini client

3ca63c2

AkhileshNegi merged commit 7098f0e into main Mar 9, 2026
2 checks passed

AkhileshNegi deleted the feature/tts-evaluation branch March 9, 2026 17:04

This was referenced Mar 12, 2026

TTS Evaluation: TTS Results signed URLs #666

Merged

TTS Evaluation: Metric #681

Merged

	raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}")
	raise HTTPException(status_code=500, detail="Batch submission failed")

		@@ -0,0 +1,56 @@
		"""Audio processing utilities for TTS evaluation."""



		def _samples_to_csv(samples: list[TTSSampleCreate]) -> bytes:
		"""Convert TTS samples to CSV format.

Conversation

AkhileshNegi commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

coderabbitai Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Prajna1999 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Prajna1999 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

AkhileshNegi Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Prajna1999 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Prajna1999 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Prajna1999 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

AkhileshNegi Mar 9, 2026

Choose a reason for hiding this comment

AkhileshNegi commented Feb 23, 2026 •

edited

Loading

coderabbitai Bot commented Feb 23, 2026 •

edited

Loading

codecov Bot commented Feb 24, 2026 •

edited

Loading