Evaluation: TTS#619
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a TTS evaluation subsystem: API docs and routes, CRUD for datasets/runs/results, Gemini batch submission support, cron polling/processing, audio utilities, DB models and migration, and wiring into existing evaluations cron and routing. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant API as API Routes
participant CRUD as CRUD Layer
participant Gemini as Gemini Batch API
participant Storage as Object Storage
participant DB as Database
participant Cron as Cron Poller
Client->>API: POST /evaluations/tts/runs (start evaluation)
API->>CRUD: create_tts_run + create_tts_results
CRUD->>DB: persist run & results (PENDING)
API->>CRUD: start_tts_evaluation_batch
CRUD->>Gemini: submit batch job(s) (JSONL texts + voice config)
Gemini-->>CRUD: batch_job_id(s)
CRUD->>DB: update run (PROCESSING, batch_job_id)
API-->>Client: return TTSEvaluationRunPublic
Note over Cron: asynchronous polling loop
Cron->>CRUD: poll_all_pending_tts_evaluations
CRUD->>Gemini: get batch status / download results
Gemini-->>CRUD: batch results (base64 PCM in JSONL)
CRUD->>CRUD: _extract_audio_from_response -> pcm_to_wav
CRUD->>Storage: upload WAV
Storage-->>CRUD: object_store_url
CRUD->>DB: update result (COMPLETED, url, metadata)
CRUD->>DB: update run (COMPLETED)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (13)
backend/app/services/tts_evaluations/audio.py (1)
3-5: Unused import:struct.
structis imported but never used — thewavemodule handles header construction internally.🧹 Proposed fix
import io -import struct import wave🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/tts_evaluations/audio.py` around lines 3 - 5, The import statement for the symbol struct in backend/app/services/tts_evaluations/audio.py is unused and should be removed to clean up the module imports; update the top-of-file imports by deleting the struct import so only io and wave remain (leave any other existing imports untouched).backend/app/alembic/versions/047_add_tts_evaluation_tables.py (2)
133-150: Redundant single-column indexix_tts_result_run_id.Both
idx_tts_result_feedback(evaluation_run_id, is_correct)andidx_tts_result_status(evaluation_run_id, status)haveevaluation_run_idas the leading column, so either composite index already satisfies queries that filter only onevaluation_run_id. The standalone indexix_tts_result_run_idis redundant and adds write overhead without benefit.♻️ Proposed fix: remove the redundant index
In
upgrade():- op.create_index( - "ix_tts_result_run_id", - "tts_result", - ["evaluation_run_id"], - unique=False, - )In
downgrade():- op.drop_index("ix_tts_result_run_id", table_name="tts_result")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/047_add_tts_evaluation_tables.py` around lines 133 - 150, The migration creates a redundant single-column index ix_tts_result_run_id on tts_result.evaluation_run_id because the composite indexes idx_tts_result_feedback (evaluation_run_id, is_correct) and idx_tts_result_status (evaluation_run_id, status) already cover queries filtering only by evaluation_run_id; remove the op.create_index call for "ix_tts_result_run_id" from upgrade(), and also remove the corresponding op.drop_index for "ix_tts_result_run_id" from downgrade() so the migration no longer creates or drops that redundant index (leave idx_tts_result_feedback and idx_tts_result_status unchanged).
103-114: Consider addingserver_defaultfor timestamp columns.
inserted_atandupdated_atarenullable=Falsewith noserver_default. If a row is ever inserted outside the ORM (e.g., a manual SQL fix, a migration backfill, or another service), the insert will fail. Addingserver_default=sa.func.now()provides a safety net.♻️ Suggested change
sa.Column( "inserted_at", sa.DateTime(), nullable=False, + server_default=sa.func.now(), comment="Timestamp when the result was created", ), sa.Column( "updated_at", sa.DateTime(), nullable=False, + server_default=sa.func.now(), comment="Timestamp when the result was last updated", ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/047_add_tts_evaluation_tables.py` around lines 103 - 114, The timestamp columns inserted_at and updated_at in the Alembic revision (columns named "inserted_at" and "updated_at") are non-nullable but lack server defaults; update their Column definitions to include a server_default (e.g., server_default=sa.func.now() or sa.text('now()')) so inserts outside the ORM won't fail, and keep the existing nullable=False and comments intact; apply this change to both Column(...) calls in this migration.backend/app/crud/evaluations/cron.py (1)
49-66: Minor inconsistency in dict access style (pre-existing).Lines 49–61 use direct key access (
summary["processed"]) while Lines 62–65 use.get("details", []). Both are safe since all three polling functions guarantee these keys, but you might consider making the access style uniform for consistency. Not blocking since this pattern was pre-existing before TTS was added.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/cron.py` around lines 49 - 66, Summary: Make dict access style consistent by using .get for all summary keys. Fix: Replace direct key indexing for processed/failed/still_processing in the computations of total_processed, total_failed, and total_still_processing to use text_summary.get("processed", 0), stt_summary.get("processed", 0), tts_summary.get("processed", 0) (and similar for "failed" and "still_processing"), and keep all_details using .get("details", []); this ensures uniform access for text_summary, stt_summary, and tts_summary and safe defaults if keys are ever missing.backend/app/api/routes/tts_evaluations/result.py (2)
64-83: DuplicatedTTSResultPublicconstruction across both endpoints.The mapping from
TTSResult→TTSResultPublic(including themetadata_extraction) is repeated verbatim. Consider extracting a small helper like_to_public(result: TTSResult) -> TTSResultPublicto keep it DRY—especially since this same pattern appears in the evaluation routes too.Also applies to: 109-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/tts_evaluations/result.py` around lines 64 - 83, The TTSResult→TTSResultPublic mapping is duplicated; extract a helper function (e.g. _to_public(result: TTSResult) -> TTSResultPublic) that centralizes building TTSResultPublic (including extracting duration_seconds and size_bytes from result.metadata_). Replace the inline constructions in this file’s endpoints (the current return block that creates TTSResultPublic and the similar block at lines 109-128) with calls to _to_public(result) so both endpoints reuse the same conversion logic.
44-62: Redundant existence check beforeupdate_tts_human_feedback.
update_tts_human_feedback(perbackend/app/crud/tts_evaluations/result.pylines 221-273) already validates existence viaget_tts_result_by_idand raisesHTTPException(404)if not found. The pre-check on lines 44-52 duplicates that query.You can remove the pre-check and rely on the CRUD layer's 404 handling, or remove the internal check from
update_tts_human_feedbackand keep it here—but having both is an unnecessary extra DB query.♻️ Suggested simplification
def update_result_feedback( _session: SessionDep, auth_context: AuthContextDep, result_id: int, feedback: TTSFeedbackUpdate = Body(...), ) -> APIResponse[TTSResultPublic]: """Update human feedback on a TTS result.""" logger.info( f"[update_result_feedback] Updating feedback | " f"result_id: {result_id}, is_correct: {feedback.is_correct}" ) - # Verify result exists and belongs to this project - existing = get_tts_result_by_id( - session=_session, - result_id=result_id, - org_id=auth_context.organization_.id, - project_id=auth_context.project_.id, - ) - - if not existing: - raise HTTPException(status_code=404, detail="Result not found") - # Update feedback result = update_tts_human_feedback(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/tts_evaluations/result.py` around lines 44 - 62, Remove the redundant existence check that calls get_tts_result_by_id before calling update_tts_human_feedback: delete the block that queries existing and raises HTTPException when falsy, and rely on update_tts_human_feedback to perform the existence validation (or conversely, if you prefer validation in the route, remove the internal check inside update_tts_human_feedback); locate the redundant code by the get_tts_result_by_id call and the subsequent if not existing / raise HTTPException block and remove it so only one DB existence check remains for result_id with auth_context.organization_.id and auth_context.project_.id.backend/app/crud/tts_evaluations/batch.py (1)
152-153: BareExceptionraised when all batch submissions fail.Using
raise Exception(...)loses context and makes it hard to distinguish this failure from unexpected errors upstream (e.g., in the API route'sexcept Exceptionhandler atevaluation.pyline 119). Consider defining or reusing a domain-specific exception.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/batch.py` around lines 152 - 153, Replace the bare raise Exception(...) used when batch_jobs is empty with a domain-specific exception: define a new exception class (e.g., BatchSubmissionError) in this module or a shared exceptions module and raise that instead (include the original error details or chain underlying exceptions via "from" where applicable) so callers can distinguish batch submission failures; update any upstream handlers if needed to catch BatchSubmissionError instead of a generic Exception.backend/app/crud/tts_evaluations/dataset.py (1)
117-173: Consider usingmodel_validateto reduce boilerplate in the mapping.The manual field-by-field construction of
TTSDatasetPublic(lines 157-169) is repeated here and in the API route'sget_dataset. IfTTSDatasetPublicfields align withEvaluationDataset, Pydantic'smodel_validate(orfrom_orm) can eliminate this boilerplate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/dataset.py` around lines 117 - 173, Replace the manual field-by-field construction in list_tts_datasets with Pydantic model validation/mapping to remove boilerplate: instead of building TTSDatasetPublic(...) for each EvaluationDataset, call TTSDatasetPublic.model_validate(dataset) (or TTSDatasetPublic.model_validate(dataset, mode="from_attributes") / use from_orm equivalent) to convert the ORM EvaluationDataset into the Pydantic TTSDatasetPublic; if needed, enable TTSDatasetPublic model_config (from_attributes/orm_mode) so model_validate/from_orm can read attributes from EvaluationDataset, and update the mapping in list_tts_datasets (and mirror the same change in get_dataset) to return [TTSDatasetPublic.model_validate(d) for d in datasets] and keep the same returned total.backend/app/crud/tts_evaluations/result.py (3)
62-64: Redundantflush()beforecommit().
session.commit()implicitly flushes, sosession.flush()on Line 63 is unnecessary here.Simplify
session.add_all(results) - session.flush() session.commit()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/result.py` around lines 62 - 64, Remove the redundant explicit flush before commit: the block that calls session.add_all(results) followed by session.flush() and session.commit() should drop the session.flush() call because session.commit() implicitly flushes; keep session.add_all(results) and then call session.commit() (reference: session.add_all, session.flush, session.commit in the tts_evaluations/result.py function).
320-328: Simplify dict comprehension todict()constructor.Per static analysis (Ruff C416),
{status: count for status, count in rows}can be simplified.Fix
- return {status: count for status, count in rows} + return dict(rows)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/result.py` around lines 320 - 328, The dict comprehension building the status->count map can be simplified: replace the final return that currently constructs {status: count for status, count in rows} with a direct dict conversion of the query result (rows) so the code returns dict(rows); locate the query using TTSResult (the statement variable) and the rows variable set from session.exec(statement).all(), and change only the return expression to use dict(rows).
177-219:update_tts_resultflushes but does not commit — document this caller obligation.This function intentionally uses
flush()to support batch contexts (e.g.,process_completed_tts_batchincron.pycommits once at the end). However, if called from a non-batch context, uncommitted changes will be silently lost. A brief docstring note about the caller's commit responsibility would prevent misuse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/result.py` around lines 177 - 219, Update the update_tts_result docstring to explicitly state that this function calls session.flush() but does not commit the transaction, so callers (e.g., process_completed_tts_batch) are responsible for calling session.commit() when not running inside a batch; reference the function name update_tts_result, the model TTSResult, and the use of now() to indicate this only persists flushed changes until the caller commits.backend/app/models/tts_evaluation.py (1)
207-230: Validator parameter namevalid_modelis misleading; default duplicates the constant.
The
field_validatorparameter (Line 220) is namedvalid_modelbut it receives the raw input before validation. A name likemodelsorv(Pydantic convention) would be clearer.The default factory on Line 213 hard-codes
["gemini-2.5-pro-preview-tts"]whileSUPPORTED_TTS_MODELSalready holds the same list. Reuse the constant to avoid drift.Suggested changes
models: list[str] = Field( - default_factory=lambda: ["gemini-2.5-pro-preview-tts"], + default_factory=lambda: list(SUPPORTED_TTS_MODELS), description="List of TTS models to use", min_length=1, ) `@field_validator`("models") `@classmethod` - def validate_models(cls, valid_model: list[str]) -> list[str]: + def validate_models(cls, v: list[str]) -> list[str]: """Validate that all models are supported.""" - if not valid_model: + if not v: raise ValueError("At least one model must be specified") - unsupported = [m for m in valid_model if m not in SUPPORTED_TTS_MODELS] + unsupported = [m for m in v if m not in SUPPORTED_TTS_MODELS] if unsupported: raise ValueError( f"Unsupported model(s): {', '.join(unsupported)}. " f"Supported models are: {', '.join(SUPPORTED_TTS_MODELS)}" ) - return valid_model + return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/tts_evaluation.py` around lines 207 - 230, In TTSEvaluationRunCreate, update the validator and default to be clearer and less brittle: rename the validator parameter in validate_models (the method decorated with `@field_validator`("models")) from valid_model to a conventional name like models or v to reflect it receives the raw input, and change the models Field default_factory from the hard-coded ["gemini-2.5-pro-preview-tts"] to reuse SUPPORTED_TTS_MODELS (e.g., default_factory=lambda: SUPPORTED_TTS_MODELS) so the default cannot drift from the supported list; keep the validation logic in validate_models the same but refer to the renamed parameter.backend/app/crud/tts_evaluations/run.py (1)
178-230:update_tts_runlacks org/project/type scoping, unlike other functions in this module.
get_tts_run_by_idandlist_tts_runsboth filter byorganization_id,project_id, andEvaluationType.TTS. This function queries only byrun_id(Line 202), which is inconsistent. While current callers already have valid run IDs, adding scoping here would be more defensive.Also, the
if value is not Noneguard (Line 217) means you can never explicitly clear a field (e.g., reseterror_messagetoNoneon a retry). Consider using a sentinel value or an explicit "fields to update" set if clearing is needed later.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/run.py` around lines 178 - 230, The update_tts_run function currently selects EvaluationRun only by run_id; change its lookup to also filter by organization_id, project_id and EvaluationType.TTS (similar to get_tts_run_by_id and list_tts_runs) so the update is properly scoped; additionally replace the current updates dict + "if value is not None" loop (which prevents clearing fields like error_message) with an explicit fields_to_update parameter or a sentinel (e.g., use a special UNSET constant) and only update attributes present in that set/whose value is not UNSET while allowing None to be written, then persist via session.add/session.commit/session.refresh and keep logging with update_tts_run and updated_at = now().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/routes/tts_evaluations/evaluation.py`:
- Around line 277-297: The GET handler get_tts_evaluation_run must include the
same run_metadata that start_tts_evaluation sets on TTSEvaluationRunPublic so
payloads match; update the TTSEvaluationRunWithResults construction in
get_tts_evaluation_run to populate run_metadata (e.g.,
run_metadata=run.run_metadata or the appropriate attribute on the run object)
and return it in the APIResponse (or remove from POST if you choose consistency
the other way), ensuring the field name matches TTSEvaluationRunPublic's
run_metadata.
- Line 130: Replace the HTTPException detail that interpolates the raw exception
(the raise HTTPException(status_code=500, detail=f"Batch submission failed:
{e}") line) with a generic message like "Batch submission failed" and do not
include the exception text; keep the existing server-side logging (the logger
calls around lines 120-123) to record full error details for debugging. Ensure
only the generic message is returned to the client while the original exception
is logged server-side.
- Around line 165-169: The helper _get_sample_texts_from_dataset uses
forward-string types that are unresolved by type checkers; import
EvaluationDataset and use a concrete sqlmodel Session type instead of the
FastAPI dependency alias SessionDep: add an import for EvaluationDataset (from
app.models) at the top of the file, replace the parameter type SessionDep with
Session in the _get_sample_texts_from_dataset signature, and ensure Session is
imported from sqlmodel (add it to the existing sqlmodel imports if missing).
- Around line 132-138: get_tts_run_by_id may return None but the code assumes
run is non-null; add a null guard after the refresh call that checks the run
variable returned by get_tts_run_by_id and handles the None case (e.g., raise a
clear HTTPException/return a 404 or appropriate error using the same
auth_context/project/org info) instead of dereferencing run.id later; update the
block around run = get_tts_run_by_id(...) to validate run and provide a
descriptive error path so subsequent accesses (run.id, run.status, etc.) are
safe.
In `@backend/app/crud/tts_evaluations/batch.py`:
- Around line 98-118: The config dict passed to start_batch_job uses the bare
variable model while GeminiBatchProvider was constructed with model_path =
f"models/{model}", causing create_batch (which prefers config.get("model",
self._model)) to receive an un-prefixed model name; update the config to pass
the prefixed model_path (i.e., use model_path instead of model) so the
provider/Google API gets the "models/..." name expected by
GeminiBatchProvider.create_batch; apply the same change in the STT batch code
where a bare model is passed.
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 466-474: The string passed as the subdirectory argument to
upload_to_object_store uses an unnecessary f-string prefix; update the call in
the block that prepares audio upload (where audio_filename is set and
upload_to_object_store is invoked) to use a plain string for subdirectory
(replace f"evaluations/tts/audio" with "evaluations/tts/audio") so there is no
redundant f-prefix in the subdirectory parameter.
- Around line 506-518: The exception handler in process_completed_tts_batch
currently re-raises without rolling back the SQLAlchemy session, which can leave
session in an invalid state and cause later commits (e.g., in
poll_all_pending_tts_evaluations → update_tts_run) to fail; fix by calling
session.rollback() (or using a nested savepoint / begin_nested around the
batch-result loop) before re-raising so the session is returned to a usable
state, ensuring you reference the same session variable and keep the existing
logger.error call and raise behavior.
- Around line 275-368: The code sets any_succeeded but never uses it and may
mark runs as "completed" even when some batch jobs failed; remove the unused
any_succeeded variable and ensure final_status reflects failures: change the
finalization logic around final_status (used before calling update_tts_run and
in the return) to set final_status to "failed" (or "completed_with_errors") when
any_failed is True (instead of basing it solely on pending == 0), and then call
update_tts_run and compute action from any_failed as done now so DB state and
returned action are consistent; update references in the block that computes
final_status, error_message, update_tts_run, and the returned dict accordingly
(functions/variables to edit: any_succeeded, any_failed, final_status,
update_tts_run, and the return payload).
In `@backend/app/services/tts_evaluations/dataset.py`:
- Around line 96-145: Currently _upload_samples_to_object_store swallows all
exceptions and returns None, which allows upload_tts_dataset to create a dataset
with object_store_url=None; instead, surface the failure so the dataset creation
aborts or the caller is notified: remove the broad except swallowing or re-raise
after logging (or raise a specific UploadError) from
_upload_samples_to_object_store, and update upload_tts_dataset to catch that
specific error and return/propagate an HTTP/functional error rather than
proceeding; reference _upload_samples_to_object_store, upload_tts_dataset and
_get_sample_texts_from_dataset when making these changes.
---
Nitpick comments:
In `@backend/app/alembic/versions/047_add_tts_evaluation_tables.py`:
- Around line 133-150: The migration creates a redundant single-column index
ix_tts_result_run_id on tts_result.evaluation_run_id because the composite
indexes idx_tts_result_feedback (evaluation_run_id, is_correct) and
idx_tts_result_status (evaluation_run_id, status) already cover queries
filtering only by evaluation_run_id; remove the op.create_index call for
"ix_tts_result_run_id" from upgrade(), and also remove the corresponding
op.drop_index for "ix_tts_result_run_id" from downgrade() so the migration no
longer creates or drops that redundant index (leave idx_tts_result_feedback and
idx_tts_result_status unchanged).
- Around line 103-114: The timestamp columns inserted_at and updated_at in the
Alembic revision (columns named "inserted_at" and "updated_at") are non-nullable
but lack server defaults; update their Column definitions to include a
server_default (e.g., server_default=sa.func.now() or sa.text('now()')) so
inserts outside the ORM won't fail, and keep the existing nullable=False and
comments intact; apply this change to both Column(...) calls in this migration.
In `@backend/app/api/routes/tts_evaluations/result.py`:
- Around line 64-83: The TTSResult→TTSResultPublic mapping is duplicated;
extract a helper function (e.g. _to_public(result: TTSResult) ->
TTSResultPublic) that centralizes building TTSResultPublic (including extracting
duration_seconds and size_bytes from result.metadata_). Replace the inline
constructions in this file’s endpoints (the current return block that creates
TTSResultPublic and the similar block at lines 109-128) with calls to
_to_public(result) so both endpoints reuse the same conversion logic.
- Around line 44-62: Remove the redundant existence check that calls
get_tts_result_by_id before calling update_tts_human_feedback: delete the block
that queries existing and raises HTTPException when falsy, and rely on
update_tts_human_feedback to perform the existence validation (or conversely, if
you prefer validation in the route, remove the internal check inside
update_tts_human_feedback); locate the redundant code by the
get_tts_result_by_id call and the subsequent if not existing / raise
HTTPException block and remove it so only one DB existence check remains for
result_id with auth_context.organization_.id and auth_context.project_.id.
In `@backend/app/crud/evaluations/cron.py`:
- Around line 49-66: Summary: Make dict access style consistent by using .get
for all summary keys. Fix: Replace direct key indexing for
processed/failed/still_processing in the computations of total_processed,
total_failed, and total_still_processing to use text_summary.get("processed",
0), stt_summary.get("processed", 0), tts_summary.get("processed", 0) (and
similar for "failed" and "still_processing"), and keep all_details using
.get("details", []); this ensures uniform access for text_summary, stt_summary,
and tts_summary and safe defaults if keys are ever missing.
In `@backend/app/crud/tts_evaluations/batch.py`:
- Around line 152-153: Replace the bare raise Exception(...) used when
batch_jobs is empty with a domain-specific exception: define a new exception
class (e.g., BatchSubmissionError) in this module or a shared exceptions module
and raise that instead (include the original error details or chain underlying
exceptions via "from" where applicable) so callers can distinguish batch
submission failures; update any upstream handlers if needed to catch
BatchSubmissionError instead of a generic Exception.
In `@backend/app/crud/tts_evaluations/dataset.py`:
- Around line 117-173: Replace the manual field-by-field construction in
list_tts_datasets with Pydantic model validation/mapping to remove boilerplate:
instead of building TTSDatasetPublic(...) for each EvaluationDataset, call
TTSDatasetPublic.model_validate(dataset) (or
TTSDatasetPublic.model_validate(dataset, mode="from_attributes") / use from_orm
equivalent) to convert the ORM EvaluationDataset into the Pydantic
TTSDatasetPublic; if needed, enable TTSDatasetPublic model_config
(from_attributes/orm_mode) so model_validate/from_orm can read attributes from
EvaluationDataset, and update the mapping in list_tts_datasets (and mirror the
same change in get_dataset) to return [TTSDatasetPublic.model_validate(d) for d
in datasets] and keep the same returned total.
In `@backend/app/crud/tts_evaluations/result.py`:
- Around line 62-64: Remove the redundant explicit flush before commit: the
block that calls session.add_all(results) followed by session.flush() and
session.commit() should drop the session.flush() call because session.commit()
implicitly flushes; keep session.add_all(results) and then call session.commit()
(reference: session.add_all, session.flush, session.commit in the
tts_evaluations/result.py function).
- Around line 320-328: The dict comprehension building the status->count map can
be simplified: replace the final return that currently constructs {status: count
for status, count in rows} with a direct dict conversion of the query result
(rows) so the code returns dict(rows); locate the query using TTSResult (the
statement variable) and the rows variable set from
session.exec(statement).all(), and change only the return expression to use
dict(rows).
- Around line 177-219: Update the update_tts_result docstring to explicitly
state that this function calls session.flush() but does not commit the
transaction, so callers (e.g., process_completed_tts_batch) are responsible for
calling session.commit() when not running inside a batch; reference the function
name update_tts_result, the model TTSResult, and the use of now() to indicate
this only persists flushed changes until the caller commits.
In `@backend/app/crud/tts_evaluations/run.py`:
- Around line 178-230: The update_tts_run function currently selects
EvaluationRun only by run_id; change its lookup to also filter by
organization_id, project_id and EvaluationType.TTS (similar to get_tts_run_by_id
and list_tts_runs) so the update is properly scoped; additionally replace the
current updates dict + "if value is not None" loop (which prevents clearing
fields like error_message) with an explicit fields_to_update parameter or a
sentinel (e.g., use a special UNSET constant) and only update attributes present
in that set/whose value is not UNSET while allowing None to be written, then
persist via session.add/session.commit/session.refresh and keep logging with
update_tts_run and updated_at = now().
In `@backend/app/models/tts_evaluation.py`:
- Around line 207-230: In TTSEvaluationRunCreate, update the validator and
default to be clearer and less brittle: rename the validator parameter in
validate_models (the method decorated with `@field_validator`("models")) from
valid_model to a conventional name like models or v to reflect it receives the
raw input, and change the models Field default_factory from the hard-coded
["gemini-2.5-pro-preview-tts"] to reuse SUPPORTED_TTS_MODELS (e.g.,
default_factory=lambda: SUPPORTED_TTS_MODELS) so the default cannot drift from
the supported list; keep the validation logic in validate_models the same but
refer to the renamed parameter.
In `@backend/app/services/tts_evaluations/audio.py`:
- Around line 3-5: The import statement for the symbol struct in
backend/app/services/tts_evaluations/audio.py is unused and should be removed to
clean up the module imports; update the top-of-file imports by deleting the
struct import so only io and wave remain (leave any other existing imports
untouched).
| status="failed", | ||
| error_message=str(e), | ||
| ) | ||
| raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}") |
There was a problem hiding this comment.
Internal error details leaked to the client.
f"Batch submission failed: {e}" exposes the raw exception message (which could contain credentials, internal paths, or stack details) in the HTTP 500 response body. Return a generic message and rely on the server-side log (line 120-123) for debugging.
🔧 Suggested fix
- raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}")
+ raise HTTPException(status_code=500, detail="Batch submission failed")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise HTTPException(status_code=500, detail=f"Batch submission failed: {e}") | |
| raise HTTPException(status_code=500, detail="Batch submission failed") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/routes/tts_evaluations/evaluation.py` at line 130, Replace
the HTTPException detail that interpolates the raw exception (the raise
HTTPException(status_code=500, detail=f"Batch submission failed: {e}") line)
with a generic message like "Batch submission failed" and do not include the
exception text; keep the existing server-side logging (the logger calls around
lines 120-123) to record full error details for debugging. Ensure only the
generic message is returned to the client while the original exception is logged
server-side.
| model_path = f"models/{model}" | ||
| batch_provider = GeminiBatchProvider( | ||
| client=gemini_client.client, model=model_path | ||
| ) | ||
|
|
||
| try: | ||
| batch_job = start_batch_job( | ||
| session=session, | ||
| provider=batch_provider, | ||
| provider_name="google", | ||
| job_type="tts_evaluation", | ||
| organization_id=org_id, | ||
| project_id=project_id, | ||
| jsonl_data=jsonl_data, | ||
| config={ | ||
| "model": model, | ||
| "tts_provider": model, | ||
| "evaluation_run_id": run.id, | ||
| "voice_name": DEFAULT_VOICE_NAME, | ||
| "style_prompt": DEFAULT_STYLE_PROMPT, | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the actual file content around lines 98-118
cat -n backend/app/crud/tts_evaluations/batch.py | sed -n '90,125p'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 1512
🏁 Script executed:
# Find the GeminiBatchProvider class to see how create_batch uses the model parameter
fd -t f '*.py' | xargs grep -l "class GeminiBatchProvider" | head -1Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 358
🏁 Script executed:
# Search for GeminiBatchProvider.create_batch method
rg -n "def create_batch" --type=py -A20 backend/app/core/batch/Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 4866
🏁 Script executed:
# Look for DEFAULT_MODEL in GeminiBatchProvider
rg -n "DEFAULT_MODEL\|class GeminiBatchProvider" --type=py -B5 -A15Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check how STT batch code handles model paths for comparison
rg -n "model_path\|models/" --type=py backend/app/crud/tts_evaluations/ -C2Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Get the full GeminiBatchProvider class to see DEFAULT_MODEL and the implementation
cat -n backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 20920
🏁 Script executed:
# Check how create_batch uses the model variable
sed -n '68,120p' backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 2177
🏁 Script executed:
# Search for other uses of GeminiBatchProvider to see how model is typically passed
rg -n "GeminiBatchProvider" --type=py -B3 -A3Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 17820
🏁 Script executed:
# Look for how the model is actually used in the batch request creation
sed -n '88,115p' backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1076
🏁 Script executed:
# Check the STT batch code to see how it handles the config model parameter
sed -n '138,170p' backend/app/crud/stt_evaluations/batch.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1119
🏁 Script executed:
# Look at the full context of config in STT batch
sed -n '140,180p' backend/app/crud/stt_evaluations/batch.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1457
Model name inconsistency: config passes bare model name while provider expects "models/" prefix.
Line 98 constructs model_path = f"models/{model}" and passes it to GeminiBatchProvider, but line 113 passes the bare model (without "models/" prefix) in the config dict. Inside GeminiBatchProvider.create_batch (line 88), the config value takes priority (config.get("model", self._model)), so the Gemini API receives the model name without the prefix at line 109—contradicting the provider's DEFAULT_MODEL = "models/gemini-2.5-pro" convention. This will cause the batch job creation to fail with an invalid model error.
The same pattern exists in the STT batch code (backend/app/crud/stt_evaluations/batch.py line 154), indicating a systemic issue.
🐛 Suggested fix — use the prefixed path in config
model_path = f"models/{model}"
batch_provider = GeminiBatchProvider(
client=gemini_client.client, model=model_path
)
try:
batch_job = start_batch_job(
session=session,
provider=batch_provider,
provider_name="google",
job_type="tts_evaluation",
organization_id=org_id,
project_id=project_id,
jsonl_data=jsonl_data,
config={
- "model": model,
+ "model": model_path,
"tts_provider": model,
"evaluation_run_id": run.id,
"voice_name": DEFAULT_VOICE_NAME,
"style_prompt": DEFAULT_STYLE_PROMPT,
},
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model_path = f"models/{model}" | |
| batch_provider = GeminiBatchProvider( | |
| client=gemini_client.client, model=model_path | |
| ) | |
| try: | |
| batch_job = start_batch_job( | |
| session=session, | |
| provider=batch_provider, | |
| provider_name="google", | |
| job_type="tts_evaluation", | |
| organization_id=org_id, | |
| project_id=project_id, | |
| jsonl_data=jsonl_data, | |
| config={ | |
| "model": model, | |
| "tts_provider": model, | |
| "evaluation_run_id": run.id, | |
| "voice_name": DEFAULT_VOICE_NAME, | |
| "style_prompt": DEFAULT_STYLE_PROMPT, | |
| }, | |
| model_path = f"models/{model}" | |
| batch_provider = GeminiBatchProvider( | |
| client=gemini_client.client, model=model_path | |
| ) | |
| try: | |
| batch_job = start_batch_job( | |
| session=session, | |
| provider=batch_provider, | |
| provider_name="google", | |
| job_type="tts_evaluation", | |
| organization_id=org_id, | |
| project_id=project_id, | |
| jsonl_data=jsonl_data, | |
| config={ | |
| "model": model_path, | |
| "tts_provider": model, | |
| "evaluation_run_id": run.id, | |
| "voice_name": DEFAULT_VOICE_NAME, | |
| "style_prompt": DEFAULT_STYLE_PROMPT, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/tts_evaluations/batch.py` around lines 98 - 118, The config
dict passed to start_batch_job uses the bare variable model while
GeminiBatchProvider was constructed with model_path = f"models/{model}", causing
create_batch (which prefers config.get("model", self._model)) to receive an
un-prefixed model name; update the config to pass the prefixed model_path (i.e.,
use model_path instead of model) so the provider/Google API gets the
"models/..." name expected by GeminiBatchProvider.create_batch; apply the same
change in the STT batch code where a bare model is passed.
| def _upload_samples_to_object_store( | ||
| session: Session, | ||
| project_id: int, | ||
| dataset_name: str, | ||
| samples: list[TTSSampleCreate], | ||
| ) -> str | None: | ||
| """Upload TTS samples as CSV to object store. | ||
|
|
||
| Args: | ||
| session: Database session | ||
| project_id: Project ID for storage credentials | ||
| dataset_name: Dataset name for filename | ||
| samples: List of samples to upload | ||
|
|
||
| Returns: | ||
| Object store URL if successful, None otherwise | ||
| """ | ||
| try: | ||
| storage = get_cloud_storage(session=session, project_id=project_id) | ||
|
|
||
| csv_content = _samples_to_csv(samples) | ||
|
|
||
| filename = generate_timestamped_filename(dataset_name, "csv") | ||
| object_store_url = upload_to_object_store( | ||
| storage=storage, | ||
| content=csv_content, | ||
| filename=filename, | ||
| subdirectory="tts_datasets", | ||
| content_type="text/csv", | ||
| ) | ||
|
|
||
| if object_store_url: | ||
| logger.info( | ||
| f"[_upload_samples_to_object_store] Upload successful | " | ||
| f"url={object_store_url}" | ||
| ) | ||
| else: | ||
| logger.info( | ||
| "[_upload_samples_to_object_store] Upload returned None | " | ||
| "continuing without object store storage" | ||
| ) | ||
|
|
||
| return object_store_url | ||
|
|
||
| except Exception as e: | ||
| logger.warning( | ||
| f"[_upload_samples_to_object_store] Failed to upload | {e}", | ||
| exc_info=True, | ||
| ) | ||
| return None |
There was a problem hiding this comment.
Silent upload failure creates a dataset that cannot be used for evaluation runs.
If _upload_samples_to_object_store fails (line 140-145), it returns None and upload_tts_dataset still creates a dataset record with object_store_url=None. Downstream, _get_sample_texts_from_dataset in evaluation.py (line 180) returns an empty list when object_store_url is absent, causing any evaluation run against this dataset to fail with "Could not load samples from dataset".
Consider either propagating the upload error so the dataset isn't created at all, or at minimum warning the caller in the response.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/tts_evaluations/dataset.py` around lines 96 - 145,
Currently _upload_samples_to_object_store swallows all exceptions and returns
None, which allows upload_tts_dataset to create a dataset with
object_store_url=None; instead, surface the failure so the dataset creation
aborts or the caller is notified: remove the broad except swallowing or re-raise
after logging (or raise a specific UploadError) from
_upload_samples_to_object_store, and update upload_tts_dataset to catch that
specific error and return/propagate an HTTP/functional error rather than
proceeding; reference _upload_samples_to_object_store, upload_tts_dataset and
_get_sample_texts_from_dataset when making these changes.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
backend/app/alembic/versions/048_add_tts_evaluation_tables.py (3)
133-150: Standardize index name prefix —ix_vsidx_are mixed.
ix_tts_result_run_iduses SQLAlchemy's conventionalix_prefix while the two composite indexes useidx_. Pick one and apply it consistently across all three.♻️ Proposed fix (standardize on `ix_`)
op.create_index( - "idx_tts_result_feedback", + "ix_tts_result_feedback", "tts_result", ["evaluation_run_id", "is_correct"], unique=False, ) op.create_index( - "idx_tts_result_status", + "ix_tts_result_status", "tts_result", ["evaluation_run_id", "status"], unique=False, )And mirror the renames in
downgrade():- op.drop_index("idx_tts_result_status", table_name="tts_result") - op.drop_index("idx_tts_result_feedback", table_name="tts_result") + op.drop_index("ix_tts_result_status", table_name="tts_result") + op.drop_index("ix_tts_result_feedback", table_name="tts_result")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines 133 - 150, The three index names are inconsistent: change "idx_tts_result_feedback" and "idx_tts_result_status" to use the same "ix_" prefix as "ix_tts_result_run_id" by renaming them to "ix_tts_result_feedback" and "ix_tts_result_status" in the op.create_index calls; also mirror these exact name changes in the downgrade() op.drop_index calls so upgrade/downgrade use matching index names. Use the existing op.create_index/op.drop_index calls to locate where to update the names.
54-60:statushas no DB-level constraint on valid values.The column accepts any string ≤ 20 characters. Adding a
CHECKconstraint (or using a native PostgreSQLENUM) ensures the DB rejects anything outsidePENDING,SUCCESS,FAILED— a cheap safety net against data corruption from direct DB writes or bugs.♻️ Proposed fix (add CHECK constraint inline)
sa.Column( "status", sa.String(length=20), nullable=False, server_default="PENDING", comment="Result status: PENDING, SUCCESS, FAILED", ), +sa.CheckConstraint( + "status IN ('PENDING', 'SUCCESS', 'FAILED')", + name="ck_tts_result_status", +),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines 54 - 60, The migration defines the "status" Column in the 048_add_tts_evaluation_tables migration but doesn't enforce allowed values; update the Column definition for "status" to include a DB-level constraint that restricts values to 'PENDING','SUCCESS','FAILED' (either an inline sa.CheckConstraint on the Column or use a PostgreSQL sa.Enum and add/drop it in upgrade()/downgrade()); modify the "status" Column declaration in the migration (the sa.Column("status", sa.String(...)) line) to include the constraint and ensure the downgrade removes the constraint/enum accordingly.
115-130: Name the remaining FK constraints for consistency and manageability.
evaluation_run_idhasname="fk_tts_result_run_id"but theorganization_idandproject_idconstraints are unnamed, making them hard to reference in future migrations or schema introspection.♻️ Proposed fix
sa.ForeignKeyConstraint( ["organization_id"], ["organization.id"], + name="fk_tts_result_organization_id", ondelete="CASCADE", ), sa.ForeignKeyConstraint( ["project_id"], ["project.id"], + name="fk_tts_result_project_id", ondelete="CASCADE", ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines 115 - 130, The two unnamed ForeignKeyConstraint entries in the TTS result table should be given explicit names for consistency with the existing evaluation_run FK (name="fk_tts_result_run_id"); update the organization_id constraint to include a name like "fk_tts_result_organization_id" and the project_id constraint to include a name like "fk_tts_result_project_id" (keeping ondelete="CASCADE"), so the ForeignKeyConstraint definitions for organization_id and project_id mirror the pattern used by the existing evaluation_run constraint.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py`:
- Line 20: Add explicit return type annotations to the migration functions:
update the function signatures for upgrade and downgrade to include "-> None"
(i.e., def upgrade() -> None: and def downgrade() -> None:) so both functions
have explicit return type hints; locate the definitions named upgrade and
downgrade in this migration module (the top-level functions in
048_add_tts_evaluation_tables.py) and modify their signatures accordingly.
- Around line 103-114: The migration defines non-nullable timestamp columns
inserted_at and updated_at without server defaults which will cause NOT NULL
violations on inserts; update the sa.Column definitions for "inserted_at" and
"updated_at" in the migration (functions/classes around the table creation in
the 048 migration) to include server_default=sa.text("now()") so they match the
established pattern used in other revisions (e.g., migrations 005/032/040) and
avoid failing raw/bulk inserts or tests.
---
Nitpick comments:
In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py`:
- Around line 133-150: The three index names are inconsistent: change
"idx_tts_result_feedback" and "idx_tts_result_status" to use the same "ix_"
prefix as "ix_tts_result_run_id" by renaming them to "ix_tts_result_feedback"
and "ix_tts_result_status" in the op.create_index calls; also mirror these exact
name changes in the downgrade() op.drop_index calls so upgrade/downgrade use
matching index names. Use the existing op.create_index/op.drop_index calls to
locate where to update the names.
- Around line 54-60: The migration defines the "status" Column in the
048_add_tts_evaluation_tables migration but doesn't enforce allowed values;
update the Column definition for "status" to include a DB-level constraint that
restricts values to 'PENDING','SUCCESS','FAILED' (either an inline
sa.CheckConstraint on the Column or use a PostgreSQL sa.Enum and add/drop it in
upgrade()/downgrade()); modify the "status" Column declaration in the migration
(the sa.Column("status", sa.String(...)) line) to include the constraint and
ensure the downgrade removes the constraint/enum accordingly.
- Around line 115-130: The two unnamed ForeignKeyConstraint entries in the TTS
result table should be given explicit names for consistency with the existing
evaluation_run FK (name="fk_tts_result_run_id"); update the organization_id
constraint to include a name like "fk_tts_result_organization_id" and the
project_id constraint to include a name like "fk_tts_result_project_id" (keeping
ondelete="CASCADE"), so the ForeignKeyConstraint definitions for organization_id
and project_id mirror the pattern used by the existing evaluation_run
constraint.
| sa.Column( | ||
| "inserted_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the result was created", | ||
| ), | ||
| sa.Column( | ||
| "updated_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the result was last updated", | ||
| ), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "inserted_at|updated_at" --type=py backend/app/alembic/versions/ -A 3 | head -100Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 9016
🏁 Script executed:
rg -n "server_default" --type=py backend/app/alembic/versions/ | head -50Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 5826
🏁 Script executed:
wc -l backend/app/alembic/versions/048_add_tts_evaluation_tables.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 140
🏁 Script executed:
cat -n backend/app/alembic/versions/048_add_tts_evaluation_tables.py | head -120Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 4189
inserted_at and updated_at are non-nullable with no server_default — inserts without explicit values will fail.
Both columns are nullable=False but lack a server_default. Any INSERT that doesn't supply these explicitly (raw SQL, test fixtures, bulk operations) will hit a NOT NULL constraint violation. The established pattern across migrations (005, 032, 040) consistently uses server_default=sa.text("now()") for timestamp columns.
🐛 Proposed fix
sa.Column(
"inserted_at",
sa.DateTime(),
nullable=False,
+ server_default=sa.text("now()"),
comment="Timestamp when the result was created",
),
sa.Column(
"updated_at",
sa.DateTime(),
nullable=False,
+ server_default=sa.text("now()"),
comment="Timestamp when the result was last updated",
),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sa.Column( | |
| "inserted_at", | |
| sa.DateTime(), | |
| nullable=False, | |
| comment="Timestamp when the result was created", | |
| ), | |
| sa.Column( | |
| "updated_at", | |
| sa.DateTime(), | |
| nullable=False, | |
| comment="Timestamp when the result was last updated", | |
| ), | |
| sa.Column( | |
| "inserted_at", | |
| sa.DateTime(), | |
| nullable=False, | |
| server_default=sa.text("now()"), | |
| comment="Timestamp when the result was created", | |
| ), | |
| sa.Column( | |
| "updated_at", | |
| sa.DateTime(), | |
| nullable=False, | |
| server_default=sa.text("now()"), | |
| comment="Timestamp when the result was last updated", | |
| ), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/alembic/versions/048_add_tts_evaluation_tables.py` around lines
103 - 114, The migration defines non-nullable timestamp columns inserted_at and
updated_at without server defaults which will cause NOT NULL violations on
inserts; update the sa.Column definitions for "inserted_at" and "updated_at" in
the migration (functions/classes around the table creation in the 048 migration)
to include server_default=sa.text("now()") so they match the established pattern
used in other revisions (e.g., migrations 005/032/040) and avoid failing
raw/bulk inserts or tests.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
backend/app/crud/tts_evaluations/cron.py (3)
517-523:⚠️ Potential issue | 🟠 MajorAdd
session.rollback()before re-raising to keep the session usable for the caller.If an exception (e.g.,
IntegrityError) leaves the session in an invalidated state, the caller's subsequentupdate_tts_runcall will also fail, preventing the run from being marked as"failed".Proposed fix
except Exception as e: logger.error( f"[process_completed_tts_batch] Failed to process batch results | " f"batch_job_id={batch_job.id}, error={str(e)}", exc_info=True, ) + session.rollback() raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 517 - 523, The exception handler in process_completed_tts_batch currently logs and re-raises without rolling back the DB session; add a session.rollback() before re-raising so the SQLAlchemy session is returned to a usable state for the caller (e.g., so the subsequent update_tts_run call can run and mark the run failed). Locate the except Exception as e block in process_completed_tts_batch, call session.rollback() (or the appropriate Session object used in that scope) immediately after logging and before the raise, then re-raise the exception.
280-373:⚠️ Potential issue | 🟠 Major
any_succeededis unused;final_statuscan be"completed"even when batch jobs failed.These two issues remain from the prior review round:
any_succeeded(Lines 281, 291, 327) is assigned but never read — remove it.- When
any_failed is Trueandpending == 0,final_statusis set to"completed"(Line 349) whileactionis"failed"(Line 363). The DB record will be marked"completed"despite batch-level failures. Consider settingfinal_status = "failed"whenany_failedis true.Proposed fix
all_terminal = True - any_succeeded = False any_failed = False errors: list[str] = [] for batch_job in batch_jobs: provider_name = batch_job.config.get("tts_provider", "unknown") # Skip batch jobs already in terminal state if batch_job.provider_status in TERMINAL_STATES: - if batch_job.provider_status == BatchJobState.SUCCEEDED.value: - any_succeeded = True - else: + if batch_job.provider_status != BatchJobState.SUCCEEDED.value: any_failed = True errors.append( f"{provider_name}: {batch_job.error_message or batch_job.provider_status}" ) continue ... if provider_status == BatchJobState.SUCCEEDED.value: await process_completed_tts_batch(...) - any_succeeded = True else: any_failed = True ... ... - final_status = "completed" if pending == 0 else "processing" + if any_failed: + final_status = "failed" + elif pending == 0: + final_status = "completed" + else: + final_status = "processing" ... - action = "completed" if not any_failed else "failed" + action = final_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 280 - 373, Remove the unused any_succeeded variable and its assignments (references in the loop where any_succeeded is set) and adjust final_status logic so a run with any_failed == True is marked "failed" rather than "completed": compute final_status from pending and any_failed (e.g., if any_failed then "failed" else "completed" if pending == 0 else "processing") before calling update_tts_run and when constructing the returned payload; ensure update_tts_run(...) and the action variable still reflect the new final_status.
471-479:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix on string without placeholders.Proposed fix
- subdirectory=f"evaluations/tts/audio", + subdirectory="evaluations/tts/audio",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 471 - 479, The filename construction uses an unnecessary f-string prefix; change the audio_filename assignment in cron.py (variable audio_filename used before calling upload_to_object_store) to build the string without an f-string, e.g. audio_filename = str(uuid.uuid4()) + ".wav" (or use "{}.wav".format(uuid.uuid4())), so the code no longer uses an extraneous f prefix while preserving the same filename value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 376-529: process_completed_tts_batch is async but calls blocking
functions and will block the event loop; either make it sync or run blocking
work in a thread. Recommended quick fix: keep process_completed_tts_batch async
and wrap blocking calls with asyncio.to_thread() (and await them) — specifically
wrap batch_provider.download_batch_results(provider_batch_id),
upload_to_object_store(...), base64.b64decode(audio_b64), and
pcm_to_wav(pcm_data) so they run off the event loop; leave small CPU ops (e.g.,
calculate_duration) inline or also push to threads if needed; ensure you await
the asyncio.to_thread() results and keep session.commit() behavior unchanged.
- Around line 376-381: The parameter batch_job in process_completed_tts_batch is
typed as Any but is used as a BatchJob; change its annotation from Any to
BatchJob, update the function signature async def
process_completed_tts_batch(session: Session, run: EvaluationRun, batch_job:
BatchJob, batch_provider: GeminiBatchProvider) -> None, and add the necessary
import for BatchJob at the top of the file (or from its module) so static type
checkers recognize attributes like batch_job.id, batch_job.config, and
batch_job.provider_batch_id.
---
Duplicate comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 517-523: The exception handler in process_completed_tts_batch
currently logs and re-raises without rolling back the DB session; add a
session.rollback() before re-raising so the SQLAlchemy session is returned to a
usable state for the caller (e.g., so the subsequent update_tts_run call can run
and mark the run failed). Locate the except Exception as e block in
process_completed_tts_batch, call session.rollback() (or the appropriate Session
object used in that scope) immediately after logging and before the raise, then
re-raise the exception.
- Around line 280-373: Remove the unused any_succeeded variable and its
assignments (references in the loop where any_succeeded is set) and adjust
final_status logic so a run with any_failed == True is marked "failed" rather
than "completed": compute final_status from pending and any_failed (e.g., if
any_failed then "failed" else "completed" if pending == 0 else "processing")
before calling update_tts_run and when constructing the returned payload; ensure
update_tts_run(...) and the action variable still reflect the new final_status.
- Around line 471-479: The filename construction uses an unnecessary f-string
prefix; change the audio_filename assignment in cron.py (variable audio_filename
used before calling upload_to_object_store) to build the string without an
f-string, e.g. audio_filename = str(uuid.uuid4()) + ".wav" (or use
"{}.wav".format(uuid.uuid4())), so the code no longer uses an extraneous f
prefix while preserving the same filename value.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
backend/app/crud/tts_evaluations/cron.py (5)
538-544: Pre-existing: nosession.rollback()before re-raise may leave the session invalid.This was flagged in a previous review cycle and remains unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 538 - 544, The except block in process_completed_tts_batch logs and re-raises exceptions without rolling back the DB session (leaving it invalid); update the except handler to call session.rollback() before re-raising so the SQLAlchemy session is returned to a clean state (e.g., invoke session.rollback() immediately before the logger.error and/or before raise in the except Exception as e block that currently logs batch_job.id and re-raises). Ensure you reference the same session object used in process_completed_tts_batch and keep exc_info logging as-is.
284-394: Pre-existing:any_succeededunused;final_status/actioncan disagree whenany_failedisTrue.These issues were raised in a previous review cycle and remain unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 284 - 394, The variable any_succeeded is never used—remove it (or wire it into logic if intended) and eliminate related assignments; ensure final_status and action cannot disagree by deriving final_status from any_failed and pending: compute status_counts via count_results_by_status(session, run_id=run.id), set final_status = "failed" if any_failed else ("completed" if pending == 0 else "processing"), set action = "failed" if any_failed else "completed", then call update_tts_run(...) with that final_status and error_message; update references to JobStatus, any_failed, final_status, action, and update_tts_run in the function to keep the run result consistent.
397-402: Pre-existing:batch_jobtyped asAnyinstead ofBatchJob.This was flagged in a previous review cycle and remains unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 397 - 402, The parameter batch_job in process_completed_tts_batch is currently typed as Any; change its annotation to the correct BatchJob type (e.g., batch_job: BatchJob) and import BatchJob from the module where GeminiBatchProvider/BatchJob are defined so static typing picks it up; update any other references/usages in this file that assume Any to use the proper type annotation to satisfy the earlier review comment.
397-550: Pre-existing:process_completed_tts_batchisasyncbut all I/O is blocking — noawaitin the function body.This was flagged in a previous review cycle and remains unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` around lines 397 - 550, The function process_completed_tts_batch is declared async but only does blocking I/O; either make it synchronous (remove async) or convert blocking calls to async-compatible thread calls — e.g., wrap get_cloud_storage, batch_provider.download_batch_results, session.exec / session.commit, upload_to_object_store, and update_tts_result calls with asyncio.to_thread (or anyio.to_thread) and await them so the async function doesn't block the event loop; update references to those symbols (process_completed_tts_batch, batch_provider.download_batch_results, get_cloud_storage, upload_to_object_store, session.exec, session.commit, update_tts_result) accordingly so callers remain correct.
498-498: Pre-existing: extraneousfprefix onf"evaluations/tts/audio"— no interpolation.This was flagged in a previous review cycle and remains unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` at line 498, The code uses an unnecessary f-string prefix in the call using subdirectory=f"evaluations/tts/audio" in cron.py; replace it with a plain string subdirectory="evaluations/tts/audio" (remove the leading f) wherever this exact parameter occurs to eliminate the extraneous interpolation marker.
🧹 Nitpick comments (1)
backend/app/crud/tts_evaluations/cron.py (1)
38-38: Consider movingGeminiClientto a shared location to avoid the STT→TTS cross-module dependency.TTS cron pulling
GeminiClientfromapp.services.stt_evaluations.geminicreates coupling between two evaluation domains. IfGeminiClientis common to both, it belongs inapp.services.geminiorapp.core.batch.#!/bin/bash # Check all importers of GeminiClient to understand its current reach rg -n --type py 'from app\.services\.stt_evaluations\.gemini import GeminiClient'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/cron.py` at line 38, TTS cron is importing GeminiClient from app.services.stt_evaluations.gemini which creates a cross-domain dependency; move the GeminiClient class into a shared module (e.g., app.services.gemini or app.core.batch), update the class location, then change the import in backend/app/crud/tts_evaluations/cron.py to import GeminiClient from the new shared module and update any other files that import from app.services.stt_evaluations.gemini (use the unique symbol GeminiClient to find references and ensure tests/imports are updated accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/core/batch/gemini.py`:
- Around line 222-230: The code currently uses a truthiness check on
parsed.get("response") which will skip valid-but-empty response payloads; change
the conditional to check for key presence (e.g., if "response" in parsed) and
then set response_obj = parsed.get("response") and append the dict to results
using BATCH_KEY and custom_id so empty responses are not dropped; update the
block around parsed, response_obj, results, BATCH_KEY and custom_id accordingly.
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 319-325: poll_tts_run currently calls the synchronous
poll_batch_status (which calls provider.get_batch_status) directly and will
block the event loop; change the call so the blocking work runs off the event
loop (e.g. wrap poll_batch_status in asyncio.to_thread or loop.run_in_executor)
when invoked from the async poll_tts_run; do the same symmetric fix for
poll_stt_run to ensure provider.get_batch_status is executed in a
thread/executor rather than synchronously on the event loop.
---
Duplicate comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Around line 538-544: The except block in process_completed_tts_batch logs and
re-raises exceptions without rolling back the DB session (leaving it invalid);
update the except handler to call session.rollback() before re-raising so the
SQLAlchemy session is returned to a clean state (e.g., invoke session.rollback()
immediately before the logger.error and/or before raise in the except Exception
as e block that currently logs batch_job.id and re-raises). Ensure you reference
the same session object used in process_completed_tts_batch and keep exc_info
logging as-is.
- Around line 284-394: The variable any_succeeded is never used—remove it (or
wire it into logic if intended) and eliminate related assignments; ensure
final_status and action cannot disagree by deriving final_status from any_failed
and pending: compute status_counts via count_results_by_status(session,
run_id=run.id), set final_status = "failed" if any_failed else ("completed" if
pending == 0 else "processing"), set action = "failed" if any_failed else
"completed", then call update_tts_run(...) with that final_status and
error_message; update references to JobStatus, any_failed, final_status, action,
and update_tts_run in the function to keep the run result consistent.
- Around line 397-402: The parameter batch_job in process_completed_tts_batch is
currently typed as Any; change its annotation to the correct BatchJob type
(e.g., batch_job: BatchJob) and import BatchJob from the module where
GeminiBatchProvider/BatchJob are defined so static typing picks it up; update
any other references/usages in this file that assume Any to use the proper type
annotation to satisfy the earlier review comment.
- Around line 397-550: The function process_completed_tts_batch is declared
async but only does blocking I/O; either make it synchronous (remove async) or
convert blocking calls to async-compatible thread calls — e.g., wrap
get_cloud_storage, batch_provider.download_batch_results, session.exec /
session.commit, upload_to_object_store, and update_tts_result calls with
asyncio.to_thread (or anyio.to_thread) and await them so the async function
doesn't block the event loop; update references to those symbols
(process_completed_tts_batch, batch_provider.download_batch_results,
get_cloud_storage, upload_to_object_store, session.exec, session.commit,
update_tts_result) accordingly so callers remain correct.
- Line 498: The code uses an unnecessary f-string prefix in the call using
subdirectory=f"evaluations/tts/audio" in cron.py; replace it with a plain string
subdirectory="evaluations/tts/audio" (remove the leading f) wherever this exact
parameter occurs to eliminate the extraneous interpolation marker.
---
Nitpick comments:
In `@backend/app/crud/tts_evaluations/cron.py`:
- Line 38: TTS cron is importing GeminiClient from
app.services.stt_evaluations.gemini which creates a cross-domain dependency;
move the GeminiClient class into a shared module (e.g., app.services.gemini or
app.core.batch), update the class location, then change the import in
backend/app/crud/tts_evaluations/cron.py to import GeminiClient from the new
shared module and update any other files that import from
app.services.stt_evaluations.gemini (use the unique symbol GeminiClient to find
references and ensure tests/imports are updated accordingly).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/core/batch/gemini.pybackend/app/crud/stt_evaluations/cron.pybackend/app/crud/tts_evaluations/cron.pybackend/app/tests/core/batch/test_gemini.py
| # Return the raw response so callers can extract | ||
| # text (STT) or audio (TTS) as needed. | ||
| response_obj = parsed.get("response") | ||
| if response_obj: | ||
| text = self._extract_text_from_response_dict(response_obj) | ||
| results.append( | ||
| { | ||
| BATCH_KEY: custom_id, | ||
| "response": {"text": text}, | ||
| "response": response_obj, | ||
| "error": None, |
There was a problem hiding this comment.
Don’t drop empty response objects.
Using a truthiness check can skip valid-but-empty response payloads. Prefer key presence to avoid losing results.
✅ Proposed fix
- response_obj = parsed.get("response")
- if response_obj:
+ if "response" in parsed:
+ response_obj = parsed.get("response")
results.append(
{
BATCH_KEY: custom_id,
"response": response_obj,
"error": None,
}
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/core/batch/gemini.py` around lines 222 - 230, The code currently
uses a truthiness check on parsed.get("response") which will skip
valid-but-empty response payloads; change the conditional to check for key
presence (e.g., if "response" in parsed) and then set response_obj =
parsed.get("response") and append the dict to results using BATCH_KEY and
custom_id so empty responses are not dropped; update the block around parsed,
response_obj, results, BATCH_KEY and custom_id accordingly.
…/kaapi-backend into feature/tts-evaluation
| TTSResult.provider == tts_provider, | ||
| ) | ||
| result_map: dict[int, TTSResult] = { | ||
| r.id: r for r in session.exec(prefetch_stmt).all() # type: ignore[misc] |
There was a problem hiding this comment.
This line is unreadable and difficult to debug
| TTSResult.project_id == project_id, | ||
| ] | ||
|
|
||
| statement = select(TTSResult).where(*where_clauses).order_by(TTSResult.id) |
There was a problem hiding this comment.
instead of ordering by id, reverse chronological ordering looks to be a better choice
There was a problem hiding this comment.
why is that? can u elaborate
| EvaluationRun | None: Updated run | ||
| """ | ||
| where_clauses = [EvaluationRun.id == run_id] | ||
| if org_id is not None: |
There was a problem hiding this comment.
nitpick: Can this above statement be made more readable like
statement = select(EvaluationRun).where(EvaluationRun.id == run_id)
if org_id is not None:
statement = statement.where(EvaluationRun.organization_id == org_id)
if project_id is not None:
statement = statement.where(EvaluationRun.project_id == project_id)
run = session.exec(statement).one_or_none()
| @@ -0,0 +1,56 @@ | |||
| """Audio processing utilities for TTS evaluation.""" | |||
|
|
|||
There was a problem hiding this comment.
there are ditto utils with pydub are available at backend/app/core/audio_utils.py. we should reuse them
|
|
||
|
|
||
| def _samples_to_csv(samples: list[TTSSampleCreate]) -> bytes: | ||
| """Convert TTS samples to CSV format. |
There was a problem hiding this comment.
imo this utils might already be there or be made more generic to be usable as a common csv util
There was a problem hiding this comment.
this is different, it takes the text and then serialise it to CSV to be uploaded in S3 as copy. different than text or stt as well
Summary
Target issue is #535
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
New Features
Functional Requirements Testing