-
Notifications
You must be signed in to change notification settings - Fork 10
Evaluation: TTS #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Evaluation: TTS #619
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
f0ae443
first stab at TTS evaluation
AkhileshNegi 1c3f326
Merge branch 'main' into feature/tts-evaluation
AkhileshNegi 52abe01
Merge branch 'main' into feature/tts-evaluation
AkhileshNegi e63e8bd
updated migratino
AkhileshNegi 348e81b
update to custom id
AkhileshNegi f9cfdb6
fix bug for parsing audio
AkhileshNegi 8b37b8b
updats
AkhileshNegi 603a6ee
first stab at moving to celery
AkhileshNegi 3b6c2fe
cleanups and refactoring
AkhileshNegi 99610bb
cleanups and refactoring
AkhileshNegi bba3d03
cleanups
AkhileshNegi 0e11857
typo cleanups
AkhileshNegi bcd9659
doc updates
AkhileshNegi b56b20d
minor cleanups and refactoring crons
AkhileshNegi 613dc3a
cleanups
AkhileshNegi a0f83b3
Merge branch 'main' into feature/tts-evaluation
AkhileshNegi 69593b5
minor cleanups
AkhileshNegi fc64736
Merge branch 'feature/tts-evaluation' of github.com:ProjectTech4DevAI…
AkhileshNegi cb56f76
added testcases and cover few more edgecases
AkhileshNegi 8cafe1a
testcases pass
AkhileshNegi a755392
cleanup
AkhileshNegi 56056b0
refactoring
AkhileshNegi 2c174d0
cleanups
AkhileshNegi 3ca63c2
refactor gemini client
AkhileshNegi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
157 changes: 157 additions & 0 deletions
157
backend/app/alembic/versions/049_add_tts_evaluation_tables.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| """add tts evaluation tables | ||
|
|
||
| Revision ID: 049 | ||
| Revises: 048 | ||
| Create Date: 2026-02-14 12:00:00.000000 | ||
|
|
||
| """ | ||
|
|
||
| import sqlalchemy as sa | ||
| from alembic import op | ||
| from sqlalchemy.dialects import postgresql | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision = "049" | ||
| down_revision = "048" | ||
| branch_labels = None | ||
| depends_on = None | ||
|
|
||
|
|
||
| def upgrade(): | ||
| # Create tts_result table | ||
| op.create_table( | ||
| "tts_result", | ||
| sa.Column( | ||
| "id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Unique identifier for the TTS result", | ||
| ), | ||
| sa.Column( | ||
| "sample_text", | ||
| sa.Text(), | ||
| nullable=False, | ||
| comment="Input text that will be synthesized to speech", | ||
| ), | ||
| sa.Column( | ||
| "object_store_url", | ||
| sa.String(), | ||
| nullable=True, | ||
| comment="S3 URL of the generated WAV audio file", | ||
| ), | ||
| sa.Column( | ||
| "metadata", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| nullable=True, | ||
| comment="Audio metadata: {duration_seconds, size_bytes}", | ||
| ), | ||
| sa.Column( | ||
| "provider", | ||
| sa.String(length=100), | ||
| nullable=False, | ||
| comment="TTS provider used (e.g., gemini-2.5-pro-preview-tts)", | ||
| ), | ||
| sa.Column( | ||
| "status", | ||
| sa.String(length=20), | ||
| nullable=False, | ||
| server_default="PENDING", | ||
| comment="Result status: PENDING, SUCCESS, FAILED", | ||
| ), | ||
| sa.Column( | ||
| "score", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| nullable=True, | ||
| comment="Extensible evaluation metrics", | ||
| ), | ||
| sa.Column( | ||
| "is_correct", | ||
| sa.Boolean(), | ||
| nullable=True, | ||
| comment="Human feedback flag on audio quality correctness", | ||
| ), | ||
| sa.Column( | ||
| "comment", | ||
| sa.Text(), | ||
| nullable=True, | ||
| comment="Human feedback comment on audio quality", | ||
| ), | ||
| sa.Column( | ||
| "error_message", | ||
| sa.Text(), | ||
| nullable=True, | ||
| comment="Error message if synthesis failed", | ||
| ), | ||
| sa.Column( | ||
| "evaluation_run_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the evaluation run", | ||
| ), | ||
| sa.Column( | ||
| "organization_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the organization", | ||
| ), | ||
| sa.Column( | ||
| "project_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the project", | ||
| ), | ||
| sa.Column( | ||
| "inserted_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the result was created", | ||
| ), | ||
| sa.Column( | ||
| "updated_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the result was last updated", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["evaluation_run_id"], | ||
| ["evaluation_run.id"], | ||
| name="fk_tts_result_run_id", | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["organization_id"], | ||
| ["organization.id"], | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["project_id"], | ||
| ["project.id"], | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| ) | ||
| op.create_index( | ||
| "ix_tts_result_run_id", | ||
| "tts_result", | ||
| ["evaluation_run_id"], | ||
| unique=False, | ||
| ) | ||
| op.create_index( | ||
| "idx_tts_result_feedback", | ||
| "tts_result", | ||
| ["evaluation_run_id", "is_correct"], | ||
| unique=False, | ||
| ) | ||
| op.create_index( | ||
| "idx_tts_result_status", | ||
| "tts_result", | ||
| ["evaluation_run_id", "status"], | ||
| unique=False, | ||
| ) | ||
|
|
||
|
|
||
| def downgrade(): | ||
| op.drop_index("idx_tts_result_status", table_name="tts_result") | ||
| op.drop_index("idx_tts_result_feedback", table_name="tts_result") | ||
| op.drop_index("ix_tts_result_run_id", table_name="tts_result") | ||
| op.drop_table("tts_result") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| Create a new TTS evaluation dataset with text samples. | ||
|
|
||
| Required fields: | ||
| - **name**: Dataset name | ||
| - **samples**: List of text samples, each with a **text** field | ||
|
|
||
| Optional fields: | ||
| - **description**: Dataset description | ||
| - **language_id**: ID of a language from the global languages table |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Get a TTS evaluation dataset by ID. | ||
|
|
||
| Returns dataset including sample count. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Get a single TTS synthesis result by ID. | ||
|
|
||
| Returns the result including audio URL, metadata, and human feedback status. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Get a TTS evaluation run by ID with optional results. | ||
|
|
||
| Query parameters: | ||
| - `include_results`: Include synthesis results (default: true) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| List all TTS evaluation datasets for the current project. | ||
|
|
||
| Supports pagination with `limit` and `offset` parameters. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| List TTS evaluation runs for the current project. | ||
|
|
||
| Supports filtering by `dataset_id` and `status`, with pagination via `limit` and `offset`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| Start a TTS evaluation run on a dataset. | ||
|
|
||
| Required fields: | ||
| - **run_name**: Name for this evaluation run | ||
| - **dataset_id**: ID of the TTS dataset to evaluate | ||
|
|
||
| Optional fields: | ||
| - **models**: List of TTS models to use (default: `["gemini-2.5-pro-preview-tts"]`) | ||
|
|
||
| The evaluation will: | ||
| 1. Process each text sample through the specified TTS models | ||
| 2. Generate speech audio using Gemini Batch API | ||
| 3. Store WAV audio files in S3 for human review | ||
|
|
||
| **Supported models:** `gemini-2.5-pro-preview-tts` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| Update human feedback on a TTS synthesis result. | ||
|
|
||
| Fields: | ||
| - **is_correct**: Whether the synthesized audio quality is acceptable (null to clear) | ||
| - **comment**: Optional feedback comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 9016
🏁 Script executed:
Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 5826
🏁 Script executed:
Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 140
🏁 Script executed:
cat -n backend/app/alembic/versions/048_add_tts_evaluation_tables.py | head -120Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 4189
inserted_atandupdated_atare non-nullable with noserver_default— inserts without explicit values will fail.Both columns are
nullable=Falsebut lack aserver_default. Any INSERT that doesn't supply these explicitly (raw SQL, test fixtures, bulk operations) will hit a NOT NULL constraint violation. The established pattern across migrations (005, 032, 040) consistently usesserver_default=sa.text("now()")for timestamp columns.🐛 Proposed fix
sa.Column( "inserted_at", sa.DateTime(), nullable=False, + server_default=sa.text("now()"), comment="Timestamp when the result was created", ), sa.Column( "updated_at", sa.DateTime(), nullable=False, + server_default=sa.text("now()"), comment="Timestamp when the result was last updated", ),📝 Committable suggestion
🤖 Prompt for AI Agents