Skip to content

Evaluation: Automated metrics TTS/STT evals #555

@kartpop

Description

@kartpop

Is your feature request related to a problem?

Once STT evaluation results are populated, there are no automated quality metrics. Reviewers must manually compare transcriptions against ground truth, with no quantitative measure of transcription accuracy.

Describe the solution you'd like

Compute automated metrics for each stt_result by comparing its transcription column against the stt_sample.ground_truth (linked via stt_sample_id foreign key):

  • WER (Word Error Rate) — word-level accuracy
  • CER (Character Error Rate) — character-level accuracy
  • Lenient WER — WER after script-aware normalization (handles Unicode inconsistencies in Indic scripts)
  • WIP (Word Information Preserved) — proportion of correctly recognized words

Metadata

Metadata

Assignees

Labels

sub-parentChild of parent label for roadmap view

Type

No type

Projects

Status

Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions