Evaluation: Automated metrics TTS/STT evals

**Is your feature request related to a problem?**

Once STT evaluation results are populated, there are no automated quality metrics. Reviewers must manually compare transcriptions against ground truth, with no quantitative measure of transcription accuracy.

**Describe the solution you'd like**

Compute automated metrics for each `stt_result` by comparing its `transcription` column against the `stt_sample.ground_truth` (linked via `stt_sample_id` foreign key):

- **WER** (Word Error Rate) — word-level accuracy
- **CER** (Character Error Rate) — character-level accuracy
- **Lenient WER** — WER after script-aware normalization (handles Unicode inconsistencies in Indic scripts)
- **WIP** (Word Information Preserved) — proportion of correctly recognized words


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Automated metrics TTS/STT evals #555

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation: Automated metrics TTS/STT evals #555

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions