Skip to content

Use punctuation attachment metadata in batch transcripts#113

Open
sudojatin wants to merge 1 commit into
speechmatics:mainfrom
sudojatin:fix/batch-punctuation-attachments
Open

Use punctuation attachment metadata in batch transcripts#113
sudojatin wants to merge 1 commit into
speechmatics:mainfrom
sudojatin:fix/batch-punctuation-attachments

Conversation

@sudojatin

@sudojatin sudojatin commented Jun 12, 2026

Copy link
Copy Markdown

Issue

Speechmatics Batch JSON includes punctuation as separate results[] items. These punctuation items can say how they should attach to nearby words using attaches_to (previous, next, both, or none) and can also include is_eos.

The gap in the SDK was in Transcript.transcript_text: it did not use attaches_to when building the plain text transcript. Instead, it looked at the punctuation character and guessed where spaces should go. That guess was wrong for some punctuation, such as Spanish opening punctuation (¿Hola?) and punctuation that should attach to both sides (and/or).

Summary

  • Use full RecognitionResult objects when building Transcript.transcript_text, so punctuation spacing follows result.type and attaches_to metadata.
  • Add regression coverage for punctuation metadata preservation through from_dict/asdict.
  • Cover previous, next, none, and both punctuation attachment rendering while keeping word-only transcripts unchanged.

Test Plan

  • PYTHONPATH=sdk/batch uv run --no-project --with pytest --with pytest-asyncio --with aiofiles --with aiohttp --with typing-extensions pytest tests/batch
  • uv run --no-project --with black black --check sdk/batch/speechmatics/batch/_models.py tests/batch/test_models.py
  • uv run --no-project --with ruff ruff check sdk/batch/speechmatics/batch/_models.py tests/batch/test_models.py
  • PYTHONPATH=sdk/batch uv run --no-project --with mypy==1.17.1 --with aiofiles --with aiohttp --with typing-extensions --with types-aiofiles mypy sdk/batch/speechmatics

@sudojatin

Copy link
Copy Markdown
Author

@dumitrugutu can you review this PR. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant