Parallelize multi-chunk inference for ~3-4x speedup on multi-sentence text by namanomar · Pull Request #147 · KittenML/KittenTTS

namanomar · 2026-06-28T21:29:18Z

Summary

generate() previously processed each text chunk (from chunk_text()) strictly sequentially: phonemize, then run ONNX inference, one chunk at a time. For multi-sentence input (the common real-world case, not just the README's one-liner demo), this means N sequential session.run() calls even though they're independent.

This PR splits the two steps:

Phonemization stays sequential. eSpeak/phonemizer keeps shared internal state and is not safe to call concurrently — confirmed by reproducing a RuntimeError: number of lines in input and output must be equal when calling it from multiple threads.
ONNX inference now runs concurrently across chunks via a ThreadPoolExecutor, since ONNX Runtime sessions support concurrent run() calls on the same session, and profiling showed inference is >99.9% of per-chunk latency (phonemization + tokenization is sub-2ms).

Single-chunk text (the common short-sentence case) is unaffected — it skips the executor entirely and behaves exactly as before.

Benchmarks

On kitten-tts-mini-0.8, a 5-sentence paragraph, 12-core CPU:

Sequential (current behavior): ~37-38s
Parallel (this PR): ~8-9s (~4x speedup)

I also benchmarked manual SessionOptions tuning (explicit thread counts, ORT_ENABLE_EXTENDED, ORT_PARALLEL execution mode) as an alternative approach — every manual override I tried was slower than ORT's auto-tuned defaults (e.g. forcing all 12 threads was 2.5x slower than the default). So this PR doesn't touch SessionOptions at all; it only changes how chunks are dispatched.

Verification

Existing test suite passes (python -m unittest discover -s tests).
No NaN/Inf, no clipping, stable RMS across 5 repeated runs of the same multi-chunk input.
Verified across multiple voices (Bella, Hugo, Kiki, Jasper) and longer inputs (up to 15 chunks).
Note: the model itself has inherent stochastic sampling (two purely sequential calls on identical input already produce different output, e.g. max abs diff ~0.2-0.3) — this is pre-existing behavior on main, unrelated to this change, and confirmed by testing main directly before making any modifications.

Scope

Only kittentts/onnx_model.py is touched. generate_stream() is left untouched/sequential on purpose, since streaming cares about low time-to-first-chunk rather than total throughput, and parallelizing it would change that latency characteristic.

… text generate() previously ran each text chunk's phonemization and ONNX inference strictly sequentially. Phonemization (espeak) keeps shared internal state and isn't safe to call concurrently, but ONNX Runtime sessions support concurrent run() calls. Splitting these two steps lets inference across chunks run in a thread pool while keeping phonemization sequential, since it's the inference step that dominates latency (>99.9% of per-chunk time per profiling). Benchmarked on a 5-sentence paragraph (kitten-tts-mini-0.8, 12-core CPU): ~37s sequential -> ~8-9s parallel. Single-chunk text is unaffected (no executor overhead, same as before). Verified no NaN/Inf, no clipping, and stable output across repeated runs, multiple voices, and longer multi-chunk inputs (up to 15 chunks).

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

greptile-apps Bot reviewed Jun 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelize multi-chunk inference for ~3-4x speedup on multi-sentence text#147

Parallelize multi-chunk inference for ~3-4x speedup on multi-sentence text#147
namanomar wants to merge 1 commit into
KittenML:mainfrom
namanomar:perf/onnx-inference-speedup

namanomar commented Jun 28, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

namanomar commented Jun 28, 2026

Summary

Benchmarks

Verification

Scope

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant