Skip to content

Support loading local model directories#137

Draft
dewana-sl wants to merge 1 commit into
KittenML:mainfrom
dewana-sl:load-local-models
Draft

Support loading local model directories#137
dewana-sl wants to merge 1 commit into
KittenML:mainfrom
dewana-sl:load-local-models

Conversation

@dewana-sl

@dewana-sl dewana-sl commented May 21, 2026

Copy link
Copy Markdown

Summary

Adds local model-directory loading for users who have already downloaded KittenTTS model assets.

Details

  • Adds load_from_local(model_path, backend=None).
  • Allows KittenTTS("/path/to/model-dir") when the path exists locally.
  • Validates config.json, model file, and voices file before constructing the ONNX model.
  • Keeps Hugging Face imports lazy so local loading can be imported without the download client path being initialized.
  • Documents the local-directory usage in the README.

Addresses #132 and covers the local-cache/repeated-download pain behind #20.

Validation

  • python3 -m unittest -q
  • Editable install in a fresh virtualenv with declared package dependencies.
  • Import smoke for kittentts, KittenTTS, load_from_local, and normalize_text.
  • Real inference smoke from a cached local kitten-tts-nano-0.8-int8 snapshot directory.

@namanomar

Copy link
Copy Markdown

Found a significant int8 quality regression specific to nano on this branch

I tested this PR locally (Windows, cp314, kitten-inference==0.1.1 from PyPI) and the native engine itself works great — all 8 voices generate correctly, speed control/streaming/generate_to_file all pass, and CPU latency is genuinely ~3-4x faster than the current ONNX path on this machine. Nice work.

While digging into the README's existing note about kitten-tts-nano-0.8-int8 quality issues, I ran a controlled fp32-vs-int8 comparison across all 3 model sizes, all 8 voices, and 80 varied sentences (5,120 generations total) to quantify it. Metrics: RMS energy, sample-to-sample "roughness" (mean squared first difference, normalized by signal energy), and the fraction of spectral energy above 8kHz — all should stay roughly flat between fp32 and a correctly-quantized int8 model.

Engine / model n RMS delta (int8 vs fp32) Roughness delta HF-spectral delta
ONNX (main) nano 640 -6.2% (σ 5.6%) +6.2% (σ 16.1%) +22.0% (σ 33.0%)
native (this PR) nano 640 -22.5% (σ 1.9%) +132.6% (σ 21.7%) +27.1% (σ 8.9%)
native (this PR) micro 640 -0.3% (σ 0.7%) -0.1% (σ 1.7%) +0.4% (σ 3.8%)
native (this PR) mini 640 -0.6% (σ 0.7%) -0.7% (σ 2.0%) +0.4% (σ 5.0%)

Two things stand out:

  1. micro and mini int8 are statistically indistinguishable from fp32 under the native engine (deltas under 1%, tiny variance) — so the engine's int8 path is fine in general.
  2. nano int8 specifically loses ~22% of signal energy and gains ~133% more sample-to-sample roughness, with very low variance (σ ~2%) across 640 samples — that's not noise, that's a systematic, reproducible effect isolated to one model size. And it's notably worse than the same nano int8 weights running through the existing ONNX engine (which shows a much smaller, noisier ~6% effect).

Since micro/mini/nano int8 all go through the same engine code path and the same model_inference.InferenceModel, the fact that only nano regresses this hard points at something specific to kitten_int8_15m_arch.json (or the corresponding kitten_fp32_15m_arch.json vs int8 weight layout) rather than a general int8-handling bug in the engine.

This is very likely the root cause behind the README's existing "some users have reported issues with kitten-tts-nano-0.8-int8" note — and this PR makes it measurably worse for that model.

Happy to share the full 80-sentence corpus and raw per-sample JSON if useful for debugging the nano int8 arch/weights specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants