Support loading local model directories#137
Conversation
Found a significant int8 quality regression specific to
|
| Engine / model | n | RMS delta (int8 vs fp32) | Roughness delta | HF-spectral delta |
|---|---|---|---|---|
ONNX (main) nano |
640 | -6.2% (σ 5.6%) | +6.2% (σ 16.1%) | +22.0% (σ 33.0%) |
native (this PR) nano |
640 | -22.5% (σ 1.9%) | +132.6% (σ 21.7%) | +27.1% (σ 8.9%) |
native (this PR) micro |
640 | -0.3% (σ 0.7%) | -0.1% (σ 1.7%) | +0.4% (σ 3.8%) |
native (this PR) mini |
640 | -0.6% (σ 0.7%) | -0.7% (σ 2.0%) | +0.4% (σ 5.0%) |
Two things stand out:
microandminiint8 are statistically indistinguishable from fp32 under the native engine (deltas under 1%, tiny variance) — so the engine's int8 path is fine in general.nanoint8 specifically loses ~22% of signal energy and gains ~133% more sample-to-sample roughness, with very low variance (σ ~2%) across 640 samples — that's not noise, that's a systematic, reproducible effect isolated to one model size. And it's notably worse than the samenanoint8 weights running through the existing ONNX engine (which shows a much smaller, noisier ~6% effect).
Since micro/mini/nano int8 all go through the same engine code path and the same model_inference.InferenceModel, the fact that only nano regresses this hard points at something specific to kitten_int8_15m_arch.json (or the corresponding kitten_fp32_15m_arch.json vs int8 weight layout) rather than a general int8-handling bug in the engine.
This is very likely the root cause behind the README's existing "some users have reported issues with kitten-tts-nano-0.8-int8" note — and this PR makes it measurably worse for that model.
Happy to share the full 80-sentence corpus and raw per-sample JSON if useful for debugging the nano int8 arch/weights specifically.
Summary
Adds local model-directory loading for users who have already downloaded KittenTTS model assets.
Details
load_from_local(model_path, backend=None).KittenTTS("/path/to/model-dir")when the path exists locally.config.json, model file, and voices file before constructing the ONNX model.Addresses #132 and covers the local-cache/repeated-download pain behind #20.
Validation
python3 -m unittest -qkittentts,KittenTTS,load_from_local, andnormalize_text.kitten-tts-nano-0.8-int8snapshot directory.