feat: TTS humanization pipeline, rating system, and STT upgrade by Reuz93 · Pull Request #414 · jamiepine/voicebox

Reuz93 · 2026-04-15T08:30:24Z

Summary

STT upgrade: Default Whisper model upgraded to large-v3-mlx (Apple Silicon) / large-v3 (PyTorch) for better Spanish transcription. Fixed missing preprocessor_config.json with fallback to turbo processor.
Paralinguistic tags: Added sniff, shush, whimper, scream, whisper to the tag router PARA_TAGS set for richer expressiveness in TTS output.
Generation rating system: Thumbs up/down on history rows. Rating + sampling params stored per generation. GET /profiles/{id}/suggested-params returns averaged best params after 3+ high-rated generations.
History params visibility: All 5 sampling params (temperature, top_k, top_p, repetition_penalty, speed) shown in badge popover per history row. "Reuse" button applies text + params back to the generation form.
Advanced panel: Added Top-K, Top-P, Rep. Penalty sliders to FloatingGenerateBox Advanced popover (was missing, causing those fields to never be saved).
TTS humanization utilities: New modules — breath_injection, hybrid_generate, tag_router, text_preprocess — form the backbone of the humanization pipeline.

Test plan

Record or upload a voice sample and confirm STT transcribes Spanish speech correctly using large-v3-mlx on Apple Silicon
Verify fallback to turbo processor when preprocessor_config.json is absent
Generate TTS with text containing paralinguistic tags ([sniff], [whimper], [scream], [whisper], [shush]) and confirm they route correctly
Rate several generations thumbs up; confirm GET /profiles/{id}/suggested-params returns averaged params after 3+ ratings
Open a history row popover and verify all 5 sampling params display correctly
Click "Reuse" on a history row and confirm text + params populate the generation form
Open the Advanced panel in FloatingGenerateBox and confirm Top-K, Top-P, and Rep. Penalty sliders are present and their values are saved on generation
Run a generation end-to-end and confirm no regressions in audio output quality

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Advanced generation panel with sampling controls (temperature, top_k/top_p, repetition_penalty, speed, jitter, humanize options) and “Proven params” suggestions per voice
- Rate generations (thumbs up/down) and reuse parameters from history
- Guided 40s voice recording UI with auto-filled script and improved upload flow
- Breath injection and micro-timing (jitter) for more natural audio
- New audio effects and presets; expanded speech-recognition models
Performance
- Automatic idle-model unloading to reduce memory usage

- Upgrade default Whisper model to large-v3-mlx (Apple Silicon) / large-v3 (PyTorch) for better Spanish transcription; fix missing preprocessor_config.json with fallback to turbo processor - Add paralinguistic tags (sniff, shush, whimper, scream, whisper) to tag router PARA_TAGS set - Add thumbs up/down rating system on history rows; rating + sampling params stored per generation; GET /profiles/{id}/suggested-params returns averaged best params after 3+ high-rated generations - Show all 5 sampling params (temperature, top_k, top_p, repetition_penalty, speed) in history row badge popover with Reuse button that restores text + params to generation form - Add Top-K, Top-P, Rep. Penalty sliders to FloatingGenerateBox Advanced popover so those fields are properly saved - Add breath_injection, hybrid_generate, tag_router, text_preprocess utility modules for TTS humanization pipeline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-15T08:30:44Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5872451-045d-46ba-94b8-639f0af933df

📥 Commits

Reviewing files that changed from the base of the PR and between f728150 and 5c25994.

📒 Files selected for processing (9)

app/src/components/Generation/FloatingGenerateBox.tsx
app/src/components/History/HistoryTable.tsx
app/src/lib/api/types.ts
backend/database/migrations.py
backend/database/models.py
backend/models.py
backend/routes/generations.py
backend/routes/profiles.py
backend/services/history.py

📝 Walkthrough

Walkthrough

Advanced generation controls, sampling and humanization options, breath/jitter audio shaping, history/profile-based parameter suggestions and reuse, 40s recording guide, new backend hybrid multi-engine flow, idle model unloading, new DB fields/endpoints for ratings and suggested params, and expanded effects/preset registry.

Changes

Cohort / File(s)	Summary
Frontend Generation UI `app/src/components/Generation/FloatingGenerateBox.tsx`, `app/src/components/Generation/GenerationForm.tsx`	Added advanced settings panel (temperature, top_k, top_p, repetition_penalty, speed, inject_breaths, jitter_ms, humanize_text/intensity), suggested params fetch/apply, _reuse preset handling, and engine-specific input adjustments.
Frontend History & Store `app/src/components/History/HistoryTable.tsx`, `app/src/stores/generationStore.ts`	Added ParamsBadge popover, rating actions, "reuse params" action that populates `generationStore.reuseParams`, and new `ReuseParams` store typing.
Frontend Voice Recording / Profiles `app/src/components/VoiceProfiles/AudioSampleRecording.tsx`, `app/src/components/VoiceProfiles/ProfileForm.tsx`, `app/src/components/VoiceProfiles/SampleUpload.tsx`	Recording guide UI, auto-advance/scroll lines, increased recording guidance to 40s, auto-fill referenceText from `SCRIPT_LINES`, and removal of explicit transcription button/props in some flows.
Frontend Hooks & API types `app/src/lib/hooks/useGenerationForm.ts`, `app/src/lib/hooks/useAudioRecording.ts`, `app/src/lib/api/client.ts`, `app/src/lib/api/types.ts`	Extended form schema and hook to accept sampling/humanization fields; increased recording maxDuration default; added `rateGeneration` and `getSuggestedParams` client methods; expanded request/response types and `WhisperModelSize`.
Backend Routes & Services `backend/routes/generations.py`, `backend/routes/profiles.py`, `backend/services/generation.py`, `backend/services/history.py`	Persist and propagate sampling/jitter/humanize/inject_breaths fields through generation pipeline; added PATCH `/generations/{id}/rating`; added GET `/profiles/{id}/suggested-params` computing decayed averages; `run_generation` updated to accept new params and route hybrid generation.
Backend Models & DB `backend/models.py`, `backend/database/models.py`, `backend/database/migrations.py`	Added persisted sampling and humanization fields plus `rating` and `jitter_ms` to DB/model layers; migrations to add new columns.
Backend Backends & Lifecycle `backend/backends/__init__.py`, `backend/backends/chatterbox_turbo_backend.py`, `backend/backends/mlx_backend.py`, `backend/backends/pytorch_backend.py`, `backend/app.py`	TTS backend APIs now accept `sampling_params`; MLX/backends improved unload/cache clearing and processor fallback; default Whisper sizes updated; added idle model tracking and startup background unload loop.
Backend Audio/Effects Utilities `backend/utils/audio.py`, `backend/utils/chunked_tts.py`, `backend/utils/effects.py`	Added warm-up trimming, increased reference audio max to 40s, `jitter_ms` support in concatenation, and new effects (distortion, clipping, noise_gate, limiter) plus new presets.
Backend Text & Audio Helpers `backend/utils/text_preprocess.py`, `backend/utils/breath_injection.py`, `backend/utils/tag_router.py`, `backend/utils/hybrid_generate.py`	Added Ollama-based disfluency injector, breath-injection synthesis, paralinguistic tag parser, and hybrid multi-engine generation routing.
Backend Transcription `backend/routes/transcription.py`	Improved temp-file extension derivation from MIME type with fallback and refined error messages.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant UI as Generation UI
    participant API as API Server
    participant DB as Database
    participant TTS as TTS Engine

    User->>UI: Submit text + sampling/humanize params
    UI->>API: POST /generate {text, engine, sampling_params, ...}
    API->>DB: create_generation(record with params)
    API->>API: build_sampling_params()
    API->>TTS: generate(text, voice_prompt, sampling_params, jitter_ms)
    TTS->>TTS: apply sampling overrides / generate audio
    TTS-->>API: audio + metadata
    API->>API: optionally inject_breaths(), trim_warmup(), apply_jitter()
    API->>DB: update generation record with outputs
    API-->>UI: return audio + generation id
    User->>UI: Click rating button
    UI->>API: PATCH /generations/{id}/rating {rating}
    API->>DB: update rating

sequenceDiagram
    actor User
    participant History as History UI
    participant Store as generationStore
    participant UI as Generation UI

    User->>History: Click "Reuse params" on a row
    History->>Store: setReuseParams(ReuseParams)
    Store-->>UI: reuseParams updated
    UI->>UI: populate form fields (engine, language, sampling, effects=_reuse)
    UI->>API: GET /profiles/{profileId}/suggested-params
    API->>DB: query high-rated generations, compute decayed averages
    API-->>UI: SuggestedParams
    UI->>UI: show banner "Proven params" -> Apply -> update sliders

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: chunked TTS generation for long text (engine-agnostic) #266 — Related changes to chunked TTS utilities (jitter_ms, sampling_params) that overlap backend generation flow.
feat: Kokoro 82M TTS engine + voice profile type system #325 — Overlaps frontend generation UI and effects/preset handling including preset sourcing and reuse logic.
fix: GUI startup with external server + data refresh on server switch #319 — Related changes to FloatingGenerateBox effects-preset handling and reuse sentinel behavior.

Suggested reviewers

rhmod09-dev

"🐰 I fiddled with sliders, nudged breaths in the dark,
Reused an old sample, sparked a fresh lark.
Forty seconds of cadence, presets that play nice—
I hopped through the pipeline and sprinkled some spice. 🥕🎶"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main change categories: TTS humanization pipeline, rating system, and STT upgrade, which align with the substantial changes across the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Added humanize_text, humanize_intensity, jitter_ms fields to generation history (DB migration, models, API types, UI badge) - Rating system: weighted average with exponential decay, no minimum threshold, displays "Based on N ratings" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reuz93 closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: TTS humanization pipeline, rating system, and STT upgrade#414

feat: TTS humanization pipeline, rating system, and STT upgrade#414
Reuz93 wants to merge 2 commits intojamiepine:mainfrom
Reuz93:main

Reuz93 commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Reuz93 commented Apr 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Reuz93 commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 15, 2026 •

edited

Loading