feat: TTS humanization pipeline, rating system, and STT upgrade#414
feat: TTS humanization pipeline, rating system, and STT upgrade#414Reuz93 wants to merge 2 commits intojamiepine:mainfrom
Conversation
- Upgrade default Whisper model to large-v3-mlx (Apple Silicon) / large-v3 (PyTorch) for better Spanish transcription; fix missing preprocessor_config.json with fallback to turbo processor
- Add paralinguistic tags (sniff, shush, whimper, scream, whisper) to tag router PARA_TAGS set
- Add thumbs up/down rating system on history rows; rating + sampling params stored per generation; GET /profiles/{id}/suggested-params returns averaged best params after 3+ high-rated generations
- Show all 5 sampling params (temperature, top_k, top_p, repetition_penalty, speed) in history row badge popover with Reuse button that restores text + params to generation form
- Add Top-K, Top-P, Rep. Penalty sliders to FloatingGenerateBox Advanced popover so those fields are properly saved
- Add breath_injection, hybrid_generate, tag_router, text_preprocess utility modules for TTS humanization pipeline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughAdvanced generation controls, sampling and humanization options, breath/jitter audio shaping, history/profile-based parameter suggestions and reuse, 40s recording guide, new backend hybrid multi-engine flow, idle model unloading, new DB fields/endpoints for ratings and suggested params, and expanded effects/preset registry. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant UI as Generation UI
participant API as API Server
participant DB as Database
participant TTS as TTS Engine
User->>UI: Submit text + sampling/humanize params
UI->>API: POST /generate {text, engine, sampling_params, ...}
API->>DB: create_generation(record with params)
API->>API: build_sampling_params()
API->>TTS: generate(text, voice_prompt, sampling_params, jitter_ms)
TTS->>TTS: apply sampling overrides / generate audio
TTS-->>API: audio + metadata
API->>API: optionally inject_breaths(), trim_warmup(), apply_jitter()
API->>DB: update generation record with outputs
API-->>UI: return audio + generation id
User->>UI: Click rating button
UI->>API: PATCH /generations/{id}/rating {rating}
API->>DB: update rating
sequenceDiagram
actor User
participant History as History UI
participant Store as generationStore
participant UI as Generation UI
User->>History: Click "Reuse params" on a row
History->>Store: setReuseParams(ReuseParams)
Store-->>UI: reuseParams updated
UI->>UI: populate form fields (engine, language, sampling, effects=_reuse)
UI->>API: GET /profiles/{profileId}/suggested-params
API->>DB: query high-rated generations, compute decayed averages
API-->>UI: SuggestedParams
UI->>UI: show banner "Proven params" -> Apply -> update sliders
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Added humanize_text, humanize_intensity, jitter_ms fields to generation history (DB migration, models, API types, UI badge) - Rating system: weighted average with exponential decay, no minimum threshold, displays "Based on N ratings" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
large-v3-mlx(Apple Silicon) /large-v3(PyTorch) for better Spanish transcription. Fixed missingpreprocessor_config.jsonwith fallback to turbo processor.sniff,shush,whimper,scream,whisperto the tag routerPARA_TAGSset for richer expressiveness in TTS output.GET /profiles/{id}/suggested-paramsreturns averaged best params after 3+ high-rated generations.breath_injection,hybrid_generate,tag_router,text_preprocess— form the backbone of the humanization pipeline.Test plan
preprocessor_config.jsonis absent[sniff],[whimper],[scream],[whisper],[shush]) and confirm they route correctlyGET /profiles/{id}/suggested-paramsreturns averaged params after 3+ ratings🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Performance