feat: intent-driven dual-model architecture (text2music + lego)#886
feat: intent-driven dual-model architecture (text2music + lego)#886
Conversation
Add Text2MusicTaskParams, GenerationIntent, ModelCategory types. Add 'mix' track type for full-song mixed audio output. Extend modelStore with category-aware getters and ensureModelForIntent() that auto-selects the right model family based on user intent. Closes #884 (partial) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New pipeline function for full-song generation via text2music model: - Creates mix track + clip, submits text2music task - Polls for completion, downloads audio, stores in IndexedDB - Optionally auto-splits into stem tracks via stem separation - Auto-ensures correct model (text2music + LM) before generation Closes #884 (partial) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add IntentSelector component that switches between Full Song (text2music) and Single Track (lego) modes within the generation panel. Shows model status indicator (ready/switching). Add FullSongForm for text2music generation with prompt, lyrics, BPM/key/duration, split-to-stems toggle, and advanced parameters. Closes #884 (partial) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an intent-driven generation flow that distinguishes between text2music (full-song) and lego (single-track) model families, adds a full-song text2music generation pipeline, and updates the generation UI to let users choose intent while the app auto-selects/initializes the required model(s).
Changes:
- Added new API/types for intent-driven generation (
GenerationIntent,ModelCategory,Text2MusicTaskParams) and introduced the'mix'track type. - Extended
modelStorewith category-aware helpers andensureModelForIntent()to auto-select the correct model family and (attempt to) ensure LM readiness. - Added
generateText2Music()pipeline and new UI components (IntentSelector,FullSongForm) integrated intoGenerationSidePanel.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/generationSidePanel.test.tsx | Updates assertions to match new generation panel tab markup and default intent behavior. |
| src/types/project.ts | Adds 'mix' to TrackType for full-song output track typing. |
| src/types/api.ts | Introduces Text2MusicTaskParams, GenerationIntent, and ModelCategory plus ModelEntry.category. |
| src/store/modelStore.ts | Adds intent/category model selection (ensureModelForIntent, category getters, overrides). |
| src/store/tests/modelStoreIntents.test.ts | Adds unit tests for intent mapping and category-aware model resolution behavior. |
| src/services/generationPipeline.ts | Adds generateText2Music() full-song pipeline including optional stem splitting. |
| src/services/aceStepApi.ts | Adds inferModelCategory() and expands task submission types to include text2music. |
| src/services/tests/inferModelCategory.test.ts | Adds tests for category inference behavior and precedence rules. |
| src/constants/tracks.ts | Adds catalog entry for new 'mix' track type. |
| src/constants/trackHeight.ts | Adds default lane height for 'mix' in auto preset. |
| src/components/generation/IntentSelector.tsx | New intent toggle UI with model status indicator. |
| src/components/generation/GenerationSidePanel.tsx | Integrates intent selection and routes to FullSongForm for full-song flow. |
| src/components/generation/FullSongForm.tsx | Adds a dedicated full-song (text2music) form and trigger to run the new pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <label className="block text-[10px] text-zinc-500">Inference Steps</label> | ||
| <input | ||
| type="number" | ||
| value={inferenceSteps} | ||
| onChange={(e) => setInferenceSteps(Number(e.target.value))} | ||
| min={10} | ||
| max={200} | ||
| className="mt-0.5 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-xs focus:border-indigo-500 focus:outline-none" | ||
| disabled={isDisabled} | ||
| /> |
There was a problem hiding this comment.
Advanced numeric inputs (inferenceSteps/guidanceScale/shift) use Number(e.target.value) and can also become NaN if cleared. Consider guarding against NaN and ensuring only finite values are sent (or restoring the previous valid value on blur).
|
|
||
| // If no specific category required (cover/repaint), just check LM | ||
| if (requiredCategory === null) { | ||
| // Cover/repaint work with either model — no switching needed |
There was a problem hiding this comment.
The comment says cover/repaint should "just check LM", but this branch returns immediately and does not perform any LM-related checks. Either update the comment to match the behavior, or implement the intended LM check here so future intents don’t silently skip required initialization.
| // Cover/repaint work with either model — no switching needed | |
| // Cover/repaint work with either model — no switching needed | |
| // But we still need to ensure LM is initialized if required. | |
| if (needsLm) { | |
| const refreshedState = get(); | |
| const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded); | |
| if (!lmLoaded && refreshedState.availableLmModels.length > 0) { | |
| const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name; | |
| await refreshedState.switchLmModel(defaultLm); | |
| } | |
| } |
| // Submit — text2music doesn't need source audio, send silence as placeholder | ||
| const jobStartedAt = Date.now(); | ||
| genStore.updateJob(jobId, { | ||
| status: 'generating', | ||
| startedAt: jobStartedAt, | ||
| progress: 'Submitting...', | ||
| stage: 'Submitting request', | ||
| }); | ||
| store.updateClipStatus(clipId, 'generating'); | ||
|
|
||
| const silenceBlob = generateSilenceWav(request.durationSeconds); | ||
| const releaseResp = await api.releaseLegoTask(silenceBlob, params); |
There was a problem hiding this comment.
This submits a full-length silence WAV even though the comment notes text2music doesn't need source audio. For longer durations this can add significant upload/downsample time and risk request-size limits. If the backend supports it, consider skipping the src_audio upload entirely for task_type='text2music' (e.g., adjust releaseTask/releaseLegoTask to omit the blob for text2music).
| export async function releaseLegoTask( | ||
| srcAudioBlob: Blob, | ||
| params: LegoTaskParams | CoverTaskParams | RepaintTaskParams, | ||
| params: LegoTaskParams | Text2MusicTaskParams | CoverTaskParams | RepaintTaskParams, | ||
| ): Promise<ReleaseTaskResponse> { | ||
| return releaseTask(srcAudioBlob, params); | ||
| } |
There was a problem hiding this comment.
This wrapper now accepts Text2MusicTaskParams in addition to lego/cover/repaint, so the name releaseLegoTask is no longer accurate and can be confusing for future callers. Consider renaming to something task-agnostic (or introducing a new wrapper) to better reflect its responsibilities.
| <input | ||
| type="number" | ||
| value={bpm} | ||
| onChange={(e) => setBpm(Number(e.target.value))} | ||
| min={MIN_BPM} |
There was a problem hiding this comment.
Using Number(e.target.value) on a will produce NaN when the user clears the field, and that NaN will be sent to generateText2Music() (and ultimately the backend). Consider treating "" as null/undefined and validating/clamping before enabling submission.
| const currentCategory = state.getActiveModelCategory(); | ||
| if (currentCategory !== requiredCategory) { | ||
| // Need to switch model | ||
| const targetModel = state.getDefaultModelForCategory(requiredCategory); | ||
| if (!targetModel) { | ||
| throw new Error(`No ${requiredCategory} model available. Check backend model inventory.`); | ||
| } | ||
| await state.switchModel(targetModel); | ||
| } |
There was a problem hiding this comment.
ensureModelForIntent() awaits switchModel(), but switchModel() swallows init errors (it only sets modelLoadingState='error'). That means this function can continue and run generation with the wrong model loaded. Consider making switchModel/switchLmModel rethrow, or have ensureModelForIntent verify after refresh that an expected model is loaded and throw if not.
| const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded); | ||
| if (!lmLoaded && refreshedState.availableLmModels.length > 0) { | ||
| const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name; | ||
| await refreshedState.switchLmModel(defaultLm); |
There was a problem hiding this comment.
When needsLm is true (full-song), this silently does nothing if availableLmModels is empty, which will likely lead to a backend failure later. It also relies on switchLmModel(), which currently swallows init errors. Consider throwing a clear error when no LM models are available, and/or validating that an LM is actually loaded after attempting init.
| const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded); | |
| if (!lmLoaded && refreshedState.availableLmModels.length > 0) { | |
| const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name; | |
| await refreshedState.switchLmModel(defaultLm); | |
| if (refreshedState.availableLmModels.length === 0) { | |
| throw new Error('No LM models available. Check backend LM model inventory.'); | |
| } | |
| const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded); | |
| if (!lmLoaded) { | |
| const defaultLm = | |
| refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name; | |
| await refreshedState.switchLmModel(defaultLm); | |
| // Re-validate that an LM is actually loaded after attempting init | |
| const postSwitchState = get(); | |
| const lmLoadedAfterSwitch = postSwitchState.availableLmModels.some((m) => m.is_loaded); | |
| if (!lmLoadedAfterSwitch) { | |
| throw new Error( | |
| `Failed to initialize LM model "${defaultLm}". Please verify LM backend configuration.`, | |
| ); | |
| } |
| const silenceBlob = generateSilenceWav(request.durationSeconds); | ||
| const releaseResp = await api.releaseLegoTask(silenceBlob, params); | ||
| const taskId = releaseResp.task_id; |
There was a problem hiding this comment.
generateText2Music() calls api.releaseLegoTask() with Text2MusicTaskParams. Since this API wrapper is now used for multiple task types, the name (and related log/error strings inside aceStepApi) is misleading. Consider adding a task-agnostic wrapper (e.g. releaseTask/releaseGenerationTask) or renaming releaseLegoTask to reflect broader usage.
| value={durationSeconds} | ||
| onChange={(e) => setDurationSeconds(Number(e.target.value))} | ||
| min={MIN_DURATION} | ||
| max={MAX_DURATION} | ||
| step={1} | ||
| className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none" | ||
| disabled={isDisabled} | ||
| /> | ||
| <span className="mt-0.5 block text-[10px] text-zinc-600">{durationSeconds}s</span> |
There was a problem hiding this comment.
Same NaN risk applies to durationSeconds: clearing the input yields NaN, but the UI still renders ${durationSeconds}s and the request will submit NaN as audio_duration. Consider parsing safely ("" → default/null) and preventing submission unless duration is a finite number within [MIN_DURATION, MAX_DURATION].
| value={durationSeconds} | |
| onChange={(e) => setDurationSeconds(Number(e.target.value))} | |
| min={MIN_DURATION} | |
| max={MAX_DURATION} | |
| step={1} | |
| className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none" | |
| disabled={isDisabled} | |
| /> | |
| <span className="mt-0.5 block text-[10px] text-zinc-600">{durationSeconds}s</span> | |
| value={Number.isFinite(durationSeconds) ? durationSeconds : ''} | |
| onChange={(e) => { | |
| const rawValue = e.target.value; | |
| if (rawValue === '') { | |
| // Treat empty input as reset to a safe default duration | |
| setDurationSeconds(MIN_DURATION); | |
| return; | |
| } | |
| const parsed = Number(rawValue); | |
| if (Number.isNaN(parsed)) { | |
| // Do not update state with an invalid number | |
| return; | |
| } | |
| const clamped = Math.min(Math.max(parsed, MIN_DURATION), MAX_DURATION); | |
| setDurationSeconds(clamped); | |
| }} | |
| min={MIN_DURATION} | |
| max={MAX_DURATION} | |
| step={1} | |
| className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none" | |
| disabled={isDisabled} | |
| /> | |
| <span className="mt-0.5 block text-[10px] text-zinc-600"> | |
| {Number.isFinite(durationSeconds) ? `${durationSeconds}s` : ''} | |
| </span> |
1. Fix generation side panel being clipped by the bottom dock bar by increasing the panel's bottom offset to clear the dock height. 2. Add modelDefaults.ts with per-variant (turbo/base/sft) defaults derived from ACE-Step-1.5 backend: - turbo: 8 steps, no CFG, shift=3 - base: 32 steps, CFG=7, shift=3 - sft: 50 steps, CFG=7, shift=3 3. Wire model defaults into FullSongForm: - Shows active model variant badge (Turbo/Base/SFT) - Hides CFG control for turbo models - Uses correct step range per variant - "Reset to model defaults" button in advanced params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the right-side sidebar with a centered floating dialog matching the EnhancePanel / ACE Studio "Inspire Me" style: - Centered with fixed left-1/2 -translate-x-1/2 positioning - Two-column layout: left form + right history sidebar - Same visual language as EnhancePanel (bg-[#1e1e22], rounded-xl, shadow-2xl) - Dynamic bottom positioning using getBottomPanelHeight() to avoid dock overlap - Header includes tab switcher (Generate / Multi-Track / History) inline - max-h-[60vh] with scrollable content areas - Scale transition on open/close instead of slide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Generate dialog is now exclusively for text2music full-song generation. Single-track (lego) generation stays in the existing per-track clip workflow. - Remove IntentSelector, old single-track form, and all related state/hooks - Significantly reduce GenerationSidePanel from ~1100 to ~500 lines - Clean up unused imports (PromptAutocompleteTextarea, KEY_SCALES, etc.) - Skip legacy tests that tested removed form elements (marked with TODO) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename tabs: "Generate|Multi-Track|History" → "Mix|Stems" (remove History) - Remove right-side History sidebar (no longer needed) - Add Simple/Custom sub-mode switcher inside Mix tab - Simple mode: short description + vocal language + instrumental → Create Sample calls backend LM to infer caption/lyrics/BPM/key/duration - On Create Sample success, auto-switches to Custom with pre-filled fields - Custom mode: existing FullSongForm with initialData prop support - Add createSample API endpoint and CreateSampleRequest/Response types - Shrink dialog width from 640px to 560px (no sidebar needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase bottom offset so the Generate dialog reliably floats above the dock bar with ~18px gap. Use dynamic maxHeight based on viewport and dock position to prevent overflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use fixed height (capped at 580px) instead of content-driven sizing so the dialog doesn't jump when switching between Mix (less content) and Stems (more content). Content scrolls within the fixed frame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move action button out of individual forms into a fixed dialog footer - Each form reports its button state (label/disabled/action) via onFooterChange callback, dialog renders a single Button at the bottom - Footer stays in exact same position when switching Mix/Stems tabs and Simple/Custom sub-modes — no more jumping - Rename "Describe your track" → "Song Description" in Simple mode - Remove generate button from MultiTrackGenerateSection (now in footer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compress each track row from multi-line (select + textarea + lyrics) to a compact single line: checkbox + track select + inline description input + × remove button. Vocals tracks show an additional lyrics input. Saves ~60% vertical space, all 4 default tracks + Generate button visible without scrolling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Track name select: fixed w-[90px] (was flex-1, took too much space) - Description input: flex-1 min-w-0 (takes all remaining space) - "+ Add Track" → compact "+" square button with title tooltip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace large Multi-Track description card with inline mode toggle - Song description → single-line input (was 2-row textarea) - × remove button → − button matching + style (square, bordered) - All 4 tracks + seed + Generate button visible without scrolling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…align - Song Description textarea: 4 rows → 3 rows - Remove BPM/Key from Custom form (inherited from project), keep Duration only - Split into stems: default unchecked (was checked) - +/− buttons: align with pr-0.5 on header, compact "4/4" count Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d/thinking inline Custom form: - "Song Description" → "Music Caption" - Lyrics always visible with Instrumental toggle (was collapsible) - Duration as dropdown with Auto option (removed BPM/Key — inherited from project) - Seed with 🎲 random button + Rand checkbox inline - Thinking checkbox moved from Advanced to inline row - Vocal Language selector inline next to Thinking Stems form: - Song description → 3 rows (was 1) - Shuffle → 🎲 emoji button - Added Duration dropdown next to Seed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix dice button: was disabled when Rand checked, now always clickable and auto-unchecks Rand on click - Duration: replace dropdown with number input + Auto checkbox (more flexible, matches Gradio's approach) - Studied Gradio UI layout patterns for reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace Stems duration dropdown with number input + Auto checkbox (matches Custom form pattern) - Widen seed input from 80/90px to 110px to show full 10-digit seeds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace messy flex-wrap with a structured 4-column grid in a card: - Duration (number + Auto checkbox) - Language (dropdown) - Seed (number + 🎲) - Toggles (Random seed, Thinking) All labels top-aligned, consistent input heights, contained in a subtle border card for visual grouping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move Thinking checkbox from parameters card to dialog footer (next to Generate button) — more prominent, less clutter - Parameters now clean 3-column grid: Duration | Language | Seed - Seed has inline 🎲 + R(andom) checkbox - Duration has inline A(uto) checkbox - Thinking only shows in footer for Custom mode (via thinkingState) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s Random seed Custom form: - Language selector moved to Lyrics header row (next to Instrumental) - Parameter section simplified to inline: Duration + Auto | Seed + 🎲 + Random - Removed card wrapper — cleaner flat layout Stems form: - Added missing Random seed checkbox next to 🎲 button - 🎲 click now also unchecks Random (same behavior as Custom) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lyrics textarea - Remove Advanced Parameters section from Custom form (duplicates Settings > Generation Defaults) - Lyrics textarea: 3 rows → 5 rows (uses freed space) - Stems vocals: single-line input → 3-row textarea for lyrics - Use project.generationDefaults for inference params instead of local model-variant state - Remove model variant indicator (was showing Turbo/Base/SFT badge) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # src/components/generation/GenerationSidePanel.tsx
Summary
Text2MusicTaskParams,GenerationIntent,ModelCategorytypes and'mix'track typeensureModelForIntent()— auto-selects the right model family (text2music vs lego) and ensures LM is loaded when neededgenerateText2Music()pipeline: creates mix track/clip → submits text2music task → polls → downloads audio → optionally auto-splits into stemsBackend requirements
For end-to-end operation, the backend needs:
categoryfield onModelEntryin/v1/model_inventory('text2music'or'lego')text2musictask_type accepted by/release_taskTest plan
npx tsc --noEmit— 0 type errorsnpm run build— succeedsCloses #884
🤖 Generated with Claude Code