Skip to content

feat: intent-driven dual-model architecture (text2music + lego)#886

Merged
ChuxiJ merged 22 commits intomainfrom
feat/issue-884
Mar 26, 2026
Merged

feat: intent-driven dual-model architecture (text2music + lego)#886
ChuxiJ merged 22 commits intomainfrom
feat/issue-884

Conversation

@ChuxiJ
Copy link

@ChuxiJ ChuxiJ commented Mar 25, 2026

Summary

  • Add Text2MusicTaskParams, GenerationIntent, ModelCategory types and 'mix' track type
  • Extend modelStore with ensureModelForIntent() — auto-selects the right model family (text2music vs lego) and ensures LM is loaded when needed
  • Add generateText2Music() pipeline: creates mix track/clip → submits text2music task → polls → downloads audio → optionally auto-splits into stems
  • Add IntentSelector component: Full Song / Single Track toggle with model status indicator
  • Add FullSongForm: dedicated form for text2music (prompt, lyrics, BPM/key/duration, split-to-stems, advanced params)

Backend requirements

For end-to-end operation, the backend needs:

  1. category field on ModelEntry in /v1/model_inventory ('text2music' or 'lego')
  2. text2music task_type accepted by /release_task

Test plan

  • 36 new unit tests pass (inferModelCategory, intentToCategory, intentNeedsLm, modelStore category getters, ensureModelForIntent)
  • All 2679 existing tests pass (0 regressions)
  • npx tsc --noEmit — 0 type errors
  • npm run build — succeeds
  • Visual verification: IntentSelector toggles correctly, FullSongForm renders all fields, no console errors

Closes #884

🤖 Generated with Claude Code

ChuxiJ and others added 3 commits March 25, 2026 22:43
Add Text2MusicTaskParams, GenerationIntent, ModelCategory types.
Add 'mix' track type for full-song mixed audio output.
Extend modelStore with category-aware getters and ensureModelForIntent()
that auto-selects the right model family based on user intent.

Closes #884 (partial)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New pipeline function for full-song generation via text2music model:
- Creates mix track + clip, submits text2music task
- Polls for completion, downloads audio, stores in IndexedDB
- Optionally auto-splits into stem tracks via stem separation
- Auto-ensures correct model (text2music + LM) before generation

Closes #884 (partial)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add IntentSelector component that switches between Full Song (text2music)
and Single Track (lego) modes within the generation panel. Shows model
status indicator (ready/switching). Add FullSongForm for text2music
generation with prompt, lyrics, BPM/key/duration, split-to-stems toggle,
and advanced parameters.

Closes #884 (partial)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 15:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an intent-driven generation flow that distinguishes between text2music (full-song) and lego (single-track) model families, adds a full-song text2music generation pipeline, and updates the generation UI to let users choose intent while the app auto-selects/initializes the required model(s).

Changes:

  • Added new API/types for intent-driven generation (GenerationIntent, ModelCategory, Text2MusicTaskParams) and introduced the 'mix' track type.
  • Extended modelStore with category-aware helpers and ensureModelForIntent() to auto-select the correct model family and (attempt to) ensure LM readiness.
  • Added generateText2Music() pipeline and new UI components (IntentSelector, FullSongForm) integrated into GenerationSidePanel.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/unit/generationSidePanel.test.tsx Updates assertions to match new generation panel tab markup and default intent behavior.
src/types/project.ts Adds 'mix' to TrackType for full-song output track typing.
src/types/api.ts Introduces Text2MusicTaskParams, GenerationIntent, and ModelCategory plus ModelEntry.category.
src/store/modelStore.ts Adds intent/category model selection (ensureModelForIntent, category getters, overrides).
src/store/tests/modelStoreIntents.test.ts Adds unit tests for intent mapping and category-aware model resolution behavior.
src/services/generationPipeline.ts Adds generateText2Music() full-song pipeline including optional stem splitting.
src/services/aceStepApi.ts Adds inferModelCategory() and expands task submission types to include text2music.
src/services/tests/inferModelCategory.test.ts Adds tests for category inference behavior and precedence rules.
src/constants/tracks.ts Adds catalog entry for new 'mix' track type.
src/constants/trackHeight.ts Adds default lane height for 'mix' in auto preset.
src/components/generation/IntentSelector.tsx New intent toggle UI with model status indicator.
src/components/generation/GenerationSidePanel.tsx Integrates intent selection and routes to FullSongForm for full-song flow.
src/components/generation/FullSongForm.tsx Adds a dedicated full-song (text2music) form and trigger to run the new pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +198 to +207
<label className="block text-[10px] text-zinc-500">Inference Steps</label>
<input
type="number"
value={inferenceSteps}
onChange={(e) => setInferenceSteps(Number(e.target.value))}
min={10}
max={200}
className="mt-0.5 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-xs focus:border-indigo-500 focus:outline-none"
disabled={isDisabled}
/>
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advanced numeric inputs (inferenceSteps/guidanceScale/shift) use Number(e.target.value) and can also become NaN if cleared. Consider guarding against NaN and ensuring only finite values are sent (or restoring the previous valid value on blur).

Copilot uses AI. Check for mistakes.

// If no specific category required (cover/repaint), just check LM
if (requiredCategory === null) {
// Cover/repaint work with either model — no switching needed
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says cover/repaint should "just check LM", but this branch returns immediately and does not perform any LM-related checks. Either update the comment to match the behavior, or implement the intended LM check here so future intents don’t silently skip required initialization.

Suggested change
// Cover/repaint work with either model — no switching needed
// Cover/repaint work with either model — no switching needed
// But we still need to ensure LM is initialized if required.
if (needsLm) {
const refreshedState = get();
const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);
if (!lmLoaded && refreshedState.availableLmModels.length > 0) {
const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;
await refreshedState.switchLmModel(defaultLm);
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +2306 to +2317
// Submit — text2music doesn't need source audio, send silence as placeholder
const jobStartedAt = Date.now();
genStore.updateJob(jobId, {
status: 'generating',
startedAt: jobStartedAt,
progress: 'Submitting...',
stage: 'Submitting request',
});
store.updateClipStatus(clipId, 'generating');

const silenceBlob = generateSilenceWav(request.durationSeconds);
const releaseResp = await api.releaseLegoTask(silenceBlob, params);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This submits a full-length silence WAV even though the comment notes text2music doesn't need source audio. For longer durations this can add significant upload/downsample time and risk request-size limits. If the backend supports it, consider skipping the src_audio upload entirely for task_type='text2music' (e.g., adjust releaseTask/releaseLegoTask to omit the blob for text2music).

Copilot uses AI. Check for mistakes.
Comment on lines 272 to 277
export async function releaseLegoTask(
srcAudioBlob: Blob,
params: LegoTaskParams | CoverTaskParams | RepaintTaskParams,
params: LegoTaskParams | Text2MusicTaskParams | CoverTaskParams | RepaintTaskParams,
): Promise<ReleaseTaskResponse> {
return releaseTask(srcAudioBlob, params);
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapper now accepts Text2MusicTaskParams in addition to lego/cover/repaint, so the name releaseLegoTask is no longer accurate and can be confusing for future callers. Consider renaming to something task-agnostic (or introducing a new wrapper) to better reflect its responsibilities.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +115
<input
type="number"
value={bpm}
onChange={(e) => setBpm(Number(e.target.value))}
min={MIN_BPM}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Number(e.target.value) on a will produce NaN when the user clears the field, and that NaN will be sent to generateText2Music() (and ultimately the backend). Consider treating "" as null/undefined and validating/clamping before enabling submission.

Copilot uses AI. Check for mistakes.
Comment on lines +187 to +195
const currentCategory = state.getActiveModelCategory();
if (currentCategory !== requiredCategory) {
// Need to switch model
const targetModel = state.getDefaultModelForCategory(requiredCategory);
if (!targetModel) {
throw new Error(`No ${requiredCategory} model available. Check backend model inventory.`);
}
await state.switchModel(targetModel);
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureModelForIntent() awaits switchModel(), but switchModel() swallows init errors (it only sets modelLoadingState='error'). That means this function can continue and run generation with the wrong model loaded. Consider making switchModel/switchLmModel rethrow, or have ensureModelForIntent verify after refresh that an expected model is loaded and throw if not.

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +203
const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);
if (!lmLoaded && refreshedState.availableLmModels.length > 0) {
const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;
await refreshedState.switchLmModel(defaultLm);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When needsLm is true (full-song), this silently does nothing if availableLmModels is empty, which will likely lead to a backend failure later. It also relies on switchLmModel(), which currently swallows init errors. Consider throwing a clear error when no LM models are available, and/or validating that an LM is actually loaded after attempting init.

Suggested change
const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);
if (!lmLoaded && refreshedState.availableLmModels.length > 0) {
const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;
await refreshedState.switchLmModel(defaultLm);
if (refreshedState.availableLmModels.length === 0) {
throw new Error('No LM models available. Check backend LM model inventory.');
}
const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);
if (!lmLoaded) {
const defaultLm =
refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;
await refreshedState.switchLmModel(defaultLm);
// Re-validate that an LM is actually loaded after attempting init
const postSwitchState = get();
const lmLoadedAfterSwitch = postSwitchState.availableLmModels.some((m) => m.is_loaded);
if (!lmLoadedAfterSwitch) {
throw new Error(
`Failed to initialize LM model "${defaultLm}". Please verify LM backend configuration.`,
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +2316 to +2318
const silenceBlob = generateSilenceWav(request.durationSeconds);
const releaseResp = await api.releaseLegoTask(silenceBlob, params);
const taskId = releaseResp.task_id;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateText2Music() calls api.releaseLegoTask() with Text2MusicTaskParams. Since this API wrapper is now used for multiple task types, the name (and related log/error strings inside aceStepApi) is misleading. Consider adding a task-agnostic wrapper (e.g. releaseTask/releaseGenerationTask) or renaming releaseLegoTask to reflect broader usage.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +146
value={durationSeconds}
onChange={(e) => setDurationSeconds(Number(e.target.value))}
min={MIN_DURATION}
max={MAX_DURATION}
step={1}
className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none"
disabled={isDisabled}
/>
<span className="mt-0.5 block text-[10px] text-zinc-600">{durationSeconds}s</span>
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same NaN risk applies to durationSeconds: clearing the input yields NaN, but the UI still renders ${durationSeconds}s and the request will submit NaN as audio_duration. Consider parsing safely ("" → default/null) and preventing submission unless duration is a finite number within [MIN_DURATION, MAX_DURATION].

Suggested change
value={durationSeconds}
onChange={(e) => setDurationSeconds(Number(e.target.value))}
min={MIN_DURATION}
max={MAX_DURATION}
step={1}
className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none"
disabled={isDisabled}
/>
<span className="mt-0.5 block text-[10px] text-zinc-600">{durationSeconds}s</span>
value={Number.isFinite(durationSeconds) ? durationSeconds : ''}
onChange={(e) => {
const rawValue = e.target.value;
if (rawValue === '') {
// Treat empty input as reset to a safe default duration
setDurationSeconds(MIN_DURATION);
return;
}
const parsed = Number(rawValue);
if (Number.isNaN(parsed)) {
// Do not update state with an invalid number
return;
}
const clamped = Math.min(Math.max(parsed, MIN_DURATION), MAX_DURATION);
setDurationSeconds(clamped);
}}
min={MIN_DURATION}
max={MAX_DURATION}
step={1}
className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none"
disabled={isDisabled}
/>
<span className="mt-0.5 block text-[10px] text-zinc-600">
{Number.isFinite(durationSeconds) ? `${durationSeconds}s` : ''}
</span>

Copilot uses AI. Check for mistakes.
ChuxiJ and others added 19 commits March 25, 2026 23:47
1. Fix generation side panel being clipped by the bottom dock bar by
   increasing the panel's bottom offset to clear the dock height.

2. Add modelDefaults.ts with per-variant (turbo/base/sft) defaults
   derived from ACE-Step-1.5 backend:
   - turbo: 8 steps, no CFG, shift=3
   - base: 32 steps, CFG=7, shift=3
   - sft: 50 steps, CFG=7, shift=3

3. Wire model defaults into FullSongForm:
   - Shows active model variant badge (Turbo/Base/SFT)
   - Hides CFG control for turbo models
   - Uses correct step range per variant
   - "Reset to model defaults" button in advanced params

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the right-side sidebar with a centered floating dialog matching
the EnhancePanel / ACE Studio "Inspire Me" style:

- Centered with fixed left-1/2 -translate-x-1/2 positioning
- Two-column layout: left form + right history sidebar
- Same visual language as EnhancePanel (bg-[#1e1e22], rounded-xl, shadow-2xl)
- Dynamic bottom positioning using getBottomPanelHeight() to avoid dock overlap
- Header includes tab switcher (Generate / Multi-Track / History) inline
- max-h-[60vh] with scrollable content areas
- Scale transition on open/close instead of slide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Generate dialog is now exclusively for text2music full-song generation.
Single-track (lego) generation stays in the existing per-track clip workflow.

- Remove IntentSelector, old single-track form, and all related state/hooks
- Significantly reduce GenerationSidePanel from ~1100 to ~500 lines
- Clean up unused imports (PromptAutocompleteTextarea, KEY_SCALES, etc.)
- Skip legacy tests that tested removed form elements (marked with TODO)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename tabs: "Generate|Multi-Track|History" → "Mix|Stems" (remove History)
- Remove right-side History sidebar (no longer needed)
- Add Simple/Custom sub-mode switcher inside Mix tab
- Simple mode: short description + vocal language + instrumental → Create Sample
  calls backend LM to infer caption/lyrics/BPM/key/duration
- On Create Sample success, auto-switches to Custom with pre-filled fields
- Custom mode: existing FullSongForm with initialData prop support
- Add createSample API endpoint and CreateSampleRequest/Response types
- Shrink dialog width from 640px to 560px (no sidebar needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase bottom offset so the Generate dialog reliably floats above
the dock bar with ~18px gap. Use dynamic maxHeight based on viewport
and dock position to prevent overflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use fixed height (capped at 580px) instead of content-driven sizing
so the dialog doesn't jump when switching between Mix (less content)
and Stems (more content). Content scrolls within the fixed frame.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move action button out of individual forms into a fixed dialog footer
- Each form reports its button state (label/disabled/action) via
  onFooterChange callback, dialog renders a single Button at the bottom
- Footer stays in exact same position when switching Mix/Stems tabs
  and Simple/Custom sub-modes — no more jumping
- Rename "Describe your track" → "Song Description" in Simple mode
- Remove generate button from MultiTrackGenerateSection (now in footer)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compress each track row from multi-line (select + textarea + lyrics)
to a compact single line: checkbox + track select + inline description
input + × remove button. Vocals tracks show an additional lyrics input.

Saves ~60% vertical space, all 4 default tracks + Generate button
visible without scrolling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Track name select: fixed w-[90px] (was flex-1, took too much space)
- Description input: flex-1 min-w-0 (takes all remaining space)
- "+ Add Track" → compact "+" square button with title tooltip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace large Multi-Track description card with inline mode toggle
- Song description → single-line input (was 2-row textarea)
- × remove button → − button matching + style (square, bordered)
- All 4 tracks + seed + Generate button visible without scrolling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…align

- Song Description textarea: 4 rows → 3 rows
- Remove BPM/Key from Custom form (inherited from project), keep Duration only
- Split into stems: default unchecked (was checked)
- +/− buttons: align with pr-0.5 on header, compact "4/4" count

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d/thinking inline

Custom form:
- "Song Description" → "Music Caption"
- Lyrics always visible with Instrumental toggle (was collapsible)
- Duration as dropdown with Auto option (removed BPM/Key — inherited from project)
- Seed with 🎲 random button + Rand checkbox inline
- Thinking checkbox moved from Advanced to inline row
- Vocal Language selector inline next to Thinking

Stems form:
- Song description → 3 rows (was 1)
- Shuffle → 🎲 emoji button
- Added Duration dropdown next to Seed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix dice button: was disabled when Rand checked, now always clickable
  and auto-unchecks Rand on click
- Duration: replace dropdown with number input + Auto checkbox (more
  flexible, matches Gradio's approach)
- Studied Gradio UI layout patterns for reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace Stems duration dropdown with number input + Auto checkbox
  (matches Custom form pattern)
- Widen seed input from 80/90px to 110px to show full 10-digit seeds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace messy flex-wrap with a structured 4-column grid in a card:
- Duration (number + Auto checkbox)
- Language (dropdown)
- Seed (number + 🎲)
- Toggles (Random seed, Thinking)

All labels top-aligned, consistent input heights, contained in a
subtle border card for visual grouping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move Thinking checkbox from parameters card to dialog footer
  (next to Generate button) — more prominent, less clutter
- Parameters now clean 3-column grid: Duration | Language | Seed
- Seed has inline 🎲 + R(andom) checkbox
- Duration has inline A(uto) checkbox
- Thinking only shows in footer for Custom mode (via thinkingState)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s Random seed

Custom form:
- Language selector moved to Lyrics header row (next to Instrumental)
- Parameter section simplified to inline: Duration + Auto | Seed + 🎲 + Random
- Removed card wrapper — cleaner flat layout

Stems form:
- Added missing Random seed checkbox next to 🎲 button
- 🎲 click now also unchecks Random (same behavior as Custom)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lyrics textarea

- Remove Advanced Parameters section from Custom form (duplicates
  Settings > Generation Defaults)
- Lyrics textarea: 3 rows → 5 rows (uses freed space)
- Stems vocals: single-line input → 3-row textarea for lyrics
- Use project.generationDefaults for inference params instead of
  local model-variant state
- Remove model variant indicator (was showing Turbo/Base/SFT badge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	src/components/generation/GenerationSidePanel.tsx
@ChuxiJ ChuxiJ merged commit f305849 into main Mar 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: intent-driven dual-model architecture (text2music + lego)

2 participants