feat: intent-driven dual-model architecture (text2music + lego) by ChuxiJ · Pull Request #886 · ace-step/ACE-Step-DAW

ChuxiJ · 2026-03-25T15:36:09Z

Summary

Add Text2MusicTaskParams, GenerationIntent, ModelCategory types and 'mix' track type
Extend modelStore with ensureModelForIntent() — auto-selects the right model family (text2music vs lego) and ensures LM is loaded when needed
Add generateText2Music() pipeline: creates mix track/clip → submits text2music task → polls → downloads audio → optionally auto-splits into stems
Add IntentSelector component: Full Song / Single Track toggle with model status indicator
Add FullSongForm: dedicated form for text2music (prompt, lyrics, BPM/key/duration, split-to-stems, advanced params)

Backend requirements

For end-to-end operation, the backend needs:

category field on ModelEntry in /v1/model_inventory ('text2music' or 'lego')
text2music task_type accepted by /release_task

Test plan

36 new unit tests pass (inferModelCategory, intentToCategory, intentNeedsLm, modelStore category getters, ensureModelForIntent)
All 2679 existing tests pass (0 regressions)
npx tsc --noEmit — 0 type errors
npm run build — succeeds
Visual verification: IntentSelector toggles correctly, FullSongForm renders all fields, no console errors

Closes #884

🤖 Generated with Claude Code

Add Text2MusicTaskParams, GenerationIntent, ModelCategory types. Add 'mix' track type for full-song mixed audio output. Extend modelStore with category-aware getters and ensureModelForIntent() that auto-selects the right model family based on user intent. Closes #884 (partial) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New pipeline function for full-song generation via text2music model: - Creates mix track + clip, submits text2music task - Polls for completion, downloads audio, stores in IndexedDB - Optionally auto-splits into stem tracks via stem separation - Auto-ensures correct model (text2music + LM) before generation Closes #884 (partial) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add IntentSelector component that switches between Full Song (text2music) and Single Track (lego) modes within the generation panel. Shows model status indicator (ready/switching). Add FullSongForm for text2music generation with prompt, lyrics, BPM/key/duration, split-to-stems toggle, and advanced parameters. Closes #884 (partial) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces an intent-driven generation flow that distinguishes between text2music (full-song) and lego (single-track) model families, adds a full-song text2music generation pipeline, and updates the generation UI to let users choose intent while the app auto-selects/initializes the required model(s).

Changes:

Added new API/types for intent-driven generation (GenerationIntent, ModelCategory, Text2MusicTaskParams) and introduced the 'mix' track type.
Extended modelStore with category-aware helpers and ensureModelForIntent() to auto-select the correct model family and (attempt to) ensure LM readiness.
Added generateText2Music() pipeline and new UI components (IntentSelector, FullSongForm) integrated into GenerationSidePanel.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/unit/generationSidePanel.test.tsx	Updates assertions to match new generation panel tab markup and default intent behavior.
src/types/project.ts	Adds `'mix'` to `TrackType` for full-song output track typing.
src/types/api.ts	Introduces `Text2MusicTaskParams`, `GenerationIntent`, and `ModelCategory` plus `ModelEntry.category`.
src/store/modelStore.ts	Adds intent/category model selection (`ensureModelForIntent`, category getters, overrides).
src/store/tests/modelStoreIntents.test.ts	Adds unit tests for intent mapping and category-aware model resolution behavior.
src/services/generationPipeline.ts	Adds `generateText2Music()` full-song pipeline including optional stem splitting.
src/services/aceStepApi.ts	Adds `inferModelCategory()` and expands task submission types to include text2music.
src/services/tests/inferModelCategory.test.ts	Adds tests for category inference behavior and precedence rules.
src/constants/tracks.ts	Adds catalog entry for new `'mix'` track type.
src/constants/trackHeight.ts	Adds default lane height for `'mix'` in auto preset.
src/components/generation/IntentSelector.tsx	New intent toggle UI with model status indicator.
src/components/generation/GenerationSidePanel.tsx	Integrates intent selection and routes to `FullSongForm` for full-song flow.
src/components/generation/FullSongForm.tsx	Adds a dedicated full-song (text2music) form and trigger to run the new pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T15:41:39Z

src/components/generation/FullSongForm.tsx

+              <label className="block text-[10px] text-zinc-500">Inference Steps</label>
+              <input
+                type="number"
+                value={inferenceSteps}
+                onChange={(e) => setInferenceSteps(Number(e.target.value))}
+                min={10}
+                max={200}
+                className="mt-0.5 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-xs focus:border-indigo-500 focus:outline-none"
+                disabled={isDisabled}
+              />


Advanced numeric inputs (inferenceSteps/guidanceScale/shift) use Number(e.target.value) and can also become NaN if cleared. Consider guarding against NaN and ensuring only finite values are sent (or restoring the previous valid value on blur).

Copilot · 2026-03-25T15:41:40Z

src/store/modelStore.ts

+
+        // If no specific category required (cover/repaint), just check LM
+        if (requiredCategory === null) {
+          // Cover/repaint work with either model — no switching needed


The comment says cover/repaint should "just check LM", but this branch returns immediately and does not perform any LM-related checks. Either update the comment to match the behavior, or implement the intended LM check here so future intents don’t silently skip required initialization.

Suggested change

// Cover/repaint work with either model — no switching needed

// Cover/repaint work with either model — no switching needed

// But we still need to ensure LM is initialized if required.

if (needsLm) {

const refreshedState = get();

const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);

if (!lmLoaded && refreshedState.availableLmModels.length > 0) {

const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;

await refreshedState.switchLmModel(defaultLm);

}

}

Copilot · 2026-03-25T15:41:40Z

src/services/generationPipeline.ts

+    // Submit — text2music doesn't need source audio, send silence as placeholder
+    const jobStartedAt = Date.now();
+    genStore.updateJob(jobId, {
+      status: 'generating',
+      startedAt: jobStartedAt,
+      progress: 'Submitting...',
+      stage: 'Submitting request',
+    });
+    store.updateClipStatus(clipId, 'generating');
+
+    const silenceBlob = generateSilenceWav(request.durationSeconds);
+    const releaseResp = await api.releaseLegoTask(silenceBlob, params);


This submits a full-length silence WAV even though the comment notes text2music doesn't need source audio. For longer durations this can add significant upload/downsample time and risk request-size limits. If the backend supports it, consider skipping the src_audio upload entirely for task_type='text2music' (e.g., adjust releaseTask/releaseLegoTask to omit the blob for text2music).

Copilot · 2026-03-25T15:41:40Z

src/services/aceStepApi.ts

 export async function releaseLegoTask(
  srcAudioBlob: Blob,
-  params: LegoTaskParams | CoverTaskParams | RepaintTaskParams,
+  params: LegoTaskParams | Text2MusicTaskParams | CoverTaskParams | RepaintTaskParams,
 ): Promise<ReleaseTaskResponse> {
  return releaseTask(srcAudioBlob, params);
 }


This wrapper now accepts Text2MusicTaskParams in addition to lego/cover/repaint, so the name releaseLegoTask is no longer accurate and can be confusing for future callers. Consider renaming to something task-agnostic (or introducing a new wrapper) to better reflect its responsibilities.

Copilot · 2026-03-25T15:41:41Z

src/components/generation/FullSongForm.tsx

+          <input
+            type="number"
+            value={bpm}
+            onChange={(e) => setBpm(Number(e.target.value))}
+            min={MIN_BPM}


Using Number(e.target.value) on a will produce NaN when the user clears the field, and that NaN will be sent to generateText2Music() (and ultimately the backend). Consider treating "" as null/undefined and validating/clamping before enabling submission.

Copilot · 2026-03-25T15:41:41Z

src/store/modelStore.ts

+        const currentCategory = state.getActiveModelCategory();
+        if (currentCategory !== requiredCategory) {
+          // Need to switch model
+          const targetModel = state.getDefaultModelForCategory(requiredCategory);
+          if (!targetModel) {
+            throw new Error(`No ${requiredCategory} model available. Check backend model inventory.`);
+          }
+          await state.switchModel(targetModel);
+        }


ensureModelForIntent() awaits switchModel(), but switchModel() swallows init errors (it only sets modelLoadingState='error'). That means this function can continue and run generation with the wrong model loaded. Consider making switchModel/switchLmModel rethrow, or have ensureModelForIntent verify after refresh that an expected model is loaded and throw if not.

Copilot · 2026-03-25T15:41:41Z

src/store/modelStore.ts

+          const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);
+          if (!lmLoaded && refreshedState.availableLmModels.length > 0) {
+            const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;
+            await refreshedState.switchLmModel(defaultLm);


When needsLm is true (full-song), this silently does nothing if availableLmModels is empty, which will likely lead to a backend failure later. It also relies on switchLmModel(), which currently swallows init errors. Consider throwing a clear error when no LM models are available, and/or validating that an LM is actually loaded after attempting init.

Suggested change

const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);

if (!lmLoaded && refreshedState.availableLmModels.length > 0) {

const defaultLm = refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;

await refreshedState.switchLmModel(defaultLm);

if (refreshedState.availableLmModels.length === 0) {

throw new Error('No LM models available. Check backend LM model inventory.');

}

const lmLoaded = refreshedState.availableLmModels.some((m) => m.is_loaded);

if (!lmLoaded) {

const defaultLm =

refreshedState.activeLmModelId ?? refreshedState.availableLmModels[0].name;

await refreshedState.switchLmModel(defaultLm);

// Re-validate that an LM is actually loaded after attempting init

const postSwitchState = get();

const lmLoadedAfterSwitch = postSwitchState.availableLmModels.some((m) => m.is_loaded);

if (!lmLoadedAfterSwitch) {

throw new Error(

`Failed to initialize LM model "${defaultLm}". Please verify LM backend configuration.`,

);

}

Copilot · 2026-03-25T15:41:41Z

src/services/generationPipeline.ts

+    const silenceBlob = generateSilenceWav(request.durationSeconds);
+    const releaseResp = await api.releaseLegoTask(silenceBlob, params);
+    const taskId = releaseResp.task_id;


generateText2Music() calls api.releaseLegoTask() with Text2MusicTaskParams. Since this API wrapper is now used for multiple task types, the name (and related log/error strings inside aceStepApi) is misleading. Consider adding a task-agnostic wrapper (e.g. releaseTask/releaseGenerationTask) or renaming releaseLegoTask to reflect broader usage.

Copilot · 2026-03-25T15:41:42Z

src/components/generation/FullSongForm.tsx

+            value={durationSeconds}
+            onChange={(e) => setDurationSeconds(Number(e.target.value))}
+            min={MIN_DURATION}
+            max={MAX_DURATION}
+            step={1}
+            className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none"
+            disabled={isDisabled}
+          />
+          <span className="mt-0.5 block text-[10px] text-zinc-600">{durationSeconds}s</span>


Same NaN risk applies to durationSeconds: clearing the input yields NaN, but the UI still renders ${durationSeconds}s and the request will submit NaN as audio_duration. Consider parsing safely ("" → default/null) and preventing submission unless duration is a finite number within [MIN_DURATION, MAX_DURATION].

Suggested change

value={durationSeconds}

onChange={(e) => setDurationSeconds(Number(e.target.value))}

min={MIN_DURATION}

max={MAX_DURATION}

step={1}

className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none"

disabled={isDisabled}

/>

<span className="mt-0.5 block text-[10px] text-zinc-600">{durationSeconds}s</span>

value={Number.isFinite(durationSeconds) ? durationSeconds : ''}

onChange={(e) => {

const rawValue = e.target.value;

if (rawValue === '') {

// Treat empty input as reset to a safe default duration

setDurationSeconds(MIN_DURATION);

return;

}

const parsed = Number(rawValue);

if (Number.isNaN(parsed)) {

// Do not update state with an invalid number

return;

}

const clamped = Math.min(Math.max(parsed, MIN_DURATION), MAX_DURATION);

setDurationSeconds(clamped);

}}

min={MIN_DURATION}

max={MAX_DURATION}

step={1}

className="mt-1 w-full rounded border border-[#444] bg-[#2a2a2a] px-2 py-1 text-sm focus:border-indigo-500 focus:outline-none"

disabled={isDisabled}

/>

<span className="mt-0.5 block text-[10px] text-zinc-600">

{Number.isFinite(durationSeconds) ? `${durationSeconds}s` : ''}

</span>

1. Fix generation side panel being clipped by the bottom dock bar by increasing the panel's bottom offset to clear the dock height. 2. Add modelDefaults.ts with per-variant (turbo/base/sft) defaults derived from ACE-Step-1.5 backend: - turbo: 8 steps, no CFG, shift=3 - base: 32 steps, CFG=7, shift=3 - sft: 50 steps, CFG=7, shift=3 3. Wire model defaults into FullSongForm: - Shows active model variant badge (Turbo/Base/SFT) - Hides CFG control for turbo models - Uses correct step range per variant - "Reset to model defaults" button in advanced params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the right-side sidebar with a centered floating dialog matching the EnhancePanel / ACE Studio "Inspire Me" style: - Centered with fixed left-1/2 -translate-x-1/2 positioning - Two-column layout: left form + right history sidebar - Same visual language as EnhancePanel (bg-[#1e1e22], rounded-xl, shadow-2xl) - Dynamic bottom positioning using getBottomPanelHeight() to avoid dock overlap - Header includes tab switcher (Generate / Multi-Track / History) inline - max-h-[60vh] with scrollable content areas - Scale transition on open/close instead of slide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Generate dialog is now exclusively for text2music full-song generation. Single-track (lego) generation stays in the existing per-track clip workflow. - Remove IntentSelector, old single-track form, and all related state/hooks - Significantly reduce GenerationSidePanel from ~1100 to ~500 lines - Clean up unused imports (PromptAutocompleteTextarea, KEY_SCALES, etc.) - Skip legacy tests that tested removed form elements (marked with TODO) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rename tabs: "Generate|Multi-Track|History" → "Mix|Stems" (remove History) - Remove right-side History sidebar (no longer needed) - Add Simple/Custom sub-mode switcher inside Mix tab - Simple mode: short description + vocal language + instrumental → Create Sample calls backend LM to infer caption/lyrics/BPM/key/duration - On Create Sample success, auto-switches to Custom with pre-filled fields - Custom mode: existing FullSongForm with initialData prop support - Add createSample API endpoint and CreateSampleRequest/Response types - Shrink dialog width from 640px to 560px (no sidebar needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Increase bottom offset so the Generate dialog reliably floats above the dock bar with ~18px gap. Use dynamic maxHeight based on viewport and dock position to prevent overflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use fixed height (capped at 580px) instead of content-driven sizing so the dialog doesn't jump when switching between Mix (less content) and Stems (more content). Content scrolls within the fixed frame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move action button out of individual forms into a fixed dialog footer - Each form reports its button state (label/disabled/action) via onFooterChange callback, dialog renders a single Button at the bottom - Footer stays in exact same position when switching Mix/Stems tabs and Simple/Custom sub-modes — no more jumping - Rename "Describe your track" → "Song Description" in Simple mode - Remove generate button from MultiTrackGenerateSection (now in footer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Compress each track row from multi-line (select + textarea + lyrics) to a compact single line: checkbox + track select + inline description input + × remove button. Vocals tracks show an additional lyrics input. Saves ~60% vertical space, all 4 default tracks + Generate button visible without scrolling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Track name select: fixed w-[90px] (was flex-1, took too much space) - Description input: flex-1 min-w-0 (takes all remaining space) - "+ Add Track" → compact "+" square button with title tooltip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace large Multi-Track description card with inline mode toggle - Song description → single-line input (was 2-row textarea) - × remove button → − button matching + style (square, bordered) - All 4 tracks + seed + Generate button visible without scrolling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…align - Song Description textarea: 4 rows → 3 rows - Remove BPM/Key from Custom form (inherited from project), keep Duration only - Split into stems: default unchecked (was checked) - +/− buttons: align with pr-0.5 on header, compact "4/4" count Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d/thinking inline Custom form: - "Song Description" → "Music Caption" - Lyrics always visible with Instrumental toggle (was collapsible) - Duration as dropdown with Auto option (removed BPM/Key — inherited from project) - Seed with 🎲 random button + Rand checkbox inline - Thinking checkbox moved from Advanced to inline row - Vocal Language selector inline next to Thinking Stems form: - Song description → 3 rows (was 1) - Shuffle → 🎲 emoji button - Added Duration dropdown next to Seed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix dice button: was disabled when Rand checked, now always clickable and auto-unchecks Rand on click - Duration: replace dropdown with number input + Auto checkbox (more flexible, matches Gradio's approach) - Studied Gradio UI layout patterns for reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace Stems duration dropdown with number input + Auto checkbox (matches Custom form pattern) - Widen seed input from 80/90px to 110px to show full 10-digit seeds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace messy flex-wrap with a structured 4-column grid in a card: - Duration (number + Auto checkbox) - Language (dropdown) - Seed (number + 🎲) - Toggles (Random seed, Thinking) All labels top-aligned, consistent input heights, contained in a subtle border card for visual grouping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move Thinking checkbox from parameters card to dialog footer (next to Generate button) — more prominent, less clutter - Parameters now clean 3-column grid: Duration | Language | Seed - Seed has inline 🎲 + R(andom) checkbox - Duration has inline A(uto) checkbox - Thinking only shows in footer for Custom mode (via thinkingState) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s Random seed Custom form: - Language selector moved to Lyrics header row (next to Instrumental) - Parameter section simplified to inline: Duration + Auto | Seed + 🎲 + Random - Removed card wrapper — cleaner flat layout Stems form: - Added missing Random seed checkbox next to 🎲 button - 🎲 click now also unchecks Random (same behavior as Custom) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lyrics textarea - Remove Advanced Parameters section from Custom form (duplicates Settings > Generation Defaults) - Lyrics textarea: 3 rows → 5 rows (uses freed space) - Stems vocals: single-line input → 3-row textarea for lyrics - Use project.generationDefaults for inference params instead of local model-variant state - Remove model variant indicator (was showing Turbo/Base/SFT badge) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # src/components/generation/GenerationSidePanel.tsx

ChuxiJ and others added 3 commits March 25, 2026 22:43

Copilot AI review requested due to automatic review settings March 25, 2026 15:36

Copilot started reviewing on behalf of ChuxiJ March 25, 2026 15:36 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

ChuxiJ and others added 19 commits March 25, 2026 23:47

Merge remote-tracking branch 'origin/main' into feat/issue-884

19f7657

# Conflicts: # src/components/generation/GenerationSidePanel.tsx

ChuxiJ merged commit f305849 into main Mar 26, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: intent-driven dual-model architecture (text2music + lego)#886

feat: intent-driven dual-model architecture (text2music + lego)#886
ChuxiJ merged 22 commits intomainfrom
feat/issue-884

ChuxiJ commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChuxiJ commented Mar 25, 2026

Summary

Backend requirements

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants