Fix: Shape mismatch in extend mode causing AssertionError#373
Fix: Shape mismatch in extend mode causing AssertionError#373iackov wants to merge 1 commit intoace-step:mainfrom
Conversation
- Added automatic shape alignment for target_latents and x0 - Handles both shorter (padding) and longer (trimming) cases - Fixes crash in extend mode with long audio files - Minimal impact on audio quality (~0.05-0.15 sec) Resolves issue where extend mode fails with AssertionError when target_latents shape doesn't match x0 shape after padding/trimming operations.
Additional ContextThis PR fixes the issue reported in #374 Full Error DetailsThe error occurs with the following shape mismatch: Stack Trace Location
ImpactWithout this fix, the extend mode is completely unusable, preventing users from extending their generated audio files. Closes #374 |
Testing ConfirmationI've tested this fix with the following scenarios: ✅ Extend to the left (negative start frame)
✅ Extend to the right (end frame > source length)
✅ Long audio near 240 sec limit
✅ Combined left + right padding
Audio Quality ImpactThe padding/trimming adds or removes ~0.05-0.15 seconds (1-5 frames), which is imperceptible to users. The fix ensures stability without compromising audio quality. |
|
output_20260128180141_0.wav |
Fix: Shape mismatch in extend mode causing AssertionError
Description
This PR fixes a critical bug in the
extendmode wheretarget_latentsandx0tensor shapes don't match after padding/trimming operations, causing the pipeline to crash with an AssertionError.Problem
When using the extend mode (extending audio left/right), the following error occurs:
Root Cause
The shape mismatch happens due to:
max_infer_fame_length(240 seconds)These operations can create a 1-5 frame difference between
target_latentsandx0.Solution
Added automatic shape alignment before the assertion check:
Logic:
target_latentsis shorter: pad with zeros on the righttarget_latentsis longer: trim excess frames from the rightImpact
Before Fix:
After Fix:
Testing
Tested scenarios:
Files Changed
acestep/pipeline_ace_step.py- Added shape alignment logic in extend modeSeverity
🔴 CRITICAL - Without this fix, extend mode is completely broken.
Additional Notes
This fix addresses the issue reported by users when using the Upload tab with Text2Music Parameters in extend mode. The shape mismatch was causing the pipeline to fail before generating any audio.