ACE-Step 1.5: Reference audio (timbre) in ComfyUI — shared nodes and workflow

[ACE-Step_audio_to_audio.zip](https://github.com/user-attachments/files/25114630/ACE-Step_audio_to_audio.zip)


I’ve been experimenting a bit with ACE-Step to get *something usable* working around **reference audio input** and generating new variations from it.
Due to lack of time I can’t continue developing this right now, so I’m sharing what I’ve managed to put together so far, in case it helps someone else or anyone wants to continue from here.

Thanks 🙌

---

## Summary of the nodes I created (ACE_Step_RefAudio)

The ZIP includes several nodes **compatible with existing workflows** that expect specific node IDs, plus one auxiliary node:

### 1. **`ACEStepDCAEEncodeAudioToLatent` (Compat)**

* Takes an **AUDIO** input (reference audio), normalises it (48 kHz, stereo, time window) and passes it through ACE-Step’s internal **DCAE encoder**.
* Outputs a **LATENT** suitable for timbre conditioning in the format **[B, 64, T]**.
* Used to extract the *colour / timbre* of the reference audio (not the exact melody).

### 2. **`ACE15_ReferenceAudioToLatent` (Compat)**

* Same functionality as the previous node, but exposed under a **different ID**, as some workflows expect this specific name.

### 3. **`ACE15_ApplyTimbreReference`**

* Injects the latent into the **CONDITIONING** under the key
  `reference_audio_timbre_latents`,
  allowing the sampler to use it as a **timbre reference** (without copying the original melody).

### 4. **`AudioEstimateBPMHzTuningPRO` (Compat)**

* Estimates **BPM** and **tuning (Hz)** using `librosa` if available.
* If not available, it falls back to safe default values so the workflow doesn’t break.

---

## Workflow

The ZIP also includes **an example workflow** that connects:

* **LoadAudio → (EncodeAudioToLatent)**
* **TextEncodeAceStepAudio1.5 → (ApplyTimbreReference)**
* And from there to the ACE-Step sampler / final generation.

---
Español:

He estado experimentando un poco con ACE-Step para conseguir *algo funcional* en cuanto a usar un **audio de referencia** y generar versiones nuevas a partir de él.
Por falta de tiempo no puedo seguir desarrollándolo ahora mismo, así que dejo aquí lo que he conseguido hasta el momento, por si a alguien le resulta útil o quiere continuarlo.

Gracias 🙌

---

## Resumen de los nodos que creé (ACE_Step_RefAudio)

En el ZIP he incluido varios nodos **compatibles con workflows existentes** que buscaban IDs concretos, además de uno auxiliar:

### 1. **`ACEStepDCAEEncodeAudioToLatent` (Compat)**

* Toma un **AUDIO** (audio de referencia), lo normaliza (48 kHz, estéreo, ventana temporal) y lo pasa por el **encoder DCAE** interno de ACE-Step.
* Devuelve un **LATENT** compatible con timbre en formato **[B, 64, T]**.
* Sirve para extraer el *color / timbre* del audio de referencia (no la melodía exacta).

### 2. **`ACE15_ReferenceAudioToLatent` (Compat)**

* Hace lo mismo que el nodo anterior, pero mantiene **otro ID**, ya que algunos workflows esperan este nombre concreto.

### 3. **`ACE15_ApplyTimbreReference`**

* Inserta el latente dentro del **CONDITIONING** bajo la clave
  `reference_audio_timbre_latents`,
  para que el sampler lo utilice como **referencia tímbrica** (sin copiar la melodía original).

### 4. **`AudioEstimateBPMHzTuningPRO` (Compat)**

* Estima **BPM** y **afinación (Hz)** usando `librosa` si está disponible.
* Si no lo está, devuelve valores por defecto seguros para evitar que el workflow falle.

---

## Workflow

El ZIP incluye **un workflow de ejemplo** que conecta:

* **CargarAudio → (EncodeAudioToLatent)**
* **TextEncodeAceStepAudio1.5 → (ApplyTimbreReference)**
* Y de ahí al sampler / generación final de ACE-Step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACE-Step 1.5: Reference audio (timbre) in ComfyUI — shared nodes and workflow #381

Summary of the nodes I created (ACE_Step_RefAudio)

1. `ACEStepDCAEEncodeAudioToLatent` (Compat)

2. `ACE15_ReferenceAudioToLatent` (Compat)

3. `ACE15_ApplyTimbreReference`

4. `AudioEstimateBPMHzTuningPRO` (Compat)

Workflow

Resumen de los nodos que creé (ACE_Step_RefAudio)

1. `ACEStepDCAEEncodeAudioToLatent` (Compat)

2. `ACE15_ReferenceAudioToLatent` (Compat)

3. `ACE15_ApplyTimbreReference`

4. `AudioEstimateBPMHzTuningPRO` (Compat)

Workflow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ACE-Step 1.5: Reference audio (timbre) in ComfyUI — shared nodes and workflow #381

Description

Summary of the nodes I created (ACE_Step_RefAudio)

1. ACEStepDCAEEncodeAudioToLatent (Compat)

2. ACE15_ReferenceAudioToLatent (Compat)

3. ACE15_ApplyTimbreReference

4. AudioEstimateBPMHzTuningPRO (Compat)

Workflow

Resumen de los nodos que creé (ACE_Step_RefAudio)

1. ACEStepDCAEEncodeAudioToLatent (Compat)

2. ACE15_ReferenceAudioToLatent (Compat)

3. ACE15_ApplyTimbreReference

4. AudioEstimateBPMHzTuningPRO (Compat)

Workflow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `ACEStepDCAEEncodeAudioToLatent` (Compat)

2. `ACE15_ReferenceAudioToLatent` (Compat)

3. `ACE15_ApplyTimbreReference`

4. `AudioEstimateBPMHzTuningPRO` (Compat)

1. `ACEStepDCAEEncodeAudioToLatent` (Compat)

2. `ACE15_ReferenceAudioToLatent` (Compat)

3. `ACE15_ApplyTimbreReference`

4. `AudioEstimateBPMHzTuningPRO` (Compat)