Agent speech silently dropped when interrupted before the first audio frame (resumeFalseInterruption) — port of Python #5039

### Summary

With `resumeFalseInterruption: true`, a brief user sound that arrives **after `say()` is issued but before the agent's first TTS audio frame is forwarded** pauses the speech, leaves `firstFrameFut` unresolved, and the turn is dropped — the user hears **no audio at all** and the turn is dropped from history. The call then sits in silence.

This is the JS counterpart of livekit/agents#5038 (Python), which was fixed by livekit/agents#5039. That fix does not appear to be ported to `agents-js` (the relevant code path is unchanged on `main`).

### Environment

| Field | Value |
|---|---|
| `@livekit/agents` | 1.4.6 (relevant path identical on 1.4.7 and current `main`) |
| Node | 22.x |
| Turn detection | streaming STT, `turnDetection: "stt"` |
| Interruption config | `{ mode: "adaptive", minWords: 2, minDuration: 500, resumeFalseInterruption: true }` |
| Transport | outbound telephony (SIP); mechanism is transport-independent |

### Steps to reproduce

1. Outbound call. On answer, the agent speaks a fixed opener: `session.say(text, { allowInterruptions: true })`.
2. The TTS has a non-trivial time-to-first-byte (a few hundred ms), so the first audio frame is still in flight after `say()` returns.
3. The callee makes a brief sound ("hello?") in that pre-first-frame window.

### Expected

The brief false interruption pauses the speech and then **resumes and plays it** once the false interruption clears (per `resumeFalseInterruption`), or the speech is interrupted cleanly and re-attempted.

### Actual

`firstFrameFut` never resolves; the first audio frame arrives after the segment is torn down and is discarded; the turn is dropped from history. **No audio reaches the user**, and the interruption is not counted (`session` reports zero interruptions). `minWords` is irrelevant — the path that fires has no word-count gate.

### Root cause (JS)

Two behaviors combine.

**1. `onStartOfSpeech` pauses a not-yet-playing speech, ungated.** In `voice/agent_activity.ts`, `onStartOfSpeech` pauses the current speech as soon as user VAD-start fires, guarded only by:

```ts
agentState !== "speaking"            // agent's first audio frame hasn't played yet
&& pauseEnabled()
&& _currentSpeech.allowInterruptions  // the opener is interruptible
```

There is no `minWords` and no duration check on this branch, so any user sound in the pre-first-frame window pauses the speech.

**2. The pause leaves `firstFrameFut` unresolved, so audio + transcript are dropped.** In `voice/generation.ts`, `forwardAudio`'s `finally` rejects the future when no frame was forwarded:

```ts
if (!out.firstFrameFut.done) {
  out.firstFrameFut.reject(new Error("audio forwarding cancelled before playback started"));
}
audioOutput.flush();
if (signal?.aborted) audioOutput.clearBuffer();
```

Downstream, the reply task only preserves the synchronized transcript when `firstFrameFut.done && !firstFrameFut.rejected`, so on the rejected (no-first-frame) path the transcript is blanked and the audio is discarded — the JS analogue of the `else: forwarded_text = ""` overwrite called out in livekit/agents#5038.

Observable log signature on the dropped turn:

```
SegmentSynchronizerImpl.markPlaybackFinished called before text/audio input is done
SegmentSynchronizerImpl.onPlaybackStarted called after close
playback_finished called more times than playback segments were captured
```

### Minimal reproduction (mechanism)

The destruction half is deterministic against the real `performAudioForwarding`: when the forwarding signal is aborted before the first frame, zero frames reach the sink and `firstFrameFut` rejects.

| Case | Frames to output | `firstFrameFut` | `clearBuffer` |
|---|---|---|---|
| Frames flow, no cancel | 3 | resolved | 0 |
| Cancelled before first frame | 0 (silent) | rejected | 1 |
| TTS yields no frame | 0 (silent) | rejected | 0 |

I can attach a runnable `AgentSession`-level reproduction if helpful.

### Relationship to the Python fix

livekit/agents#5038 describes this exact failure and was fixed by livekit/agents#5039, which relocates `first_frame_fut` handling to the callers and avoids blanking the generated text on the pre-first-frame path. The JS port still rejects `firstFrameFut` in `forwardAudio` and gates transcript preservation on `!firstFrameFut.rejected`, so the same defect is present.

One porting subtlety: in the JS `Future`, `cancel()` also sets `rejected = true`, so a literal "cancel instead of reject" transcription of #5039 would not change the downstream `!firstFrameFut.rejected` check. The JS fix likely needs to preserve the generated audio/transcript on the no-first-frame path explicitly.

### Ask

Port the #5039 fix to `agents-js` (or confirm the intended approach), so a brief false interruption arriving before the first audio frame no longer drops the entire turn — and ideally resumes it, per `resumeFalseInterruption`. Happy to open a PR with a regression test if that's welcome.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent speech silently dropped when interrupted before the first audio frame (resumeFalseInterruption) — port of Python #5039 #1909

Summary

Environment

Steps to reproduce

Expected

Actual

Root cause (JS)

Minimal reproduction (mechanism)

Relationship to the Python fix

Ask

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
`@livekit/agents`	1.4.6 (relevant path identical on 1.4.7 and current `main`)
Node	22.x
Turn detection	streaming STT, `turnDetection: "stt"`
Interruption config	`{ mode: "adaptive", minWords: 2, minDuration: 500, resumeFalseInterruption: true }`
Transport	outbound telephony (SIP); mechanism is transport-independent

Case	Frames to output	`firstFrameFut`	`clearBuffer`
Frames flow, no cancel	3	resolved	0
Cancelled before first frame	0 (silent)	rejected	1
TTS yields no frame	0 (silent)	rejected	0

Uh oh!

Agent speech silently dropped when interrupted before the first audio frame (resumeFalseInterruption) — port of Python #5039 #1909

Description

Summary

Environment

Steps to reproduce

Expected

Actual

Root cause (JS)

Minimal reproduction (mechanism)

Relationship to the Python fix

Ask

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions