Skip to content

Multiple AgentSession.start({ room }) in one Job fails: lk.agent.session byte stream handler already set #1927

Description

@GuoRuiLv

Description

When running multiple AgentSession instances bound to the same Room within a single Job (one per remote participant), the second session.start() throws and the Job fails.

This pattern is documented and working in the Python agents SDK via the official multi-user-transcriber example, which creates one AgentSession per participant on the same ctx.room.

We are building a listen-only realtime transcription agent (STT + lk.transcription, no LLM/TTS). Our use case requires concurrent per-participant STT (separate participantIdentity per session).

Environment

  • @livekit/agents: 1.4.11
  • @livekit/rtc-node: 0.13.29 (resolved by agents)
  • LiveKit Server: 1.13.1 (Standard)
  • OS: Linux (K8s pod, Debian 12) and Windows (dev)
  • Node.js: 20+

Steps to reproduce

  1. Dispatch a transcription-style agent Job into a room with ≥2 human participants (or iterate all remoteParticipants at Job start).
  2. For each non-agent participant, create a listen-only AgentSession (STT + VAD, no LLM/TTS) and call:
await session.start({
  agent: new voice.Agent({ instructions: '...' }),
  room: ctx.room, // same Room instance for all sessions
  record: false,
  inputOptions: {
    audioEnabled: true,
    textEnabled: false,
    participantIdentity: participant.identity,
    closeOnDisconnect: false,
  },
  outputOptions: {
    audioEnabled: false,
    syncTranscription: false,
    transcriptionEnabled: true,
  },
});
  1. First participant session starts successfully; starting the second session throws immediately.

Minimal structural equivalent:

entry: async (ctx) => {
  await ctx.connect();
  for (const p of ctx.room.remoteParticipants.values()) {
    const session = new voice.AgentSession({ stt, vad });
    await session.start({
      agent,
      room: ctx.room,
      record: false,
      inputOptions: { participantIdentity: p.identity, audioEnabled: true, textEnabled: false },
      outputOptions: { audioEnabled: false, transcriptionEnabled: true, syncTranscription: false },
    });
  }
};

Expected behavior

Either:

  • Multiple AgentSession instances can share one Room in a single Job when using record: false on secondary sessions (as implied by the record: false / primary-session handling in agent_session.ts), similar to Python's multi-user transcriber; or
  • Documentation clearly states that only one AgentSession may call start({ room }) per Job, with guidance for multi-participant transcription on Node.js.

Actual behavior

Second AgentSession.start() fails:

Error: A byte stream handler for topic "lk.agent.session" has already been set.
    at Room.registerByteStreamHandler (@livekit/rtc-node/dist/room.js:671:13)
    at RoomSessionTransport.start (@livekit/agents/dist/voice/remote_session.js:38:15)
    at SessionHost.start (@livekit/agents/dist/voice/remote_session.js:516:26)
    at AgentSession._startImpl (@livekit/agents/dist/voice/agent_session.js:320:30)
    at async AgentSession.start (@livekit/agents/dist/voice/agent_session.js:377:5)

The Job entry function exits with error in entry function; the first session may then close.

Analysis

In AgentSession._startImpl, each session with a room creates RoomSessionTransport and calls SessionHost.start(), which registers a byte stream handler on the shared Room (TOPIC_SESSION_MESSAGES / lk.agent.session). Room allows only one handler per topic, so the second registration throws.

record: false correctly avoids the primary recording conflict (ctx._primaryAgentSession), but does not avoid the RoomSessionTransport registration conflict.

Python reference (works)

https://github.com/livekit/agents/blob/main/examples/other/transcription/multi-user-transcriber.py

session = AgentSession()
await session.start(
    agent=Transcriber(participant_identity=participant.identity),
    room=self.ctx.room,
    room_options=room_io.RoomOptions(
        participant_identity=participant.identity,
        audio_input=True,
        text_output=True,
        audio_output=False,
        text_input=False,
    ),
)

Questions for maintainers

  1. Is multi-participant transcription via multiple AgentSession on one Room in one Job intended to work on Node.js?
  2. If not, what is the recommended Node.js pattern for concurrent per-participant transcription (not active-speaker switching)?
    • addParticipantEntrypoint without AgentSession?
    • One dispatch Job per participant?
  3. Should RoomSessionTransport be shared across sessions in the same Job, or should secondary sessions skip remote session transport registration?

Possible workarounds (for context)

  • Multi-dispatch: one Job (and one AgentSession) per participant — works but N agent participants in the room.
  • Participant entrypoint + custom STT pipeline — no second AgentSession; more application code.
  • Single session + room_io.setParticipant() on active speaker — not suitable when multiple speakers need concurrent transcription.

Happy to provide a minimal reproduction repository or additional logs if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions