Skip to content

How to interrupt Gemini Live API response mid-stream when using automatic VAD (Python google-genai) #2593

@jg-itxi

Description

@jg-itxi

Question

I'm using the Gemini Live API via the Python google-genai SDK with automatic VAD (voice activity detection) enabled. During a session, the model starts streaming audio/text chunks back to the client, and I need to interrupt the server mid-response — stopping it from sending further chunks — similar to how a user would talk over the AI in a real conversation.

Context

My setup looks roughly like this:

import asyncio
from google import genai

client = genai.Client(api_key="...")

config = {
    "response_modalities": ["AUDIO"],
    "realtime_input_config": {
        "automatic_activity_detection": {
            "disabled": False,  # automatic VAD is ON
        }
    },
}

Problem

While iterating over session.receive(), the server keeps pushing chunks. I want to interrupt the current generation — i.e., stop the server from sending more audio/text for the current turn — and either:

Send new user audio immediately (simulating the user talking over the model), or
Explicitly signal "stop generating" without sending new audio.

What I've tried
Breaking out of the receive() loop: This stops me from consuming chunks locally, but the server doesn't know to stop — it keeps generating.
Sending new audio while receiving: The automatic VAD is supposed to detect new user speech and trigger an interruption. Does sending audio via session.send_realtime_input() while still in the receive loop achieve this? Or does the receive loop need to be running concurrently on a separate task?
Sending an end_of_turn message mid-stream: Not clear if this signals an interruption or is simply ignored during active generation.

Questions

  • What is the correct way to signal an interruption to the server when using automatic VAD — is it purely driven by sending new audio input, or is there an explicit interrupt/cancel message in the SDK?
  • Should the send and receive loops run concurrently (e.g., two asyncio tasks — one consuming session.receive() and one sending audio)? Is that the intended pattern for supporting interruptions?
  • Does the BidiGenerateContentClientContent or any other message type support an explicit interrupt flag that tells the server to stop the current generation?

Metadata

Metadata

Labels

priority: p3Desirable enhancement or fix. May not be included in next release.type: questionRequest for information or clarification. Not an issue.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions