The `gemini-2.5-flash-native-audio-preview-12-2025` model cannot be used with  modalities text for hybrid architecture with a separate TTS plugin

### Bug Description

### Error message

```
websockets.exceptions.ConnectionClosedError: received 1007 (invalid frame payload data) Cannot extract voices from a non-audio request.
```

### Code to reproduce

```python
from livekit.agents import AgentSession
from livekit.plugins import google
from livekit.plugins.google.realtime import Modality

session = AgentSession(
    llm=google.realtime.RealtimeModel(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        modalities=["text"],
    ),
    tts=<YOUR_CUSTOM_TTS>,  # e.g., elevenlabs.TTS(), deepgram.TTS()
    vad=silero.VAD.load(),
)
```

### Expected Behavior

When setting `modalities=[Modality.TEXT]`, the Gemini Live API should return text-only responses, allowing the agent to use a separate TTS plugin for speech synthesis (half-cascade architecture).

### Reproduction Steps

```bash
from livekit.agents import AgentSession
from livekit.plugins import google
from livekit.plugins.google.realtime import Modality

session = AgentSession(
    llm=google.realtime.RealtimeModel(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        modalities=[Modality.TEXT],
    ),
    tts=<YOUR_CUSTOM_TTS>,  # e.g., elevenlabs.TTS(), deepgram.TTS()
    vad=silero.VAD.load(),
)
```

### Operating System

Ubuntu 22.04

### Models Used

Deepgram, Google, Elevenlab

### Package Versions

```bash
livekit-agents==1.3.10
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

```python

```

### Additional Context

_No response_

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The `gemini-2.5-flash-native-audio-preview-12-2025` model cannot be used with modalities text for hybrid architecture with a separate TTS plugin #4423

Bug Description

Error message

Code to reproduce

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The gemini-2.5-flash-native-audio-preview-12-2025 model cannot be used with modalities text for hybrid architecture with a separate TTS plugin #4423

Description

Bug Description

Error message

Code to reproduce

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `gemini-2.5-flash-native-audio-preview-12-2025` model cannot be used with modalities text for hybrid architecture with a separate TTS plugin #4423