Skip to content

Feature Suggestion: Add AI subtitles #147

@janwilmake

Description

@janwilmake

Having subtitles would be great for language learning...

According to the documentation, you can receive real-time transcripts of the audio through the response.audio_transcript.delta server events. This happens concurrently while receiving the audio stream.

For WebRTC connections, the documentation mentions that during a session you'll receive:

  • input_audio_buffer.speech_started events when input starts
  • input_audio_buffer.speech_stopped events when input stops
  • response.audio_transcript.delta events for the in-progress audio transcript
  • response.done event when the model has completed transcribing and sending a response

This means you can get word-by-word transcription updates as the audio is being processed, allowing you to build features like real-time captions or text displays alongside the voice interaction.

The transcription events are part of the standard event lifecycle whether you're using WebRTC or WebSocket connections, so you'll have access to the transcript regardless of which connection method you choose.

We can probably create a component like this:

import React, { useState, useEffect } from 'react';
import { useRoomContext } from '~/hooks/useRoomContext';
import type { ClientMessage, User } from '~/types/Messages';

const AiSubtitles = () => {
  const [subtitles, setSubtitles] = useState('');
  const [isVisible, setIsVisible] = useState(false);
  const { room } = useRoomContext();

  // Record AI speech activity
  const recordActivity = (user: User) => {
    if (user.id === 'ai' && user.speaking) {
      setIsVisible(true);
      // Here we'd need the actual transcript from the AI service
      // For now, we'll just show a speaking indicator
      setSubtitles("AI is speaking...");
    } else {
      setIsVisible(false);
      setSubtitles('');
    }
  };

  if (!isVisible) return null;

  return (
    <div className="fixed bottom-24 left-1/2 -translate-x-1/2 w-full max-w-2xl mx-auto px-4">
      <div className="bg-black/75 text-white p-4 rounded-lg text-center text-lg animate-fadeIn">
        {subtitles}
      </div>
    </div>
  );
};

export default AiSubtitles;

to support subtitles, and render it by processing the realtime API response and including this component in /app/routes/_room.$roomName.room.tsx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions