|
30 | 30 | "\n", |
31 | 31 | "\n", |
32 | 32 | "*Notes:*\n", |
33 | | - "> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n", |
| 33 | + "> 1. gpt-realtime supports a 32k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n", |
34 | 34 | "> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n", |
35 | 35 | "\n", |
36 | 36 | "### One‑liner install (run in a fresh cell)" |
|
48 | 48 | }, |
49 | 49 | { |
50 | 50 | "cell_type": "code", |
51 | | - "execution_count": 4, |
| 51 | + "execution_count": 1, |
52 | 52 | "metadata": {}, |
53 | 53 | "outputs": [], |
54 | 54 | "source": [ |
|
74 | 74 | }, |
75 | 75 | { |
76 | 76 | "cell_type": "code", |
77 | | - "execution_count": 5, |
| 77 | + "execution_count": 2, |
78 | 78 | "metadata": {}, |
79 | 79 | "outputs": [], |
80 | 80 | "source": [ |
|
96 | 96 | "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n", |
97 | 97 | "\n", |
98 | 98 | "\n", |
99 | | - "* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n", |
| 99 | + "* gpt-realtime accepts up to **32k tokens** and as the token size increases, instruction adherence can drift.\n", |
100 | 100 | "* Every user/assistant turn consumes tokens → the window **only grows**.\n", |
101 | 101 | "* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n", |
102 | 102 | "\n", |
|
128 | 128 | }, |
129 | 129 | { |
130 | 130 | "cell_type": "code", |
131 | | - "execution_count": 6, |
| 131 | + "execution_count": 3, |
132 | 132 | "metadata": {}, |
133 | 133 | "outputs": [], |
134 | 134 | "source": [ |
|
159 | 159 | }, |
160 | 160 | { |
161 | 161 | "cell_type": "code", |
162 | | - "execution_count": 7, |
| 162 | + "execution_count": 4, |
163 | 163 | "metadata": {}, |
164 | 164 | "outputs": [], |
165 | 165 | "source": [ |
|
196 | 196 | }, |
197 | 197 | { |
198 | 198 | "cell_type": "code", |
199 | | - "execution_count": 8, |
| 199 | + "execution_count": 5, |
200 | 200 | "metadata": {}, |
201 | 201 | "outputs": [], |
202 | 202 | "source": [ |
|
248 | 248 | }, |
249 | 249 | { |
250 | 250 | "cell_type": "code", |
251 | | - "execution_count": 9, |
| 251 | + "execution_count": 6, |
252 | 252 | "metadata": {}, |
253 | 253 | "outputs": [], |
254 | 254 | "source": [ |
|
297 | 297 | "metadata": {}, |
298 | 298 | "source": [ |
299 | 299 | "### 3.3 Detect When to Summarise\n", |
300 | | - "The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n", |
| 300 | + "The Realtime model keeps a **large 32 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n", |
301 | 301 | "\n", |
302 | 302 | "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n", |
303 | 303 | "\n", |
304 | | - "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n", |
| 304 | + "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarization coroutine.\n", |
305 | 305 | "\n", |
306 | 306 | "We compress everything except the last 2 turns into a single French paragraph, then:\n", |
307 | 307 | "\n", |
|
314 | 314 | }, |
315 | 315 | { |
316 | 316 | "cell_type": "code", |
317 | | - "execution_count": 10, |
| 317 | + "execution_count": 7, |
318 | 318 | "metadata": {}, |
319 | 319 | "outputs": [], |
320 | 320 | "source": [ |
|
343 | 343 | }, |
344 | 344 | { |
345 | 345 | "cell_type": "code", |
346 | | - "execution_count": 11, |
| 346 | + "execution_count": 8, |
347 | 347 | "metadata": {}, |
348 | 348 | "outputs": [], |
349 | 349 | "source": [ |
|
401 | 401 | }, |
402 | 402 | { |
403 | 403 | "cell_type": "code", |
404 | | - "execution_count": 12, |
| 404 | + "execution_count": 9, |
405 | 405 | "metadata": {}, |
406 | 406 | "outputs": [], |
407 | 407 | "source": [ |
|
451 | 451 | }, |
452 | 452 | { |
453 | 453 | "cell_type": "code", |
454 | | - "execution_count": 13, |
| 454 | + "execution_count": 10, |
455 | 455 | "metadata": {}, |
456 | 456 | "outputs": [], |
457 | 457 | "source": [ |
|
466 | 466 | }, |
467 | 467 | { |
468 | 468 | "cell_type": "code", |
469 | | - "execution_count": 14, |
| 469 | + "execution_count": 11, |
470 | 470 | "metadata": {}, |
471 | 471 | "outputs": [], |
472 | 472 | "source": [ |
473 | 473 | "# --------------------------------------------------------------------------- #\n", |
474 | | - "# 🎤 Realtime session #\n", |
| 474 | + "# Realtime session #\n", |
475 | 475 | "# --------------------------------------------------------------------------- #\n", |
476 | | - "async def realtime_session(model=\"gpt-4o-realtime-preview\", voice=\"shimmer\", enable_playback=True):\n", |
| 476 | + "async def realtime_session(model=\"gpt-realtime\", voice=\"shimmer\", enable_playback=True):\n", |
477 | 477 | " \"\"\"\n", |
478 | 478 | " Main coroutine: connects to the Realtime endpoint, spawns helper tasks,\n", |
479 | 479 | " and processes incoming events in a big async‑for loop.\n", |
|
487 | 487 | " # Open the WebSocket connection to the Realtime API #\n", |
488 | 488 | " # ----------------------------------------------------------------------- #\n", |
489 | 489 | " url = f\"wss://api.openai.com/v1/realtime?model={model}\"\n", |
490 | | - " headers = {\"Authorization\": f\"Bearer {openai.api_key}\", \"OpenAI-Beta\": \"realtime=v1\"}\n", |
| 490 | + " headers = {\"Authorization\": f\"Bearer {openai.api_key}\"}\n", |
491 | 491 | "\n", |
492 | 492 | " async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:\n", |
493 | 493 | " # ------------------------------------------------------------------- #\n", |
|
503 | 503 | " await ws.send(json.dumps({\n", |
504 | 504 | " \"type\": \"session.update\",\n", |
505 | 505 | " \"session\": {\n", |
| 506 | + " \"type\": \"realtime\",\n", |
| 507 | + " model: \"gpt-realtime\",\n", |
506 | 508 | " \"voice\": voice,\n", |
507 | 509 | " \"modalities\": [\"audio\", \"text\"],\n", |
508 | 510 | " \"input_audio_format\": \"pcm16\",\n", |
|
0 commit comments