Minh/formatting (#2141)

minh-hoque · web-flow · commit cf19ff4c1d31 · 2025-09-14T16:00:52.000-07:00
diff --git a/examples/Context_summarization_with_realtime_api.ipynb b/examples/Context_summarization_with_realtime_api.ipynb
@@ -30,7 +30,7 @@
     "\n",
     "\n",
     "*Notes:*\n",
-    "> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
+    "> 1. gpt-realtime supports a 32k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
     "> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
     "\n",
     "### One‑liner install (run in a fresh cell)"
@@ -48,7 +48,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -74,7 +74,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -96,7 +96,7 @@
     "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
     "\n",
     "\n",
-    "* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n",
+    "* gpt-realtime accepts up to **32k tokens** and as the token size increases, instruction adherence can drift.\n",
     "* Every user/assistant turn consumes tokens → the window **only grows**.\n",
     "* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
     "\n",
@@ -128,7 +128,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -159,7 +159,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -196,7 +196,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -248,7 +248,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -297,11 +297,11 @@
    "metadata": {},
    "source": [
     "### 3.3 Detect When to Summarise\n",
-    "The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
+    "The Realtime model keeps a **large 32 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
     "\n",
     "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
     "\n",
-    "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
+    "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarization coroutine.\n",
     "\n",
     "We compress everything except the last 2 turns into a single French paragraph, then:\n",
     "\n",
@@ -314,7 +314,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -343,7 +343,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -401,7 +401,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -451,7 +451,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -466,14 +466,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
     "# --------------------------------------------------------------------------- #\n",
-    "# 🎤 Realtime session                                                          #\n",
+    "# Realtime session                                                          #\n",
     "# --------------------------------------------------------------------------- #\n",
-    "async def realtime_session(model=\"gpt-4o-realtime-preview\", voice=\"shimmer\", enable_playback=True):\n",
+    "async def realtime_session(model=\"gpt-realtime\", voice=\"shimmer\", enable_playback=True):\n",
     "    \"\"\"\n",
     "    Main coroutine: connects to the Realtime endpoint, spawns helper tasks,\n",
     "    and processes incoming events in a big async‑for loop.\n",
@@ -487,7 +487,7 @@
     "    # Open the WebSocket connection to the Realtime API                       #\n",
     "    # ----------------------------------------------------------------------- #\n",
     "    url = f\"wss://api.openai.com/v1/realtime?model={model}\"\n",
-    "    headers = {\"Authorization\": f\"Bearer {openai.api_key}\", \"OpenAI-Beta\": \"realtime=v1\"}\n",
+    "    headers = {\"Authorization\": f\"Bearer {openai.api_key}\"}\n",
     "\n",
     "    async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:\n",
     "        # ------------------------------------------------------------------- #\n",
@@ -503,6 +503,8 @@
     "        await ws.send(json.dumps({\n",
     "            \"type\": \"session.update\",\n",
     "            \"session\": {\n",
+    "                \"type\": \"realtime\",\n",
+    "                model: \"gpt-realtime\",\n",
     "                \"voice\": voice,\n",
     "                \"modalities\": [\"audio\", \"text\"],\n",
     "                \"input_audio_format\": \"pcm16\",\n",
diff --git a/examples/Realtime_prompting_guide.ipynb b/examples/Realtime_prompting_guide.ipynb
@@ -9,7 +9,7 @@
     "\n",
     "<img\n",
     "  src=\"../images/realtime_prompting_guide.png\"\n",
-    "  style=\"width:450px; height:auto;\"\n",
+    "  style=\"width:450px; height:450px;\"\n",
     "/>\n",
     "\n",
     "\n",
@@ -20,39 +20,7 @@
     "\n",
     "The new gpt-realtime model delivers stronger instruction following, more reliable tool calling, noticeably better voice quality, and an overall smoother feel. These gains make it practical to move from chained approaches to true realtime experiences, cutting latency and producing responses that sound more natural and expressive.\n",
     "\n",
-    "Realtime model benefits from different prompting techniques that wouldn't directly apply to text based models. This prompting guide starts with a suggested prompt skeleton, then walks through each part with practical tips, small patterns you can copy, and examples you can adapt to your use case.\n",
-    "\n",
-    "# Table of Contents\n",
-    "\n",
-    "- [Realtime Prompting Guide](#realtime-prompting-guide)\n",
-    "- [General Tips](#general-tips)\n",
-    "- [Prompt Structure](#prompt-structure)\n",
-    "- [Role and Objective](#role-and-objective)\n",
-    "- [Personality and Tone](#personality-and-tone)\n",
-    "  - [Speed Instructions](#speed-instructions)\n",
-    "  - [Language Constraint](#language-constraint)\n",
-    "  - [Reduce Repetition](#reduce-repetition)\n",
-    "- [Reference Pronunciations](#reference-pronunciations)\n",
-    "  - [Alphanumeric Pronunciations](#alphanumeric-pronunciations)\n",
-    "- [Instructions](#instructions)\n",
-    "  - [Instruction Following](#instruction-following)\n",
-    "  - [No audio or unclear audio](#no-audio-or-unclear-audio)\n",
-    "- [Tools](#tools)\n",
-    "  - [Tool Selection](#tool-selection)\n",
-    "  - [Tool Call Preambles](#tool-call-preambles)\n",
-    "    - [Tool Call Preambles + Sample Phrases](#tool-call-preambles-sample-phrases)\n",
-    "  - [Tool Calls without Confirmation](#tool-calls-without-confirmation)\n",
-    "  - [Tool Call Performance](#tool-call-performance)\n",
-    "  - [Tool Level Behavior](#tool-level-behavior)\n",
-    "  - [Rephrase Supervisor Tool (Responder-Thinker Architecture)](#rephrase-supervisor-tool-responder-thinker-architecture)\n",
-    "  - [Common Tools](#common-tools)\n",
-    "- [Conversation flow](#conversation-flow)\n",
-    "  - [Sample phrases](#sample-phrases)\n",
-    "  - [Conversation flow + Sample Phrases](#conversation-flow-sample-phrases)\n",
-    "  - [Advanced Conversation Flow](#advanced-conversation-flow)\n",
-    "    - [Conversation Flow as State Machine](#conversation-flow-as-state-machine)\n",
-    "    - [Dynamic Conversation Flow](#dynamic-conversation-flow)\n",
-    "- [Safety & Escalation](#safety-escalation)"
+    "Realtime model benefits from different prompting techniques that wouldn't directly apply to text based models. This prompting guide starts with a suggested prompt skeleton, then walks through each part with practical tips, small patterns you can copy, and examples you can adapt to your use case."
    ]
   },
   {