diff --git a/README.md b/README.md
index c224e74..1458f12 100644
--- a/README.md
+++ b/README.md
@@ -24,7 +24,7 @@ AI dictation App for macOS (MVP scaffold).
 - Pass-2 finalize pass using `gpt-4o-transcribe` for better punctuation and stability.
 - Optional Pass-3 rewrite for cleaner English output with numeric/proper noun protection.
 - Auto-paste into the app that was frontmost when recording began.
-- Configurable behavior and models via `config.toml`.
+- Configurable behavior and models via Settings-backed `config.toml`.
 
 For the normative product contract, constraints, and gaps, see the
 [Runtime Spec](docs/spec/runtime.md).
@@ -37,8 +37,8 @@ V1 target is **macOS-first** and aligned to the English-only voice input design.
 - Scope: ✅ Native macOS mic capture + OpenAI model pipeline only.
 - Limitation: ✅ Linux/Windows build is intentionally disabled.
 - Limitation: ⚠️ Known gaps are documented in the
-  [Runtime Spec](docs/spec/runtime.md) (runtime action wiring, config write-through,
-  CPAL fallback robustness, and rollout cleanup items).
+  [Runtime Spec](docs/spec/runtime.md) (explicit microphone picker, CPAL fallback
+  robustness, app-rule authoring, and rollout cleanup items).
 
 ## Usage
 
@@ -104,13 +104,14 @@ realtime_target_rate_hz = 24000
 
 [openai]
 api_base_url = "https://api.openai.com/v1"
-realtime_model = "gpt-4o-mini-transcribe"
+realtime_model = "gpt-realtime-2"
 finalize_model = "gpt-4o-transcribe"
 rewrite_model = "gpt-5.2-mini"
 language = "en"
 
 [openai.realtime]
 noise_reduction = "near_field" # near_field | far_field | off
+transcription_model = "gpt-4o-mini-transcribe"
 
 [rewrite]
 enabled = true
@@ -130,14 +131,14 @@ First-run onboarding checklist:
 - Microphone permission in **System Settings → Privacy & Security → Microphone**.
 - Accessibility permission in **Privacy & Security → Accessibility** (for Cmd+V fallback).
 - Input Monitoring permission in **Privacy & Security → Input Monitoring** (for global hotkey hooks).
-- Voxit uses request buttons to guide you through the permission prompts in sequence (Microphone → Accessibility → Input Monitoring); grant each permission and re-check when prompted.
+- Voxit Settings includes shortcut buttons for the relevant macOS privacy panes; grant each permission and re-check before a real dictation run.
 - Verify paste flow after permission grant and restart the app if needed.
 
 For the full guided sequence, see [First Run](docs/runbook/first-run.md).
 
 Runtime configuration remains sourced from `config.toml`. The current Swift Settings
-window persists shell preferences in macOS `UserDefaults`; writing those settings back
-through the Rust config path is a tracked runtime gap.
+window persists shell and model preferences in macOS `UserDefaults` and writes
+supported preferences back through the Rust host FFI.
 
 ### Interaction
 
@@ -147,8 +148,10 @@ through the Rust config path is a tracked runtime gap.
 - While listening: panel shows live draft text and committed segments.
 - Stop recording: toggle key again or release key in hold mode.
 - Finalize: Pass-2 runs automatically; rewrite runs by default unless disabled in settings.
-- Microphone input selection is persisted in config as `audio.input_device_id` and `audio.input_device_name`.
-- Refresh workflow: the picker list is refreshed at startup and via the **Refresh microphones** control before choosing from a list of input-capable devices.
+- Model choice: Settings exposes editable OpenAI model IDs for realtime voice,
+  realtime transcript, finalize, and rewrite passes.
+- The Swift Settings audio picker currently exposes the system default microphone; explicit
+  `audio.input_device_id` values can still be resolved by Rust config.
 - Runtime fallback: if a saved explicit device id is unavailable, Voxit falls back to the system default input device and continues recording.
 - Paste behavior: by default paste rewritten text after finalize, or paste raw transcript via available controls.
 - Output target: text is pasted into the app that was frontmost when dictation started.
diff --git a/docs/decisions/contextual-voice-layer.md b/docs/decisions/contextual-voice-layer.md
index 77ac40b..47081a8 100644
--- a/docs/decisions/contextual-voice-layer.md
+++ b/docs/decisions/contextual-voice-layer.md
@@ -21,8 +21,8 @@ Consequences:
 - The main Voxit window is a control center for activity, app rules, profiles,
   glossary, prompt experiments, and debug/evaluation surfaces.
 - The Settings window stays separate and limited to app preferences such as startup,
-  shortcuts, microphone, permissions, account defaults, privacy, logging, and
-  notifications.
+  shortcuts, model choices, microphone, permissions, account defaults, privacy, logging,
+  and notifications.
 - Swift owns the native macOS presentation layer and UI glue. Rust owns durable product
   logic, context classification, prompt profile selection, voice session planning,
   output policy, and provider orchestration.
diff --git a/docs/reference/repository-layout.md b/docs/reference/repository-layout.md
index 004ff28..936b66f 100644
--- a/docs/reference/repository-layout.md
+++ b/docs/reference/repository-layout.md
@@ -16,8 +16,9 @@ files.
 ## Top-level surfaces
 
 - `native/macos-host/` holds the SwiftPM native macOS host. It owns platform UI
-  composition, the menu bar extra, the Voxit control-center window, the Settings
-  window, and links Rust through the host FFI static library.
+  composition, the menu bar extra, global hotkey observation, the floating recording
+  HUD, the Voxit control-center window, the Settings window, and links Rust through the
+  host FFI static library.
 - `packages/voxit-core/` holds the shared runtime logic, auth, OpenAI integration, and
   dictation pipeline code. Platform-neutral UI model types and contextual voice
   planning contracts also live here so hosts do not invent divergent state names,
diff --git a/docs/runbook/first-run.md b/docs/runbook/first-run.md
index 7b2919f..7604dd5 100644
--- a/docs/runbook/first-run.md
+++ b/docs/runbook/first-run.md
@@ -49,17 +49,17 @@ Verification:
 ## 4. Confirm runtime configuration
 
 - Open **Settings...** from the menu bar menu or press `Cmd+,` to confirm shell
-  preferences and permission shortcuts are available.
+  preferences, model choices, and permission shortcuts are available.
 - Check the config file at:
 
 ```text
 $HOME/Library/Application Support/voxit/config.toml
 ```
 
-- Confirm the default runtime hotkey and audio device settings look reasonable for the
-  machine.
-- If you need an explicit microphone, refresh the device list and select it before the
-  first real dictation run.
+- Confirm the default runtime hotkey, OpenAI model IDs, and system-default audio route
+  look reasonable for the machine.
+- If you need an explicit microphone before the Swift picker exposes one, set
+  `audio.input_device_id` and `audio.input_device_name` in `config.toml`.
 
 ## 5. Verify paste flow
 
diff --git a/docs/spec/contextual-voice.md b/docs/spec/contextual-voice.md
index 393dbeb..c16c9fe 100644
--- a/docs/spec/contextual-voice.md
+++ b/docs/spec/contextual-voice.md
@@ -167,7 +167,7 @@ Swift hosts own:
 
 - menu bar, HUD, main window, and Settings presentation
 - macOS-specific context capture
-- permission prompts and native controls
+- permission panes and native controls
 - rendering Rust-owned snapshots and session plans
 - user confirmation UX
 
diff --git a/docs/spec/runtime.md b/docs/spec/runtime.md
index 5a6f907..debc54b 100644
--- a/docs/spec/runtime.md
+++ b/docs/spec/runtime.md
@@ -93,14 +93,10 @@ State transitions:
 
 ### 4.2 Device picker lifecycle
 
-- On startup, the app refreshes available input-capable devices and caches the result.
-- A manual **Refresh microphones** action is available in the UI to repopulate the
-  picker.
-- Picker values map to:
-  - **System default** (`audio.input_device_id = 0`)
-  - an explicit input device id and name pair from a discovered device list
-- Selection changes persist `audio.input_device_name` and `audio.input_device_id` to
-  config.
+- The current Swift Settings audio picker exposes **System default**
+  (`audio.input_device_id = 0`).
+- Rust can resolve explicit `audio.input_device_id` and `audio.input_device_name` values
+  supplied through config.
 - If a configured device id is invalid or stale when starting recording, the runtime
   falls back to system default and reports fallback in status or logs.
 
@@ -108,10 +104,16 @@ State transitions:
 
 - For each chunk, send `input_audio_buffer.append` payload frames to OpenAI Realtime.
 - Realtime session must be configured with:
+  - `model`: `openai.realtime_model` (default `gpt-realtime-2`)
+  - `reasoning.effort`: the Rust-selected contextual voice plan effort
   - `audio.input.format`: `audio/pcm` with sample rate from config (default `24000`)
-  - `audio.input.noise_reduction`: configured profile (default `near_field`)
-  - `audio.input.transcription.model`: Pass1 model
+  - `audio.input.noise_reduction`: configured profile (default `near_field`) or `null`
+    when set to `off`
+  - `audio.input.transcription.model`: `openai.realtime.transcription_model` (default
+    `gpt-4o-mini-transcribe`)
+  - `audio.input.transcription.language`: `openai.language` (default `en`)
   - `audio.input.turn_detection.type`: `server_vad`
+  - `audio.input.turn_detection.create_response`: `false`
 - Realtime events consumed by the UI:
   - `conversation.item.input_audio_transcription.delta` (draft)
   - `conversation.item.input_audio_transcription.completed` (committed)
@@ -167,8 +169,10 @@ State transitions:
 
 - Hotkey chord handling:
   - supported mode switch: toggle or hold
-  - the menu command uses the configured `hotkey.chord` presentation
-  - system-wide hotkey capture is not active yet
+  - system-wide and app-local key monitors observe the configured `hotkey.chord`
+  - pressing the chord presents the non-activating floating recording HUD and starts
+    dictation without making Voxit the target-app context
+  - toggle mode stops on the next chord press; hold mode stops on hotkey release
 - Menu bar behavior:
   - `MenuBarExtra` exposes `Open Voxit` (`Cmd+O`), `Settings...` (`Cmd+,`),
     `Start Dictation`, `Stop Dictation`, `Refresh Status` (`Cmd+R`), and `Quit Voxit`
@@ -185,15 +189,14 @@ State transitions:
     controls
   - Voxit control-center window: activity, app rules, profiles, glossary, prompt lab,
     and debug/evaluation surfaces
-  - Settings window: app preferences, shortcuts, microphone, permissions, account
-    defaults, privacy, logging, and notifications
-- Onboarding checklist provides request actions for required macOS permissions. The UI
-  prompts permission requests in order:
-  - Microphone: probe-based request and retry loop when denied
-  - Accessibility: system prompt request plus re-check
-  - Input Monitoring: system prompt request plus re-check
-- Grant each permission in macOS Privacy & Security settings when prompted, then
-  re-check in Voxit before continuing.
+  - Settings window: app preferences, shortcuts, model choices, microphone,
+    permissions, account defaults, privacy, logging, and notifications
+- Settings provides shortcut actions for required macOS permission panes:
+  - Microphone
+  - Accessibility
+  - Input Monitoring
+- Grant each permission in macOS Privacy & Security settings, then re-check before
+  continuing to a real dictation run.
 - "Paste raw now" is always available when finalization or rewrite is active and should
   bypass Pass3.
 - The Control Center exposes the current focused context, selected profile, profile
@@ -217,7 +220,7 @@ Supported sections and keys:
   `audio.input_device_id`, `audio.realtime_target_rate_hz`
 - `openai.api_base_url`, `openai.realtime_model`, `openai.finalize_model`,
   `openai.rewrite_model`, `openai.language`
-- `openai.realtime.noise_reduction`
+- `openai.realtime.noise_reduction`, `openai.realtime.transcription_model`
 - `rewrite.enabled`, `rewrite.auto`, `rewrite.guard_numbers`,
   `rewrite.max_output_chars`, `rewrite.style`
 - `paste.lock_frontmost_app`, `paste.method`
@@ -233,7 +236,10 @@ On load:
 Current Swift Settings window:
 
 - persists shell preferences in macOS `UserDefaults`
-- writes supported preferences through the Rust host FFI into `config.toml`
+- exposes editable OpenAI model IDs for realtime voice, realtime transcript, finalize,
+  and rewrite passes
+- writes supported shell and model preferences through the Rust host FFI into
+  `config.toml`
 
 ## 11) CI and Release
 
@@ -253,10 +259,6 @@ Current Swift Settings window:
 
 ## 13) Known Gaps
 
-- System-wide global hotkey capture is not implemented yet; the configured shortcut is
-  currently a Swift menu command.
-- The native HUD does not yet render Pass1 realtime draft/committed transcript events;
-  it shows active profile/state plus raw and final output after Pass2/Pass3.
 - App-rule authoring is not implemented yet; users can refresh focus context and
   manually override the active built-in profile.
 - The Swift Settings audio picker still exposes only System Default even though Rust can
diff --git a/native/macos-host/Sources/VoxitHostBridge/HostFFI.swift b/native/macos-host/Sources/VoxitHostBridge/HostFFI.swift
index 0c9ec68..a92c048 100644
--- a/native/macos-host/Sources/VoxitHostBridge/HostFFI.swift
+++ b/native/macos-host/Sources/VoxitHostBridge/HostFFI.swift
@@ -70,6 +70,8 @@ public struct HostSnapshot: Equatable, Sendable {
   public var hasFocusedContext: Bool
   public var selectedTextPresent: Bool
   public var hasRawTranscript: Bool
+  public var hasPass1CommittedTranscript: Bool
+  public var hasPass1DraftTranscript: Bool
   public var hasFinalOutput: Bool
   public var hasError: Bool
   public var recordingDurationMS: UInt64
@@ -80,6 +82,8 @@ public struct HostSnapshot: Equatable, Sendable {
   public var focusedElementRole: String?
   public var promptProfileID: String?
   public var promptDirective: String?
+  public var pass1CommittedTranscript: String?
+  public var pass1DraftTranscript: String?
   public var rawTranscript: String?
   public var finalOutput: String?
   public var lastError: String?
@@ -100,6 +104,8 @@ public struct HostSnapshot: Equatable, Sendable {
     hasFocusedContext: Bool,
     selectedTextPresent: Bool,
     hasRawTranscript: Bool,
+    hasPass1CommittedTranscript: Bool,
+    hasPass1DraftTranscript: Bool,
     hasFinalOutput: Bool,
     hasError: Bool,
     recordingDurationMS: UInt64,
@@ -110,6 +116,8 @@ public struct HostSnapshot: Equatable, Sendable {
     focusedElementRole: String?,
     promptProfileID: String?,
     promptDirective: String?,
+    pass1CommittedTranscript: String?,
+    pass1DraftTranscript: String?,
     rawTranscript: String?,
     finalOutput: String?,
     lastError: String?,
@@ -129,6 +137,8 @@ public struct HostSnapshot: Equatable, Sendable {
     self.hasFocusedContext = hasFocusedContext
     self.selectedTextPresent = selectedTextPresent
     self.hasRawTranscript = hasRawTranscript
+    self.hasPass1CommittedTranscript = hasPass1CommittedTranscript
+    self.hasPass1DraftTranscript = hasPass1DraftTranscript
     self.hasFinalOutput = hasFinalOutput
     self.hasError = hasError
     self.recordingDurationMS = recordingDurationMS
@@ -139,6 +149,8 @@ public struct HostSnapshot: Equatable, Sendable {
     self.focusedElementRole = focusedElementRole
     self.promptProfileID = promptProfileID
     self.promptDirective = promptDirective
+    self.pass1CommittedTranscript = pass1CommittedTranscript
+    self.pass1DraftTranscript = pass1DraftTranscript
     self.rawTranscript = rawTranscript
     self.finalOutput = finalOutput
     self.lastError = lastError
@@ -277,6 +289,34 @@ public final class VoxitHostSession {
     return try currentSnapshot()
   }
 
+  public func saveModelPreferences(
+    realtimeModel: String,
+    realtimeTranscriptionModel: String,
+    finalizeModel: String,
+    rewriteModel: String
+  ) throws -> HostSnapshot {
+    try realtimeModel.withCString { realtime in
+      try realtimeTranscriptionModel.withCString { realtimeTranscription in
+        try finalizeModel.withCString { finalize in
+          try rewriteModel.withCString { rewrite in
+            try requireOk(
+              voxit_host_session_save_model_preferences(
+                handle,
+                realtime,
+                realtimeTranscription,
+                finalize,
+                rewrite
+              ),
+              context: "saving model preferences"
+            )
+          }
+        }
+      }
+    }
+
+    return try currentSnapshot()
+  }
+
   public func setProfileOverride(_ profileKind: PromptProfileKind) throws -> HostSnapshot {
     try requireOk(
       voxit_host_session_set_profile_override(handle, encode(promptProfileKind: profileKind)),
@@ -321,6 +361,8 @@ public final class VoxitHostSession {
       hasFocusedContext: snapshot.has_focused_context != 0,
       selectedTextPresent: snapshot.selected_text_present != 0,
       hasRawTranscript: snapshot.has_raw_transcript != 0,
+      hasPass1CommittedTranscript: snapshot.has_pass1_committed_transcript != 0,
+      hasPass1DraftTranscript: snapshot.has_pass1_draft_transcript != 0,
       hasFinalOutput: snapshot.has_final_output != 0,
       hasError: snapshot.has_error != 0,
       recordingDurationMS: snapshot.recording_duration_ms,
@@ -331,6 +373,8 @@ public final class VoxitHostSession {
       focusedElementRole: try copyString(field: VOXIT_HOST_STRING_FOCUSED_ELEMENT_ROLE),
       promptProfileID: try copyString(field: VOXIT_HOST_STRING_PROMPT_PROFILE_ID),
       promptDirective: try copyString(field: VOXIT_HOST_STRING_PROMPT_DIRECTIVE),
+      pass1CommittedTranscript: try copyString(field: VOXIT_HOST_STRING_PASS1_COMMITTED_TRANSCRIPT),
+      pass1DraftTranscript: try copyString(field: VOXIT_HOST_STRING_PASS1_DRAFT_TRANSCRIPT),
       rawTranscript: try copyString(field: VOXIT_HOST_STRING_RAW_TRANSCRIPT),
       finalOutput: try copyString(field: VOXIT_HOST_STRING_FINAL_OUTPUT),
       lastError: try copyString(field: VOXIT_HOST_STRING_LAST_ERROR),
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/App/VoxitNativeHostApp.swift b/native/macos-host/Sources/VoxitNativeHostKit/App/VoxitNativeHostApp.swift
index c50795f..002f236 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/App/VoxitNativeHostApp.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/App/VoxitNativeHostApp.swift
@@ -5,7 +5,9 @@ public struct VoxitNativeHostApp: App {
   @Environment(\.openWindow) private var openWindow
   @StateObject private var store = HostStore()
   @StateObject private var settingsStore = VoxitSettingsStore()
+  @StateObject private var hotkeyMonitor = GlobalHotkeyMonitor()
   @State private var settingsWindowController: VoxitSettingsWindowController?
+  @State private var recordingHUDWindowController: RecordingHUDWindowController?
 
   public init() {}
 
@@ -16,6 +18,7 @@ public struct VoxitNativeHostApp: App {
         .task {
           VoxitArtwork.applyApplicationIcon()
           configureSettingsSync()
+          configureHotkeyMonitor()
           await store.reload()
           await store.savePreferences(settingsStore.settings)
           await store.setGlossary(UserDefaults.standard.string(forKey: "glossaryTerms") ?? "")
@@ -36,10 +39,6 @@ public struct VoxitNativeHostApp: App {
         Button("Start Dictation") {
           startDictation()
         }
-        .keyboardShortcut(
-          settingsStore.settings.dictationHotkeyPresentation.swiftUIKeyEquivalent,
-          modifiers: settingsStore.settings.dictationHotkeyPresentation.swiftUIModifiers
-        )
 
         Button("Stop Dictation") {
           Task {
@@ -60,15 +59,6 @@ public struct VoxitNativeHostApp: App {
       }
     }
 
-    Window("Voxit Recording", id: "recording-hud") {
-      RecordingHUDView(store: store)
-        .task {
-          await store.reload()
-        }
-    }
-    .windowResizability(.contentSize)
-    .defaultPosition(.topTrailing)
-
     MenuBarExtra {
       Button("Open Voxit") {
         openWindow(id: "main")
@@ -79,10 +69,6 @@ public struct VoxitNativeHostApp: App {
       Button("Start Dictation") {
         startDictation()
       }
-      .keyboardShortcut(
-        settingsStore.settings.dictationHotkeyPresentation.swiftUIKeyEquivalent,
-        modifiers: settingsStore.settings.dictationHotkeyPresentation.swiftUIModifiers
-      )
 
       Button("Stop Dictation") {
         Task {
@@ -121,20 +107,79 @@ public struct VoxitNativeHostApp: App {
   @MainActor
   private func configureSettingsSync() {
     settingsStore.setSyncHandler { settings in
-      Task {
+      Task { @MainActor in
         await store.savePreferences(settings)
+        configureHotkeyMonitor()
       }
     }
   }
 
+  @MainActor
+  private func configureHotkeyMonitor() {
+    hotkeyMonitor.configure(
+      settings: settingsStore.settings,
+      keyDown: {
+        handleHotkeyDown()
+      },
+      keyUp: {
+        handleHotkeyUp()
+      }
+    )
+  }
+
   @MainActor
   private func startDictation() {
-    openWindow(id: "recording-hud")
+    presentRecordingHUD()
     Task {
       await store.startDictation()
     }
   }
 
+  @MainActor
+  private func handleHotkeyDown() {
+    presentRecordingHUD()
+
+    if settingsStore.settings.hotkeyMode == .hold {
+      guard store.snapshot?.dictationState != .listening else {
+        return
+      }
+
+      Task {
+        await store.startDictation()
+      }
+    } else if store.snapshot?.dictationState == .listening {
+      Task {
+        await store.stopDictation()
+      }
+    } else {
+      Task {
+        await store.startDictation()
+      }
+    }
+  }
+
+  @MainActor
+  private func handleHotkeyUp() {
+    guard settingsStore.settings.hotkeyMode == .hold,
+      store.snapshot?.dictationState == .listening
+    else {
+      return
+    }
+
+    Task {
+      await store.stopDictation()
+    }
+  }
+
+  @MainActor
+  private func presentRecordingHUD() {
+    if recordingHUDWindowController == nil {
+      recordingHUDWindowController = RecordingHUDWindowController(store: store)
+    }
+
+    recordingHUDWindowController?.present()
+  }
+
   @MainActor
   private func presentSettings() {
     if settingsWindowController == nil {
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Stores/HostStore.swift b/native/macos-host/Sources/VoxitNativeHostKit/Stores/HostStore.swift
index 8bcac9d..f3eae04 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/Stores/HostStore.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Stores/HostStore.swift
@@ -7,9 +7,14 @@ public final class HostStore: ObservableObject {
   @Published public private(set) var errorMessage: String?
 
   private var session: VoxitHostSession?
+  private var pollingTask: Task<Void, Never>?
 
   public init() {}
 
+  deinit {
+    pollingTask?.cancel()
+  }
+
   public func reload() async {
     do {
       let session = try currentSession()
@@ -35,12 +40,14 @@ public final class HostStore: ObservableObject {
       let session = try currentSession()
       snapshot = try session.startDictation()
       errorMessage = snapshot?.lastError
+      startRealtimePolling()
     } catch {
       errorMessage = String(describing: error)
     }
   }
 
   public func stopDictation() async {
+    pollingTask?.cancel()
     do {
       let session = try currentSession()
       snapshot = try session.stopDictation()
@@ -70,6 +77,12 @@ public final class HostStore: ObservableObject {
         pasteAfterTranscription: settings.pasteAfterTranscription,
         rewriteAfterTranscription: settings.rewriteAfterTranscription
       )
+      snapshot = try session.saveModelPreferences(
+        realtimeModel: settings.realtimeModel,
+        realtimeTranscriptionModel: settings.realtimeTranscriptionModel,
+        finalizeModel: settings.finalizeModel,
+        rewriteModel: settings.rewriteModel
+      )
       errorMessage = snapshot?.lastError
     } catch {
       errorMessage = String(describing: error)
@@ -110,4 +123,19 @@ public final class HostStore: ObservableObject {
 
     return session
   }
+
+  private func startRealtimePolling() {
+    pollingTask?.cancel()
+    pollingTask = Task { [weak self] in
+      while Task.isCancelled == false {
+        try? await Task.sleep(nanoseconds: 250_000_000)
+        await self?.reload()
+
+        let state = self?.snapshot?.dictationState
+        if state != .listening {
+          break
+        }
+      }
+    }
+  }
 }
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Stores/VoxitSettingsStore.swift b/native/macos-host/Sources/VoxitNativeHostKit/Stores/VoxitSettingsStore.swift
index 12b22d9..c41412f 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/Stores/VoxitSettingsStore.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Stores/VoxitSettingsStore.swift
@@ -16,6 +16,10 @@ final class VoxitSettingsStore: ObservableObject {
     static let rewriteAfterTranscription = "rewriteAfterTranscription"
     static let authRoute = "authRoute"
     static let audioInput = "audioInput"
+    static let realtimeModel = "realtimeModel"
+    static let realtimeTranscriptionModel = "realtimeTranscriptionModel"
+    static let finalizeModel = "finalizeModel"
+    static let rewriteModel = "rewriteModel"
   }
 
   private let defaults: UserDefaults
@@ -43,7 +47,16 @@ final class VoxitSettingsStore: ObservableObject {
         ?? baseSettings.authRoute,
       audioInput: VoxitAudioInputPreference(
         rawValue: defaults.string(forKey: DefaultsKey.audioInput) ?? "")
-        ?? baseSettings.audioInput
+        ?? baseSettings.audioInput,
+      realtimeModel: defaults.string(forKey: DefaultsKey.realtimeModel)
+        ?? baseSettings.realtimeModel,
+      realtimeTranscriptionModel: defaults.string(
+        forKey: DefaultsKey.realtimeTranscriptionModel)
+        ?? baseSettings.realtimeTranscriptionModel,
+      finalizeModel: defaults.string(forKey: DefaultsKey.finalizeModel)
+        ?? baseSettings.finalizeModel,
+      rewriteModel: defaults.string(forKey: DefaultsKey.rewriteModel)
+        ?? baseSettings.rewriteModel
     )
     self.settings = settings.sanitized()
     Self.persist(self.settings, into: defaults)
@@ -71,6 +84,13 @@ final class VoxitSettingsStore: ObservableObject {
     defaults.set(settings.rewriteAfterTranscription, forKey: DefaultsKey.rewriteAfterTranscription)
     defaults.set(settings.authRoute.rawValue, forKey: DefaultsKey.authRoute)
     defaults.set(settings.audioInput.rawValue, forKey: DefaultsKey.audioInput)
+    defaults.set(settings.realtimeModel, forKey: DefaultsKey.realtimeModel)
+    defaults.set(
+      settings.realtimeTranscriptionModel,
+      forKey: DefaultsKey.realtimeTranscriptionModel
+    )
+    defaults.set(settings.finalizeModel, forKey: DefaultsKey.finalizeModel)
+    defaults.set(settings.rewriteModel, forKey: DefaultsKey.rewriteModel)
   }
 }
 
@@ -82,6 +102,10 @@ struct VoxitSettings: Equatable {
   var rewriteAfterTranscription: Bool
   var authRoute: VoxitAuthRoutePreference
   var audioInput: VoxitAudioInputPreference
+  var realtimeModel: String
+  var realtimeTranscriptionModel: String
+  var finalizeModel: String
+  var rewriteModel: String
 
   static var defaults: Self {
     Self(
@@ -91,7 +115,11 @@ struct VoxitSettings: Equatable {
       pasteAfterTranscription: true,
       rewriteAfterTranscription: true,
       authRoute: .chatGPTDeviceCode,
-      audioInput: .systemDefault
+      audioInput: .systemDefault,
+      realtimeModel: "gpt-realtime-2",
+      realtimeTranscriptionModel: "gpt-4o-mini-transcribe",
+      finalizeModel: "gpt-4o-transcribe",
+      rewriteModel: "gpt-5.2-mini"
     )
   }
 
@@ -104,6 +132,22 @@ struct VoxitSettings: Equatable {
     copy.dictationHotkey =
       Self.dictationHotkeyPresentation(for: copy.dictationHotkey)
       .displayTitle
+    copy.realtimeModel = Self.sanitizedModelID(
+      copy.realtimeModel,
+      fallback: Self.defaults.realtimeModel
+    )
+    copy.realtimeTranscriptionModel = Self.sanitizedModelID(
+      copy.realtimeTranscriptionModel,
+      fallback: Self.defaults.realtimeTranscriptionModel
+    )
+    copy.finalizeModel = Self.sanitizedModelID(
+      copy.finalizeModel,
+      fallback: Self.defaults.finalizeModel
+    )
+    copy.rewriteModel = Self.sanitizedModelID(
+      copy.rewriteModel,
+      fallback: Self.defaults.rewriteModel
+    )
     return copy
   }
 
@@ -192,6 +236,12 @@ struct VoxitSettings: Equatable {
       .map { $0.trimmingCharacters(in: .whitespacesAndNewlines) }
       .filter { $0.isEmpty == false }
   }
+
+  private static func sanitizedModelID(_ raw: String, fallback: String) -> String {
+    let modelID = raw.trimmingCharacters(in: .whitespacesAndNewlines)
+
+    return modelID.isEmpty ? fallback : modelID
+  }
 }
 
 struct VoxitHotkeyPresentation: Equatable {
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Support/GlobalHotkeyMonitor.swift b/native/macos-host/Sources/VoxitNativeHostKit/Support/GlobalHotkeyMonitor.swift
new file mode 100644
index 0000000..b304aad
--- /dev/null
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Support/GlobalHotkeyMonitor.swift
@@ -0,0 +1,118 @@
+import AppKit
+
+@MainActor
+final class GlobalHotkeyMonitor: ObservableObject {
+  private enum Phase: Sendable {
+    case down
+    case up
+  }
+
+  private struct EventPayload: Sendable {
+    let characters: String
+    let modifierRawValue: UInt
+    let phase: Phase
+  }
+
+  private static let relevantModifiers: NSEvent.ModifierFlags = [
+    .command, .control, .option, .shift,
+  ]
+
+  private var globalKeyDownMonitor: Any?
+  private var globalKeyUpMonitor: Any?
+  private var localKeyDownMonitor: Any?
+  private var localKeyUpMonitor: Any?
+  private var presentation = VoxitSettings.defaults.dictationHotkeyPresentation
+  private var hotkeyMode = VoxitHotkeyModePreference.toggle
+  private var isPressed = false
+  private var keyDownHandler: (() -> Void)?
+  private var keyUpHandler: (() -> Void)?
+
+  init() {
+    installMonitors()
+  }
+
+  func configure(
+    settings: VoxitSettings,
+    keyDown: @escaping () -> Void,
+    keyUp: @escaping () -> Void
+  ) {
+    presentation = settings.dictationHotkeyPresentation
+    hotkeyMode = settings.hotkeyMode
+    keyDownHandler = keyDown
+    keyUpHandler = keyUp
+  }
+
+  private func installMonitors() {
+    globalKeyDownMonitor = NSEvent.addGlobalMonitorForEvents(matching: .keyDown) {
+      [weak self] event in
+      Self.enqueue(event: event, phase: .down, target: self)
+    }
+    globalKeyUpMonitor = NSEvent.addGlobalMonitorForEvents(matching: .keyUp) { [weak self] event in
+      Self.enqueue(event: event, phase: .up, target: self)
+    }
+    localKeyDownMonitor = NSEvent.addLocalMonitorForEvents(matching: .keyDown) {
+      [weak self] event in
+      Self.enqueue(event: event, phase: .down, target: self)
+      return event
+    }
+    localKeyUpMonitor = NSEvent.addLocalMonitorForEvents(matching: .keyUp) { [weak self] event in
+      Self.enqueue(event: event, phase: .up, target: self)
+      return event
+    }
+  }
+
+  private func handle(_ payload: EventPayload) {
+    guard matchesHotkey(payload) else {
+      return
+    }
+
+    switch payload.phase {
+    case .down:
+      guard isPressed == false else {
+        return
+      }
+      isPressed = true
+      keyDownHandler?()
+    case .up:
+      guard isPressed else {
+        return
+      }
+      isPressed = false
+      if hotkeyMode == .hold {
+        keyUpHandler?()
+      }
+    }
+  }
+
+  private func matchesHotkey(_ payload: EventPayload) -> Bool {
+    let modifiers = NSEvent.ModifierFlags(rawValue: payload.modifierRawValue)
+      .intersection(Self.relevantModifiers)
+    let expectedModifiers = presentation.modifierMask.intersection(Self.relevantModifiers)
+
+    guard modifiers == expectedModifiers else {
+      return false
+    }
+
+    return normalizedKey(payload.characters) == normalizedKey(presentation.keyEquivalent)
+  }
+
+  private func normalizedKey(_ value: String) -> String {
+    if value == " " {
+      return "space"
+    }
+
+    return value.lowercased()
+  }
+
+  private static func enqueue(event: NSEvent, phase: Phase, target: GlobalHotkeyMonitor?) {
+    let payload = EventPayload(
+      characters: event.charactersIgnoringModifiers ?? "",
+      modifierRawValue: event.modifierFlags.rawValue,
+      phase: phase
+    )
+
+    Task { @MainActor in
+      target?.handle(payload)
+    }
+  }
+}
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Support/Labels.swift b/native/macos-host/Sources/VoxitNativeHostKit/Support/Labels.swift
index 0b5d148..4f18542 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/Support/Labels.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Support/Labels.swift
@@ -1,3 +1,4 @@
+import Foundation
 import VoxitHostBridge
 
 extension AuthMethod {
@@ -135,4 +136,20 @@ extension HostSnapshot {
     }
     return "No Runs"
   }
+
+  var pass1TranscriptPreview: String? {
+    let committed = pass1CommittedTranscript?.trimmingCharacters(in: .whitespacesAndNewlines) ?? ""
+    let draft = pass1DraftTranscript?.trimmingCharacters(in: .whitespacesAndNewlines) ?? ""
+
+    switch (committed.isEmpty, draft.isEmpty) {
+    case (false, false):
+      return "\(committed) \(draft)"
+    case (false, true):
+      return committed
+    case (true, false):
+      return draft
+    case (true, true):
+      return nil
+    }
+  }
 }
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Support/RecordingHUDWindowController.swift b/native/macos-host/Sources/VoxitNativeHostKit/Support/RecordingHUDWindowController.swift
new file mode 100644
index 0000000..6cf753c
--- /dev/null
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Support/RecordingHUDWindowController.swift
@@ -0,0 +1,57 @@
+import AppKit
+import SwiftUI
+
+@MainActor
+final class RecordingHUDWindowController: NSWindowController, NSWindowDelegate {
+  private let store: HostStore
+
+  init(store: HostStore) {
+    self.store = store
+
+    let contentRect = NSRect(x: 0, y: 0, width: 380, height: 220)
+    let panel = NSPanel(
+      contentRect: contentRect,
+      styleMask: [.titled, .closable, .hudWindow, .nonactivatingPanel, .fullSizeContentView],
+      backing: .buffered,
+      defer: false
+    )
+    panel.title = "Voxit Recording"
+    panel.titleVisibility = .hidden
+    panel.titlebarAppearsTransparent = true
+    panel.isReleasedWhenClosed = false
+    panel.hidesOnDeactivate = false
+    panel.level = .floating
+    panel.collectionBehavior = [.canJoinAllSpaces, .moveToActiveSpace, .transient]
+
+    super.init(window: panel)
+
+    panel.delegate = self
+    panel.contentViewController = NSHostingController(rootView: RecordingHUDView(store: store))
+  }
+
+  @available(*, unavailable)
+  required init?(coder: NSCoder) {
+    fatalError("init(coder:) has not been implemented")
+  }
+
+  func present() {
+    guard let window else {
+      return
+    }
+
+    positionNearTopTrailing(window)
+    showWindow(nil)
+    window.orderFrontRegardless()
+  }
+
+  private func positionNearTopTrailing(_ window: NSWindow) {
+    let visibleFrame = NSScreen.main?.visibleFrame ?? NSRect(x: 0, y: 0, width: 1_280, height: 720)
+    let frame = window.frame
+    let origin = NSPoint(
+      x: visibleFrame.maxX - frame.width - 24,
+      y: visibleFrame.maxY - frame.height - 24
+    )
+
+    window.setFrameOrigin(origin)
+  }
+}
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Views/DetailView.swift b/native/macos-host/Sources/VoxitNativeHostKit/Views/DetailView.swift
index 750003d..22a1d9b 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/Views/DetailView.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Views/DetailView.swift
@@ -136,6 +136,9 @@ private struct ActivityDetail: View {
     if let rawTranscript = snapshot?.rawTranscript {
       TranscriptPreview(title: "Raw Transcript", text: rawTranscript)
     }
+    if let pass1Transcript = snapshot?.pass1TranscriptPreview {
+      TranscriptPreview(title: "Realtime Draft", text: pass1Transcript)
+    }
   }
 }
 
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Views/RecordingHUDView.swift b/native/macos-host/Sources/VoxitNativeHostKit/Views/RecordingHUDView.swift
index de41c80..2f8578c 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/Views/RecordingHUDView.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Views/RecordingHUDView.swift
@@ -55,6 +55,9 @@ struct RecordingHUDView: View {
     if let rawTranscript = store.snapshot?.rawTranscript {
       return rawTranscript
     }
+    if let pass1Transcript = store.snapshot?.pass1TranscriptPreview {
+      return pass1Transcript
+    }
     if let error = store.snapshot?.lastError {
       return error
     }
diff --git a/native/macos-host/Sources/VoxitNativeHostKit/Views/VoxitSettingsView.swift b/native/macos-host/Sources/VoxitNativeHostKit/Views/VoxitSettingsView.swift
index 11b8bed..d4a9462 100644
--- a/native/macos-host/Sources/VoxitNativeHostKit/Views/VoxitSettingsView.swift
+++ b/native/macos-host/Sources/VoxitNativeHostKit/Views/VoxitSettingsView.swift
@@ -3,8 +3,8 @@ import SwiftUI
 
 enum VoxitSettingsWindowMetrics {
   static let width: CGFloat = 620
-  static let minHeight: CGFloat = 336
-  static let idealHeight: CGFloat = 396
+  static let minHeight: CGFloat = 420
+  static let idealHeight: CGFloat = 520
   static let cornerRadius: CGFloat = 18
 }
 
@@ -40,6 +40,10 @@ final class VoxitSettingsViewModel: ObservableObject {
     openPrivacySettings(query: "Privacy_Accessibility")
   }
 
+  func openInputMonitoringSettings() {
+    openPrivacySettings(query: "Privacy_ListenEvent")
+  }
+
   private func openPrivacySettings(query: String) {
     let modernURLString =
       "x-apple.systempreferences:com.apple.settings.PrivacySecurity.extension?\(query)"
@@ -88,6 +92,7 @@ struct VoxitSettingsView: View {
 private enum VoxitSettingsSection: String, CaseIterable, Identifiable {
   case general
   case dictation
+  case models
   case audio
   case permissions
   case about
@@ -100,6 +105,8 @@ private enum VoxitSettingsSection: String, CaseIterable, Identifiable {
       return "General"
     case .dictation:
       return "Dictation"
+    case .models:
+      return "Models"
     case .audio:
       return "Audio"
     case .permissions:
@@ -115,6 +122,8 @@ private enum VoxitSettingsSection: String, CaseIterable, Identifiable {
       return "Startup"
     case .dictation:
       return "Shortcut"
+    case .models:
+      return "OpenAI"
     case .audio:
       return "Input"
     case .permissions:
@@ -130,6 +139,8 @@ private enum VoxitSettingsSection: String, CaseIterable, Identifiable {
       return "switch.2"
     case .dictation:
       return "waveform"
+    case .models:
+      return "cpu"
     case .audio:
       return "mic"
     case .permissions:
@@ -141,7 +152,7 @@ private enum VoxitSettingsSection: String, CaseIterable, Identifiable {
 
   var allowsRestoreDefaults: Bool {
     switch self {
-    case .general, .dictation, .audio:
+    case .general, .dictation, .models, .audio:
       return true
     case .permissions, .about:
       return false
@@ -259,6 +270,8 @@ private struct SettingsDashboard: View {
           GeneralSettingsPane(model: model)
         case .dictation:
           DictationSettingsPane(model: model)
+        case .models:
+          ModelsSettingsPane(model: model)
         case .audio:
           AudioSettingsPane(model: model)
         case .permissions:
@@ -360,6 +373,130 @@ private struct DictationSettingsPane: View {
   }
 }
 
+private struct ModelsSettingsPane: View {
+  @ObservedObject var model: VoxitSettingsViewModel
+
+  var body: some View {
+    SettingsPanel {
+      ModelSettingRow(
+        title: "Realtime voice",
+        presets: ["gpt-realtime-2"],
+        modelID: modelBinding(\.realtimeModel)
+      )
+      ModelSettingRow(
+        title: "Realtime text",
+        presets: ["gpt-4o-mini-transcribe", "gpt-4o-transcribe"],
+        modelID: modelBinding(\.realtimeTranscriptionModel)
+      )
+      ModelSettingRow(
+        title: "Finalize",
+        presets: ["gpt-4o-transcribe", "gpt-4o-mini-transcribe"],
+        modelID: modelBinding(\.finalizeModel)
+      )
+      ModelSettingRow(
+        title: "Rewrite",
+        presets: ["gpt-5.2-mini", "gpt-5.5", "gpt-5.4", "gpt-5.4-mini"],
+        modelID: modelBinding(\.rewriteModel)
+      )
+    }
+  }
+
+  private func modelBinding(_ keyPath: WritableKeyPath<VoxitSettings, String>) -> Binding<String> {
+    Binding(
+      get: { model.settings[keyPath: keyPath] },
+      set: { value in
+        model.update { $0[keyPath: keyPath] = value }
+      }
+    )
+  }
+}
+
+private struct ModelSettingRow: View {
+  private static let customPresetTag = "__voxit_custom_model__"
+
+  let title: String
+  let presets: [String]
+  @Binding var modelID: String
+  @State private var draftModelID: String
+
+  init(title: String, presets: [String], modelID: Binding<String>) {
+    self.title = title
+    self.presets = presets
+    self._modelID = modelID
+    self._draftModelID = State(initialValue: modelID.wrappedValue)
+  }
+
+  var body: some View {
+    VStack(alignment: .leading, spacing: 6) {
+      HStack(alignment: .firstTextBaseline, spacing: 8) {
+        Text(title)
+          .frame(width: 116, alignment: .leading)
+        Picker("", selection: presetBinding) {
+          ForEach(presets, id: \.self) { preset in
+            Text(preset).tag(preset)
+          }
+          Text("Custom").tag(Self.customPresetTag)
+        }
+        .labelsHidden()
+        .pickerStyle(.menu)
+        .frame(width: 210, alignment: .leading)
+      }
+
+      HStack(spacing: 6) {
+        TextField("Model ID", text: $draftModelID)
+          .textFieldStyle(.roundedBorder)
+          .onSubmit(commitDraft)
+        Button("Apply", action: commitDraft)
+          .disabled(canApplyDraft == false)
+      }
+      .padding(.leading, 124)
+    }
+    .onChange(of: modelID) { _, newValue in
+      if draftModelID != newValue {
+        draftModelID = newValue
+      }
+    }
+  }
+
+  private var presetBinding: Binding<String> {
+    Binding(
+      get: {
+        presets.contains(modelID) ? modelID : Self.customPresetTag
+      },
+      set: { value in
+        guard value != Self.customPresetTag else {
+          return
+        }
+        draftModelID = value
+        modelID = value
+      }
+    )
+  }
+
+  private var canApplyDraft: Bool {
+    let sanitized = sanitizedDraftModelID
+
+    return sanitized.isEmpty == false && sanitized != modelID
+  }
+
+  private var sanitizedDraftModelID: String {
+    draftModelID.trimmingCharacters(in: .whitespacesAndNewlines)
+  }
+
+  private func commitDraft() {
+    let sanitized = sanitizedDraftModelID
+
+    guard sanitized.isEmpty == false else {
+      draftModelID = modelID
+
+      return
+    }
+
+    draftModelID = sanitized
+    modelID = sanitized
+  }
+}
+
 private struct AudioSettingsPane: View {
   @ObservedObject var model: VoxitSettingsViewModel
 
@@ -405,6 +542,13 @@ private struct PermissionsSettingsPane: View {
           model.openAccessibilitySettings()
         }
       }
+
+      HStack {
+        LabeledContent("Input Monitoring", value: "Shortcut")
+        Button("Open") {
+          model.openInputMonitoringSettings()
+        }
+      }
     }
   }
 }
diff --git a/packages/voxit-audio/src/lib.rs b/packages/voxit-audio/src/lib.rs
index d084ee3..93f2c29 100644
--- a/packages/voxit-audio/src/lib.rs
+++ b/packages/voxit-audio/src/lib.rs
@@ -100,6 +100,8 @@ impl Recorder {
 	pub fn start_with_stream(
 		stream_tx: Option<SyncSender<AudioChunk>>,
 		selection: &InputDeviceSelection,
+		target_sample_rate_hz: u32,
+		target_channels: u16,
 	) -> Result<Self, String> {
 		let use_voice_processing = selection.requested_device_id.is_none();
 		let io_type =
@@ -121,11 +123,15 @@ impl Recorder {
 		let input_format =
 			audio_unit.input_stream_format().map_err(|err: Error| err.to_string())?;
 		let _ = audio_unit.uninitialize();
-		let (sample_rate, channels) = configure_input_format(
-			&mut audio_unit,
-			input_format.sample_rate,
-			input_format.channels,
-		)?;
+		let requested_sample_rate = if target_sample_rate_hz == 0 {
+			input_format.sample_rate
+		} else {
+			target_sample_rate_hz as f64
+		};
+		let requested_channels =
+			if target_channels == 0 { input_format.channels } else { u32::from(target_channels) };
+		let (sample_rate, channels) =
+			configure_input_format(&mut audio_unit, requested_sample_rate, requested_channels)?;
 		let recording = Arc::new(Mutex::new(Vec::<i16>::new()));
 		let recording_cb = Arc::clone(&recording);
 		let callback_tx = stream_tx.clone();
@@ -265,10 +271,13 @@ pub struct InputDeviceSelection {
 pub fn start_recording_with_stream(
 	chunk_capacity: usize,
 	preferred_device_id: Option<u32>,
+	target_sample_rate_hz: u32,
+	target_channels: u16,
 ) -> Result<(Recorder, AudioChunkReceiver, InputDeviceSelection), String> {
 	let (tx, rx) = mpsc::sync_channel(chunk_capacity);
 	let selection = resolve_input_device(preferred_device_id)?;
-	let recorder = Recorder::start_with_stream(Some(tx), &selection)?;
+	let recorder =
+		Recorder::start_with_stream(Some(tx), &selection, target_sample_rate_hz, target_channels)?;
 
 	Ok((recorder, rx, selection))
 }
@@ -313,9 +322,13 @@ pub fn list_input_devices() -> Result<Vec<InputDevice>, String> {
 pub fn start_recording_with_stream(
 	_chnk_capacity: usize,
 	_preferred_device_id: Option<u32>,
+	_target_sample_rate_hz: u32,
+	_target_channels: u16,
 ) -> Result<(Recorder, AudioChunkReceiver, InputDeviceSelection), String> {
 	let _ = _chnk_capacity;
 	let _ = _preferred_device_id;
+	let _ = _target_sample_rate_hz;
+	let _ = _target_channels;
 
 	Err("recording is only supported on macOS in this build".to_string())
 }
diff --git a/packages/voxit-core/src/config.rs b/packages/voxit-core/src/config.rs
index a191b4e..ff551b2 100644
--- a/packages/voxit-core/src/config.rs
+++ b/packages/voxit-core/src/config.rs
@@ -84,7 +84,7 @@ impl Default for OpenAiConfig {
 	fn default() -> Self {
 		Self {
 			api_base_url: "https://api.openai.com/v1".to_string(),
-			realtime_model: "gpt-4o-mini-transcribe".to_string(),
+			realtime_model: "gpt-realtime-2".to_string(),
 			finalize_model: "gpt-4o-transcribe".to_string(),
 			rewrite_model: "gpt-5.2-mini".to_string(),
 			language: "en".to_string(),
@@ -98,10 +98,15 @@ impl Default for OpenAiConfig {
 pub struct OpenAiRealtimeConfig {
 	/// Optional noise reduction profile.
 	pub noise_reduction: String,
+	/// Input-audio transcription model used for realtime Pass1 transcript events.
+	pub transcription_model: String,
 }
 impl Default for OpenAiRealtimeConfig {
 	fn default() -> Self {
-		Self { noise_reduction: "near_field".to_string() }
+		Self {
+			noise_reduction: "near_field".to_string(),
+			transcription_model: "gpt-4o-mini-transcribe".to_string(),
+		}
 	}
 }
 
@@ -410,6 +415,13 @@ fn apply_openai_config(
 				config.openai.realtime.noise_reduction = v.to_string();
 			}
 		},
+		([openai_section, realtime_section], "transcription_model")
+			if openai_section == "openai" && realtime_section == "realtime" =>
+		{
+			if let Some(v) = value.str.clone() {
+				config.openai.realtime.transcription_model = v;
+			}
+		},
 		_ => return false,
 	}
 
@@ -535,8 +547,11 @@ fn serialize_toml(config: &Config) -> String {
 	output.push_str(&format!("rewrite_model = \"{}\"\n", config.openai.rewrite_model));
 	output.push_str(&format!("language = \"{}\"\n\n", config.openai.language));
 	output.push_str("[openai.realtime]\n");
-	output
-		.push_str(&format!("noise_reduction = \"{}\"\n\n", config.openai.realtime.noise_reduction));
+	output.push_str(&format!("noise_reduction = \"{}\"\n", config.openai.realtime.noise_reduction));
+	output.push_str(&format!(
+		"transcription_model = \"{}\"\n\n",
+		config.openai.realtime.transcription_model
+	));
 	output.push_str("[rewrite]\n");
 	output.push_str(&format!("enabled = {}\n", config.rewrite.enabled));
 	output.push_str(&format!("auto = {}\n", config.rewrite.auto));
@@ -576,13 +591,14 @@ mode = "hold"
 
 [openai]
 api_base_url = "https://api.openai.com/v1"
-realtime_model = "gpt-4o-mini-transcribe"
+realtime_model = "gpt-realtime-2"
 finalize_model = "gpt-4o-transcribe"
 rewrite_model = "gpt-5.2-mini"
 language = "en"
 
 [openai.realtime]
 noise_reduction = "near_field"
+transcription_model = "gpt-4o-mini-transcribe"
 
 [rewrite]
 enabled = false
@@ -605,6 +621,7 @@ method = "clipboard_cmd_v"
 		assert_eq!(parsed.audio.input_device_name, "USB Mic");
 		assert_eq!(parsed.audio.input_device_id, 123);
 		assert_eq!(parsed.openai.realtime.noise_reduction, "near_field");
+		assert_eq!(parsed.openai.realtime.transcription_model, "gpt-4o-mini-transcribe");
 	}
 
 	#[test]
@@ -617,5 +634,6 @@ method = "clipboard_cmd_v"
 		assert_eq!(parsed.paste.method, "clipboard_cmd_v");
 		assert_eq!(parsed.audio.input_device_id, 0);
 		assert!(parsed.audio.input_device_name.is_empty());
+		assert_eq!(parsed.openai.realtime_model, "gpt-realtime-2");
 	}
 }
diff --git a/packages/voxit-core/src/realtime.rs b/packages/voxit-core/src/realtime.rs
index 01c0a11..bb53e3a 100644
--- a/packages/voxit-core/src/realtime.rs
+++ b/packages/voxit-core/src/realtime.rs
@@ -27,18 +27,30 @@ pub const REALTIME_ENDPOINT: &str = "wss://api.openai.com/v1/realtime";
 pub struct RealtimeSessionConfig {
 	/// API model id.
 	pub model: String,
+	/// Input-audio transcription model id.
+	pub transcription_model: String,
+	/// Input language hint for realtime transcription.
+	pub language: String,
 	/// Input sample rate expected by OpenAI (`24000` by plan).
 	pub sample_rate_hz: u32,
 	/// `near_field` | `far_field` | `off`.
 	pub noise_reduction: String,
+	/// Session instructions for contextual voice behavior.
+	pub instructions: String,
+	/// Realtime reasoning effort for models that support it.
+	pub reasoning_effort: String,
 }
 impl Default for RealtimeSessionConfig {
 	/// Default session configuration for English pass1 streaming.
 	fn default() -> Self {
 		Self {
-			model: "gpt-4o-mini-transcribe".to_string(),
+			model: "gpt-realtime-2".to_string(),
+			transcription_model: "gpt-4o-mini-transcribe".to_string(),
+			language: "en".to_string(),
 			sample_rate_hz: 24_000,
 			noise_reduction: "near_field".to_string(),
+			instructions: "Transcribe the user's dictation as text for the target app.".to_string(),
+			reasoning_effort: "minimal".to_string(),
 		}
 	}
 }
@@ -139,8 +151,13 @@ fn start_realtime_session_impl(
 	event_tx: Sender<RealtimeEvent>,
 ) -> Result<RealtimeSession, RealtimeError> {
 	let (stop_tx, stop_rx) = mpsc::channel::<()>();
+	let worker_event_tx = event_tx.clone();
 	let worker = thread::spawn(move || {
-		let _ = run_realtime_worker(api_key, account_id, config, chunk_rx, event_tx, stop_rx);
+		if let Err(err) =
+			run_realtime_worker(api_key, account_id, config, chunk_rx, event_tx, stop_rx)
+		{
+			let _ = worker_event_tx.send(RealtimeEvent::StreamError(err.to_string()));
+		}
 	});
 
 	Ok(RealtimeSession { stop_tx: Some(stop_tx), worker: Some(worker) })
@@ -159,6 +176,7 @@ fn run_realtime_worker(
 		reason: format!("failed to create tokio runtime: {err}"),
 	})?;
 	let endpoint = format!("{REALTIME_ENDPOINT}?model={}", config.model);
+	let session_update = realtime_session_update(&config);
 
 	rt.block_on(async move {
 		let mut builder = Request::builder()
@@ -174,22 +192,6 @@ fn run_realtime_worker(
 		let request = builder.body(()).map_err(|err| RealtimeError::RuntimeError {
 			reason: format!("invalid realtime request: {err}"),
 		})?;
-		let session_update = serde_json::json!({
-				"type": "session.update",
-				"session": {
-					"audio": {
-						"input": {
-						"format": {
-							"type": "audio/pcm",
-							"rate": config.sample_rate_hz,
-						},
-						"noise_reduction": { "type": config.noise_reduction },
-						"transcription": { "model": config.model },
-						"turn_detection": { "type": "server_vad" },
-					},
-				},
-			}
-		});
 		let (mut ws, _) = tokio_tungstenite::connect_async(request).await.map_err(|err| {
 			RealtimeError::RuntimeError {
 				reason: format!("realtime websocket connect failed: {err}"),
@@ -271,6 +273,37 @@ fn run_realtime_worker(
 	Ok(())
 }
 
+fn realtime_session_update(config: &RealtimeSessionConfig) -> Value {
+	serde_json::json!({
+		"type": "session.update",
+		"session": {
+			"type": "realtime",
+			"instructions": config.instructions,
+			"output_modalities": ["text"],
+			"reasoning": {
+				"effort": config.reasoning_effort,
+			},
+			"audio": {
+				"input": {
+					"format": {
+						"type": "audio/pcm",
+						"rate": config.sample_rate_hz,
+					},
+					"noise_reduction": noise_reduction_payload(&config.noise_reduction),
+					"transcription": {
+						"model": config.transcription_model,
+						"language": config.language,
+					},
+					"turn_detection": {
+						"type": "server_vad",
+						"create_response": false,
+					},
+				},
+			},
+		}
+	})
+}
+
 fn chunk_to_base64(samples: &[i16]) -> String {
 	let mut bytes = Vec::with_capacity(samples.len() * 2);
 
@@ -281,6 +314,10 @@ fn chunk_to_base64(samples: &[i16]) -> String {
 	STANDARD.encode(bytes)
 }
 
+fn noise_reduction_payload(profile: &str) -> Value {
+	if profile == "off" { Value::Null } else { serde_json::json!({ "type": profile }) }
+}
+
 fn parse_realtime_frame(body: &str) -> Result<Option<ParsedFrame>, RealtimeError> {
 	let value: Value = serde_json::from_str(body).map_err(|err| RealtimeError::RuntimeError {
 		reason: format!("invalid realtime frame json: {err}"),
@@ -368,6 +405,14 @@ mod tests {
 		let config = RealtimeSessionConfig::default();
 
 		assert!(config.model.contains("gpt"));
+		assert_eq!(config.transcription_model, "gpt-4o-mini-transcribe");
+		assert_eq!(config.language, "en");
 		assert_eq!(config.sample_rate_hz, 24_000);
+		assert_eq!(config.reasoning_effort, "minimal");
+	}
+
+	#[test]
+	fn noise_reduction_off_maps_to_null() {
+		assert!(realtime::noise_reduction_payload("off").is_null());
 	}
 }
diff --git a/packages/voxit-host-ffi/include/voxit_host_ffi.h b/packages/voxit-host-ffi/include/voxit_host_ffi.h
index de2b6b8..dced076 100644
--- a/packages/voxit-host-ffi/include/voxit_host_ffi.h
+++ b/packages/voxit-host-ffi/include/voxit_host_ffi.h
@@ -7,7 +7,7 @@
 extern "C" {
 #endif
 
-#define VOXIT_HOST_FFI_ABI_VERSION 4u
+#define VOXIT_HOST_FFI_ABI_VERSION 6u
 
 typedef struct VoxitHostSessionHandle VoxitHostSessionHandle;
 
@@ -86,6 +86,8 @@ typedef enum VoxitHostStringField {
 	VOXIT_HOST_STRING_RAW_TRANSCRIPT = 7,
 	VOXIT_HOST_STRING_FINAL_OUTPUT = 8,
 	VOXIT_HOST_STRING_LAST_ERROR = 9,
+	VOXIT_HOST_STRING_PASS1_COMMITTED_TRANSCRIPT = 10,
+	VOXIT_HOST_STRING_PASS1_DRAFT_TRANSCRIPT = 11,
 } VoxitHostStringField;
 
 typedef struct VoxitHostConfig {
@@ -111,6 +113,8 @@ typedef struct VoxitHostSnapshot {
 	uint8_t has_focused_context;
 	uint8_t selected_text_present;
 	uint8_t has_raw_transcript;
+	uint8_t has_pass1_committed_transcript;
+	uint8_t has_pass1_draft_transcript;
 	uint8_t has_final_output;
 	uint8_t has_error;
 	uint64_t recording_duration_ms;
@@ -132,6 +136,13 @@ enum VoxitStatus voxit_host_session_save_preferences(
 	struct VoxitHostPreferences preferences,
 	const char *hotkey_chord
 );
+enum VoxitStatus voxit_host_session_save_model_preferences(
+	VoxitHostSessionHandle *handle,
+	const char *realtime_model,
+	const char *realtime_transcription_model,
+	const char *finalize_model,
+	const char *rewrite_model
+);
 enum VoxitStatus voxit_host_session_set_profile_override(
 	VoxitHostSessionHandle *handle,
 	enum VoxitPromptProfileKind profile_kind
diff --git a/packages/voxit-host-ffi/src/lib.rs b/packages/voxit-host-ffi/src/lib.rs
index 3ee054d..1bcf8a0 100644
--- a/packages/voxit-host-ffi/src/lib.rs
+++ b/packages/voxit-host-ffi/src/lib.rs
@@ -4,16 +4,19 @@
 //! This gives the Swift host a stable Rust-owned model without moving audio, auth, or
 //! inference orchestration across FFI before those boundaries are ready.
 
+#[cfg(target_os = "macos")] use std::sync::mpsc;
 use std::{
 	ffi::{CStr, c_char},
 	ptr::{self, NonNull},
+	sync::mpsc::{Receiver, TryRecvError},
 };
 
 #[cfg(target_os = "macos")] use voxit_audio::Recorder;
+#[cfg(target_os = "macos")] use voxit_core::RealtimeSessionConfig;
 #[cfg(target_os = "macos")] use voxit_core::RewriteSettings;
 use voxit_core::{
 	self, Config, ContextualVoiceRouter, FocusedAppContext, NativeHostSnapshot, PlatformHost,
-	VoiceSessionPlan,
+	RealtimeEvent, RealtimeSession, TranscriptAssembler, VoiceSessionPlan,
 	contextual::{
 		PromptProfileKind, VoiceInteractionTier, VoiceOutputPolicy, VoiceReasoningEffort,
 	},
@@ -22,7 +25,7 @@ use voxit_core::{
 #[cfg(target_os = "macos")] use voxit_macos::TargetApp;
 
 /// ABI version exported by the thin C host bridge.
-pub const VOXIT_HOST_FFI_ABI_VERSION: u32 = 4;
+pub const VOXIT_HOST_FFI_ABI_VERSION: u32 = 6;
 
 /// Opaque session handle owned by the native host through the C ABI.
 pub struct VoxitHostSessionHandle {
@@ -32,10 +35,15 @@ pub struct VoxitHostSessionHandle {
 	profile_override: Option<PromptProfileKind>,
 	voice_plan: VoiceSessionPlan,
 	glossary_terms: String,
+	transcript_assembler: TranscriptAssembler,
+	pass1_committed_transcript: String,
+	pass1_draft_transcript: String,
 	last_raw_transcript: String,
 	last_final_output: String,
 	last_error: String,
 	recording_duration_ms: u64,
+	realtime_session: Option<RealtimeSession>,
+	realtime_event_rx: Option<Receiver<RealtimeEvent>>,
 	#[cfg(target_os = "macos")]
 	recorder: Option<Recorder>,
 	#[cfg(target_os = "macos")]
@@ -194,6 +202,10 @@ pub enum VoxitHostStringField {
 	FinalOutput = 8,
 	/// Latest user-actionable error.
 	LastError = 9,
+	/// Latest committed realtime Pass1 transcript.
+	Pass1CommittedTranscript = 10,
+	/// Latest in-flight realtime Pass1 draft transcript.
+	Pass1DraftTranscript = 11,
 }
 
 /// FFI-safe session configuration.
@@ -244,6 +256,10 @@ pub struct VoxitHostSnapshot {
 	pub selected_text_present: u8,
 	/// Non-zero when a raw Pass2 transcript is available.
 	pub has_raw_transcript: u8,
+	/// Non-zero when realtime Pass1 committed transcript text is available.
+	pub has_pass1_committed_transcript: u8,
+	/// Non-zero when realtime Pass1 draft transcript text is available.
+	pub has_pass1_draft_transcript: u8,
 	/// Non-zero when a final output is available.
 	pub has_final_output: u8,
 	/// Non-zero when the last command failed or produced a warning.
@@ -273,6 +289,8 @@ impl Default for VoxitHostSnapshot {
 			has_focused_context: 0,
 			selected_text_present: 0,
 			has_raw_transcript: 0,
+			has_pass1_committed_transcript: 0,
+			has_pass1_draft_transcript: 0,
 			has_final_output: 0,
 			has_error: 0,
 			recording_duration_ms: 0,
@@ -311,10 +329,15 @@ pub extern "C" fn voxit_host_session_create(
 		profile_override: None,
 		voice_plan,
 		glossary_terms: String::new(),
+		transcript_assembler: TranscriptAssembler::new(),
+		pass1_committed_transcript: String::new(),
+		pass1_draft_transcript: String::new(),
 		last_raw_transcript: String::new(),
 		last_final_output: String::new(),
 		last_error: String::new(),
 		recording_duration_ms: 0,
+		realtime_session: None,
+		realtime_event_rx: None,
 		#[cfg(target_os = "macos")]
 		recorder: None,
 		#[cfg(target_os = "macos")]
@@ -331,7 +354,9 @@ pub extern "C" fn voxit_host_session_create(
 #[unsafe(no_mangle)]
 pub unsafe extern "C" fn voxit_host_session_destroy(handle: *mut VoxitHostSessionHandle) {
 	if let Some(handle) = NonNull::new(handle) {
-		unsafe { drop(Box::from_raw(handle.as_ptr())) };
+		let mut handle = unsafe { Box::from_raw(handle.as_ptr()) };
+
+		stop_realtime_preview(&mut handle);
 	}
 }
 
@@ -346,13 +371,16 @@ pub unsafe extern "C" fn voxit_host_session_copy_snapshot(
 	handle: *mut VoxitHostSessionHandle,
 	out: *mut VoxitHostSnapshot,
 ) -> VoxitStatus {
-	let Some(handle) = NonNull::new(handle) else {
+	let Some(mut handle) = NonNull::new(handle) else {
 		return VoxitStatus::NullHandle;
 	};
 	let Some(out) = NonNull::new(out) else {
 		return VoxitStatus::NullOutput;
 	};
-	let handle_ref = unsafe { handle.as_ref() };
+	let handle_ref = unsafe { handle.as_mut() };
+
+	drain_realtime_events(handle_ref);
+
 	let snapshot = &handle_ref.snapshot;
 	let focused_context = &handle_ref.focused_context;
 	let voice_plan = &handle_ref.voice_plan;
@@ -469,6 +497,54 @@ pub unsafe extern "C" fn voxit_host_session_save_preferences(
 	save_preferences(handle, preferences, hotkey_chord)
 }
 
+/// Saves OpenAI model preferences through the Rust-owned config file.
+///
+/// # Safety
+///
+/// `handle` must be a valid pointer returned by [`voxit_host_session_create`]. Model
+/// pointers must point to null-terminated UTF-8 strings.
+#[unsafe(no_mangle)]
+pub unsafe extern "C" fn voxit_host_session_save_model_preferences(
+	handle: *mut VoxitHostSessionHandle,
+	realtime_model: *const c_char,
+	realtime_transcription_model: *const c_char,
+	finalize_model: *const c_char,
+	rewrite_model: *const c_char,
+) -> VoxitStatus {
+	let Some(mut handle) = NonNull::new(handle) else {
+		return VoxitStatus::NullHandle;
+	};
+	let handle = unsafe { handle.as_mut() };
+	let realtime_model = match read_required_c_string(handle, realtime_model, "realtime model") {
+		Ok(value) => value,
+		Err(status) => return status,
+	};
+	let realtime_transcription_model = match read_required_c_string(
+		handle,
+		realtime_transcription_model,
+		"realtime transcription model",
+	) {
+		Ok(value) => value,
+		Err(status) => return status,
+	};
+	let finalize_model = match read_required_c_string(handle, finalize_model, "finalize model") {
+		Ok(value) => value,
+		Err(status) => return status,
+	};
+	let rewrite_model = match read_required_c_string(handle, rewrite_model, "rewrite model") {
+		Ok(value) => value,
+		Err(status) => return status,
+	};
+
+	save_model_preferences(
+		handle,
+		realtime_model,
+		realtime_transcription_model,
+		finalize_model,
+		rewrite_model,
+	)
+}
+
 /// Sets a manual prompt-profile override for the current host session.
 ///
 /// # Safety
@@ -555,7 +631,7 @@ pub unsafe extern "C" fn voxit_host_session_copy_string(
 	out: *mut c_char,
 	out_len: usize,
 ) -> VoxitStatus {
-	let Some(handle) = NonNull::new(handle) else {
+	let Some(mut handle) = NonNull::new(handle) else {
 		return VoxitStatus::NullHandle;
 	};
 	let Some(out) = NonNull::new(out) else {
@@ -566,7 +642,10 @@ pub unsafe extern "C" fn voxit_host_session_copy_string(
 		return VoxitStatus::InvalidInput;
 	}
 
-	let handle = unsafe { handle.as_ref() };
+	let handle = unsafe { handle.as_mut() };
+
+	drain_realtime_events(handle);
+
 	let value = string_field_value(handle, field);
 
 	write_c_string(out, out_len, value)
@@ -588,8 +667,31 @@ fn start_dictation(handle: &mut VoxitHostSessionHandle) -> VoxitStatus {
 		let preferred_device_id = (handle.config.audio.input_device_id != 0)
 			.then_some(handle.config.audio.input_device_id);
 
-		match voxit_audio::start_recording_with_stream(64, preferred_device_id) {
-			Ok((recorder, _chunk_rx, selection)) => {
+		match voxit_audio::start_recording_with_stream(
+			64,
+			preferred_device_id,
+			handle.config.audio.realtime_target_rate_hz,
+			1,
+		) {
+			Ok((recorder, chunk_rx, selection)) => {
+				let (event_tx, event_rx) = mpsc::channel();
+
+				match voxit_core::start_realtime_session(
+					realtime_session_config(handle),
+					chunk_rx,
+					event_tx,
+				) {
+					Ok(session) => {
+						handle.realtime_session = Some(session);
+						handle.realtime_event_rx = Some(event_rx);
+					},
+					Err(err) => {
+						handle.realtime_session = None;
+						handle.realtime_event_rx = None;
+						handle.last_error = format!("realtime preview unavailable: {err}");
+					},
+				}
+
 				handle.recorder = Some(recorder);
 				handle.snapshot.dictation_state = DictationSurfaceState::Listening;
 				handle.recording_duration_ms = 0;
@@ -636,6 +738,7 @@ fn stop_dictation(handle: &mut VoxitHostSessionHandle) -> VoxitStatus {
 			Err(err) => {
 				handle.snapshot.dictation_state = DictationSurfaceState::Done;
 
+				stop_realtime_preview(handle);
 				set_error(handle, format!("failed to stop recording: {err}"));
 
 				return VoxitStatus::Ok;
@@ -644,16 +747,28 @@ fn stop_dictation(handle: &mut VoxitHostSessionHandle) -> VoxitStatus {
 
 		handle.recording_duration_ms = recording.duration_ms;
 
+		stop_realtime_preview(handle);
+		drain_realtime_events(handle);
+
 		let (raw_transcript, _) =
 			match voxit_core::transcribe_only(&recording.wav, &handle.config.openai.finalize_model)
 			{
 				Ok(result) => result,
 				Err(err) => {
-					handle.snapshot.dictation_state = DictationSurfaceState::Done;
+					let fallback = realtime_transcript_text(handle);
+
+					if fallback.is_empty() {
+						handle.snapshot.dictation_state = DictationSurfaceState::Done;
 
-					set_error(handle, format!("transcription failed: {err}"));
+						set_error(handle, format!("transcription failed: {err}"));
 
-					return VoxitStatus::Ok;
+						return VoxitStatus::Ok;
+					}
+
+					handle.last_error =
+						format!("transcription failed; using realtime transcript: {err}");
+
+					(fallback, 0)
 				},
 			};
 
@@ -767,7 +882,33 @@ fn save_preferences(
 	VoxitStatus::Ok
 }
 
+fn save_model_preferences(
+	handle: &mut VoxitHostSessionHandle,
+	realtime_model: String,
+	realtime_transcription_model: String,
+	finalize_model: String,
+	rewrite_model: String,
+) -> VoxitStatus {
+	handle.config.openai.realtime_model = realtime_model;
+	handle.config.openai.realtime.transcription_model = realtime_transcription_model;
+	handle.config.openai.finalize_model = finalize_model;
+	handle.config.openai.rewrite_model = rewrite_model;
+
+	if let Err(err) = handle.config.save() {
+		set_error(handle, format!("failed to save config: {err}"));
+	} else {
+		handle.last_error.clear();
+	}
+
+	VoxitStatus::Ok
+}
+
 fn clear_run_output(handle: &mut VoxitHostSessionHandle) {
+	stop_realtime_preview(handle);
+
+	handle.transcript_assembler.reset();
+	handle.pass1_committed_transcript.clear();
+	handle.pass1_draft_transcript.clear();
 	handle.last_raw_transcript.clear();
 	handle.last_final_output.clear();
 	handle.last_error.clear();
@@ -775,10 +916,131 @@ fn clear_run_output(handle: &mut VoxitHostSessionHandle) {
 	handle.recording_duration_ms = 0;
 }
 
+#[cfg(target_os = "macos")]
+fn realtime_session_config(handle: &VoxitHostSessionHandle) -> RealtimeSessionConfig {
+	RealtimeSessionConfig {
+		model: handle.config.openai.realtime_model.clone(),
+		transcription_model: handle.config.openai.realtime.transcription_model.clone(),
+		language: handle.config.openai.language.clone(),
+		sample_rate_hz: handle.config.audio.realtime_target_rate_hz,
+		noise_reduction: handle.config.openai.realtime.noise_reduction.clone(),
+		instructions: realtime_session_instructions(handle),
+		reasoning_effort: reasoning_effort_value(handle.voice_plan.reasoning_effort).to_string(),
+	}
+}
+
+#[cfg(target_os = "macos")]
+fn realtime_session_instructions(handle: &VoxitHostSessionHandle) -> String {
+	format!(
+		"You are Voxit, a contextual voice input layer. Listen to the user's dictation for the focused target app and keep any response text suitable for insertion or preview.\n\
+		Active profile: {profile_title} ({profile_id}).\n\
+		Profile direction: {prompt_directive}\n\
+		Output policy: {output_policy}.\n\
+		Do not claim that app actions or shell commands have already run.",
+		profile_title = handle.voice_plan.profile_title,
+		profile_id = handle.voice_plan.profile_id,
+		prompt_directive = handle.voice_plan.prompt_directive,
+		output_policy = output_policy_value(handle.voice_plan.output_policy),
+	)
+}
+
+#[cfg(target_os = "macos")]
+fn reasoning_effort_value(effort: VoiceReasoningEffort) -> &'static str {
+	match effort {
+		VoiceReasoningEffort::Minimal => "minimal",
+		VoiceReasoningEffort::Low => "low",
+		VoiceReasoningEffort::Medium => "medium",
+		VoiceReasoningEffort::High => "high",
+	}
+}
+
+#[cfg(target_os = "macos")]
+fn output_policy_value(policy: VoiceOutputPolicy) -> &'static str {
+	match policy {
+		VoiceOutputPolicy::InsertText => "insert_text",
+		VoiceOutputPolicy::PreviewBeforeInsert => "preview_before_insert",
+		VoiceOutputPolicy::ConfirmBeforeAction => "confirm_before_action",
+	}
+}
+
+fn stop_realtime_preview(handle: &mut VoxitHostSessionHandle) {
+	if let Some(session) = handle.realtime_session.take() {
+		session.stop();
+	}
+}
+
+fn drain_realtime_events(handle: &mut VoxitHostSessionHandle) {
+	loop {
+		let event = match handle.realtime_event_rx.as_ref().map(Receiver::try_recv) {
+			Some(Ok(event)) => event,
+			Some(Err(TryRecvError::Empty)) | None => break,
+			Some(Err(TryRecvError::Disconnected)) => {
+				handle.realtime_event_rx = None;
+
+				break;
+			},
+		};
+
+		match event {
+			RealtimeEvent::Draft(event) | RealtimeEvent::Committed(event) => {
+				handle.transcript_assembler.apply(event);
+			},
+			RealtimeEvent::StreamError(reason) => {
+				handle.last_error = reason;
+			},
+		}
+	}
+
+	let transcript = handle.transcript_assembler.snapshot();
+
+	handle.pass1_committed_transcript = transcript.committed;
+	handle.pass1_draft_transcript = transcript.draft;
+}
+
+#[cfg(target_os = "macos")]
+fn realtime_transcript_text(handle: &VoxitHostSessionHandle) -> String {
+	let committed = handle.pass1_committed_transcript.trim();
+	let draft = handle.pass1_draft_transcript.trim();
+
+	match (committed.is_empty(), draft.is_empty()) {
+		(false, false) => format!("{committed} {draft}"),
+		(false, true) => committed.to_string(),
+		(true, false) => draft.to_string(),
+		(true, true) => String::new(),
+	}
+}
+
 fn set_error(handle: &mut VoxitHostSessionHandle, message: impl Into<String>) {
 	handle.last_error = message.into();
 }
 
+fn read_required_c_string(
+	handle: &mut VoxitHostSessionHandle,
+	value: *const c_char,
+	label: &str,
+) -> Result<String, VoxitStatus> {
+	let Some(value) = NonNull::new(value.cast_mut()) else {
+		set_error(handle, format!("{label} is missing"));
+
+		return Err(VoxitStatus::InvalidInput);
+	};
+	let value = unsafe { CStr::from_ptr(value.as_ptr()) };
+	let Ok(value) = value.to_str() else {
+		set_error(handle, format!("{label} is not valid UTF-8"));
+
+		return Err(VoxitStatus::Ok);
+	};
+	let value = value.trim();
+
+	if value.is_empty() {
+		set_error(handle, format!("{label} cannot be empty"));
+
+		return Err(VoxitStatus::Ok);
+	}
+
+	Ok(value.to_string())
+}
+
 #[cfg(target_os = "macos")]
 fn rewrite_settings(handle: &VoxitHostSessionHandle) -> RewriteSettings {
 	RewriteSettings {
@@ -815,6 +1077,8 @@ fn encode_snapshot(
 		has_focused_context: 0,
 		selected_text_present: 0,
 		has_raw_transcript: 0,
+		has_pass1_committed_transcript: 0,
+		has_pass1_draft_transcript: 0,
 		has_final_output: 0,
 		has_error: 0,
 		recording_duration_ms: 0,
@@ -836,6 +1100,9 @@ fn encode_snapshot_with_context(
 	encoded.has_focused_context = u8::from(!focused_context.is_empty());
 	encoded.selected_text_present = u8::from(focused_context.selected_text_present);
 	encoded.has_raw_transcript = u8::from(!handle.last_raw_transcript.is_empty());
+	encoded.has_pass1_committed_transcript =
+		u8::from(!handle.pass1_committed_transcript.is_empty());
+	encoded.has_pass1_draft_transcript = u8::from(!handle.pass1_draft_transcript.is_empty());
 	encoded.has_final_output = u8::from(!handle.last_final_output.is_empty());
 	encoded.has_error = u8::from(!handle.last_error.is_empty());
 	encoded.recording_duration_ms = handle.recording_duration_ms;
@@ -946,6 +1213,8 @@ fn string_field_value(handle: &VoxitHostSessionHandle, field: VoxitHostStringFie
 		VoxitHostStringField::RawTranscript => &handle.last_raw_transcript,
 		VoxitHostStringField::FinalOutput => &handle.last_final_output,
 		VoxitHostStringField::LastError => &handle.last_error,
+		VoxitHostStringField::Pass1CommittedTranscript => &handle.pass1_committed_transcript,
+		VoxitHostStringField::Pass1DraftTranscript => &handle.pass1_draft_transcript,
 	}
 }
 
@@ -1020,6 +1289,8 @@ mod tests {
 		assert_eq!(snapshot.rewrite_enabled, 1);
 		assert_eq!(snapshot.has_focused_context, 0);
 		assert_eq!(snapshot.selected_text_present, 0);
+		assert_eq!(snapshot.has_pass1_committed_transcript, 0);
+		assert_eq!(snapshot.has_pass1_draft_transcript, 0);
 		assert_eq!(snapshot.prompt_profile_kind, VoxitPromptProfileKind::FastDictation);
 		assert_eq!(snapshot.voice_tier, VoxitVoiceInteractionTier::FastDictation);
 		assert_eq!(snapshot.reasoning_effort, VoxitVoiceReasoningEffort::Minimal);