Improve dictation startup and dictionary training#496
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce8185fbf8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.isTrainingRecording = true | ||
|
|
||
| await self.asr.start(forDictionaryTraining: true) |
There was a problem hiding this comment.
Defer stop until training capture has actually started
When a user clicks Record and then clicks Stop while asr.start(forDictionaryTraining:) is still starting the audio engine, isTrainingRecording is already true, so stopTrainingSample() calls asr.stop() while ASRService.isRunning is still false; that stop returns early, and the pending start can then finish with isRunning/dictionary capture active while this view has already cleared its recording state and disabled the Stop button. Please either prevent stopping until start returns or queue a deferred stop like the hotkey release path does.
Useful? React with 👍 / 👎.
| let captureCallbackStartedAt = Date().timeIntervalSince1970 | ||
| onCaptureStarted?() |
There was a problem hiding this comment.
Play the start cue before enabling capture
For users with transcription sounds enabled on a speaker/mic setup, invoking onCaptureStarted here plays the start sound after audioCapturePipeline.setRecordingEnabled(true) and after the input tap has been installed, so the cue can be appended to the dictation buffer and contaminate the first transcription. Keep recording disabled until after the cue, or play the cue before the tap begins capturing.
Useful? React with 👍 / 👎.
| self.appBench("capture_context_start") | ||
| self.captureRecordingContext() |
There was a problem hiding this comment.
Capture the target app before awaiting ASR startup
In the hotkey dictation path, captureRecordingContext() now runs only after the awaited ASR startup above, which can take noticeable time on a cold audio route/model path. If the user switches focus during that startup window, recordingTargetPID/recordingAppInfo are captured from the wrong app, so per-app AI routing and the eventual paste/focus restore can target the wrong application; capture this context before awaiting asr.start() as the old flow did.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b269532ada
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if matchingShortcut { | ||
| self.prewarmPrimaryDictationShortcutIfNeeded(reason: "primary dictation modifier prefix down") |
There was a problem hiding this comment.
Avoid starting the mic on modifier prefixes
When a primary dictation shortcut uses a common modifier chord such as ⌘Space, this flagsChanged path runs as soon as the modifier portion alone matches. prewarmCaptureEngineForFastStart starts the AVAudioEngine and installs the input tap, so merely holding ⌘ for an unrelated shortcut opens the microphone until another key/release cancels it or the prewarm expires, causing privacy indicators and Bluetooth headset call-mode side effects without dictation intent. Please wait for the full shortcut, or add a delayed/less invasive prewarm, before starting the capture engine.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d09afe124
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try self.setupEngineTap() | ||
| DebugLogger.shared.debug("✅ Engine tap setup complete", source: "ASRService") | ||
| let captureCallbackStartedAt = Date().timeIntervalSince1970 | ||
| await beforeCaptureEnabled?() |
There was a problem hiding this comment.
Honor cancel while the start-sound gate is pending
When transcription sounds are enabled, this callback can sleep for up to a second before isRunning is set. During that window the cancel shortcut path only stops ASR when asrService.isRunning is already true, so pressing Escape/cancel while the cue is playing can clear the UI state but then return here and enable capture anyway. Please make startup cancellable across this await or set/check a pending-cancel state before enabling capture.
Useful? React with 👍 / 👎.
| player.volume = desiredVolume | ||
| } | ||
| player.play() | ||
| guard player.play() else { return 0 } |
There was a problem hiding this comment.
Restore system volume when playback fails
With independent transcription sound volume enabled, the code has already saved and changed the system output volume before reaching this guard. If AVAudioPlayer.play() returns false (for example during an output-route interruption), this new early return skips the restore block below and leaves the user's system volume at the temporary cue volume. Please restore savedSystemVolume on this failure path before returning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5631029dd5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| guard !self.asrService.isDictionaryTrainingCaptureActive else { | ||
| DebugLogger.shared.debug("Ignoring \(label) - dictionary training capture is active", source: "GlobalHotkeyManager") | ||
| return false |
There was a problem hiding this comment.
Apply the training guard to cancel hotkeys
The new dictionary-training guard only protects the normal trigger/stop helpers, but the cancel shortcut path above bypasses it and calls stopWithoutTranscription() directly whenever ASR is running. If the user presses the configured cancel key while recording a training sample, ASR is stopped behind CustomDictionaryView, leaving its isTrainingRecording state set until the next Stop click only queues a pending stop against an ASR service that is no longer running. Please route cancel through the same dictionary-training guard or notify the training UI to own the cancellation.
Useful? React with 👍 / 👎.
| if self.isEngineWarmForFastRestart || self.engineStorage != nil { | ||
| self.tearDownCaptureEngine(reason: "audio route changed while warm: \(reason)", releaseAsync: true) | ||
| self.scheduleIdleCaptureEnginePreparation(reason: "audio route changed while idle: \(reason)") |
There was a problem hiding this comment.
Ignore route teardown while startup is in flight
This idle-route branch also runs while start() is awaiting the start-sound/media-pause gate: isRunning is still false but engineStorage already points at the just-started engine. If a default-device or AVAudioEngineConfigurationChange notification is delivered during that window, this tears down the engine and schedules idle preparation, then start() resumes and marks isRunning with no live tap/engine to capture audio. Please skip this teardown when isStarting is true, or abort/retry the in-flight start.
Useful? React with 👍 / 👎.
| } else { | ||
| self.scheduleFastRestartWarmEngineShutdown(reason: "recording stopped") | ||
| } |
There was a problem hiding this comment.
Release the microphone when dictation stops
For every normal dictation stop this branch keeps the existing AVAudioEngine running for the 8-second fast-restart window instead of tearing it down. Even though recordingEnabled is false, the input tap and running engine still hold the microphone device, so users see the mic remain active and Bluetooth headsets can stay in call/HFP mode after they stopped recording. Please release the live input device on stop, or make any warm restart path avoid holding the running microphone.
Useful? React with 👍 / 👎.
Description
Adds faster dictation capture startup/prewarm behavior and a guided Custom Dictionary training flow for replacement words.
Type of Change
Related Issues
None.
Testing
swiftlint --strict --config .swiftlint.yml Sourcesswiftformat --config .swiftformat Sourcessh build_with_FI_incremental.shNotes
Screenshots / Video
Not attached.