Describe the bug
The recording overlay and start chime can indicate that dictation is ready before audio capture is actually stable. When I start speaking immediately after the hotkey/UI cue, the first words are sometimes missing from the transcript even though the rest of the dictation is accurate.
This is especially confusing because the black recording overlay appears immediately, which suggests the app is already listening. In practice, debug logs show AVAudioEngine.start() returning before a startup AVAudioEngineConfigurationChange route recovery finishes and before audio samples have accumulated.
To Reproduce
Steps to reproduce the behavior:
- Enable dictation start sounds.
- Trigger dictation with the global hotkey.
- Start speaking as soon as the overlay/chime indicates recording.
- Observe that the first words can be dropped, while later speech is transcribed correctly.
Expected behavior
The app should only play the start chime once capture is actually ready to receive audio, or the UI should show a distinct "starting" state until the capture graph is stable. The overlay/chime should not imply that speech is being captured while startup route recovery is still pending.
Screenshots
Not applicable.
Environment:
- macOS Version: macOS 26.5.1 (25F80)
- App Version: 1.6.1, installed via Homebrew cask
- Architecture: Apple Silicon
Additional context
Observed debug timings from local testing:
- The recording overlay is shown immediately when dictation starts.
asr_start_return has been observed around 460-662 ms after asr_start_call.
- A startup
AVAudioEngineConfigurationChange can arrive after asr_start_return.
- Route recovery then completes roughly 140-170 ms later in tested sessions.
- Waiting for route recovery to be idle, a short stability delay, and at least a small captured sample buffer moved the start chime to the point where capture was actually ready.
- In local patched runs, the capture-ready cue fired roughly 513-932 ms after
asr_start_return, with the start sound playing immediately afterward.
The main UX issue is that the current UI gives a false-ready signal. There may also be follow-up performance opportunities around startup graph construction, output node/device routing, or short-lived engine reuse, but a conservative first fix is to make the audible cue reflect capture readiness.
Crash Logs
No crash.
Describe the bug
The recording overlay and start chime can indicate that dictation is ready before audio capture is actually stable. When I start speaking immediately after the hotkey/UI cue, the first words are sometimes missing from the transcript even though the rest of the dictation is accurate.
This is especially confusing because the black recording overlay appears immediately, which suggests the app is already listening. In practice, debug logs show
AVAudioEngine.start()returning before a startupAVAudioEngineConfigurationChangeroute recovery finishes and before audio samples have accumulated.To Reproduce
Steps to reproduce the behavior:
Expected behavior
The app should only play the start chime once capture is actually ready to receive audio, or the UI should show a distinct "starting" state until the capture graph is stable. The overlay/chime should not imply that speech is being captured while startup route recovery is still pending.
Screenshots
Not applicable.
Environment:
Additional context
Observed debug timings from local testing:
asr_start_returnhas been observed around 460-662 ms afterasr_start_call.AVAudioEngineConfigurationChangecan arrive afterasr_start_return.asr_start_return, with the start sound playing immediately afterward.The main UX issue is that the current UI gives a false-ready signal. There may also be follow-up performance opportunities around startup graph construction, output node/device routing, or short-lived engine reuse, but a conservative first fix is to make the audible cue reflect capture readiness.
Crash Logs
No crash.