[🐞 BUG] Start overlay/chime can indicate recording before audio capture is ready

**Describe the bug**
The recording overlay and start chime can indicate that dictation is ready before audio capture is actually stable. When I start speaking immediately after the hotkey/UI cue, the first words are sometimes missing from the transcript even though the rest of the dictation is accurate.

This is especially confusing because the black recording overlay appears immediately, which suggests the app is already listening. In practice, debug logs show `AVAudioEngine.start()` returning before a startup `AVAudioEngineConfigurationChange` route recovery finishes and before audio samples have accumulated.

**To Reproduce**
Steps to reproduce the behavior:
1. Enable dictation start sounds.
2. Trigger dictation with the global hotkey.
3. Start speaking as soon as the overlay/chime indicates recording.
4. Observe that the first words can be dropped, while later speech is transcribed correctly.

**Expected behavior**
The app should only play the start chime once capture is actually ready to receive audio, or the UI should show a distinct "starting" state until the capture graph is stable. The overlay/chime should not imply that speech is being captured while startup route recovery is still pending.

**Screenshots**
Not applicable.

**Environment:**
 - macOS Version: macOS 26.5.1 (25F80)
 - App Version: 1.6.1, installed via Homebrew cask
 - Architecture: Apple Silicon

**Additional context**
Observed debug timings from local testing:

- The recording overlay is shown immediately when dictation starts.
- `asr_start_return` has been observed around 460-662 ms after `asr_start_call`.
- A startup `AVAudioEngineConfigurationChange` can arrive after `asr_start_return`.
- Route recovery then completes roughly 140-170 ms later in tested sessions.
- Waiting for route recovery to be idle, a short stability delay, and at least a small captured sample buffer moved the start chime to the point where capture was actually ready.
- In local patched runs, the capture-ready cue fired roughly 513-932 ms after `asr_start_return`, with the start sound playing immediately afterward.

The main UX issue is that the current UI gives a false-ready signal. There may also be follow-up performance opportunities around startup graph construction, output node/device routing, or short-lived engine reuse, but a conservative first fix is to make the audible cue reflect capture readiness.

**Crash Logs**
No crash.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[🐞 BUG] Start overlay/chime can indicate recording before audio capture is ready #477

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[🐞 BUG] Start overlay/chime can indicate recording before audio capture is ready #477

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions