Skip to content

Play dictation start cue after capture is ready#478

Open
freshyjmp wants to merge 10 commits into
altic-dev:mainfrom
freshyjmp:fix/start-cue-after-capture-ready
Open

Play dictation start cue after capture is ready#478
freshyjmp wants to merge 10 commits into
altic-dev:mainfrom
freshyjmp:fix/start-cue-after-capture-ready

Conversation

@freshyjmp

@freshyjmp freshyjmp commented Jun 29, 2026

Copy link
Copy Markdown

Description

Moves the dictation start sound from the hotkey/overlay moment to the point where the audio capture path is actually ready, and adds a conservative short-lived engine reuse window for repeated dictations.

The start cue now waits until:

  • pending startup audio-route recovery is idle
  • the engine has had a short stability window after engine start or startup recovery
  • at least a small number of captured samples are buffered
  • the same recording session that requested the cue is still active

Startup AVAudioEngineConfigurationChange recovery is also treated separately from later route changes by using a shorter startup recovery delay. Later route changes keep the existing longer recovery delay and clear any startup-only readiness state if they replace startup recovery.

For responsiveness, normal successful stops can retain the stopped AVAudioEngine for 20 seconds. If another dictation starts during that grace window, it reuses the retained engine instance; otherwise the engine is released when the grace window expires. Reuse is skipped and the engine is released immediately for Bluetooth routes or independent device binding, so headphones can return to their normal high-quality mode after dictation.

This also adds explicit start/stop sound and engine-reuse logging so future debugging can distinguish sound playback, readiness wait, engine reuse hits, reuse skips, and reuse expiry.

A local debug retest also exposed that immediate event-tap recovery after a timeout could interfere with keyboard input. Event-tap timeout recovery now yields back to the system event callback, recreates the tap asynchronously, and uses the existing retry loop for transient tap-creation failures.

Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 🧹 Chore
  • 📝 Documentation update

Related Issues

Testing

  • Tested on Intel Mac
  • Tested on Apple Silicon Mac
  • Tested on macOS 26.5.1
  • Ran linter locally: swiftlint --strict --config .swiftlint.yml Sources (SwiftLint is not installed locally)
  • Ran formatter locally: swiftformat --config .swiftformat Sources (SwiftFormat is not installed locally)
  • Ran git diff --check
  • Ran xcrun swiftc -frontend -parse Sources/Fluid/Services/AudioDeviceService.swift Sources/Fluid/Services/GlobalHotkeyManager.swift Sources/Fluid/Services/ASRService.swift Sources/Fluid/ContentView.swift Sources/Fluid/Services/TranscriptionSoundPlayer.swift Tests/FluidDictationIntegrationTests/StartCueCaptureReadinessTests.swift
  • Ran xcodebuild test -project Fluid.xcodeproj -scheme Fluid -configuration Debug -destination 'platform=macOS,arch=arm64' -only-testing:FluidDictationIntegrationTests/StartCueCaptureReadinessTests PRODUCT_BUNDLE_IDENTIFIER=com.FluidApp.debug CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM= CODE_SIGN_IDENTITY=- CODE_SIGNING_ALLOWED=YES CODE_SIGNING_REQUIRED=YES (6 tests, 0 failures)
  • Ran xcodebuild build -project Fluid.xcodeproj -scheme Fluid -configuration Debug -destination 'platform=macOS,arch=arm64' PRODUCT_BUNDLE_IDENTIFIER=com.FluidApp.debug CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM= CODE_SIGN_IDENTITY=- CODE_SIGNING_ALLOWED=YES CODE_SIGNING_REQUIRED=YES

Runtime debug testing showed:

  • cold/new-engine start: engine_reuse_start hit=false, asr_start_return elapsedMs=105, start cue after capture-ready wait
  • reuse starts: engine_reuse_start hit=true, retained engine age around 1.4-2.6s, asr_start_return elapsedMs=36-40
  • after no restart, the retained engine released on schedule: engine_reuse_release reason=reuse_grace_expired hadEngine=true

Notes

  • This intentionally makes the audible start cue represent actual capture readiness instead of the instant hotkey/overlay transition. On routes that need startup recovery, the cue may therefore happen later, but it should be a truer signal to begin speaking.
  • The 20-second engine reuse window is deliberately short so FluidVoice does not become an always-hot audio app. The engine is stopped during this window; the retained object avoids rebuilding the graph for rapid repeated dictations only on routes where retention is safe.
  • This PR does not change the visual overlay state yet. A follow-up could show a distinct "starting" state before capture is ready, then switch to "listening" when this readiness check passes.
  • Potential follow-up performance work: avoid unnecessary output-node/device work for capture-only dictation and see whether the startup format/configuration change can be avoided.

Screenshots / Video

Not included; this change is timing/audio-cue behavior and logging rather than a visual UI change.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e9fa709ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/ASRService.swift Outdated
Comment thread Sources/Fluid/Services/ASRService.swift Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8890dcb2a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/ContentView.swift Outdated
@altic-dev

Copy link
Copy Markdown
Owner

Good fix! But i was wondering if you can optimize the engine to open faster than the other way around. Thoughts?

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c81e63b85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/ASRService.swift
Comment thread Sources/Fluid/Services/GlobalHotkeyManager.swift Outdated
@freshyjmp

Copy link
Copy Markdown
Author

Replying to @altic-dev: agreed that making engine startup faster is the better direction where we can do it safely.

I kept this PR split between correctness and a conservative startup optimization:

  • The cue change makes the UI/audio cue mean “capture is actually ready,” not just “the hotkey was accepted.”
  • The performance piece keeps a stopped AVAudioEngine around briefly after normal stops, so rapid repeat dictations can reuse the engine instead of paying full teardown/recreate cost each time.
  • I added targeted regression coverage in 311a3cf for stale cue waiters and the post-recovery sample baseline.

I would be cautious about a permanently prewarmed/running engine because it can keep mic/CoreAudio/Bluetooth resources active. For a follow-up, I think the next lower-risk cold-start optimization is to measure and trim the capture graph setup, especially avoiding unnecessary output-node/device-binding work when sync mode is forced and dictation only needs input capture.

Happy to split the engine reuse out if you prefer this PR to stay cue/readiness-only.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 311a3cf3b4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/ASRService.swift Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53da73873b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/ASRService.swift

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 814e2a6842

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/GlobalHotkeyManager.swift
@freshyjmp freshyjmp force-pushed the fix/start-cue-after-capture-ready branch from 814e2a6 to 21a82d1 Compare June 30, 2026 19:32

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21a82d1ab1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/ASRService.swift Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84115b771b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/ASRService.swift Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa95c81f50

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return
}

TranscriptionSoundPlayer.shared.playStartSound()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mute capture while playing the start cue

When transcription sounds are enabled and the output is audible to the microphone (built-in speakers, loopback/monitoring, or loud headphones), this plays the start cue while ASRService is still running and its tap is appending to the same buffer that stop() later transcribes and saves. That means the cue itself can be captured at the beginning of the dictation and can pollute the final/streaming transcription; consider temporarily gating or clearing capture around the cue playback after readiness has been established.

Useful? React with 👍 / 👎.

@altic-dev

altic-dev commented Jun 30, 2026 via email

Copy link
Copy Markdown
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants