Play dictation start cue after capture is ready#478
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e9fa709ce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8890dcb2a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Good fix! But i was wondering if you can optimize the engine to open faster than the other way around. Thoughts? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c81e63b85
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Replying to @altic-dev: agreed that making engine startup faster is the better direction where we can do it safely. I kept this PR split between correctness and a conservative startup optimization:
I would be cautious about a permanently prewarmed/running engine because it can keep mic/CoreAudio/Bluetooth resources active. For a follow-up, I think the next lower-risk cold-start optimization is to measure and trim the capture graph setup, especially avoiding unnecessary output-node/device-binding work when sync mode is forced and dictation only needs input capture. Happy to split the engine reuse out if you prefer this PR to stay cue/readiness-only. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 311a3cf3b4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53da73873b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 814e2a6842
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
814e2a6 to
21a82d1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21a82d1ab1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84115b771b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa95c81f50
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return | ||
| } | ||
|
|
||
| TranscriptionSoundPlayer.shared.playStartSound() |
There was a problem hiding this comment.
Mute capture while playing the start cue
When transcription sounds are enabled and the output is audible to the microphone (built-in speakers, loopback/monitoring, or loud headphones), this plays the start cue while ASRService is still running and its tap is appending to the same buffer that stop() later transcribes and saves. That means the cue itself can be captured at the beginning of the dictation and can pollute the final/streaming transcription; consider temporarily gating or clearing capture around the cue playback after readiness has been established.
Useful? React with 👍 / 👎.
|
This PR is maxing out github actions for some reason. Would appreciate if
you can reduce the pr commits and make a final commit with the changes :)
thanks. Also i am workijg on this optimization on the side, so i’d you
wait a little to make changes from now as if I am able to figure this out
in a better way, we don have to waste your efforts.
Thanks for understanding!
…On Tue, Jun 30, 2026 at 1:50 PM chatgpt-codex-connector[bot] < ***@***.***> wrote:
***@***.***[bot]* commented on this pull request.
💡 Codex Review
Here are some automated review suggestions for this pull request.
*Reviewed commit:* fa95c81
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo.
Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment ***@***.*** <https://github.com/codex> review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT <https://openai.com/codex>,
Codex can also answer questions or update the PR, like ***@***.***
<https://github.com/codex> address that feedback".
------------------------------
In Sources/Fluid/ContentView.swift
<#478 (comment)>:
> + let ready = await self.asr.waitForCaptureReadyForStartCue(sessionID: sessionID)
+ DebugLogger.shared.benchmark(
+ "APP_BENCH",
+ message: "start_cue_ready ready=\(ready) elapsedMs=\(Int(((ProcessInfo.processInfo.systemUptime - cueWaitStartedAt) * 1000).rounded()))",
+ source: "AppBenchmark"
+ )
+
+ guard ready,
+ self.asr.isRunning,
+ self.asr.currentRecordingSessionID == sessionID
+ else {
+ DebugLogger.shared.debug("Start cue skipped because capture is no longer active", source: "ContentView")
+ return
+ }
+
+ TranscriptionSoundPlayer.shared.playStartSound()
*[image: P2 Badge]
<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174>
Mute capture while playing the start cue*
When transcription sounds are enabled and the output is audible to the
microphone (built-in speakers, loopback/monitoring, or loud headphones),
this plays the start cue while ASRService is still running and its tap is
appending to the same buffer that stop() later transcribes and saves.
That means the cue itself can be captured at the beginning of the dictation
and can pollute the final/streaming transcription; consider temporarily
gating or clearing capture around the cue playback after readiness has been
established.
Useful? React with 👍 / 👎.
—
Reply to this email directly, view it on GitHub
<#478?email_source=notifications&email_token=BVSOW2TM6YOSVPF453K7T635CQR3FA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINRQGQYDANBQGMZKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#pullrequestreview-4604004032>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BVSOW2WM2NAOXUL2XYKZLAL5CQR3FAVCNFSNUABGKJSXA33TNF2G64TZHMYTANRRGMZDOMZRGE5US43TOVSTWNBXGY4TGNZRHA2DTILWAI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description
Moves the dictation start sound from the hotkey/overlay moment to the point where the audio capture path is actually ready, and adds a conservative short-lived engine reuse window for repeated dictations.
The start cue now waits until:
Startup
AVAudioEngineConfigurationChangerecovery is also treated separately from later route changes by using a shorter startup recovery delay. Later route changes keep the existing longer recovery delay and clear any startup-only readiness state if they replace startup recovery.For responsiveness, normal successful stops can retain the stopped
AVAudioEnginefor 20 seconds. If another dictation starts during that grace window, it reuses the retained engine instance; otherwise the engine is released when the grace window expires. Reuse is skipped and the engine is released immediately for Bluetooth routes or independent device binding, so headphones can return to their normal high-quality mode after dictation.This also adds explicit start/stop sound and engine-reuse logging so future debugging can distinguish sound playback, readiness wait, engine reuse hits, reuse skips, and reuse expiry.
A local debug retest also exposed that immediate event-tap recovery after a timeout could interfere with keyboard input. Event-tap timeout recovery now yields back to the system event callback, recreates the tap asynchronously, and uses the existing retry loop for transient tap-creation failures.
Type of Change
Related Issues
Testing
swiftlint --strict --config .swiftlint.yml Sources(SwiftLint is not installed locally)swiftformat --config .swiftformat Sources(SwiftFormat is not installed locally)git diff --checkxcrun swiftc -frontend -parse Sources/Fluid/Services/AudioDeviceService.swift Sources/Fluid/Services/GlobalHotkeyManager.swift Sources/Fluid/Services/ASRService.swift Sources/Fluid/ContentView.swift Sources/Fluid/Services/TranscriptionSoundPlayer.swift Tests/FluidDictationIntegrationTests/StartCueCaptureReadinessTests.swiftxcodebuild test -project Fluid.xcodeproj -scheme Fluid -configuration Debug -destination 'platform=macOS,arch=arm64' -only-testing:FluidDictationIntegrationTests/StartCueCaptureReadinessTests PRODUCT_BUNDLE_IDENTIFIER=com.FluidApp.debug CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM= CODE_SIGN_IDENTITY=- CODE_SIGNING_ALLOWED=YES CODE_SIGNING_REQUIRED=YES(6 tests, 0 failures)xcodebuild build -project Fluid.xcodeproj -scheme Fluid -configuration Debug -destination 'platform=macOS,arch=arm64' PRODUCT_BUNDLE_IDENTIFIER=com.FluidApp.debug CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM= CODE_SIGN_IDENTITY=- CODE_SIGNING_ALLOWED=YES CODE_SIGNING_REQUIRED=YESRuntime debug testing showed:
engine_reuse_start hit=false,asr_start_return elapsedMs=105, start cue after capture-ready waitengine_reuse_start hit=true, retained engine age around 1.4-2.6s,asr_start_return elapsedMs=36-40engine_reuse_release reason=reuse_grace_expired hadEngine=trueNotes
Screenshots / Video
Not included; this change is timing/audio-cue behavior and logging rather than a visual UI change.