Skip to content

Improve dictation startup and dictionary training#496

Open
altic-dev wants to merge 10 commits into
mainfrom
B/custom-dictionary-training
Open

Improve dictation startup and dictionary training#496
altic-dev wants to merge 10 commits into
mainfrom
B/custom-dictionary-training

Conversation

@altic-dev

Copy link
Copy Markdown
Owner

Description

Adds faster dictation capture startup/prewarm behavior and a guided Custom Dictionary training flow for replacement words.

Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 🧹 Chore
  • 📝 Documentation update

Related Issues

None.

Testing

  • Tested on Intel Mac
  • Tested on Apple Silicon Mac
  • Tested on macOS 27.0
  • Ran linter locally: swiftlint --strict --config .swiftlint.yml Sources
  • Ran formatter locally: swiftformat --config .swiftformat Sources
  • Built and installed locally: sh build_with_FI_incremental.sh

Notes

  • Includes the audio capture startup/prewarm fix.
  • Adds Train by Voice in Custom Dictionary with raw ASR capture, removable heard variants, merge-on-save behavior, and a collapsed dictionary section.
  • Keeps manual Add Replacement behavior available in the same Teach Words flow.

Screenshots / Video

Not attached.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce8185fbf8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +984 to +986
self.isTrainingRecording = true

await self.asr.start(forDictionaryTraining: true)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Defer stop until training capture has actually started

When a user clicks Record and then clicks Stop while asr.start(forDictionaryTraining:) is still starting the audio engine, isTrainingRecording is already true, so stopTrainingSample() calls asr.stop() while ASRService.isRunning is still false; that stop returns early, and the pending start can then finish with isRunning/dictionary capture active while this view has already cleared its recording state and disabled the Stop button. Please either prevent stopping until start returns or queue a deferred stop like the hotkey release path does.

Useful? React with 👍 / 👎.

Comment thread Sources/Fluid/Services/ASRService.swift Outdated
Comment on lines +953 to +954
let captureCallbackStartedAt = Date().timeIntervalSince1970
onCaptureStarted?()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Play the start cue before enabling capture

For users with transcription sounds enabled on a speaker/mic setup, invoking onCaptureStarted here plays the start sound after audioCapturePipeline.setRecordingEnabled(true) and after the input tap has been installed, so the cue can be appended to the dictation buffer and contaminate the first transcription. Keep recording disabled until after the cue, or play the cue before the tap begins capturing.

Useful? React with 👍 / 👎.

Comment thread Sources/Fluid/ContentView.swift Outdated
Comment on lines +3577 to +3578
self.appBench("capture_context_start")
self.captureRecordingContext()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Capture the target app before awaiting ASR startup

In the hotkey dictation path, captureRecordingContext() now runs only after the awaited ASR startup above, which can take noticeable time on a cold audio route/model path. If the user switches focus during that startup window, recordingTargetPID/recordingAppInfo are captured from the wrong app, so per-app AI routing and the eventual paste/focus restore can target the wrong application; capture this context before awaiting asr.start() as the old flow did.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b269532ada

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1149 to +1150
if matchingShortcut {
self.prewarmPrimaryDictationShortcutIfNeeded(reason: "primary dictation modifier prefix down")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid starting the mic on modifier prefixes

When a primary dictation shortcut uses a common modifier chord such as ⌘Space, this flagsChanged path runs as soon as the modifier portion alone matches. prewarmCaptureEngineForFastStart starts the AVAudioEngine and installs the input tap, so merely holding ⌘ for an unrelated shortcut opens the microphone until another key/release cancels it or the prewarm expires, causing privacy indicators and Bluetooth headset call-mode side effects without dictation intent. Please wait for the full shortcut, or add a delayed/less invasive prewarm, before starting the capture engine.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d09afe124

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

try self.setupEngineTap()
DebugLogger.shared.debug("✅ Engine tap setup complete", source: "ASRService")
let captureCallbackStartedAt = Date().timeIntervalSince1970
await beforeCaptureEnabled?()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor cancel while the start-sound gate is pending

When transcription sounds are enabled, this callback can sleep for up to a second before isRunning is set. During that window the cancel shortcut path only stops ASR when asrService.isRunning is already true, so pressing Escape/cancel while the cue is playing can clear the UI state but then return here and enable capture anyway. Please make startup cancellable across this await or set/check a pending-cancel state before enabling capture.

Useful? React with 👍 / 👎.

player.volume = desiredVolume
}
player.play()
guard player.play() else { return 0 }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore system volume when playback fails

With independent transcription sound volume enabled, the code has already saved and changed the system output volume before reaching this guard. If AVAudioPlayer.play() returns false (for example during an output-route interruption), this new early return skips the restore block below and leaves the user's system volume at the temporary cue volume. Please restore savedSystemVolume on this failure path before returning.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5631029dd5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1857 to +1859
guard !self.asrService.isDictionaryTrainingCaptureActive else {
DebugLogger.shared.debug("Ignoring \(label) - dictionary training capture is active", source: "GlobalHotkeyManager")
return false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply the training guard to cancel hotkeys

The new dictionary-training guard only protects the normal trigger/stop helpers, but the cancel shortcut path above bypasses it and calls stopWithoutTranscription() directly whenever ASR is running. If the user presses the configured cancel key while recording a training sample, ASR is stopped behind CustomDictionaryView, leaving its isTrainingRecording state set until the next Stop click only queues a pending stop against an ASR service that is no longer running. Please route cancel through the same dictionary-training guard or notify the training UI to own the cancellation.

Useful? React with 👍 / 👎.

Comment on lines +2322 to +2324
if self.isEngineWarmForFastRestart || self.engineStorage != nil {
self.tearDownCaptureEngine(reason: "audio route changed while warm: \(reason)", releaseAsync: true)
self.scheduleIdleCaptureEnginePreparation(reason: "audio route changed while idle: \(reason)")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore route teardown while startup is in flight

This idle-route branch also runs while start() is awaiting the start-sound/media-pause gate: isRunning is still false but engineStorage already points at the just-started engine. If a default-device or AVAudioEngineConfigurationChange notification is delivered during that window, this tears down the engine and schedules idle preparation, then start() resumes and marks isRunning with no live tap/engine to capture audio. Please skip this teardown when isStarting is true, or abort/retry the in-flight start.

Useful? React with 👍 / 👎.

Comment on lines +1135 to +1137
} else {
self.scheduleFastRestartWarmEngineShutdown(reason: "recording stopped")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Release the microphone when dictation stops

For every normal dictation stop this branch keeps the existing AVAudioEngine running for the 8-second fast-restart window instead of tearing it down. Even though recordingEnabled is false, the input tap and running engine still hold the microphone device, so users see the mic remain active and Bluetooth headsets can stay in call/HFP mode after they stopped recording. Please release the live input device on stop, or make any warm restart path avoid holding the running microphone.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant