feat: voice editing commands — scratch/capitalize/slash/new line (#406)#476
feat: voice editing commands — scratch/capitalize/slash/new line (#406)#476Kayaba-Attribution wants to merge 1 commit into
Conversation
…ic-dev#406) Add VoiceCommandProcessor with detect/apply pipeline for inline editing commands spoken at the end of a dictation utterance. Commands: "scratch that" / "delete that" → delete last word (configurable N via voiceCommandScratchWordCount); "capitalize that" → capitalize last word; "slash that" → append "/" to last word; "new line" / "new paragraph" → insert newline. Intercept in DictationPostProcessingService immediately after trimming — command-only utterances bypass the LLM entirely; mixed utterances strip the command phrase before the LLM and apply the edit post-GAAV. Feature is gated by voiceCommandsEnabled (defaults off). 19 unit tests covering all commands, synonyms, trailing-only enforcement, word-boundary matching, ASR variance normalization, punctuation handling, and configurable scratch count. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c8b8253d35
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let trimmed = inputText.trimmingCharacters(in: .whitespacesAndNewlines) | ||
| guard !trimmed.isEmpty else { | ||
|
|
||
| let (voiceStripped, voicePendingAction) = VoiceCommandProcessor.detect(in: trimmed, settings: SettingsStore.shared) |
There was a problem hiding this comment.
Run voice-command detection from the user dictation flow
This hook only executes inside DictationPostProcessingService, but normal hotkey dictation in ContentView.stopAndProcessTranscription calls processTextWithAI directly when AI is configured (ContentView.swift:2147) and otherwise uses transcribedText unchanged (ContentView.swift:2193); repo-wide search shows this service is only called by LocalAPI/InferenceAPIController.swift:88. As a result, enabling VoiceCommandsEnabled in the app will not process ... scratch that or new line for the main dictation workflow, only for the local /v1/postprocess API.
Useful? React with 👍 / 👎.
|
|
||
| for command in self.commands { | ||
| for phrase in command.phrases { | ||
| guard normalized.hasSuffix(phrase) else { continue } |
There was a problem hiding this comment.
Strip terminal punctuation before matching commands
When the ASR includes punctuation at the end of the utterance, such as scratch that. or new line,, the normalized string still ends with that punctuation because only comma-space and hyphens are rewritten, so normalized.hasSuffix(phrase) is false and the command text is emitted literally. Since the app already has trailing-period cleanup after processing, these common punctuated ASR outputs need to be normalized before this suffix check.
Useful? React with 👍 / 👎.
| let phraseWordCount = phrase.split(separator: " ").count | ||
| var tokens = self.tokenize(input) | ||
| guard tokens.count >= phraseWordCount else { return "" } | ||
| tokens.removeLast(phraseWordCount) |
There was a problem hiding this comment.
Strip only the matched hyphenated command token
For the hyphen variants that detection accepts, such as send the report scratch-that, the original input has one token for scratch-that while phraseWordCount is 2, so removeLast(2) also drops report; applying the scratch action then deletes an additional word and produces send instead of send the. The strip step needs to remove the matched original suffix/range rather than a normalized word count.
Useful? React with 👍 / 👎.
|
Will fully test and verify in the upcoming days |
|
I was going to add this but it needs clear UI and UX design before we add it in so people can use easily. Nice work but we need to spend a lot of time on this :( |
Closes #406
What this does
Adds inline voice editing commands spoken at the end of a dictation utterance:
Design
New file:
Sources/Fluid/Services/VoiceCommandProcessor.swift— self-containeddetect/applypipeline, no dependencies beyond Foundation. Parameterized viaVoiceCommandSettingsprotocol so it's testable in isolation.Intercept point:
DictationPostProcessingService.process(_:dictationSlot:), immediately after input trimming, before any provider routing — covers Private AI, Apple Intelligence, and cloud LLM paths uniformly.Two execution paths:
Feature gate:
voiceCommandsEnabledin Settings (defaults off). "Scratch that" word count is configurable viavoiceCommandScratchWordCount(defaults 1).Scope / v1 limits
TODO(v2)comment.Tests
19 unit tests in
VoiceCommandProcessorTestscovering: feature toggle, all commands and synonyms, trailing-only enforcement, word-boundary matching, ASR variance normalization, punctuation stripping/reattach, one-word empty result, and configurable scratch count.Checklist