Skip to content

feat: voice editing commands — scratch/capitalize/slash/new line (#406)#476

Open
Kayaba-Attribution wants to merge 1 commit into
altic-dev:mainfrom
Kayaba-Attribution:feat/issue-406-voice-commands
Open

feat: voice editing commands — scratch/capitalize/slash/new line (#406)#476
Kayaba-Attribution wants to merge 1 commit into
altic-dev:mainfrom
Kayaba-Attribution:feat/issue-406-voice-commands

Conversation

@Kayaba-Attribution

Copy link
Copy Markdown
Contributor

Closes #406

What this does

Adds inline voice editing commands spoken at the end of a dictation utterance:

Phrase Effect Example
"scratch that" / "delete that" Delete last N words (default 1) "send the report scratch that" → "send the"
"capitalize that" Capitalize first letter of last word "call monday capitalize that" → "call Monday"
"slash that" Append "/" to last word "src slash that" → "src/"
"new line" / "new paragraph" Insert newline "first item new line" → "first item\n"

Design

New file: Sources/Fluid/Services/VoiceCommandProcessor.swift — self-contained detect/apply pipeline, no dependencies beyond Foundation. Parameterized via VoiceCommandSettings protocol so it's testable in isolation.

Intercept point: DictationPostProcessingService.process(_:dictationSlot:), immediately after input trimming, before any provider routing — covers Private AI, Apple Intelligence, and cloud LLM paths uniformly.

Two execution paths:

  • Command-only utterance (e.g. "scratch that" alone) → early return, LLM bypassed entirely
  • Mixed utterance (text + command) → strip command phrase before LLM, apply edit post-GAAV formatting

Feature gate: voiceCommandsEnabled in Settings (defaults off). "Scratch that" word count is configurable via voiceCommandScratchWordCount (defaults 1).

Scope / v1 limits

  • Trailing-position commands only. Mid-sentence detection is deferred to v2.
  • ASR variance handled: comma-space and hyphen variants normalized before matching.
  • Word-boundary enforced: "capitalize that letter" does not trigger "capitalize that".
  • Punctuation on tokens: stripped before edit, re-attached after.
  • Literal intent escape hatch (saying "new line" as dictated text) is flagged with a TODO(v2) comment.

Tests

19 unit tests in VoiceCommandProcessorTests covering: feature toggle, all commands and synonyms, trailing-only enforcement, word-boundary matching, ASR variance normalization, punctuation stripping/reattach, one-word empty result, and configurable scratch count.

Checklist

  • Builds without errors
  • All 63 tests pass (19 new + 44 existing)
  • swiftlint clean (0 warnings on changed files)
  • swiftformat run on changed files
  • Feature gated off by default — no behaviour change for existing users
  • Settings UI for the toggle (out of scope for this PR — feature is accessible via UserDefaults for now)

…ic-dev#406)

Add VoiceCommandProcessor with detect/apply pipeline for inline editing
commands spoken at the end of a dictation utterance.

Commands: "scratch that" / "delete that" → delete last word (configurable
N via voiceCommandScratchWordCount); "capitalize that" → capitalize last
word; "slash that" → append "/" to last word; "new line" / "new
paragraph" → insert newline.

Intercept in DictationPostProcessingService immediately after trimming —
command-only utterances bypass the LLM entirely; mixed utterances strip
the command phrase before the LLM and apply the edit post-GAAV. Feature
is gated by voiceCommandsEnabled (defaults off).

19 unit tests covering all commands, synonyms, trailing-only enforcement,
word-boundary matching, ASR variance normalization, punctuation handling,
and configurable scratch count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8b8253d35

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

let trimmed = inputText.trimmingCharacters(in: .whitespacesAndNewlines)
guard !trimmed.isEmpty else {

let (voiceStripped, voicePendingAction) = VoiceCommandProcessor.detect(in: trimmed, settings: SettingsStore.shared)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run voice-command detection from the user dictation flow

This hook only executes inside DictationPostProcessingService, but normal hotkey dictation in ContentView.stopAndProcessTranscription calls processTextWithAI directly when AI is configured (ContentView.swift:2147) and otherwise uses transcribedText unchanged (ContentView.swift:2193); repo-wide search shows this service is only called by LocalAPI/InferenceAPIController.swift:88. As a result, enabling VoiceCommandsEnabled in the app will not process ... scratch that or new line for the main dictation workflow, only for the local /v1/postprocess API.

Useful? React with 👍 / 👎.


for command in self.commands {
for phrase in command.phrases {
guard normalized.hasSuffix(phrase) else { continue }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip terminal punctuation before matching commands

When the ASR includes punctuation at the end of the utterance, such as scratch that. or new line,, the normalized string still ends with that punctuation because only comma-space and hyphens are rewritten, so normalized.hasSuffix(phrase) is false and the command text is emitted literally. Since the app already has trailing-period cleanup after processing, these common punctuated ASR outputs need to be normalized before this suffix check.

Useful? React with 👍 / 👎.

Comment on lines +116 to +119
let phraseWordCount = phrase.split(separator: " ").count
var tokens = self.tokenize(input)
guard tokens.count >= phraseWordCount else { return "" }
tokens.removeLast(phraseWordCount)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip only the matched hyphenated command token

For the hyphen variants that detection accepts, such as send the report scratch-that, the original input has one token for scratch-that while phraseWordCount is 2, so removeLast(2) also drops report; applying the scratch action then deletes an additional word and produces send instead of send the. The strip step needs to remove the matched original suffix/range rather than a normalized word count.

Useful? React with 👍 / 👎.

@Kayaba-Attribution

Kayaba-Attribution commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Will fully test and verify in the upcoming days

@altic-dev

Copy link
Copy Markdown
Owner

I was going to add this but it needs clear UI and UX design before we add it in so people can use easily. Nice work but we need to spend a lot of time on this :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[✨ FEATURE] Add voice command triggers for punctuation and editing during dictation

2 participants