Summary
Let each prompt carry a minimum recording-duration threshold that determines when it triggers, so FluidVoice automatically selects the right prompt based on how long the recording is — not only on which app is in focus. Short utterances would map to a lightweight prompt (or no AI cleanup at all), while progressively longer recordings would map to heavier cleanup/formatting prompts, up to a dedicated meeting prompt for very long sessions.
The goal is to skip unnecessary post-processing on trivial inputs (e.g., "yeah, sure" or "sounds good"), where AI cleanup only adds latency, while still getting full cleanup and formatting on longer, more substantive dictations.
Context: how FluidVoice works today
FluidVoice doesn't have a "modes" concept — processing behavior is driven by prompts, and prompts can already be routed per application (different prompt sets for different apps).
The limitation is that prompt selection is keyed on the destination app, which assumes the app is a good proxy for how much/what cleanup the text needs. In practice, the better proxy is often the length of what was said — and a single app receives both kinds of input. A short "yep" and a three-paragraph update can both go into the same chat box but want very different handling, so an app-based rule alone can't get this right.
Proposed Behavior
Add a minimum-duration trigger to each prompt. When a dictation finishes, FluidVoice picks the prompt whose duration band the recording falls into. Users can define a ladder of prompts with escalating thresholds, for example:
- Short (e.g., ≥ 0s, below the next threshold): a quick-reply prompt — raw or minimal cleanup, Slack-style. Could also mean "no AI enhancement at all," outputting the local transcript directly.
- Medium (e.g., ≥ 10s): light cleanup — disfluency removal and punctuation.
- Long (e.g., ≥ 60s): full formatting — suitable for a document, wiki page, or word processor.
- Meeting (e.g., ≥ 30 min): a dedicated meeting prompt — summary, notes, or structured output.
The exact thresholds and number of bands are entirely user-defined. The thresholds are simply the boundaries between which prompt is chosen.
Empty / no-speech case
If the hotkey is pressed and released with no speech detected (silence or an accidental press), FluidVoice should remain idle — no transcript, no AI call, no output — rather than emit empty text or invoke a prompt.
How it fits the existing prompt system
- Per-prompt setting: the threshold is just one more field on a prompt, alongside its text and provider/model.
- Selection logic: at the end of a recording, choose the prompt whose duration band contains the recording length (e.g., the highest threshold that the recording meets or exceeds).
- Interaction with per-app routing:
Approach A — compose with per-app routing: keep per-app routing and let duration pick among the prompts available for that app. Per-app routing narrows the candidate set, then duration selects within it. If a user defines duration bands globally, they apply regardless of app. The precedence should be explicit and, ideally, user-configurable.
Approach B — replace per-app routing entirely: make duration the primary (and only) selector. You define all of your prompts, attach a duration selector to each, and the system picks purely on recording length, independent of the active app. This is simpler to reason about and is my preferred direction: a longer transcript gets a more complex cleanup prompt, while shorter ones get lighter (or no) cleanup, regardless of where the text is going.
- Local vs. cloud per band: because each band points to its own prompt, the user can already choose a local (Fluid Intelligence) or cloud/BYOK provider per band — e.g., short band = no AI / local only, long band = heavier cloud model.
Configuration / UX Notes
- Threshold value: a single minimum-duration field per prompt is the minimum viable control. Allowing values from a few hundred milliseconds up to tens of minutes covers everything from one-word replies to full meetings.
- Sensible default: ship with duration thresholds unset (feature off) so existing per-app prompt behavior is unchanged until a user opts in.
- Optional refinement (nice-to-have): allow the trigger to be based on transcript word/character count instead of (or in addition to) raw audio duration, since transcript length can be a more direct signal of "does this need cleanup." Duration is simpler and is a fine v1.
- Feedback: a subtle indicator (or a history entry) showing which prompt/band was selected would help users tune their thresholds, but isn't required for a first version.
Acceptance Criteria
- A user can assign a minimum recording-duration threshold to each prompt.
- After a recording, FluidVoice selects the prompt whose duration band the recording falls into.
- A user can configure a short band that outputs the local transcript with no AI enhancement.
- A user can configure a long/meeting band (e.g., ≥ 30 min) that routes to a dedicated prompt.
- Pressing the hotkey with no detected speech results in no output and no AI call (idle).
- Existing per-app prompt routing continues to work for users who don't set duration thresholds, and the precedence between per-app and per-duration selection is clearly defined.
Summary
Let each prompt carry a minimum recording-duration threshold that determines when it triggers, so FluidVoice automatically selects the right prompt based on how long the recording is — not only on which app is in focus. Short utterances would map to a lightweight prompt (or no AI cleanup at all), while progressively longer recordings would map to heavier cleanup/formatting prompts, up to a dedicated meeting prompt for very long sessions.
The goal is to skip unnecessary post-processing on trivial inputs (e.g., "yeah, sure" or "sounds good"), where AI cleanup only adds latency, while still getting full cleanup and formatting on longer, more substantive dictations.
Context: how FluidVoice works today
FluidVoice doesn't have a "modes" concept — processing behavior is driven by prompts, and prompts can already be routed per application (different prompt sets for different apps).
The limitation is that prompt selection is keyed on the destination app, which assumes the app is a good proxy for how much/what cleanup the text needs. In practice, the better proxy is often the length of what was said — and a single app receives both kinds of input. A short "yep" and a three-paragraph update can both go into the same chat box but want very different handling, so an app-based rule alone can't get this right.
Proposed Behavior
Add a minimum-duration trigger to each prompt. When a dictation finishes, FluidVoice picks the prompt whose duration band the recording falls into. Users can define a ladder of prompts with escalating thresholds, for example:
The exact thresholds and number of bands are entirely user-defined. The thresholds are simply the boundaries between which prompt is chosen.
Empty / no-speech case
If the hotkey is pressed and released with no speech detected (silence or an accidental press), FluidVoice should remain idle — no transcript, no AI call, no output — rather than emit empty text or invoke a prompt.
How it fits the existing prompt system
Approach A — compose with per-app routing: keep per-app routing and let duration pick among the prompts available for that app. Per-app routing narrows the candidate set, then duration selects within it. If a user defines duration bands globally, they apply regardless of app. The precedence should be explicit and, ideally, user-configurable.
Approach B — replace per-app routing entirely: make duration the primary (and only) selector. You define all of your prompts, attach a duration selector to each, and the system picks purely on recording length, independent of the active app. This is simpler to reason about and is my preferred direction: a longer transcript gets a more complex cleanup prompt, while shorter ones get lighter (or no) cleanup, regardless of where the text is going.
Configuration / UX Notes
Acceptance Criteria