Turn your voice into text with a triple-tap — minimal, fast, and macOS-native.
ctrlSPEAK is your set-it-and-forget-it speech-to-text companion. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks — effortlessly copied and pasted. Built for macOS, it's lightweight, low-overhead, and stays out of your way until you call it.
ctrlspeak-demo.mp4
- 🖥️ Minimal Interface: Runs quietly in the background via the command line
- ⚡ Triple-Tap Magic: Start/stop recording with a quick
Ctrltriple-tap - 📋 Auto-Paste: Text lands right where you need it, no extra clicks
- 🔊 Audio Cues: Hear when recording begins and ends
- 🍎 Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance
- 🌟 Top-Tier Models: Powered by NVIDIA NeMo and OpenAI Whisper
- 📜 History Browser: Review, search, and copy past transcriptions (press
rin the UI)
- System: macOS 12.3+ (MPS acceleration supported)
- Python: 3.10
- Permissions:
- 🎤 Microphone (for recording)
- ⌨️ Accessibility (for shortcuts)
Grant these on first launch and you're good to go!
# Basic installation (MLX models only)
brew tap patelnav/ctrlspeak
brew install ctrlspeak
# Recommended: Full installation with all model support
brew install ctrlspeak --with-nvidia --with-whisper
# Check what models are available after installation
ctrlspeak --list-modelsWhat each option does:
--with-nvidia: Enables NVIDIA Parakeet and Canary models (recommended for best performance)--with-whisper: Enables OpenAI Whisper models (optional)
If you get "No module named 'nemo'" errors:
# Reinstall with NVIDIA support
brew reinstall ctrlspeak --with-nvidiaClone the repository:
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeakCreate and activate a virtual environment:
# Create a virtual environment
python -m venv .venv
# Activate it on macOS/Linux
source .venv/bin/activateInstall dependencies:
# Install core dependencies
pip install -r requirements.txt
# For NVIDIA model support (optional)
pip install -r requirements-nvidia.txt
# For Whisper model support (optional)
pip install -r requirements-whisper.txtctrlspeak.py: The full-featured star of the showlive_transcribe.py: Continuous transcription for testing vibestest_transcription.py: Debug or benchmark with easetest_parallel_models.py: Compare Nemotron streaming vs Parakeet side-by-side
- Run ctrlSPEAK in a terminal window:
# If installed with Homebrew ctrlspeak # If installed manually (from the project directory with activated venv) python ctrlspeak.py
- Triple-tap Ctrl to start recording
- Speak clearly into your microphone
- Triple-tap Ctrl again to stop recording
- The transcribed text will be automatically pasted at your cursor position
Once running, you can use these keyboard shortcuts in the terminal UI:
r- View transcription historym- Switch speech recognition modelsd- Change audio input devicel- View logsh- Show helpq- Quit
ctrlSPEAK uses open-source speech recognition models:
- Parakeet 0.6B (MLX) (default):
mlx-community/parakeet-tdt-0.6b-v3model optimized for Apple Silicon. Recommended for most users on M1/M2/M3 Macs. - Canary: NVIDIA NeMo's
nvidia/canary-1b-flashmultilingual model (En, De, Fr, Es) with punctuation, but can be slower. Requiresrequirements-nvidia.txt. - Canary (180M): NVIDIA NeMo's
nvidia/canary-180m-flashmultilingual model, smaller and less accurate. Requiresrequirements-nvidia.txt. - Whisper (optional): OpenAI's
openai/whisper-large-v3model. A fast, accurate, and powerful model that includes excellent punctuation and capitalization. Requiresrequirements-whisper.txt. - Nemotron (Streaming) [Experimental]: NVIDIA's
nvidia/nemotron-speech-streaming-en-0.6bstreaming model with real-time transcription. Text appears as you speak. Requiresrequirements-nvidia.txt.
Note: The nvidia/parakeet-tdt-1.1b model is also available for testing, but it is not recommended for general use as it lacks punctuation and is slower than the 0.6b model. Requires requirements-nvidia.txt.
The models are automatically downloaded from HuggingFace the first time you use them.
To see a list of all supported models, use the --list-models flag:
ctrlspeak --list-modelsThis will output a list of the available model aliases and their corresponding Hugging Face model names.
For users on Apple Silicon (M1/M2/M3 Macs), an optimized version of the Parakeet model is available using Apple's MLX framework. This is the default model and provides a significant performance boost.
You can select a model using the --model flag. You can use either the full model name from HuggingFace or a short alias.
Short Names:
parakeet: Parakeet 0.6B optimized for Apple Silicon (MLX). (Default)canary: NVIDIA's Canary 1B Flash model.canary-180m: NVIDIA's Canary 180M Flash model.whisper: OpenAI's Whisper v3 model.nemotron: NVIDIA's Nemotron streaming model. [Experimental]
Full Model URL:
You can also provide a full model URL from Hugging Face. For example:
ctrlspeak --model nvidia/parakeet-tdt-1.1bThis will download and use the specified model.
# Using Homebrew installation
ctrlspeak --model parakeet # Default
ctrlspeak --model canary # Multilingual with punctuation
ctrlspeak --model canary-180m # The smaller Canary model
ctrlspeak --model canary-v2
ctrlspeak --model whisper # OpenAI's model
ctrlspeak --model parakeet-mlx # MLX-accelerated model
ctrlspeak --model nemotron # Streaming (experimental)
# Using manual installation
python ctrlspeak.py --model parakeet
python ctrlspeak.py --model canary
python ctrlspeak.py --model canary-180m
python ctrlspeak.py --model canary-v2
python ctrlspeak.py --model whisper
python ctrlspeak.py --model parakeet-mlx
python ctrlspeak.py --model nemotronctrlSPEAK automatically saves your transcriptions locally for later review.
Access the interactive history browser by pressing r in the terminal UI:
- View past transcriptions - Browse all saved transcriptions with timestamps
- Copy to clipboard - Press
Enterorcto copy any previous transcription - Delete entries - Press
Deleteordto remove unwanted entries - Navigate - Use arrow keys to browse through your history
- See statistics - View total entries, word count, and recording time
History is stored locally in a SQLite database:
- Location:
~/.ctrlspeak/history.db - What's stored: Timestamp, transcription text, model used, duration, language
- Permissions: File is created with user-only access (700)
You have full control over your transcription history:
# Disable history saving
ctrlspeak --no-history
# Use custom database location
ctrlspeak --history-db ~/my-custom-path/history.db
# Delete all history data
rm ~/.ctrlspeak/history.dbctrlspeak [OPTIONS]
Options:
--model MODEL Select speech recognition model (default: parakeet)
--list-models Show all available models
--no-history Disable transcription history saving
--history-db PATH Custom path for history database
--source-lang LANG Source language code (default: en)
--target-lang LANG Target language code (default: en)
--debug Enable debug logging
--check-only Verify configuration without running
--check-compatibility Check system compatibility
Examples:
ctrlspeak # Run with defaults
ctrlspeak --model whisper # Use Whisper model
ctrlspeak --no-history # Disable history
ctrlspeak --history-db ~/backup/history.db # Custom DB location
ctrlspeak --debug # Enable debug mode- Parakeet 0.6B (NVIDIA) -
nvidia/parakeet-tdt-0.6b-v3(Default) - Parakeet 1.1B (NVIDIA) -
nvidia/parakeet-tdt-1.1b - Canary (NVIDIA) -
nvidia/canary-1b-flash - Canary (NVIDIA) -
nvidia/canary-180m-flash - Canary (NVIDIA) -
nvidia/canary-1b-v2 - Whisper (OpenAI) -
openai/whisper-large-v3 - Nemotron (NVIDIA) -
nvidia/nemotron-speech-streaming-en-0.6b[Experimental, Streaming]
| Model | Framework | Load Time (s) | Transcription Time (s) | Output Example (test.wav) |
|---|---|---|---|---|
parakeet-tdt-0.6b-v3 |
MLX (Apple Silicon) | 0.97 | 0.53 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
| NeMo (NVIDIA) | 15.52 | 1.68 | ||
parakeet-tdt-0.6b-v2 |
MLX (Apple Silicon) | 0.99 | 0.56 | "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait." |
| NeMo (NVIDIA) | 8.23 | 1.61 | ||
canary-1b-flash |
NeMo (NVIDIA) | 32.06 | 3.20 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
canary-180m-flash |
NeMo (NVIDIA) | 6.16 | 3.20 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
whisper-large-v3 |
Transformers (OpenAI) | 5.44 | 2.53 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
Testing performed on a MacBook Pro (M2 Max) with a 7-second audio file (test.wav). Your results may vary.
Note: Whisper model uses translate mode to enable proper punctuation and capitalization for English transcription.
The Nemotron model uses real-time streaming transcription where text appears as you speak. This provides instant feedback but has accuracy tradeoffs compared to batch models like Parakeet:
- Streaming (Nemotron): Text appears incrementally during speech. Lower accuracy due to limited context - may miss or misinterpret phrases.
- Batch (Parakeet, etc.): Transcription happens after recording stops. Higher accuracy because the model has the full audio context.
For most users, Parakeet MLX (default) provides the best balance of speed and accuracy.
The app requires:
- Microphone access (for recording audio)
- Accessibility permissions (for global keyboard shortcuts)
You'll be prompted to grant these permissions on first run.
- No sound on recording start/stop: Ensure your system volume is not muted
- Keyboard shortcuts not working: Grant accessibility permissions in System Settings
- Transcription errors: Try speaking more clearly or using the other model
- @swanhtet1992 - Transcription history feature
- Start sound: "Notification Pluck On" from Pixabay
- Stop sound: "Notification Pluck Off" from Pixabay
This outlines the steps to create a new release and update the associated Homebrew tap.
1. Prepare the Release:
- Ensure the code is stable and tests pass.
- Update the version number in the following files:
VERSION(e.g.,1.2.0)__init__.py(__version__ = "1.2.0")pyproject.toml(version = "1.2.0")
- Commit these version changes:
git add VERSION __init__.py pyproject.toml git commit -m "Bump version to X.Y.Z"
2. Tag and Push:
- Create a git tag matching the version:
git tag vX.Y.Z
- Push the commits and the tag to the remote repository:
git push && git push origin vX.Y.Z
3. Update Homebrew Tap:
- The source code tarball URL is automatically generated based on the tag (usually
https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz). - Download the tarball using its URL and calculate its SHA256 checksum:
# Replace URL with the actual tarball link based on the tag curl -sL https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz | shasum -a 256
- Clone or navigate to your Homebrew tap repository (e.g.,
../homebrew-ctrlspeak). - Edit the formula file (e.g.,
Formula/ctrlspeak.rb):- Update the
urlline with the tag tarball URL. - Update the
sha256line with the checksum you calculated. - Optional: Update the
versionline if necessary (though it's often inferred). - Optional: If
requirements.txtor dependencies changed, update thedepends_onandinstallsteps accordingly.
- Update the
- Commit and push the changes in the tap repository:
cd ../path/to/homebrew-ctrlspeak # Or wherever your tap repo is git add Formula/ctrlspeak.rb git commit -m "Update ctrlspeak to vX.Y.Z" git push
4. Verify (Optional):
- Run
brew updatelocally to fetch the updated formula. - Run
brew upgrade ctrlspeakto install the new version. - Test the installed version.