Press a key, speak, get text. A simple voice input tool for macOS that works in any application.
Voice2Text solves this:
- ✅ Free tier available — use free models
- ✅ Cheap high-quality models — use affordable models (recommended: Gemini 3 Flash, ~$0.002/min, quality better than Wispr Flow, which costs $7/month)
- ✅ No subscriptions — pay only for what you use (if anything)
- ✅ Your own API key — works with any OpenAI-compatible API (OpenRouter, OpenAI, Anthropic, etc.)
- ✅ Global hotkey — press F8, speak, press F8 again — text is in your clipboard
Press F8 → Speak → Press F8 again → Text in clipboard!
Recording is automatically compressed to OGG/OPUS (10-20x smaller) and sent to your configured API for transcription. The whole process takes 1-3 seconds.
# 1. Clone the repository
git clone https://github.com/anoru/voice2text.git
cd voice2text
# 2. Install dependencies
pip3 install -r requirements.txt
brew install ffmpeg # Required for audio compression
# 3. Configure API (copy .env.example and edit)
cp .env.example .env
# Edit .env and add your API key
# 4. Run!
./start.shPress F8, say something, press F8 again — text is copied to clipboard!
| Key | Action |
|---|---|
| F8 | Start/stop recording |
| F10 | Cancel recording or transcription (saves API tokens) |
| Ctrl+C | Quit application |
Hotkeys can be customized via environment variables in .env file.
- macOS (uses
osascriptfor notifications) - Python 3.10+
- Microphone access
- FFmpeg (for audio compression)
- API key from any OpenAI-compatible provider (OpenRouter, OpenAI, Anthropic, etc.)
# 1. Clone the repository
git clone https://github.com/anoru/voice2text.git
cd voice2text
# 2. Create virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate
# 3. Install Python dependencies
pip install -r requirements.txt
# 4. Install FFmpeg (required for audio compression)
brew install ffmpeg
# 5. Configure environment
cp .env.example .envAll configuration is done via the .env file (created from .env.example):
# Copy example file
cp .env.example .env
# Edit .env with your settingsVoice2Text works with any OpenAI-compatible API endpoint. Edit your .env file:
Example with OpenRouter:
VOICE2TEXT_API_KEY=sk-or-v1-xxx
VOICE2TEXT_API_URL=https://openrouter.ai/api/v1
VOICE2TEXT_MODEL=google/gemini-3-flash-previewOpenAI:
VOICE2TEXT_API_KEY=sk-xxx
VOICE2TEXT_API_URL=https://api.openai.com/v1
VOICE2TEXT_MODEL=gpt-4o-miniAnthropic:
VOICE2TEXT_API_KEY=sk-ant-xxx
VOICE2TEXT_API_URL=https://api.anthropic.com/v1
VOICE2TEXT_MODEL=claude-3-haikuAny other provider — just set the API key and endpoint URL in .env.
- Sign up at OpenRouter (or any other provider)
- Create an API key in your provider's dashboard
- Open
.envfile and paste your key:VOICE2TEXT_API_KEY=sk-or-v1-your-key-here - Set the endpoint URL:
VOICE2TEXT_API_URL=https://openrouter.ai/api/v1 - Set the model (check your provider's documentation for available models):
VOICE2TEXT_MODEL=google/gemini-3-flash-preview
Voice2Text provides native macOS notifications throughout the transcription process:
- 🎙️ Recording Started — When you press F8 to begin recording
- ⏳ Transcribing — When recording stops and audio is being processed
- ✅ Transcription Ready — When text is successfully transcribed and copied to clipboard
Notifications help you track the workflow without watching the terminal. They appear in the top-right corner of your screen and automatically dismiss after a few seconds.
- 🎙️ Audio compression — automatic conversion to OGG/OPUS (10-20x smaller file size)
- 🔄 Retry functionality — if transcription fails (API error, network issue), your recording is saved locally. Retry with the same or different model without re-recording
- 📋 Clipboard integration — result instantly copied to clipboard, paste anywhere
- 🔔 macOS notifications — native notifications when transcription is ready
- 💾 Local save — recording saved locally in case of API error
Hotkey (F8) → Record Audio → Save as WAV → Compress to OGG/OPUS
→ Send to API → Transcription → Copy to Clipboard
- Press hotkey to start recording
- Audio captured at 16kHz mono
- Saved as temporary WAV
- Compressed to OGG/OPUS using FFmpeg (10-20x smaller)
- Sent to API with selected model
- Transcription returned and copied to clipboard
- macOS notification shown
# Start recording mode
./start.sh
# Retry last saved recording
./start.sh retryWhy it's useful:
Sometimes transcription fails due to:
- API rate limits
- Network connectivity issues
- Temporary service outages
- Choosing the wrong model
Your recording is never lost. When an error occurs, Voice2Text automatically saves your audio file locally. You can retry transcription later without re-recording.
Example scenario:
- You record a 2-minute voice memo
- You stop recording, but the API returns an error
- Voice2Text saves
recording.ogglocally - You wait a moment, then run:
./start.sh retry - The transcription completes successfully
Or retry with a different model (edit .env first):
# Edit .env and change VOICE2TEXT_MODEL
./start.sh retryFor quick access, create a shell alias to launch Voice2Text with a single letter:
For Zsh (default on macOS):
# Add to ~/.zshrc
echo "alias v='cd ~/path/to/voice2text && ./start.sh'" >> ~/.zshrc
source ~/.zshrc
# Now just type:
vFor Bash:
# Add to ~/.bashrc
echo "alias v='cd ~/path/to/voice2text && ./start.sh'" >> ~/.bashrc
source ~/.bashrc
# Now just type:
vError: No input device found
Solution: Check System Preferences → Security & Privacy → Privacy → Microphone and ensure Terminal has access.
Error: pynput requires accessibility permissions
Solution:
- System Preferences → Security & Privacy → Privacy → Accessibility
- Add Terminal (or your IDE) to the list
- Restart the application
Error: Compression failed
Solution: Install FFmpeg:
brew install ffmpegError: Invalid API key
Solution: Check your .env file has VOICE2TEXT_API_KEY set correctly.
# Make executable and use
chmod +x start.sh
./start.sh
# Retry mode
./start.sh retrypip install ruff
ruff check .
ruff format .This project is released into the public domain using the Unlicense. You can do whatever you want with this code — no attribution required.
- OpenRouter for unified API access to AI models
- pynput for keyboard control
- sounddevice for audio recording
- pydub for audio compression
If you encounter any issues or have questions, please open an issue on GitHub.
Made for people who prefer speaking to typing