🎙️ Voice2Text

Press a key, speak, get text. A simple voice input tool for macOS that works in any application.

Voice2Text solves this:

✅ Free tier available — use free models
✅ Cheap high-quality models — use affordable models (recommended: Gemini 3 Flash, ~$0.002/min, quality better than Wispr Flow, which costs $7/month)
✅ No subscriptions — pay only for what you use (if anything)
✅ Your own API key — works with any OpenAI-compatible API (OpenRouter, OpenAI, Anthropic, etc.)
✅ Global hotkey — press F8, speak, press F8 again — text is in your clipboard

How It Works

Press F8 → Speak → Press F8 again → Text in clipboard!

Recording is automatically compressed to OGG/OPUS (10-20x smaller) and sent to your configured API for transcription. The whole process takes 1-3 seconds.

Quick Start

# 1. Clone the repository
git clone https://github.com/anoru/voice2text.git
cd voice2text

# 2. Install dependencies
pip3 install -r requirements.txt
brew install ffmpeg  # Required for audio compression

# 3. Configure API (copy .env.example and edit)
cp .env.example .env
# Edit .env and add your API key

# 4. Run!
./start.sh

Press F8, say something, press F8 again — text is copied to clipboard!

Hotkeys

Key	Action
F8	Start/stop recording
F10	Cancel recording or transcription (saves API tokens)
Ctrl+C	Quit application

Hotkeys can be customized via environment variables in .env file.

Installation

Requirements

macOS (uses osascript for notifications)
Python 3.10+
Microphone access
FFmpeg (for audio compression)
API key from any OpenAI-compatible provider (OpenRouter, OpenAI, Anthropic, etc.)

Step-by-Step Installation

# 1. Clone the repository
git clone https://github.com/anoru/voice2text.git
cd voice2text

# 2. Create virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate

# 3. Install Python dependencies
pip install -r requirements.txt

# 4. Install FFmpeg (required for audio compression)
brew install ffmpeg

# 5. Configure environment
cp .env.example .env

Configuration

All configuration is done via the .env file (created from .env.example):

# Copy example file
cp .env.example .env

# Edit .env with your settings

Using Different API Providers

Voice2Text works with any OpenAI-compatible API endpoint. Edit your .env file:

Example with OpenRouter:

VOICE2TEXT_API_KEY=sk-or-v1-xxx
VOICE2TEXT_API_URL=https://openrouter.ai/api/v1
VOICE2TEXT_MODEL=google/gemini-3-flash-preview

OpenAI:

VOICE2TEXT_API_KEY=sk-xxx
VOICE2TEXT_API_URL=https://api.openai.com/v1
VOICE2TEXT_MODEL=gpt-4o-mini

Anthropic:

VOICE2TEXT_API_KEY=sk-ant-xxx
VOICE2TEXT_API_URL=https://api.anthropic.com/v1
VOICE2TEXT_MODEL=claude-3-haiku

Any other provider — just set the API key and endpoint URL in .env.

Getting API Key

Sign up at OpenRouter (or any other provider)
Create an API key in your provider's dashboard

Open .env file and paste your key:

VOICE2TEXT_API_KEY=sk-or-v1-your-key-here

Set the endpoint URL:

VOICE2TEXT_API_URL=https://openrouter.ai/api/v1

Set the model (check your provider's documentation for available models):
```
VOICE2TEXT_MODEL=google/gemini-3-flash-preview
```

Notifications

Voice2Text provides native macOS notifications throughout the transcription process:

🎙️ Recording Started — When you press F8 to begin recording
⏳ Transcribing — When recording stops and audio is being processed
✅ Transcription Ready — When text is successfully transcribed and copied to clipboard

Notifications help you track the workflow without watching the terminal. They appear in the top-right corner of your screen and automatically dismiss after a few seconds.

Features

🎙️ Audio compression — automatic conversion to OGG/OPUS (10-20x smaller file size)
🔄 Retry functionality — if transcription fails (API error, network issue), your recording is saved locally. Retry with the same or different model without re-recording
📋 Clipboard integration — result instantly copied to clipboard, paste anywhere
🔔 macOS notifications — native notifications when transcription is ready
💾 Local save — recording saved locally in case of API error

Technical Details

Architecture

Hotkey (F8) → Record Audio → Save as WAV → Compress to OGG/OPUS
     → Send to API → Transcription → Copy to Clipboard

Press hotkey to start recording
Audio captured at 16kHz mono
Saved as temporary WAV
Compressed to OGG/OPUS using FFmpeg (10-20x smaller)
Sent to API with selected model
Transcription returned and copied to clipboard
macOS notification shown

Command Line Options

# Start recording mode
./start.sh

# Retry last saved recording
./start.sh retry

Retry Feature

Why it's useful:

Sometimes transcription fails due to:

API rate limits
Network connectivity issues
Temporary service outages
Choosing the wrong model

Your recording is never lost. When an error occurs, Voice2Text automatically saves your audio file locally. You can retry transcription later without re-recording.

Example scenario:

You record a 2-minute voice memo
You stop recording, but the API returns an error
Voice2Text saves recording.ogg locally
You wait a moment, then run: ./start.sh retry
The transcription completes successfully

Or retry with a different model (edit .env first):

# Edit .env and change VOICE2TEXT_MODEL
./start.sh retry

Create an Alias (Optional)

For quick access, create a shell alias to launch Voice2Text with a single letter:

For Zsh (default on macOS):

# Add to ~/.zshrc
echo "alias v='cd ~/path/to/voice2text && ./start.sh'" >> ~/.zshrc
source ~/.zshrc

# Now just type:
v

For Bash:

# Add to ~/.bashrc
echo "alias v='cd ~/path/to/voice2text && ./start.sh'" >> ~/.bashrc
source ~/.bashrc

# Now just type:
v

Troubleshooting

Microphone not found

Error: No input device found

Solution: Check System Preferences → Security & Privacy → Privacy → Microphone and ensure Terminal has access.

Accessibility permissions

Error: pynput requires accessibility permissions

Solution:

System Preferences → Security & Privacy → Privacy → Accessibility
Add Terminal (or your IDE) to the list
Restart the application

FFmpeg not found

Error: Compression failed

Solution: Install FFmpeg:

brew install ffmpeg

API errors

Error: Invalid API key

Solution: Check your .env file has VOICE2TEXT_API_KEY set correctly.

Development

Using the launcher script

# Make executable and use
chmod +x start.sh
./start.sh

# Retry mode
./start.sh retry

Linting and formatting

pip install ruff
ruff check .
ruff format .

License

This project is released into the public domain using the Unlicense. You can do whatever you want with this code — no attribution required.

Acknowledgments

OpenRouter for unified API access to AI models
pynput for keyboard control
sounddevice for audio recording
pydub for audio compression

Support

If you encounter any issues or have questions, please open an issue on GitHub.

Made for people who prefer speaking to typing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start.sh		start.sh
voice2text.py		voice2text.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Voice2Text

How It Works

Quick Start

Hotkeys

Installation

Requirements

Step-by-Step Installation

Configuration

Using Different API Providers

Getting API Key

Notifications

Features

Technical Details

Architecture

Command Line Options

Retry Feature

Create an Alias (Optional)

Troubleshooting

Microphone not found

Accessibility permissions

FFmpeg not found

API errors

Development

Using the launcher script

Linting and formatting

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Voice2Text

How It Works

Quick Start

Hotkeys

Installation

Requirements

Step-by-Step Installation

Configuration

Using Different API Providers

Getting API Key

Notifications

Features

Technical Details

Architecture

Command Line Options

Retry Feature

Create an Alias (Optional)

Troubleshooting

Microphone not found

Accessibility permissions

FFmpeg not found

API errors

Development

Using the launcher script

Linting and formatting

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages