Voice-Activated AI Assistant that just does stuff -- no BS.
Heathcliff is a voice-enabled personal AI assistant that integrates with your daily services. Wake it up with "Heathcliff", give it commands, and watch it orchestrate tasks across Gmail, Calendar, Spotify, Weather, News, and more using Gemini Flash 2.5-powered decision making.
Get up and running in 5 minutes:
# 1. Clone and navigate
git clone <your-repo-url>
cd heathcliff
# 2. Install system dependencies (Linux/WSL)
sudo apt install python3-pyaudio portaudio19-dev espeak
# 3. Set up Python environment with uv
curl -LsSf https://astral.sh/uv/install.sh | sh # skip if you already have uv
uv sync # creates .venv from pyproject.toml / uv.lock
# 4. Configure API keys
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY + service keys
# (Optional) Add LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY
# + LANGFUSE_BASE_URL (https://cloud.langfuse.com or us.cloud...) for observability
# 5. Run in text mode (no voice hardware needed)
uv run python main.py --text
# OR run in voice mode
uv run python main.py
# OR launch the Streamlit dashboard
uv run streamlit run ui/Home.pyTo ensure code quality and consistent formatting, this project uses pre-commit hooks (isort, black, etc.).
Run formatting and linting manually:
uv run pre-commit run --all-filesThat's it! For detailed setup including Google OAuth, Spotify, and other integrations, see SETUP.md.
- Wake word detection ("Heathcliff")
- Speech-to-text and text-to-speech
- Conversational memory and context
- Gmail: Read, search, send emails
- Google Calendar: View schedule, create events
- Spotify: Play music, control playback
- Weather: Real-time weather updates
- News: Latest headlines by topic
- Web Search: DuckDuckGo + Wikipedia
- Telegram: Send notifications
- Google Drive: Read files
- Gemini Flash 2.5 LLM
- LangGraph agent orchestration
- ChromaDB vector memory
- Multi-turn conversation context
- Long-term memory storage
- Built-in Langfuse tracing for every conversation
- LangChain callback handler automatically captures Gemini prompts/completions
- Tool usage + errors are streamed to Langfuse events for debugging
- Voice mode (main.py)
- Text mode for testing
- Streamlit web dashboard
- LLM Framework: LangChain + LangGraph with Gemini 2.0 Flash
- Memory: ChromaDB for persistent vector storage
- Voice: openwakeword (wake word), Google STT
- Integrations: Gmail, Google Calendar, Spotify APIs
- Audio: PyAudio
Heathcliff uses a LangGraph-based agent architecture with 4 nodes:
- Retrieval Node: Fetches relevant context and memories from ChromaDB
- Reasoning Node: Processes input with Gemini LLM, determines actions
- Tool Calling Node: Executes requested tools (weather, time, etc.)
- Output Node: Saves conversation to memory, returns response
python main.py- Say "Heathcliff" to activate
- Speak your command
- Heathcliff responds via audio
Example:
[You say]: "Heathcliff"
[Heathcliff]: *listening beep*
[You say]: "What's the weather in London?"
[Heathcliff]: "The current weather in London is 72Β°F and partly cloudy..."
python main.py --text- Type commands in terminal
- Great for debugging and testing
- No microphone/speakers required
Example:
You: What's the weather?
Heathcliff: The current weather in New York is 68Β°F and sunny...
You: Add an event to my calendar for tomorrow at 2pm
Heathcliff: I've added an event to your calendar for tomorrow at 2:00 PM...
streamlit run ui/Home.pyAccess at http://localhost:8501
Dashboard Features:
- Home: Chat interface with Heathcliff
- Memories: View, search, and add long-term memories
- Analytics: Usage statistics and conversation insights
- Settings: View API configuration and system status
from core import MemoryManager, HeathcliffAgent
from config import Config
# Initialize components
config = Config
memory = MemoryManager(config=config)
agent = HeathcliffAgent(config=config, memory_manager=memory)
# Single turn conversation
response = agent.invoke("Hello! What can you do?")
print(response)
# Multi-turn conversation (same session maintains context)
session_id = "my-session-123"
response1 = agent.invoke("My name is Adi", session_id=session_id)
response2 = agent.invoke("What's my name?", session_id=session_id)
# response2 will know your name is Adifrom core import MemoryManager
memory = MemoryManager(persist_dir="./chroma_db")
# Store a long-term memory
memory_id = memory.add_memory("User prefers dark mode", category="preferences")
# Recall relevant memories
results = memory.recall("what are user preferences?", n=3)
print(results["documents"])
# Save chat conversation
memory.save_chat(
user_msg="What's the weather?",
assistant_msg="It's sunny and 72F",
session_id="session-123"
)
# Retrieve chat context
context = memory.get_chat_context("weather", session_id="session-123")python app.pyHeathcliff now ships with first-class Langfuse instrumentation:
- Set
LANGFUSE_PUBLIC_KEY,LANGFUSE_SECRET_KEY, and (optionally)LANGFUSE_HOST/LANGFUSE_RELEASEin.env.- If you're on Langfuse Cloud US/EU, also set
LANGFUSE_BASE_URLtohttps://us.cloud.langfuse.comorhttps://cloud.langfuse.com.
- If you're on Langfuse Cloud US/EU, also set
- Start the assistant like normal; every agent run creates a Langfuse trace named
heathcliff.agent, tagged withuser_id=adiagarwal(configurable viaobservability.langfuse.user_id). - Gemini prompt/response pairs automatically stream through the Langfuse LangChain callback handler.
- Each external tool invocation is logged as a Langfuse event, so you can inspect failures and latency directly in the Langfuse UI.
Troubleshooting tips
- If no traces appear, run
python -m utils.langfuse_clientor start Heathcliff withLOG_LEVEL=DEBUGto confirm the Langfuse callback is registering. - Double-check the Langfuse dashboard filters (environment/project) match the
observability.langfuse.environmentvalue inconfig/config.py. - Serverless/text-only sessions may exit before the SDK flushes; add
LANGFUSE_DISABLE_BACKGROUND_FLUSH=falseor keep the process alive for a few seconds. - The Langfuse callback handler automatically reads keys from environment variables. Passing
public_key/secret_keydirectly will fail on newer Langfuse releases, so be sure the env vars are loaded before the process starts.
Disable observability anytime by setting observability.langfuse.enabled to false in config/config.py.
Say "Heathcliff" to activate, then give your command.
User: Hello!
Heathcliff: Hello! I'm Heathcliff, your personal AI assistant. How can I help you today?
User: What's the weather in London?
Heathcliff: The weather in London is 72F and sunny.
User: My name is Adi and I work as a software engineer.
Heathcliff: Nice to meet you, Adi! I'll remember that you're a software engineer.
User: What do you know about me?
Heathcliff: Based on what I know, your name is Adi and you work as a software engineer.
heathcliff/
βββ main.py # Main entry point (voice/text modes)
βββ core/
β βββ memory_manager.py # ChromaDB-backed memory storage
β βββ agent_core.py # LangGraph agent orchestrator
β βββ audio_handler.py # Voice I/O (wake word, STT, TTS)
βββ config/
β βββ config.py # Configuration classes
β βββ __init__.py # Config singleton
βββ tools/ # Tool integrations
β βββ email_tool.py # Gmail integration
β βββ calendar_tool.py # Google Calendar
β βββ spotify_tool.py # Spotify playback
β βββ info_tools.py # Weather, news, web search
β βββ comm_tools.py # Telegram, Google Drive
βββ utils/
β βββ google_auth.py # OAuth2 credential manager
βββ ui/ # Streamlit dashboard
β βββ Home.py # Main chat interface
β βββ pages/
β βββ 1_π§ _Memories.py
β βββ 2_π_Analytics.py
β βββ 3_βοΈ_Settings.py
βββ plan/ # Planning docs
β βββ INIT.md
β βββ TODO.md
β βββ EXECUTION.md
βββ .env.example # API key template
βββ config/config.py # Runtime configuration
βββ requirements.txt # Python dependencies
βββ SETUP.md # Detailed setup guide
βββ README.md
Phase 4 Complete (v1.0.0) β
- β Foundation Setup (Config, Memory, Audio)
- β Core Agent (LangGraph with Gemini Flash 2.5)
- β Tools Integration (Gmail, Calendar, Spotify, Weather, News, etc.)
- β UI & Integration (Voice mode, Text mode, Streamlit dashboard)
- β³ Testing & Polish (Pending)
Ready for production use with basic testing!
See plan/EXECUTION.md for architecture details and plan/TODO.md for remaining tasks.
- SETUP.md: Complete setup guide with API credentials and troubleshooting
- plan/INIT.md: Initial architecture and design decisions
- plan/EXECUTION.md: Detailed implementation plan
MIT