🤖 J.A.R.V.I.S. — AI Desktop Assistant

Just A Rather Very Intelligent System A real-time, voice-first AI desktop assistant with full-duplex Gemini Live conversation, an Iron Man–inspired Web HUD, gesture control, emotion detection, and 30+ integrated system tools — all running locally on Windows.

🚀 Active Ongoing Project & Massive Future Potential: JARVIS is continuously evolved with new capabilities, holding immense potential for advanced desktop automation, custom agent workflows, and deeper spatial-visual perception.

🖥️ See It In Action

The Iron Man–style Web HUD: 3D globe, system vitals, news feed, chat, and Feature Hub — all in real-time.

Full-duplex voice conversation mode: waveform visualization, live tool execution, and real-time responses.

🎥 Want to see it live? Clone the repo, add your Gemini API key, and run start_jarvis.bat — the HUD opens in your browser at http://localhost:8080 with full voice interaction. You can also run python docs/record_demo.py to record a 60-second screen capture demo of the HUD in action.

Overview

JARVIS is a modular AI assistant that goes beyond chat. It combines real-time bidirectional voice (via Google Gemini Live), a multi-layer intelligence pipeline (intent classification → entity extraction → decision engine → emotion routing), and direct system control (apps, volume, brightness, screenshots) into a single cohesive runtime.

The interface is a full Iron Man–style Web HUD served over WebSocket, with live system vitals, a 3D globe, waveform visualization, and a tabbed Feature Hub for face recognition, gestures, WhatsApp, and news.

What makes this different

Voice-first, not text-first — Gemini Live provides full-duplex audio streaming. You talk, JARVIS talks back — simultaneously, with echo suppression and interrupt handling.
Tool execution, not just conversation — When you say "open Chrome," JARVIS doesn't just say "I opened Chrome." It calls open_app("Chrome") through a registered tool dispatcher and actually opens it.
Layered intelligence — A fast keyword classifier handles 90% of commands instantly. Ambiguous inputs route through BrainAdapter's ML pipeline. Only truly open-ended queries go to the LLM.
Tactical personality — JARVIS warns you before destructive actions (shutdown, max volume), detects repeated failures, and adapts responses to your emotional state.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        TRANSPORT LAYER                           │
│              websocket_server.py (WebSocket gateway)             │
│         Serves Web HUD · Routes all commands · Manages state     │
└───────────┬──────────────┬──────────────────┬────────────────────┘
            │              │                  │
   ┌────────▼────────┐ ┌──▼───────────┐ ┌───▼──────────────┐
   │  GEMINI LIVE    │ │  BRAIN       │ │  KEYWORD ENGINE  │
   │  ENGINE         │ │  ADAPTER     │ │  (dispatcher.py) │
   │                 │ │              │ │                  │
   │ Full-duplex     │ │ IntentModel  │ │ Pattern-match    │
   │ audio streaming │ │ EntityExtr.  │ │ 30+ intents      │
   │ 14 native tools │ │ DecisionEng. │ │ Cache + memory   │
   │ Echo gate       │ │ EmotionRoute │ │                  │
   └────────┬────────┘ └──┬───────────┘ └───┬──────────────┘
            │              │                  │
            └──────────────┴──────────────────┘
                           │
            ┌──────────────▼──────────────────┐
            │         EXECUTION LAYER          │
            │                                  │
            │  system_control.py   apps        │
            │  voice_engine.py     TTS         │
            │  workflow_manager.py automation   │
            │  news / weather / email / notes  │
            │  WhatsApp / YouTube / calendar    │
            └──────────────────────────────────┘

Key Modules

Module	Role
`websocket_server.py`	Central gateway. Serves the Web HUD, manages WebSocket connections, routes all commands through keyword engine or BrainAdapter, manages Gemini Live lifecycle.
`gemini_live_engine.py`	Full-duplex audio via Gemini 2.0 Flash. Handles mic capture, speaker playback, echo suppression, tool calls, and turn management — all async.
`brain_adapter.py`	ML pipeline bridge. Routes text through IntentModel → EntityExtractor → DecisionEngine → EmotionRouter for nuanced understanding. Falls back gracefully if any module is unavailable.
`state_controller.py`	UI state machine (`UIStateController`). Tracks state transitions (idle → listening → processing → speaking), trust scoring, emotion vectors, and deduplication.
`startup_orchestrator.py`	Boot sequence. Generates time-aware greetings, loads session history, reports system status, and builds context for the first Gemini Live turn.
`intent_classifier.py`	30+ intent classifier with confidence scoring. Maps natural language to structured actions.
`decision_engine.py`	Safety layer. Evaluates commands before execution — warns on destructive actions, blocks dangerous operations, enforces tactical personality.
`voice_engine.py`	Edge TTS backend with automatic cache cleanup. Provides `speak()` for non-live-mode responses.
`perception.py`	HUDPerception layer. Manages assistant identity (JARVIS/FRIDAY), speech deduplication, live-mode gating, and persona switching.

What Works Now

✅ Core — Fully Functional

Feature	Status	Details
Gemini Live Voice	✅ Working	Full-duplex audio, echo gate, interrupt handling, tool dispatch
BrainAdapter Text Routing	✅ Working	ML pipeline with intent → entity → decision → emotion
Keyword Engine	✅ Working	30+ intents, cache, memory, pattern matching
System Control	✅ Working	Open/close apps, volume, brightness, screenshots, lock/shutdown
Web HUD	✅ Working	Real-time dashboard with globe, vitals, chat, waveform
Feature Hub Tabs	✅ Working	Face Recognition, WhatsApp, Hand Gestures, News — tabbed UI
Tactical Personality	✅ Working	Safety warnings, failure detection, emotional adaptation
Switch to FRIDAY	✅ Working	Voice command to swap persona (JARVIS ↔ FRIDAY)
News	✅ Working	Category-filtered headlines via News API
Weather	✅ Working	Live weather via OpenWeatherMap (requires API key)
Reminders & Alarms	✅ Working	Natural language time parsing, background checker
Chat History	✅ Working	SQLite-backed with FTS5 search, thread-safe
Smart Notes	✅ Working	Create, search, list notes
Hotkeys	✅ Working	Ctrl+Alt+J (wake), Ctrl+Alt+S (shutdown)

⚙️ Optional — Dependency-Based

Feature	Requires	Details
Face Recognition	Webcam + OpenCV	Enrolls and recognizes users
Hand Gestures	Webcam + MediaPipe	Thumbs up/down, wave, swipe
Emotion Detection	Webcam + TensorFlow	Facial emotion → response adaptation
WhatsApp	`pywhatkit`	Send messages to contacts
Email	Gmail SMTP credentials	Send/read emails
YouTube	`yt-dlp`	Search and play videos
Calendar	Google Calendar API	Event listing and reminders
Spotify	`spotipy` + Spotify API	Music playback

Quick Start

Prerequisites

Python 3.10+
Windows 10/11 (system control features are Windows-native)
Gemini API Key from Google AI Studio
Microphone + Speakers (for Gemini Live voice)
Webcam (optional — for gesture, face recognition, emotion)

Installation

# Clone the repository
git clone https://github.com/Raghava001-web/Jarvis.git
cd Jarvis

# Create virtual environment
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
copy .env.example .env
# Edit .env and add your GEMINI_API_KEY

Configuration

Create a .env file in the project root (or edit the copied .env.example):

# Required
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_LIVE_ENABLED=true

# Optional
SMTP_EMAIL=your_email@gmail.com
SMTP_PASSWORD=your_app_password
OPENWEATHER_API_KEY=your_weather_key
NEWS_API_KEY=your_news_key

Run JARVIS

# Option 1: Batch launcher
start_jarvis.bat

# Option 2: Direct
python jarvis/gui/websocket_server.py

JARVIS will:

Start the WebSocket server on ws://localhost:8765
Serve the Web HUD on http://localhost:8080
Connect to Gemini Live for voice interaction
Initialize gesture/face/emotion (if webcam available)

Access the Web HUD

Open your browser to http://localhost:8080 to see the Iron Man–style dashboard.

Testing

Smoke Test Suite

A 53-test smoke suite validates all critical paths without requiring hardware or API keys:

python -m pytest tests/test_smoke.py -v

Current status: 90/90 tests passing (53 smoke tests + 37 NLP and router unit tests)

The suite covers:

Application boot and startup orchestrator
Gemini Live deduplication and echo gating
BrainAdapter pipeline routing
Intent classification (30+ intents)
News command end-to-end flow
JARVIS ↔ FRIDAY persona switching
Tactical personality (safety warnings, failure detection)
Handler map completeness

Additional Test Suites

# Intent model unit tests
python -m pytest tests/test_intent_model.py -v

# Entity extractor tests
python -m pytest tests/test_entity_extractor.py -v

# Intent router tests
python -m pytest tests/test_intent_router.py -v

# Run everything
python -m pytest tests/ -v

Voice Commands

"Open YouTube"                              → launches YouTube
"Close Chrome"                              → closes Chrome
"Set volume to 50"                          → adjusts system volume
"Take a screenshot"                         → captures screen
"What's the weather like?"                  → weather report
"Tell me the news"                          → headlines summary
"Set alarm for 7 AM"                        → alarm
"Remind me to call Mom in 30 minutes"       → reminder
"Send WhatsApp to Dad saying I'll be late"  → WhatsApp message
"Switch to Friday"                          → persona swap
"What time is it?"                          → time
"Search for latest AI research"             → web search
"Tell me a joke"                            → entertainment
"Shutdown JARVIS"                           → graceful shutdown

Project Structure

JARVIS-AI-Assistant/
├── jarvis/
│   ├── core/                        # 68 modules — brain, voice, tools, handlers
│   │   ├── gemini_live_engine.py       # Gemini Live full-duplex audio (1500+ lines)
│   │   ├── brain_adapter.py            # ML pipeline bridge
│   │   ├── intent_classifier.py        # 30+ intent classifier
│   │   ├── intent_handlers.py          # Handler implementations
│   │   ├── decision_engine.py          # Safety/tactical layer
│   │   ├── voice_engine.py             # Edge TTS backend
│   │   ├── perception.py               # HUDPerception + persona management
│   │   ├── startup_orchestrator.py     # Boot sequence
│   │   ├── system_control.py           # OS-level commands
│   │   ├── reminder_manager.py         # Thread-safe SQLite reminders
│   │   ├── chat_history.py             # Thread-safe chat storage
│   │   ├── context_memory.py           # Conversation memory
│   │   ├── emotion_router.py           # Text → mood detection
│   │   ├── state_manager.py            # Core state machine
│   │   └── ...                         # weather, news, email, WhatsApp, etc.
│   │
│   ├── gui/                         # Interface layer
│   │   ├── websocket_server.py         # Central gateway (WebSocket + HTTP)
│   │   ├── state_controller.py         # UI state controller
│   │   ├── mood_engine.py              # Emotion state machine
│   │   ├── command_processor.py        # Command routing
│   │   ├── desktop_gui.py              # Pygame desktop window
│   │   ├── advanced_hud.py             # Pygame HUD renderer
│   │   └── web_hud/
│   │       └── index.html              # Iron Man Web HUD (single-file app)
│   │
│   ├── tools/                       # Tool architecture
│   │   ├── dispatcher.py               # Intent → tool routing + caching
│   │   ├── tool_registry.py            # Async tool execution
│   │   └── web_tools.py                # Web search, news, URL fetch
│   │
│   └── data/                        # Runtime databases (SQLite)
│
├── tests/                           # Test suites
│   ├── test_smoke.py                   # 53 smoke tests
│   ├── test_intent_model.py            # Intent model tests
│   ├── test_entity_extractor.py        # Entity extraction tests
│   └── test_intent_router.py           # Router tests
│
├── jarvis_data/                     # Session data (gitignored)
├── requirements.txt                 # Python dependencies
├── .env.example                     # Environment template
├── start_jarvis.bat                 # One-click launcher
└── README.md                        # This file

Tech Stack

Layer	Technology
AI Engine	Google Gemini 2.0 Flash (Live + Text)
Voice	Gemini Live (full-duplex), Edge TTS
Vision	MediaPipe, OpenCV, TensorFlow
Desktop GUI	Pygame
Web HUD	Vanilla HTML/CSS/JS + WebSocket
Transport	WebSocket (real-time), HTTP (HUD serving)
System Control	pyautogui, pycaw, psutil
NLP	Custom intent classifier + sentence-transformers
Storage	SQLite (thread-safe, persistent)

API Keys

Service	Required	Get Key
Gemini API	✅ Required	Google AI Studio
OpenWeather	Optional	openweathermap.org
News API	Optional	newsapi.org
Gmail SMTP	Optional	Google App Passwords

Project Status & Ongoing Potential

🚀 Active & Ongoing Development — Massive Potential

JARVIS is an ongoing project with immense potential for expansion. While it is currently highly stable and fully prepared for demos, publishing, and daily use, we are continuously pushing updates to expand its capabilities.

Our current focus:

Core Stability: The runtime is stabilized with thread-safe database operations, audio clash prevention, and resource leak patches.
Robust Verification: 53/53 smoke tests pass consistently to prevent regressions.
Multimodal Focus: Gemini Live voice is the primary, full-duplex interaction mode.
Modular & Extensible: All core features are functional out of the box. Optional features (webcam, email, calendar) degrade gracefully when dependencies are missing.
Active Roadmap: Future releases aim to add offline-first intent routing, tighter OS-level automation loops, and multi-modal vision perception enhancements.

Known Limitations

Windows only — System control (volume, brightness, app management) uses Windows-native APIs
Gemini API key required — No offline fallback for AI features
Single user — Designed as a personal desktop assistant, not multi-tenant
Webcam features are optional — Face recognition, gestures, and emotion detection require a webcam and their respective ML dependencies
WebSocket server is monolithic — websocket_server.py is large (~4100 lines); partial extraction into command_processor.py, state_controller.py, and ws_channels.py has begun

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License — see the LICENSE file for details.

Acknowledgments

Google Gemini — For the Gemini Live API and Flash model
MediaPipe — For real-time hand and face tracking
The Iron Man franchise — For the JARVIS inspiration

"Good evening, sir. All systems are online and ready."
— J.A.R.V.I.S.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
docs		docs
jarvis		jarvis
tests		tests
.env.example		.env.example
.gitignore		.gitignore
COMPLETE_PROJECT_JOURNEY.md		COMPLETE_PROJECT_JOURNEY.md
JARVIS_Project_Documentation.md		JARVIS_Project_Documentation.md
LICENSE		LICENSE
README.md		README.md
RELEASE_SUMMARY.md		RELEASE_SUMMARY.md
START_HERE.md		START_HERE.md
requirements.txt		requirements.txt
start_jarvis.bat		start_jarvis.bat
zip_release.py		zip_release.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 J.A.R.V.I.S. — AI Desktop Assistant

🖥️ See It In Action

Overview

What makes this different

Architecture

Key Modules

What Works Now

✅ Core — Fully Functional

⚙️ Optional — Dependency-Based

Quick Start

Prerequisites

Installation

Configuration

Run JARVIS

Access the Web HUD

Testing

Smoke Test Suite

Additional Test Suites

Voice Commands

Project Structure

Tech Stack

API Keys

Project Status & Ongoing Potential

Known Limitations

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 J.A.R.V.I.S. — AI Desktop Assistant

🖥️ See It In Action

Overview

What makes this different

Architecture

Key Modules

What Works Now

✅ Core — Fully Functional

⚙️ Optional — Dependency-Based

Quick Start

Prerequisites

Installation

Configuration

Run JARVIS

Access the Web HUD

Testing

Smoke Test Suite

Additional Test Suites

Voice Commands

Project Structure

Tech Stack

API Keys

Project Status & Ongoing Potential

Known Limitations

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages