Control your real Chrome browser from the command line — with all your logins, cookies, and extensions intact.
An open-source alternative to cloud browser agents. Instead of spinning up a headless browser, obo lets AI agents (or you) drive the browser you already use.
AI Agent ──→ obo CLI ──→ HTTP ──→ obo server ←── WebSocket ──→ Extension ──→ CDP ──→ Chrome
(Fastify) (Manifest V3) (your tabs)
The Chrome extension connects to a local server via WebSocket. The server exposes an HTTP API and a CLI. AI agents call the CLI; the server relays commands to the extension, which uses Chrome DevTools Protocol to interact with your tabs.
Key insight: because it controls your real browser, every site you're already logged into just works — no credential management, no cookie juggling.
Load unpacked from packages/extension/dist, or install from the Chrome Web Store (coming soon).
You can run OBO without installing anything globally:
npx @agenthand/oboOr install the CLI globally:
npm install -g @agenthand/obonpx @agenthand/obo
# Server listening on http://127.0.0.1:3333The extension connects automatically.
CLI commands require a running server. obo status checks connectivity only and does not auto-start the server.
Most users never need to change connection settings. If your browser environment cannot reach the default local server, open the extension popup, expand Advanced, and set host + port.
The extension retries every 3 seconds while enabled, and you can Pause/Resume or trigger Reconnect manually from the popup.
See Troubleshooting for alternate ports, isolated browser environments, and localhost/network issues.
obo tabs # list open tabs
obo doctor # diagnose connection issues
obo new "https://example.com" --group "Research"
obo snapshot <tabId> -i # see interactive elements
obo click <tabId> @e1 # click an element
obo type <tabId> @e3 "hello" # type into a fieldInstall the skill:
npx skills add agenthand/oboOnce installed, Claude Code will automatically use obo when tasks involve browsing — "check my email", "fill out this form", "open twitter", etc.
Add the obo CLI commands to your agent's system prompt. The CLI outputs structured text that's easy for LLMs to parse:
obo snapshotreturns an accessibility tree (plain text)obo extractreturns normalized page content (JSON or Markdown)obo screenshotsaves a PNG to a temp path and prints the path- All other commands return JSON
You do not manually upload a package to skills.sh.
- Put your skill in a public GitHub repo (for example:
skills/my-skill/SKILL.md) - Install directly from GitHub:
npx skills add <owner>/<repo>
# install one specific skill in a multi-skill repo:
npx skills add <owner>/<repo> --skill <skill-name>
# or install a direct skill folder URL:
npx skills add https://github.com/<owner>/<repo>/tree/main/<path-to-skill>- Share that same install command with others
skills.sh can discover community skills from what people install with npx skills add, so distribution is repo-first.
Usage: obo [command] [options]
Server:
obo Start the server (default)
obo server [--port N] [--host H] Start with options
--token <token> Enable Bearer token auth (server mode)
--verbose Enable verbose server logs
Browser:
obo status Show connection status and active sessions
obo doctor Diagnose server and extension connection
obo tabs List all open tabs
Tab Management:
obo new [url] [--group name] Open a new grouped tab (default: blank)
obo close <tabId> Close a tab
obo attach <tabId> [--group name] Activate and group a tab
obo open <tabId> <url> Navigate a tab to a URL
obo navigate <tabId> <url> Alias for open
Interaction:
obo snapshot <tabId> [-i] Get accessibility tree (-i = interactive only)
obo extract <tabId> [--format] Extract page content (json|md)
obo screenshot <tabId> [-o file] [--base64] Save screenshot (temp file by default)
obo click <tabId> <ref|x> [y] Click an element (@e1) or coordinates (100 200)
obo type <tabId> <ref> "text" Type text into an element [--submit]
obo scroll <tabId> Scroll [--dy N] [--dx N] [--ref @e1]
obo upload <tabId> [ref] <file...> Upload local file(s) to an input
obo wait <tabId> --load|--idle Wait for page load/network idle
obo eval <tabId> "expression" Evaluate JavaScript in the page
obo eval <tabId> -f <file> Evaluate JavaScript from file
obo eval <tabId> --stdin Evaluate JavaScript from stdin
obo eval <tabId> -b <base64> Evaluate base64-encoded JavaScript
Global Options:
--url <url> Server URL (default: http://127.0.0.1:3333)
Environment Variables:
OBO_URL Server URL override
OBO_TOKEN Bearer token
When you run obo snapshot <tabId> -i, interactive elements are labeled with refs like @e1, @e2, etc. Use these refs with click, type, and scroll:
$ obo snapshot 123 -i
document "Example Page"
heading "Welcome"
button "Sign In" @e0
textbox "Email" @e1
textbox "Password" @e2
button "Submit" @e3
$ obo click 123 @e1
$ obo type 123 @e1 "user@example.com"
$ obo click 123 @e3Refs are invalidated when the page changes — always re-snapshot after navigation or interaction.
Snapshots include snapshotId for traceability across agent steps.
Most validation and connectivity failures return:
{ "error": "message", "code": "ERROR_CODE" }See docs/ERROR_CODES.md for code meanings and recommended recovery actions.
The extension intentionally keeps its permission surface small:
debugger- drive the current browser via Chrome DevTools Protocoltabs- list and manage tabstabGroups- group controlled tabsstorage- save local connection settings
OBO does not request broad host permissions in the extension manifest.
Security and publishing notes:
open-browser-operator/
├── packages/
│ ├── extension/ Chrome Manifest V3 extension
│ ├── server/ Fastify HTTP/WS server + CLI
│ └── shared/ TypeScript protocol types
└── pnpm-workspace.yaml
Chrome extension (Manifest V3) that connects to the local server via WebSocket. Uses the Chrome Debugger API (CDP 1.3) to:
- Capture accessibility trees and screenshots
- Click, type, scroll, navigate
- Manage tab lifecycle
Controlled tabs are grouped under an "OBO" tab group by default. Agents can pass a semantic group title, for example obo new "https://example.com" --group "Market Research", which creates or reuses a group named OBO: Market Research.
Fastify server exposing:
- REST API — 12 endpoints for browser control (
/tabs,/snapshot,/click, etc.) - WebSocket — single persistent connection to the extension
- CLI —
obobinary with subcommands that call the REST API
Request timeout: 30s. Optional Bearer token auth.
TypeScript types shared between server and extension:
SessionInfo,TabInfo,SnapshotResult,SnapshotNode- WebSocket protocol message types (command requests/responses, session updates)
- CLI/Agent calls
obo click 123 @e1 - CLI sends
POST /click { tabId: 123, ref: "@e1" }to the server - Server wraps it in a
command:requestWebSocket message with a unique ID - Extension receives the command, resolves
@e1to coordinates via CDP - Extension dispatches a mouse click via
Input.dispatchMouseEvent - Extension sends
command:responseback over WebSocket - Server returns the HTTP response to the CLI
git clone https://github.com/agenthand/obo.git
cd open-browser-operator
pnpm install
pnpm buildcd packages/extension
pnpm dev # Vite dev server with CRXJS hot reloadLoad packages/extension/dist as an unpacked extension in Chrome.
cd packages/server
pnpm dev # tsx watch modeChanges to packages/shared require rebuilding downstream packages.
| obo | Playwright / Puppeteer | |
|---|---|---|
| Browser | Your real Chrome | Headless / clean instance |
| Login state | Already logged in | Must authenticate |
| Cookies | Your real cookies | Empty |
| Extensions | Your extensions | None |
| Bookmarks | Your bookmarks | None |
| Use case | Personal automation, AI assistant | Testing, scraping |
| Setup | Install extension + start server | npm install |
Use obo when the task needs your browser context. Use headless browsers when you need a clean, reproducible environment.