Skip to content

agenthand/obo

Repository files navigation

Open Browser Operator (obo)

npm version

Control your real Chrome browser from the command line — with all your logins, cookies, and extensions intact.

An open-source alternative to cloud browser agents. Instead of spinning up a headless browser, obo lets AI agents (or you) drive the browser you already use.

How It Works

AI Agent ──→ obo CLI ──→ HTTP ──→ obo server ←── WebSocket ──→ Extension ──→ CDP ──→ Chrome
                                  (Fastify)                    (Manifest V3)        (your tabs)

The Chrome extension connects to a local server via WebSocket. The server exposes an HTTP API and a CLI. AI agents call the CLI; the server relays commands to the extension, which uses Chrome DevTools Protocol to interact with your tabs.

Key insight: because it controls your real browser, every site you're already logged into just works — no credential management, no cookie juggling.

Quick Start

1. Install the Chrome Extension

Load unpacked from packages/extension/dist, or install from the Chrome Web Store (coming soon).

2. Install the Server + CLI

You can run OBO without installing anything globally:

npx @agenthand/obo

Or install the CLI globally:

npm install -g @agenthand/obo

3. Start the Server

npx @agenthand/obo
# Server listening on http://127.0.0.1:3333

The extension connects automatically. CLI commands require a running server. obo status checks connectivity only and does not auto-start the server.

Most users never need to change connection settings. If your browser environment cannot reach the default local server, open the extension popup, expand Advanced, and set host + port. The extension retries every 3 seconds while enabled, and you can Pause/Resume or trigger Reconnect manually from the popup.

See Troubleshooting for alternate ports, isolated browser environments, and localhost/network issues.

4. Try It

obo tabs                                  # list open tabs
obo doctor                                # diagnose connection issues
obo new "https://example.com" --group "Research"
obo snapshot <tabId> -i                   # see interactive elements
obo click <tabId> @e1                     # click an element
obo type <tabId> @e3 "hello"              # type into a field

Use with AI Agents

Claude Code

Install the skill:

npx skills add agenthand/obo

Once installed, Claude Code will automatically use obo when tasks involve browsing — "check my email", "fill out this form", "open twitter", etc.

Any AI Agent

Add the obo CLI commands to your agent's system prompt. The CLI outputs structured text that's easy for LLMs to parse:

  • obo snapshot returns an accessibility tree (plain text)
  • obo extract returns normalized page content (JSON or Markdown)
  • obo screenshot saves a PNG to a temp path and prints the path
  • All other commands return JSON

Publishing Your Own Skill on skills.sh

You do not manually upload a package to skills.sh.

  1. Put your skill in a public GitHub repo (for example: skills/my-skill/SKILL.md)
  2. Install directly from GitHub:
npx skills add <owner>/<repo>
# install one specific skill in a multi-skill repo:
npx skills add <owner>/<repo> --skill <skill-name>
# or install a direct skill folder URL:
npx skills add https://github.com/<owner>/<repo>/tree/main/<path-to-skill>
  1. Share that same install command with others

skills.sh can discover community skills from what people install with npx skills add, so distribution is repo-first.

CLI Reference

Usage: obo [command] [options]

Server:
  obo                              Start the server (default)
  obo server [--port N] [--host H] Start with options
  --token <token>                  Enable Bearer token auth (server mode)
  --verbose                        Enable verbose server logs

Browser:
  obo status                       Show connection status and active sessions
  obo doctor                       Diagnose server and extension connection
  obo tabs                         List all open tabs

Tab Management:
  obo new [url] [--group name]     Open a new grouped tab (default: blank)
  obo close <tabId>                Close a tab
  obo attach <tabId> [--group name] Activate and group a tab
  obo open <tabId> <url>           Navigate a tab to a URL
  obo navigate <tabId> <url>       Alias for open

Interaction:
  obo snapshot <tabId> [-i]        Get accessibility tree (-i = interactive only)
  obo extract <tabId> [--format]   Extract page content (json|md)
  obo screenshot <tabId> [-o file] [--base64] Save screenshot (temp file by default)
  obo click <tabId> <ref|x> [y]   Click an element (@e1) or coordinates (100 200)
  obo type <tabId> <ref> "text"    Type text into an element [--submit]
  obo scroll <tabId>               Scroll [--dy N] [--dx N] [--ref @e1]
  obo upload <tabId> [ref] <file...> Upload local file(s) to an input
  obo wait <tabId> --load|--idle   Wait for page load/network idle
  obo eval <tabId> "expression"    Evaluate JavaScript in the page
  obo eval <tabId> -f <file>       Evaluate JavaScript from file
  obo eval <tabId> --stdin         Evaluate JavaScript from stdin
  obo eval <tabId> -b <base64>     Evaluate base64-encoded JavaScript

Global Options:
  --url <url>                      Server URL (default: http://127.0.0.1:3333)

Environment Variables:
  OBO_URL                          Server URL override
  OBO_TOKEN                        Bearer token

Element References

When you run obo snapshot <tabId> -i, interactive elements are labeled with refs like @e1, @e2, etc. Use these refs with click, type, and scroll:

$ obo snapshot 123 -i
document "Example Page"
  heading "Welcome"
  button "Sign In" @e0
  textbox "Email" @e1
  textbox "Password" @e2
  button "Submit" @e3

$ obo click 123 @e1
$ obo type 123 @e1 "user@example.com"
$ obo click 123 @e3

Refs are invalidated when the page changes — always re-snapshot after navigation or interaction. Snapshots include snapshotId for traceability across agent steps.

Error Handling

Most validation and connectivity failures return:

{ "error": "message", "code": "ERROR_CODE" }

See docs/ERROR_CODES.md for code meanings and recommended recovery actions.

Permissions, Security, And Privacy

The extension intentionally keeps its permission surface small:

  • debugger - drive the current browser via Chrome DevTools Protocol
  • tabs - list and manage tabs
  • tabGroups - group controlled tabs
  • storage - save local connection settings

OBO does not request broad host permissions in the extension manifest.

Security and publishing notes:

Architecture

open-browser-operator/
├── packages/
│   ├── extension/     Chrome Manifest V3 extension
│   ├── server/        Fastify HTTP/WS server + CLI
│   └── shared/        TypeScript protocol types
└── pnpm-workspace.yaml

packages/extension

Chrome extension (Manifest V3) that connects to the local server via WebSocket. Uses the Chrome Debugger API (CDP 1.3) to:

  • Capture accessibility trees and screenshots
  • Click, type, scroll, navigate
  • Manage tab lifecycle

Controlled tabs are grouped under an "OBO" tab group by default. Agents can pass a semantic group title, for example obo new "https://example.com" --group "Market Research", which creates or reuses a group named OBO: Market Research.

packages/server

Fastify server exposing:

  • REST API — 12 endpoints for browser control (/tabs, /snapshot, /click, etc.)
  • WebSocket — single persistent connection to the extension
  • CLIobo binary with subcommands that call the REST API

Request timeout: 30s. Optional Bearer token auth.

packages/shared

TypeScript types shared between server and extension:

  • SessionInfo, TabInfo, SnapshotResult, SnapshotNode
  • WebSocket protocol message types (command requests/responses, session updates)

Data Flow

  1. CLI/Agent calls obo click 123 @e1
  2. CLI sends POST /click { tabId: 123, ref: "@e1" } to the server
  3. Server wraps it in a command:request WebSocket message with a unique ID
  4. Extension receives the command, resolves @e1 to coordinates via CDP
  5. Extension dispatches a mouse click via Input.dispatchMouseEvent
  6. Extension sends command:response back over WebSocket
  7. Server returns the HTTP response to the CLI

Development

git clone https://github.com/agenthand/obo.git
cd open-browser-operator
pnpm install
pnpm build

Extension

cd packages/extension
pnpm dev          # Vite dev server with CRXJS hot reload

Load packages/extension/dist as an unpacked extension in Chrome.

Server

cd packages/server
pnpm dev          # tsx watch mode

Shared

Changes to packages/shared require rebuilding downstream packages.

OBO vs Headless Browsers

obo Playwright / Puppeteer
Browser Your real Chrome Headless / clean instance
Login state Already logged in Must authenticate
Cookies Your real cookies Empty
Extensions Your extensions None
Bookmarks Your bookmarks None
Use case Personal automation, AI assistant Testing, scraping
Setup Install extension + start server npm install

Use obo when the task needs your browser context. Use headless browsers when you need a clean, reproducible environment.

License

MIT

About

Control your real Chrome browser via CLI. Let AI agents use your logged-in sessions, cookies, and extensions.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors