Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions packages/bcode-browser/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,9 @@ See `decisions.md §1c` (three-level model) and `§1d` (this package) in the Bro
|---|---|
| `src/cdp/` | Vendored CDP layer (`session.ts`, `gen.ts`, `generated.ts`, protocol JSONs). Initial copy from `browser-use/browser-harness-js`; ours after — see `src/cdp/PROVENANCE.md`. |
| `src/browser-execute.ts` | In-process JS-eval `browser_execute` body. |
| `src/cloud-browser.ts` | Browser Use cloud-browser provision + attach. |
| `src/session-store.ts` | Per-opencode-session CDP `Session` map shared by both browser tools. |
| `src/session-store.ts` | Per-opencode-session CDP `Session` map. The agent calls `session.connect(...)` from a snippet; subsequent snippets find the same Session. |
| `src/skills.ts` | Runtime resolver for embedded skills (extract on first call in compiled mode; in-tree path in dev). |
| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`) plus `interaction-skills/*.md` (UI mechanic reference docs). Embedded into the binary by `script/embed-skills.ts`. |
| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`), `cloud-browser.md` (Way 3 — provision/stop a Browser Use cloud browser via raw HTTP from inside a snippet), and `interaction-skills/*.md` (UI mechanic reference docs). Embedded into the binary by `script/embed-skills.ts`. |
| `script/embed-skills.ts` | Build-time embed; emits `bcode-skills.gen.ts` consumed by the compiled binary. |
| `test/` | `bun test` smoke coverage for the workspace dynamic-import pattern. |

Expand Down
2 changes: 1 addition & 1 deletion packages/bcode-browser/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"$schema": "https://json.schemastore.org/package.json",
"version": "0.0.0",
"name": "@browser-use/bcode-browser",
"description": "BrowserCode Level-1 code: in-process CDP harness, browser_execute, cloud-browser attach, embedded skills",
"description": "BrowserCode Level-1 code: in-process CDP harness, browser_execute, embedded skills",
"type": "module",
"license": "MIT",
"private": true,
Expand Down
68 changes: 57 additions & 11 deletions packages/bcode-browser/skills/BROWSER.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,62 @@
# BROWSER.md — driving a real browser with `browser_execute`

Use the `browser_execute` tool to run JavaScript against a connected browser via the Chrome DevTools Protocol. The snippet runs in-process; `session` is bound to a long-lived CDP `Session` that survives across calls within the same bcode session.
Use the `browser_execute` tool to run JavaScript against a connected browser via the Chrome DevTools Protocol. The snippet runs in-process; `session` is bound to a long-lived CDP `Session` that persists across calls within the same bcode session. You connect once, drive many.

**Locations:**

- Workspace (read/write your reusable scripts): `<projectRoot>/.bcode/agent-workspace/`. The bcode CLI runs from the project root, so `./.bcode/agent-workspace/foo.ts` works directly with the `read`/`write`/`edit` tools.
- Skills (read-only reference docs): `{{SKILLS_DIR}}/interaction-skills/`
- Skills (read-only reference docs): `{{SKILLS_DIR}}/`. Run `read {{SKILLS_DIR}}/interaction-skills/` to list every available interaction skill before reading any one of them.

## The model in one paragraph

`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet, or shell out via the `bash` tool to get the project root. Same mechanism for a 5-line wrapper and a 500-line script. Skills under `{{SKILLS_DIR}}/interaction-skills/` are documentation you `read`, not modules you `import` — they teach you the CDP patterns; you write the code.
`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet. Same mechanism for a 5-line wrapper and a 500-line script. Skills under `{{SKILLS_DIR}}/` are documentation you `read`, not modules you `import` — they teach you the CDP patterns; you write the code.

## Connecting

The first `browser_execute` call connects automatically by scanning OS-typical Chrome profile dirs for a `DevToolsActivePort` file (Chrome must be running with `--remote-debugging-port`). To attach explicitly:
You always call `session.connect(...)` once at the start of your work. The `Session` is fresh on the first `browser_execute` call of an opencode session; subsequent calls reuse it. Three connection methods, in order of preference for typical tasks:

**Way 1 — connect to the user's running Chrome (real profile, popup-gated).** Best when the task involves the user's actual logged-in sites.

```js
// Auto-detect the most-recently-launched Chrome with remote debugging enabled.
await session.connect()
```

The user must have ticked "Allow remote debugging for this browser instance" once at `chrome://inspect/#remote-debugging` (sticky per-profile), and on Chrome 144+ click "Allow" on the in-browser popup at first attach. If `connect()` fails with a 403/permission message, ask the user to do this. To wait for the click instead of erroring fast, pass `{ profileDir: "/abs/path", timeoutMs: 30000 }`.

**Way 2 — connect to a Chrome you (or the user) launched with a debug port (isolated profile, no popups).** Best for unattended automation.

```bash
# User runs this once (or you run it via the `bash` tool):
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/bcode-chrome
```

```js
await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser" })
// or, if you know the profile dir:
await session.connect({ profileDir: "/tmp/bcode-chrome" })
```

The `--user-data-dir` must NOT be Chrome's platform default (`%LOCALAPPDATA%\Google\Chrome\User Data` on Windows, `~/Library/Application Support/Google/Chrome` on macOS, `~/.config/google-chrome` on Linux) — Chrome 136+ silently no-ops the port flag in that case.

**Way 3 — provision and connect to a Browser Use cloud browser.** Best when the user can't see the browser, you need a clean profile, geo-located proxy, or fingerprint isolation. Read `{{SKILLS_DIR}}/cloud-browser.md` for the full pattern (provision, stop, swap profile/proxy). Briefly:

```js
await session.connect({ profileDir: "/abs/path/to/Chrome/Default" })
// or
await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser/<id>" })
// or for a Browser Use cloud browser, call the `browser_open_cloud` tool first.
const r = await fetch("https://api.browser-use.com/api/v3/browsers", {
method: "POST",
headers: { "X-Browser-Use-API-Key": process.env.BROWSER_USE_API_KEY, "Content-Type": "application/json" },
body: "{}",
})
const { id, cdp_url, live_url } = await r.json()
await session.connect({ wsUrl: cdp_url })
Comment on lines +50 to +51
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new cloud connect example only reads cdp_url, even though this doc says some regions return cdpUrl. The snippet should accept both so copy-paste usage does not fail.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/bcode-browser/skills/BROWSER.md, line 50:

<comment>The new cloud connect example only reads `cdp_url`, even though this doc says some regions return `cdpUrl`. The snippet should accept both so copy-paste usage does not fail.</comment>

<file context>
@@ -1,28 +1,62 @@
+  headers: { "X-Browser-Use-API-Key": process.env.BROWSER_USE_API_KEY, "Content-Type": "application/json" },
+  body: "{}",
+})
+const { id, cdp_url, live_url } = await r.json()
+await session.connect({ wsUrl: cdp_url })
+console.log("liveUrl for the user to watch:", live_url)
</file context>
Suggested change
const { id, cdp_url, live_url } = await r.json()
await session.connect({ wsUrl: cdp_url })
const { cdp_url, cdpUrl, live_url } = await r.json()
const wsUrl = cdp_url ?? cdpUrl
if (!wsUrl) throw new Error("Browser Use response missing cdp_url/cdpUrl")
await session.connect({ wsUrl })
Fix with Cubic

console.log("liveUrl for the user to watch:", live_url)
```

After connect, attach to a page target:
Requires `BROWSER_USE_API_KEY` in the environment (the user should have set this before launching bcode). If absent, tell the user to get a key at https://browser-use.com and `export BROWSER_USE_API_KEY=...`.

## Attaching to a target

After `connect()`, attach to a page target before driving the browser:

```js
const targets = (await session.Target.getTargets({})).targetInfos
Expand Down Expand Up @@ -65,7 +99,18 @@ const { data } = await session.Page.captureScreenshot({ format: "png" })
// data is base64; write with the `write` tool or process in JS.
```

For the full menu of UI mechanics — dropdowns, dialogs, iframes, shadow DOM, uploads, scrolling, screenshots-with-highlights — read the relevant skill: `{{SKILLS_DIR}}/interaction-skills/<topic>.md`.
For the full menu of UI mechanics — dropdowns, dialogs, iframes, shadow DOM, uploads, scrolling, screenshots-with-highlights — list `{{SKILLS_DIR}}/interaction-skills/` to see all available topics, then read the relevant one.

## Switching browsers mid-session

You own the connection. To swap:

```js
await session.close()
await session.connect({ /* new opts */ })
```

Cloud cleanup is your responsibility — if you're done with a cloud browser, stop it explicitly (see `{{SKILLS_DIR}}/cloud-browser.md` for the PATCH call). Otherwise it persists until your API quota or BU's idle timer reclaims it.

## Reusing code: write to the workspace, import from snippet

Expand Down Expand Up @@ -110,4 +155,5 @@ Cache-bust (`?t=${Date.now()}`) is your responsibility: without it, edits to the
- **`session.Page.navigate` hangs forever** → the page is showing a native dialog. Use `session.Page.handleJavaScriptDialog({ accept: true })` to dismiss.
- **Selectors don't find elements that you can see** → likely an iframe or shadow DOM. Read `{{SKILLS_DIR}}/interaction-skills/iframes.md` or `shadow-dom.md`.
- **Actions silently no-op** → the page is mid-load. After `Page.navigate`, await `session.waitFor("Page.loadEventFired")` before driving inputs.
- **Connection refused or 403 on connect()** → Chrome wasn't started with `--remote-debugging-port`, or the user hasn't clicked "Allow" on the remote-debugging prompt. Pass `{ timeoutMs: 30000 }` to wait for the click.
- **Connection refused or 403 on connect()** → Chrome wasn't started with `--remote-debugging-port`, or the user hasn't clicked "Allow" on the remote-debugging prompt. Pass `{ profileDir, timeoutMs: 30000 }` to wait for the click, or fall back to Way 2.
- **Cloud `connect()` fails after a successful provision** → check that `cdp_url` came back in the POST response; some BU regions return `cdpUrl` (camelCase) — accept both. See `{{SKILLS_DIR}}/cloud-browser.md`.
145 changes: 145 additions & 0 deletions packages/bcode-browser/skills/cloud-browser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# cloud-browser.md — Browser Use cloud browser via raw HTTP

When BROWSER.md sent you here, the user wants a Browser Use cloud browser (Way 3): a clean isolated Chrome on BU's infrastructure, optionally with a geo-located proxy or a synced profile, with a `liveUrl` the user can open to watch you work.

There is no `browser_open_cloud` tool. You write the HTTP calls yourself in a `browser_execute` snippet. This keeps the connection model symmetric (you also call `session.connect()` for local browsers in Way 1 and Way 2) and gives you full control over the BU API surface — provision, stop, swap profiles, change proxies, anything BU exposes.

## Authentication

Every call to `https://api.browser-use.com/...` requires an API key in the `X-Browser-Use-API-Key` header. The key lives in the environment as `BROWSER_USE_API_KEY` (the user is expected to `export` it before launching bcode, the same way they'd set `AWS_BEDROCK_ACCESS_KEY_ID` for an LLM provider).

Read it once, fail clearly if missing:

```js
const apiKey = process.env.BROWSER_USE_API_KEY
if (!apiKey) {
throw new Error("BROWSER_USE_API_KEY is not set. Get a key at https://browser-use.com and re-launch bcode with the key exported.")
}
```

## Provision

```js
const r = await fetch("https://api.browser-use.com/api/v3/browsers", {
method: "POST",
headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
body: JSON.stringify({
// All optional — omit for an ephemeral fresh-profile browser with no proxy.
// profile_id: "<uuid>", // attach an existing BU profile
// proxy_country_code: "us", // geo-located proxy
}),
})
if (!r.ok) throw new Error(`provision failed: ${r.status} ${await r.text()}`)
const body = await r.json()
// Some BU regions return camelCase, others snake_case. Accept both.
const id = body.id
const cdpUrl = body.cdp_url ?? body.cdpUrl
const liveUrl = body.live_url ?? body.liveUrl
```

The `liveUrl` is a viewer URL the user can open in their own browser to watch the cloud browser's pixels. **Print it to console** so the user can click it:

```js
console.log("Cloud browser ready. Live view:", liveUrl)
```

Stash `id` somewhere (a `globalThis.cloudBrowserId = id` is fine, or the snippet's return value) — you need it to stop the browser later.

## Connect

```js
await session.connect({ wsUrl: cdpUrl })
const targets = (await session.Target.getTargets({})).targetInfos
const page = targets.find(t => t.type === "page")
await session.use(page.targetId)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Handle the case where no page target is found before calling session.use to avoid a runtime crash.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/bcode-browser/skills/cloud-browser.md, line 54:

<comment>Handle the case where no page target is found before calling `session.use` to avoid a runtime crash.</comment>

<file context>
@@ -0,0 +1,145 @@
+await session.connect({ wsUrl: cdpUrl })
+const targets = (await session.Target.getTargets({})).targetInfos
+const page = targets.find(t => t.type === "page")
+await session.use(page.targetId)
+```
+
</file context>
Fix with Cubic

```

From here on `session.<Domain>.<method>(...)` drives the cloud browser exactly like a local Chrome.

## Stop

When you're done, stop the browser. BU's quotas and idle reclaim will eventually clean it up if you forget, but explicit stop is faster and frees the slot:

```js
await fetch(`https://api.browser-use.com/api/v3/browsers/${id}`, {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Validate the stop API response (r.ok) so failed stop requests don't silently leave cloud browsers running.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/bcode-browser/skills/cloud-browser.md, line 64:

<comment>Validate the stop API response (`r.ok`) so failed stop requests don't silently leave cloud browsers running.</comment>

<file context>
@@ -0,0 +1,145 @@
+When you're done, stop the browser. BU's quotas and idle reclaim will eventually clean it up if you forget, but explicit stop is faster and frees the slot:
+
+```js
+await fetch(`https://api.browser-use.com/api/v3/browsers/${id}`, {
+  method: "PATCH",
+  headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
</file context>
Fix with Cubic

method: "PATCH",
headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
body: JSON.stringify({ state: "stop" }),
})
```

If you'll do this often within one project, save it as `./.bcode/agent-workspace/cloud.ts` (see BROWSER.md "Reusing code") and import it from later snippets.

## Swap

To switch from one cloud browser to another (e.g. different proxy country) within the same opencode session:

```js
// Stop the old one first.
await fetch(`https://api.browser-use.com/api/v3/browsers/${oldId}`, {
method: "PATCH",
headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
body: JSON.stringify({ state: "stop" }),
})

// Close the local Session's WS so connect() opens a fresh one.
await session.close()

// Provision and connect to the new one (provision block above, with new params).
```

## A reusable workspace helper

Recommended pattern for any project that uses cloud browsers more than once:

```ts
// ./.bcode/agent-workspace/cloud.ts
const API = "https://api.browser-use.com/api/v3/browsers"
const key = () => {
const k = process.env.BROWSER_USE_API_KEY
if (!k) throw new Error("BROWSER_USE_API_KEY is not set.")
return k
}

export async function provision(opts: { profileId?: string; proxyCountryCode?: string } = {}) {
const r = await fetch(API, {
method: "POST",
headers: { "X-Browser-Use-API-Key": key(), "Content-Type": "application/json" },
body: JSON.stringify({
profile_id: opts.profileId,
proxy_country_code: opts.proxyCountryCode,
}),
})
if (!r.ok) throw new Error(`provision failed: ${r.status} ${await r.text()}`)
const body = await r.json()
return {
id: body.id as string,
cdpUrl: (body.cdp_url ?? body.cdpUrl) as string,
liveUrl: (body.live_url ?? body.liveUrl) as string,
}
}

export async function stop(id: string) {
const r = await fetch(`${API}/${id}`, {
method: "PATCH",
headers: { "X-Browser-Use-API-Key": key(), "Content-Type": "application/json" },
body: JSON.stringify({ state: "stop" }),
})
if (!r.ok) throw new Error(`stop failed: ${r.status} ${await r.text()}`)
}
```

Then any snippet does:

```js
const { provision, stop } = await import(`${process.cwd()}/.bcode/agent-workspace/cloud.ts?t=${Date.now()}`)
const { id, cdpUrl, liveUrl } = await provision({ proxyCountryCode: "us" })
console.log("Live view:", liveUrl)
await session.connect({ wsUrl: cdpUrl })
// ... do work ...
await stop(id)
```

## Other BU API endpoints

The full BU cloud API (profile sync, profile list, custom proxies, recording on/off, etc.) is documented at https://browser-use.com — `read` the docs and write the matching `fetch` call. Anything BU's API exposes is reachable from a snippet without bcode-side wrapper code.
10 changes: 6 additions & 4 deletions packages/bcode-browser/src/browser-execute.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,9 @@ export type Parameters = Schema.Schema.Type<typeof parameters>

export interface ExecuteContext {
// Identifies the per-opencode-session CDP Session to bind into the snippet.
// Shared with `browser_open_cloud` via the SessionStore so a cloud-attach
// call's Session is driven by subsequent `browser_execute` calls.
// The same Session is reused across calls — the agent calls
// `session.connect(...)` in one snippet and subsequent snippets find the
// already-connected Session.
readonly sessionID: string
// Per-project workspace dir: <projectDir>/.bcode/agent-workspace/. Created
// on first call. The agent reads/writes/edits .ts files here via the
Expand Down Expand Up @@ -97,8 +98,9 @@ const serialize = (v: unknown): string => {
}

// Snippet executor. The CDP Session is resolved per-call from `SessionStore`
// keyed on `ctx.sessionID` so a Session attached via `browser_open_cloud` is
// the same one a follow-up `browser_execute` drives.
// keyed on `ctx.sessionID`. The agent connects with `await session.connect(...)`
// in one snippet (Way 1 / Way 2 / Way 3 in BROWSER.md); the Session persists
// for follow-up snippets in the same opencode session.
//
// `dataDir` is opencode's XDG_DATA_HOME for bcode (~/.local/share/bcode/ on
// Linux/Mac). Compiled-mode skills are extracted to `<dataDir>/skills/` once
Expand Down
Loading