browser-use · Alezander9 · May 8, 2026 · cubic-dev-ai · May 8, 2026 · cubic-dev-ai
diff --git a/packages/bcode-browser/README.md b/packages/bcode-browser/README.md
@@ -10,10 +10,9 @@ See `decisions.md §1c` (three-level model) and `§1d` (this package) in the Bro
 |---|---|
 | `src/cdp/` | Vendored CDP layer (`session.ts`, `gen.ts`, `generated.ts`, protocol JSONs). Initial copy from `browser-use/browser-harness-js`; ours after — see `src/cdp/PROVENANCE.md`. |
 | `src/browser-execute.ts` | In-process JS-eval `browser_execute` body. |
-| `src/cloud-browser.ts` | Browser Use cloud-browser provision + attach. |
-| `src/session-store.ts` | Per-opencode-session CDP `Session` map shared by both browser tools. |
+| `src/session-store.ts` | Per-opencode-session CDP `Session` map. The agent calls `session.connect(...)` from a snippet; subsequent snippets find the same Session. |
 | `src/skills.ts` | Runtime resolver for embedded skills (extract on first call in compiled mode; in-tree path in dev). |
-| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`) plus `interaction-skills/*.md` (UI mechanic reference docs). Embedded into the binary by `script/embed-skills.ts`. |
+| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`), `cloud-browser.md` (Way 3 — provision/stop a Browser Use cloud browser via raw HTTP from inside a snippet), and `interaction-skills/*.md` (UI mechanic reference docs). Embedded into the binary by `script/embed-skills.ts`. |
 | `script/embed-skills.ts` | Build-time embed; emits `bcode-skills.gen.ts` consumed by the compiled binary. |
 | `test/` | `bun test` smoke coverage for the workspace dynamic-import pattern. |
 

diff --git a/packages/bcode-browser/package.json b/packages/bcode-browser/package.json
@@ -2,7 +2,7 @@
   "$schema": "https://json.schemastore.org/package.json",
   "version": "0.0.0",
   "name": "@browser-use/bcode-browser",
-  "description": "BrowserCode Level-1 code: in-process CDP harness, browser_execute, cloud-browser attach, embedded skills",
+  "description": "BrowserCode Level-1 code: in-process CDP harness, browser_execute, embedded skills",
   "type": "module",
   "license": "MIT",
   "private": true,

diff --git a/packages/bcode-browser/skills/BROWSER.md b/packages/bcode-browser/skills/BROWSER.md
@@ -1,28 +1,62 @@
 # BROWSER.md — driving a real browser with `browser_execute`
 
-Use the `browser_execute` tool to run JavaScript against a connected browser via the Chrome DevTools Protocol. The snippet runs in-process; `session` is bound to a long-lived CDP `Session` that survives across calls within the same bcode session.
+Use the `browser_execute` tool to run JavaScript against a connected browser via the Chrome DevTools Protocol. The snippet runs in-process; `session` is bound to a long-lived CDP `Session` that persists across calls within the same bcode session. You connect once, drive many.
 
 **Locations:**
 
 - Workspace (read/write your reusable scripts): `<projectRoot>/.bcode/agent-workspace/`. The bcode CLI runs from the project root, so `./.bcode/agent-workspace/foo.ts` works directly with the `read`/`write`/`edit` tools.
-- Skills (read-only reference docs): `{{SKILLS_DIR}}/interaction-skills/`
+- Skills (read-only reference docs): `{{SKILLS_DIR}}/`. Run `read {{SKILLS_DIR}}/interaction-skills/` to list every available interaction skill before reading any one of them.
 
 ## The model in one paragraph
 
-`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet, or shell out via the `bash` tool to get the project root. Same mechanism for a 5-line wrapper and a 500-line script. Skills under `{{SKILLS_DIR}}/interaction-skills/` are documentation you `read`, not modules you `import` — they teach you the CDP patterns; you write the code.
+`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet. Same mechanism for a 5-line wrapper and a 500-line script. Skills under `{{SKILLS_DIR}}/` are documentation you `read`, not modules you `import` — they teach you the CDP patterns; you write the code.
 
 ## Connecting
 
-The first `browser_execute` call connects automatically by scanning OS-typical Chrome profile dirs for a `DevToolsActivePort` file (Chrome must be running with `--remote-debugging-port`). To attach explicitly:
+You always call `session.connect(...)` once at the start of your work. The `Session` is fresh on the first `browser_execute` call of an opencode session; subsequent calls reuse it. Three connection methods, in order of preference for typical tasks:
+
+**Way 1 — connect to the user's running Chrome (real profile, popup-gated).** Best when the task involves the user's actual logged-in sites.
+
+```js
+// Auto-detect the most-recently-launched Chrome with remote debugging enabled.
+await session.connect()
+```
+
+The user must have ticked "Allow remote debugging for this browser instance" once at `chrome://inspect/#remote-debugging` (sticky per-profile), and on Chrome 144+ click "Allow" on the in-browser popup at first attach. If `connect()` fails with a 403/permission message, ask the user to do this. To wait for the click instead of erroring fast, pass `{ profileDir: "/abs/path", timeoutMs: 30000 }`.
+
+**Way 2 — connect to a Chrome you (or the user) launched with a debug port (isolated profile, no popups).** Best for unattended automation.
+
+```bash
+# User runs this once (or you run it via the `bash` tool):
+google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/bcode-chrome
+```
+
+```js
+await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser" })
+// or, if you know the profile dir:
+await session.connect({ profileDir: "/tmp/bcode-chrome" })
+```
+
+The `--user-data-dir` must NOT be Chrome's platform default (`%LOCALAPPDATA%\Google\Chrome\User Data` on Windows, `~/Library/Application Support/Google/Chrome` on macOS, `~/.config/google-chrome` on Linux) — Chrome 136+ silently no-ops the port flag in that case.
+
+**Way 3 — provision and connect to a Browser Use cloud browser.** Best when the user can't see the browser, you need a clean profile, geo-located proxy, or fingerprint isolation. Read `{{SKILLS_DIR}}/cloud-browser.md` for the full pattern (provision, stop, swap profile/proxy). Briefly:
 
 ```js
-await session.connect({ profileDir: "/abs/path/to/Chrome/Default" })
-// or
-await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser/<id>" })
-// or for a Browser Use cloud browser, call the `browser_open_cloud` tool first.
+const r = await fetch("https://api.browser-use.com/api/v3/browsers", {
+  method: "POST",
+  headers: { "X-Browser-Use-API-Key": process.env.BROWSER_USE_API_KEY, "Content-Type": "application/json" },
+  body: "{}",
+})
+const { id, cdp_url, live_url } = await r.json()
+await session.connect({ wsUrl: cdp_url })
-const { id, cdp_url, live_url } = await r.json()
-await session.connect({ wsUrl: cdp_url })
+const { cdp_url, cdpUrl, live_url } = await r.json()
+const wsUrl = cdp_url ?? cdpUrl
+if (!wsUrl) throw new Error("Browser Use response missing cdp_url/cdpUrl")
+await session.connect({ wsUrl })
-const { id, cdp_url, live_url } = await r.json()
-await session.connect({ wsUrl: cdp_url })
+const { cdp_url, cdpUrl, live_url } = await r.json()
+const wsUrl = cdp_url ?? cdpUrl
+if (!wsUrl) throw new Error("Browser Use response missing cdp_url/cdpUrl")
+await session.connect({ wsUrl })
+console.log("liveUrl for the user to watch:", live_url)
 ```
 
-After connect, attach to a page target:
+Requires `BROWSER_USE_API_KEY` in the environment (the user should have set this before launching bcode). If absent, tell the user to get a key at https://browser-use.com and `export BROWSER_USE_API_KEY=...`.
+
+## Attaching to a target
+
+After `connect()`, attach to a page target before driving the browser:
 
 ```js
 const targets = (await session.Target.getTargets({})).targetInfos
@@ -65,7 +99,18 @@ const { data } = await session.Page.captureScreenshot({ format: "png" })
 // data is base64; write with the `write` tool or process in JS.
 ```
 
-For the full menu of UI mechanics — dropdowns, dialogs, iframes, shadow DOM, uploads, scrolling, screenshots-with-highlights — read the relevant skill: `{{SKILLS_DIR}}/interaction-skills/<topic>.md`.
+For the full menu of UI mechanics — dropdowns, dialogs, iframes, shadow DOM, uploads, scrolling, screenshots-with-highlights — list `{{SKILLS_DIR}}/interaction-skills/` to see all available topics, then read the relevant one.
+
+## Switching browsers mid-session
+
+You own the connection. To swap:
+
+```js
+await session.close()
+await session.connect({ /* new opts */ })
+```
+
+Cloud cleanup is your responsibility — if you're done with a cloud browser, stop it explicitly (see `{{SKILLS_DIR}}/cloud-browser.md` for the PATCH call). Otherwise it persists until your API quota or BU's idle timer reclaims it.
 
 ## Reusing code: write to the workspace, import from snippet
 
@@ -110,4 +155,5 @@ Cache-bust (`?t=${Date.now()}`) is your responsibility: without it, edits to the
 - **`session.Page.navigate` hangs forever** → the page is showing a native dialog. Use `session.Page.handleJavaScriptDialog({ accept: true })` to dismiss.
 - **Selectors don't find elements that you can see** → likely an iframe or shadow DOM. Read `{{SKILLS_DIR}}/interaction-skills/iframes.md` or `shadow-dom.md`.
 - **Actions silently no-op** → the page is mid-load. After `Page.navigate`, await `session.waitFor("Page.loadEventFired")` before driving inputs.
-- **Connection refused or 403 on connect()** → Chrome wasn't started with `--remote-debugging-port`, or the user hasn't clicked "Allow" on the remote-debugging prompt. Pass `{ timeoutMs: 30000 }` to wait for the click.
+- **Connection refused or 403 on connect()** → Chrome wasn't started with `--remote-debugging-port`, or the user hasn't clicked "Allow" on the remote-debugging prompt. Pass `{ profileDir, timeoutMs: 30000 }` to wait for the click, or fall back to Way 2.
+- **Cloud `connect()` fails after a successful provision** → check that `cdp_url` came back in the POST response; some BU regions return `cdpUrl` (camelCase) — accept both. See `{{SKILLS_DIR}}/cloud-browser.md`.
diff --git a/packages/bcode-browser/skills/cloud-browser.md b/packages/bcode-browser/skills/cloud-browser.md
@@ -0,0 +1,145 @@
+# cloud-browser.md — Browser Use cloud browser via raw HTTP
+
+When BROWSER.md sent you here, the user wants a Browser Use cloud browser (Way 3): a clean isolated Chrome on BU's infrastructure, optionally with a geo-located proxy or a synced profile, with a `liveUrl` the user can open to watch you work.
+
+There is no `browser_open_cloud` tool. You write the HTTP calls yourself in a `browser_execute` snippet. This keeps the connection model symmetric (you also call `session.connect()` for local browsers in Way 1 and Way 2) and gives you full control over the BU API surface — provision, stop, swap profiles, change proxies, anything BU exposes.
+
+## Authentication
+
+Every call to `https://api.browser-use.com/...` requires an API key in the `X-Browser-Use-API-Key` header. The key lives in the environment as `BROWSER_USE_API_KEY` (the user is expected to `export` it before launching bcode, the same way they'd set `AWS_BEDROCK_ACCESS_KEY_ID` for an LLM provider).
+
+Read it once, fail clearly if missing:
+
+```js
+const apiKey = process.env.BROWSER_USE_API_KEY
+if (!apiKey) {
+  throw new Error("BROWSER_USE_API_KEY is not set. Get a key at https://browser-use.com and re-launch bcode with the key exported.")
+}
+```
+
+## Provision
+
+```js
+const r = await fetch("https://api.browser-use.com/api/v3/browsers", {
+  method: "POST",
+  headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
+  body: JSON.stringify({
+    // All optional — omit for an ephemeral fresh-profile browser with no proxy.
+    // profile_id: "<uuid>",          // attach an existing BU profile
+    // proxy_country_code: "us",      // geo-located proxy
+  }),
+})
+if (!r.ok) throw new Error(`provision failed: ${r.status} ${await r.text()}`)
+const body = await r.json()
+// Some BU regions return camelCase, others snake_case. Accept both.
+const id = body.id
+const cdpUrl = body.cdp_url ?? body.cdpUrl
+const liveUrl = body.live_url ?? body.liveUrl
+```
+
+The `liveUrl` is a viewer URL the user can open in their own browser to watch the cloud browser's pixels. **Print it to console** so the user can click it:
+
+```js
+console.log("Cloud browser ready. Live view:", liveUrl)
+```
+
+Stash `id` somewhere (a `globalThis.cloudBrowserId = id` is fine, or the snippet's return value) — you need it to stop the browser later.
+
+## Connect
+
+```js
+await session.connect({ wsUrl: cdpUrl })
+const targets = (await session.Target.getTargets({})).targetInfos
+const page = targets.find(t => t.type === "page")
+await session.use(page.targetId)
+```
+
+From here on `session.<Domain>.<method>(...)` drives the cloud browser exactly like a local Chrome.
+
+## Stop
+
+When you're done, stop the browser. BU's quotas and idle reclaim will eventually clean it up if you forget, but explicit stop is faster and frees the slot:
+
+```js
+await fetch(`https://api.browser-use.com/api/v3/browsers/${id}`, {
+  method: "PATCH",
+  headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
+  body: JSON.stringify({ state: "stop" }),
+})
+```
+
+If you'll do this often within one project, save it as `./.bcode/agent-workspace/cloud.ts` (see BROWSER.md "Reusing code") and import it from later snippets.
+
+## Swap
+
+To switch from one cloud browser to another (e.g. different proxy country) within the same opencode session:
+
+```js
+// Stop the old one first.
+await fetch(`https://api.browser-use.com/api/v3/browsers/${oldId}`, {
+  method: "PATCH",
+  headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" },
+  body: JSON.stringify({ state: "stop" }),
+})
+
+// Close the local Session's WS so connect() opens a fresh one.
+await session.close()
+
+// Provision and connect to the new one (provision block above, with new params).
+```
+
+## A reusable workspace helper
+
+Recommended pattern for any project that uses cloud browsers more than once:
+
+```ts
+// ./.bcode/agent-workspace/cloud.ts
+const API = "https://api.browser-use.com/api/v3/browsers"
+const key = () => {
+  const k = process.env.BROWSER_USE_API_KEY
+  if (!k) throw new Error("BROWSER_USE_API_KEY is not set.")
+  return k
+}
+
+export async function provision(opts: { profileId?: string; proxyCountryCode?: string } = {}) {
+  const r = await fetch(API, {
+    method: "POST",
+    headers: { "X-Browser-Use-API-Key": key(), "Content-Type": "application/json" },
+    body: JSON.stringify({
+      profile_id: opts.profileId,
+      proxy_country_code: opts.proxyCountryCode,
+    }),
+  })
+  if (!r.ok) throw new Error(`provision failed: ${r.status} ${await r.text()}`)
+  const body = await r.json()
+  return {
+    id: body.id as string,
+    cdpUrl: (body.cdp_url ?? body.cdpUrl) as string,
+    liveUrl: (body.live_url ?? body.liveUrl) as string,
+  }
+}
+
+export async function stop(id: string) {
+  const r = await fetch(`${API}/${id}`, {
+    method: "PATCH",
+    headers: { "X-Browser-Use-API-Key": key(), "Content-Type": "application/json" },
+    body: JSON.stringify({ state: "stop" }),
+  })
+  if (!r.ok) throw new Error(`stop failed: ${r.status} ${await r.text()}`)
+}
+```
+
+Then any snippet does:
+
+```js
+const { provision, stop } = await import(`${process.cwd()}/.bcode/agent-workspace/cloud.ts?t=${Date.now()}`)
+const { id, cdpUrl, liveUrl } = await provision({ proxyCountryCode: "us" })
+console.log("Live view:", liveUrl)
+await session.connect({ wsUrl: cdpUrl })
+// ... do work ...
+await stop(id)
+```
+
+## Other BU API endpoints
+
+The full BU cloud API (profile sync, profile list, custom proxies, recording on/off, etc.) is documented at https://browser-use.com — `read` the docs and write the matching `fetch` call. Anything BU's API exposes is reachable from a snippet without bcode-side wrapper code.
diff --git a/packages/bcode-browser/src/browser-execute.ts b/packages/bcode-browser/src/browser-execute.ts
@@ -55,8 +55,9 @@ export type Parameters = Schema.Schema.Type<typeof parameters>
 
 export interface ExecuteContext {
   // Identifies the per-opencode-session CDP Session to bind into the snippet.
-  // Shared with `browser_open_cloud` via the SessionStore so a cloud-attach
-  // call's Session is driven by subsequent `browser_execute` calls.
+  // The same Session is reused across calls — the agent calls
+  // `session.connect(...)` in one snippet and subsequent snippets find the
+  // already-connected Session.
   readonly sessionID: string
   // Per-project workspace dir: <projectDir>/.bcode/agent-workspace/. Created
   // on first call. The agent reads/writes/edits .ts files here via the
@@ -97,8 +98,9 @@ const serialize = (v: unknown): string => {
 }
 
 // Snippet executor. The CDP Session is resolved per-call from `SessionStore`
-// keyed on `ctx.sessionID` so a Session attached via `browser_open_cloud` is
-// the same one a follow-up `browser_execute` drives.
+// keyed on `ctx.sessionID`. The agent connects with `await session.connect(...)`
+// in one snippet (Way 1 / Way 2 / Way 3 in BROWSER.md); the Session persists
+// for follow-up snippets in the same opencode session.
 //
 // `dataDir` is opencode's XDG_DATA_HOME for bcode (~/.local/share/bcode/ on
 // Linux/Mac). Compiled-mode skills are extracted to `<dataDir>/skills/` once