diff --git a/UPSTREAM.md b/UPSTREAM.md index ffeb1680aa..653ce2042e 100644 --- a/UPSTREAM.md +++ b/UPSTREAM.md @@ -1,13 +1,12 @@ # Upstream -This doc tracks BrowserCode's relationship to its two upstream sources: +This doc tracks BrowserCode's relationship to upstream: 1. **anomalyco/opencode** — forked in as the bulk of this repo. Sync runbook: `opencode-sync.md`. -2. **browser-use/browser-harness** — vendored into `packages/bcode-browser/harness/`. Sync runbook: `harness-sync.md`. -The two are deliberately independent — different upstreams, different cadences, different sync mechanisms (merge vs file-copy). One agent pulls one upstream at a time; never both in the same PR. +(Phase H retired the `browser-use/browser-harness` vendoring relationship — see §3 below for the historical record. Browser code now lives in TS at `packages/bcode-browser/src/cdp/`, owned by us, no sync cadence.) -Sections: **modification zones** (where is it safe to change upstream code?), **sync log** (when did we last pull each upstream and to what commit?), **harness divergences** (per-file deliberate-deltas list, used during harness sync). +Sections: **modification zones** (where is it safe to change upstream code?), **sync log** (when did we last pull upstream and to what commit?), **harness retirement** (what happened to the vendored Python harness). --- @@ -54,7 +53,7 @@ Future Yellow modifications (per ROADMAP): Every Yellow modification should be evaluated for conversion to a Green extension point via upstream PR. See decisions.md §1c and ROADMAP F8. -The harness has its own narrower zone policy (see §3 below): `agent-workspace/agent_helpers.py` is editable, the `src/browser_harness/` core package is protected, deliberate divergences are logged per-file. +(The harness's narrower zone policy retired with Phase H — see §3.) --- @@ -98,9 +97,11 @@ Each upstream has its own append-only table. Add a row every time you pull. --- -## 3. Harness divergences and excluded paths +## 3. Harness retirement (historical) -Per-file record of where `packages/bcode-browser/harness/` deliberately differs from upstream, plus the list of paths excluded from the vendored tree entirely. Read this *before* a sync diff so intentional differences aren't mistaken for missing features and excluded paths aren't accidentally re-imported. +Phase H (TS port, v0.1.0) retired the `browser-use/browser-harness` vendoring. The Python harness was deleted; the CDP layer was ported to TS at `packages/bcode-browser/src/cdp/` (initial copy from `browser-use/browser-harness-js@95b7a22a`, ours after — see `packages/bcode-browser/src/cdp/PROVENANCE.md`). There is no sync cadence with either harness repo; behaviors of interest from either are tracked in `memory/browsercode/harness_watchlist.md` and ported individually as needed. + +The historical sync log of the Python harness vendoring (Apr 26 – May 6, 2026) is preserved below for archaeology only; do not pull from those rows. Path-allowlist policy (decisions.md §3.7, §4.5; updated for upstream PR #229 src-layout reorg): diff --git a/harness-sync.md b/harness-sync.md deleted file mode 100644 index f9073fe65f..0000000000 --- a/harness-sync.md +++ /dev/null @@ -1,148 +0,0 @@ -# Harness sync protocol - -How to pull `browser-use/browser-harness` into `packages/bcode-browser/harness/`. For opencode upstream sync see `opencode-sync.md`; the two flows are deliberately separate and not run together. - -## Why this is different from opencode-sync - -- **Drift is acceptable, sometimes preferable.** We are not staying in lockstep — we are an opencode-flavored fork of the harness. Sync when we want their improvements, ignore when we don't. -- **No git merge.** The harness is small (~5 source files + skill markdown). Vendor by plain copy. No subtree, no submodule, no merge commits inside the subtree. Conflicts are reasoned about file-by-file by the agent, not resolved by git. -- **No typecheck step.** The harness is Python. Smoke test instead. - -## Prerequisites - -- `harness` remote configured: `git remote add harness https://github.com/browser-use/browser-harness.git`. Idempotent: `git remote get-url harness >/dev/null 2>&1 || git remote add harness https://github.com/browser-use/browser-harness.git`. -- `$BROWSERCODE_DEV_PAT` available (for push + PR creation). -- `uv` on `PATH` for the smoke test. - -## The 7 steps - -### 1. Start clean on `main` - -```sh -git checkout main -git pull origin main -``` - -### 2. Read the current state - -Two things to read before touching anything: - -- **`UPSTREAM.md`** — the latest `To SHA` row under `### browser-use/browser-harness`. That is the last commit we synced to. It is the only source of truth for "what version is vendored." -- **`UPSTREAM.md` §3 Harness divergences and excluded paths** — the table of files where we deliberately differ from upstream, plus the list of paths excluded from the vendored tree entirely. Read both *before* the diff so you know which differences are intentional and not "missing features," and which paths to skip outright. - -If the divergences table is empty (initial vendor state), every difference between us and upstream is unintentional drift; flag any in the PR. - -### 3. Check drift - -```sh -script/check-upstream.sh # commit count: how many commits behind -script/check-harness-diff.sh # file-level diff vs harness/main, with known-divergence filter -``` - -`check-upstream.sh` reports how many commits `harness/main` is ahead of our recorded `To SHA`. `check-harness-diff.sh` shows per-file differences between our vendored tree and `harness/main`, splitting them into "known divergences (UPSTREAM.md §3)" and "unexpected drift" — the latter should always be either (a) commits we haven't synced yet, or (b) a Yellow-zone modification we forgot to record. Anything else is a bug. - -Then inspect what changed: - -```sh -git fetch harness main -git log --oneline ..harness/main -git diff ..harness/main -``` - -The diff is the input to step 5. - -### 4. Create the sync branch - -```sh -git checkout -b sync/harness- main -``` - -`` is the first 7 chars of the harness commit you are syncing to (typically `harness/main` HEAD). - -### 5. Apply changes file-by-file - -This is where the agent earns its keep. For each file changed in `..harness/main`: - -| File category | Action | -|---|---| -| **Excluded paths** (`(agent-workspace/)?domain-skills/...`) | **Skip entirely.** Never copy in, never resurrect. See UPSTREAM.md §3 "Excluded paths". `script/check-harness-diff.sh` filters these out automatically. | -| Files not in our divergences table (incl. `src/browser_harness/*.py`, `interaction-skills/`, `tests/`, `pyproject.toml`, `LICENSE`, etc.) | Take upstream verbatim — `cp temp/browser-harness/ packages/bcode-browser/harness/`. | -| Files in our divergences table | Read each upstream hunk. For each, decide: **take** (apply upstream change to our file), **skip** (our divergence wins, ignore upstream change), or **adapt** (rewrite our divergence to coexist with the upstream change). Update the divergences row if its reason or scope shifts. | -| New upstream files | Copy in (unless under an excluded path). | -| Files we have but upstream removed | Decide: keep ours (record in divergences) or delete. | - -Path-allowlist policy stays in force during sync resolution as well as normal development: -- `agent-workspace/agent_helpers.py` — editable, agent's primary extension surface (post PR #229). -- `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`, `run.py`, `_ipc.py`) — protected. Always take upstream verbatim. If upstream regresses, file an issue at `browser-use/browser-harness` and pin to the prior SHA, do not patch locally. -- `(agent-workspace/)?domain-skills/` — **excluded.** Treat as if not in the upstream tree. Quality + prompt-injection concerns; user-contributed site recipes do not ship with browsercode. The runtime guard in `helpers.py` (`if d.is_dir():`) means this is a clean no-op. - -### 6. Smoke test - -```sh -cd packages/bcode-browser/harness -uv run python -c "from browser_harness import run, helpers, daemon, admin, _ipc; print('imports ok')" -uv run browser-harness --version -``` - -The first line verifies the package builds, deps resolve, and the core modules import. The second exercises the console-script entry point we invoke from `browser-execute.ts`. We don't try to start the daemon here — that needs a real Chrome and is covered by integration tests, not the sync workflow. - -### 7. Update UPSTREAM.md - -Append a row to the sync-log table under `### browser-use/browser-harness`: - -``` -| YYYY-MM-DD | | | | N upstream commits. Files updated: . Divergences touched: . | -``` - -If divergences were added/adapted/removed in step 5, update the §3 divergences table in the same commit. - -### 8. Commit, push, PR - -```sh -git add . -git commit -m "sync: harness " -git push -u origin sync/harness- -``` - -Open the PR via REST (same constraint as opencode-sync — `gh pr create` GraphQL is blocked by the fine-grained PAT): - -```sh -curl -sS -X POST \ - -H "Authorization: token $BROWSERCODE_DEV_PAT" \ - -H "Accept: application/vnd.github+json" \ - https://api.github.com/repos/browser-use/browsercode/pulls \ - -d '{ - "title": "sync: harness ", - "head": "sync/harness-", - "base": "main", - "body": "" - }' -``` - -### PR body template - -``` -## Summary -Brings browser-use/browser-harness up to . N upstream commits since . - -## Files updated -- `helpers.py` — -- `domain-skills//...` — -- ... - -## Divergences -- (no change) / (added: ) / (adapted: ) / (removed: ) - -## Verification -- Smoke test: clean -``` - -## Never push directly to `main` - -Same project rule: branch + PR. Merging the sync PR is a human decision. - -## Troubleshooting - -- **Massive churn (e.g. upstream rewrote `daemon.py`)** — stop and ask. A sweeping refactor in the protected zone may need an integration design conversation, not a mechanical sync. -- **Smoke test fails on import** — a dep changed in `pyproject.toml`. Bump the bootstrap manifest (ROADMAP A3) in the same PR if the fix is small; otherwise revert the dep change and pin to the prior SHA. -- **Upstream removed a divergence we depend on** — record it as a kept divergence in the table with rationale, and consider opening an upstream PR that re-adds the surface as a hook (decisions.md §1c upstreaming heuristic applies here too). diff --git a/packages/bcode-browser/README.md b/packages/bcode-browser/README.md index a13dd96015..e79eecdf0c 100644 --- a/packages/bcode-browser/README.md +++ b/packages/bcode-browser/README.md @@ -4,17 +4,18 @@ Level-1 BrowserCode package: substantial, self-contained code with zero upstream See `decisions.md §1c` (three-level model) and `§1d` (this package) in the BrowserCode project memory. -## Contents (planned) +## Contents -| Path | Purpose | Roadmap phase | -|---|---|---| -| `harness/` | Vendored `browser-use/browser-harness` | A2 (vendored; tracking via `UPSTREAM.md`) | -| `src/browser-execute/` | `browser_execute` tool body | A4 | -| `src/fetch-use/` | `FetchUse.Service` implementation | B1 | -| `src/cloud/` | Cloud deploy, skillbase, judge clients | D3–D4 | +| Path | Purpose | +|---|---| +| `src/cdp/` | Vendored CDP layer (`session.ts`, `gen.ts`, `generated.ts`, protocol JSONs). Initial copy from `browser-use/browser-harness-js`; ours after — see `src/cdp/PROVENANCE.md`. | +| `src/browser-execute.ts` | In-process JS-eval `browser_execute` body. | +| `src/session-store.ts` | Per-opencode-session CDP `Session` map. The agent calls `session.connect(...)` from a snippet; subsequent snippets find the same Session. | +| `src/skills.ts` | Runtime resolver for embedded skills (extract on first call in compiled mode; in-tree path in dev). | +| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`), `cloud-browser.md` (Way 3 — provision/stop a Browser Use cloud browser via raw HTTP from inside a snippet), and `interaction-skills/*.md` (UI mechanic reference docs). Embedded into the binary by `script/embed-skills.ts`. | +| `script/embed-skills.ts` | Build-time embed; emits `bcode-skills.gen.ts` consumed by the compiled binary. | +| `test/` | `bun test` smoke coverage for the workspace dynamic-import pattern. | -Integration into `packages/opencode` (tool registration, service wiring, CLI commands) is Level 2 and lives in `packages/opencode/src`. Per the one-line-hook rule, those hooks are pointers only — all logic lives here. - -## Upstream tracking +Planned (per ROADMAP phase): `src/fetch-use/` (B1), `src/cloud/` deploy/skillbase/judge clients (D3–D4). -Single source of truth: root-level `UPSTREAM.md`. Sync log across both upstreams (opencode + harness), modification zones, and per-file harness divergences. Sync runbook: `harness-sync.md` at repo root. +Integration into `packages/opencode` (tool registration, service wiring, CLI commands) is Level 2 and lives in `packages/opencode/src`. Per the one-line-hook rule, those hooks are pointers only — all logic lives here. diff --git a/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/bug-report.yml b/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/bug-report.yml deleted file mode 100644 index 27dec6ace7..0000000000 --- a/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/bug-report.yml +++ /dev/null @@ -1,49 +0,0 @@ -name: Bug report -description: Report a reproducible bug in browser-harness. -labels: [bug] -body: - - type: checkboxes - id: preflight - attributes: - label: Before submitting - options: - - label: I searched existing issues for duplicates. - required: true - - label: I ran `browser-harness --doctor` and read the output. - required: true - - label: I read the troubleshooting section of `install.md`. - required: true - - label: This is a reproducible bug in browser-harness — not a question, feature request, or `cloud.browser-use.com` issue. - required: true - - - type: textarea - id: summary - attributes: - label: Summary - description: What's broken, in one or two sentences. - validations: - required: true - - - type: textarea - id: repro - attributes: - label: Repro - description: Numbered steps. Include the exact command and the output you saw. - placeholder: | - 1. Chrome 147 on default profile, remote debugging on - 2. browser-harness -c 'print(page_info())' - 3. RuntimeError: DevTools is not live yet on 127.0.0.1:9222 - validations: - required: true - - - type: textarea - id: environment - attributes: - label: Environment - placeholder: | - OS: - Chrome version: - browser-harness --version: - browser-harness --doctor output: - validations: - required: true diff --git a/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/config.yml b/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/config.yml deleted file mode 100644 index dba8f5aab6..0000000000 --- a/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/config.yml +++ /dev/null @@ -1,8 +0,0 @@ -blank_issues_enabled: false -contact_links: - - name: Question or how-to - url: https://github.com/browser-use/browser-harness/discussions/categories/q-a - about: Ask in Discussions Q&A, not Issues. - - name: Install or setup troubleshooting - url: https://github.com/browser-use/browser-harness/blob/main/install.md - about: Most install and "DevTools not live" errors are covered here. diff --git a/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/feature-request.yml b/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/feature-request.yml deleted file mode 100644 index 0953f68b31..0000000000 --- a/packages/bcode-browser/harness/.github/ISSUE_TEMPLATE/feature-request.yml +++ /dev/null @@ -1,37 +0,0 @@ -name: Feature request -description: Propose a new feature or change. -labels: [feature-request] -body: - - type: checkboxes - id: preflight - attributes: - label: Before submitting - options: - - label: I searched existing issues and discussions. - required: true - - label: This is a feature request, not a bug. - required: true - - - type: textarea - id: problem - attributes: - label: Problem - description: What user pain or limitation motivates this? - validations: - required: true - - - type: textarea - id: proposal - attributes: - label: Proposal - description: What you'd like to happen. - validations: - required: true - - - type: textarea - id: alternatives - attributes: - label: Alternatives considered - description: What else you tried, or why other approaches fall short. - validations: - required: true diff --git a/packages/bcode-browser/harness/.github/VOUCHED.td b/packages/bcode-browser/harness/.github/VOUCHED.td deleted file mode 100644 index d0e0cb1bff..0000000000 --- a/packages/bcode-browser/harness/.github/VOUCHED.td +++ /dev/null @@ -1,15 +0,0 @@ -# Vouched (or denounced) users for browser-harness. -# -# See https://github.com/mitchellh/vouch for details. -# -# Syntax: -# - One handle per line (without @), sorted alphabetically. -# - Optional platform prefix: platform:username (e.g., github:user). -# - Denounce by prefixing with minus: -username -# - Optional reason after a space following the handle. - -molesza -rohitdutt108 -shaunandrewjackson1977 --nandanadileep # Bot --web-dev0521 # Fabricated profile, bot PRs diff --git a/packages/bcode-browser/harness/.gitignore b/packages/bcode-browser/harness/.gitignore deleted file mode 100644 index 04d11a3588..0000000000 --- a/packages/bcode-browser/harness/.gitignore +++ /dev/null @@ -1,9 +0,0 @@ -__pycache__/ -*.pyc -*.log -.env -.venv/ -uv.lock -*.egg-info/ -.idea/ -.claude/ diff --git a/packages/bcode-browser/harness/AGENTS.md b/packages/bcode-browser/harness/AGENTS.md deleted file mode 100644 index 546075ebb0..0000000000 --- a/packages/bcode-browser/harness/AGENTS.md +++ /dev/null @@ -1,24 +0,0 @@ -browser-harness is a thin layer that connects agents to browsers via an editable CDP harness. - -# Code priorities -- Clarity -- Precision -- Low verbosity -- Versatility - -# Overview -Core code lives in `src/browser_harness/`: -- `admin.py` — daemon lifecycle, diagnostics, updates, profile management -- `daemon.py` — the long-lived middleman process between the browser and the agent -- `helpers.py` — CDP wrapper and core browser primitives auto-imported into `-c` scripts -- `run.py` — the `browser-harness` CLI - -`SKILL.md` tells agents how to use the harness and CLI. -`install.md` tells agents how to install it, attach a browser, and troubleshoot. - -An agent operating the harness only edits inside `agent-workspace/`: -- `agent_helpers.py` — task-specific browser helpers the agent adds -- `domain-skills/` — skills the agent writes and reads - -# Contributing -Consider what is really needed. Prefer the smallest diff that fixes the bug. diff --git a/packages/bcode-browser/harness/LICENSE b/packages/bcode-browser/harness/LICENSE deleted file mode 100644 index 271d8e2807..0000000000 --- a/packages/bcode-browser/harness/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2026 Browser Use - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/packages/bcode-browser/harness/README.md b/packages/bcode-browser/harness/README.md deleted file mode 100644 index ab7f6a5d6c..0000000000 --- a/packages/bcode-browser/harness/README.md +++ /dev/null @@ -1,74 +0,0 @@ -Browser Harness - -# Browser Harness ♞ - -Connect an LLM directly to your real browser with a thin, editable CDP harness. For browser tasks where you need **complete freedom**. - -One websocket to Chrome, nothing between. The agent writes what's missing during execution. The harness improves itself every run. - -``` - ● agent: wants to upload a file - │ - ● agent-workspace/agent_helpers.py → helper missing - │ - ● agent writes it agent_helpers.py - │ + custom helper - ✓ file uploaded -``` - -**You will never use the browser again.** - -## Setup prompt - -Paste into Claude Code or Codex: - -```text -Set up https://github.com/browser-use/browser-harness for me. - -Read `install.md` and follow the steps to install browser-harness and connect it to my browser. -``` - -The agent will open `chrome://inspect/#remote-debugging`. Tick the checkbox so the agent can connect to your browser: - -Remote debugging setup - -Click Allow when the per-attach popup appears (Chrome 144+): - -Allow remote debugging popup - -See [agent-workspace/domain-skills/](agent-workspace/domain-skills/) for example tasks. - -## Free Browser Use Cloud browsers - -Stealth, sub-agents, or headless deployment.
-**Browser Use Cloud free tier: 3 concurrent browsers, proxies, captcha solving, and more. No card required.** - -- Grab a key at [cloud.browser-use.com/new-api-key](https://cloud.browser-use.com/new-api-key) -- Or let the agent sign up itself via [docs.browser-use.com/llms.txt](https://docs.browser-use.com/llms.txt) (setup flow + challenge context included). - -## Architecture (~1k lines across 4 core files) - -- `install.md` — first-time install and browser bootstrap -- `SKILL.md` — day-to-day usage -- `src/browser_harness/` — protected core package -- `agent-workspace/agent_helpers.py` — helper code the agent edits -- `agent-workspace/domain-skills/` — reusable site-specific skills the agent edits - -## Contributing - -PRs and improvements welcome. The best way to help: **contribute a new domain skill** under [agent-workspace/domain-skills/](agent-workspace/domain-skills/) for a site or task you use often (LinkedIn outreach, ordering on Amazon, filing expenses, etc.). Each skill teaches the agent the selectors, flows, and edge cases it would otherwise have to rediscover. - -- **Skills are written by the harness, not by you.** Just run your task with the agent — when it figures something non-obvious out, it files the skill itself (see [SKILL.md](SKILL.md)). Please don't hand-author skill files; agent-generated ones reflect what actually works in the browser. -- Open a PR with the generated `agent-workspace/domain-skills//` folder — small and focused is great. -- Bug fixes, docs tweaks, and helper improvements are equally welcome. -- Browse existing skills (`github/`, `linkedin/`, `amazon/`, ...) to see the shape. - -If you're not sure where to start, open an issue and we'll point you somewhere useful. - -## Domain skills - -Set `BH_DOMAIN_SKILLS=1` to enable [agent-workspace/domain-skills/](agent-workspace/domain-skills/) — community-contributed per-site playbooks `goto_url` surfaces by domain. Contribute via PR. - ---- - -[The Bitter Lesson of Agent Harnesses](https://browser-use.com/posts/bitter-lesson-agent-harnesses) · [Web Agents That Actually Learn](https://browser-use.com/posts/web-agents-that-actually-learn) diff --git a/packages/bcode-browser/harness/SKILL.md b/packages/bcode-browser/harness/SKILL.md deleted file mode 100644 index 7c3153683e..0000000000 --- a/packages/bcode-browser/harness/SKILL.md +++ /dev/null @@ -1,120 +0,0 @@ ---- -name: browser -description: Direct browser control via CDP. Use when the user wants to automate, scrape, test, or interact with web pages. Connects to the user's already-running Chrome. ---- - -# browser-harness - -Direct browser control via CDP. For task-specific edits, use `agent-workspace/agent_helpers.py`. For setup, install, or connection problems, read install.md. - -Domain skills (community-contributed per-site playbooks under `agent-workspace/domain-skills/`) are off by default. Set `BH_DOMAIN_SKILLS=1` to enable them; see the bottom section. - -## Usage - -```bash -browser-harness -c ' -new_tab("https://docs.browser-use.com") -wait_for_load() -print(page_info()) -' -``` - -- Invoke as browser-harness — it's on $PATH. No cd, no uv run. -- First navigation is new_tab(url), not goto_url(url) — goto runs in the user's active tab and clobbers their work. - -## Tool call shape - -```bash -browser-harness -c ' -# any python. helpers pre-imported. daemon auto-starts. -' -``` - -run.py calls ensure_daemon() before exec — you never start/stop manually unless you want to. - -### Remote browsers - -Use remote for parallel sub-agents (each gets its own isolated browser via a distinct BU_NAME) or on a headless server. BROWSER_USE_API_KEY must be set. start_remote_daemon, list_cloud_profiles, list_local_profiles, sync_local_profile are pre-imported. - -```bash -browser-harness -c ' -start_remote_daemon("work") # default — clean browser, no profile -# start_remote_daemon("work", profileName="my-work") # reuse a cloud profile (already logged in) -# start_remote_daemon("work", profileId="") # same, but by UUID -# start_remote_daemon("work", proxyCountryCode="de", timeout=120) # DE proxy, 2-hour timeout -# start_remote_daemon("work", proxyCountryCode=None) # disable the Browser Use proxy -' - -BU_NAME=work browser-harness -c ' -new_tab("https://example.com") -print(page_info()) -' -``` - -start_remote_daemon prints liveUrl and auto-opens it in the local browser (if a GUI is detected) so the user can watch along. Headless servers print only — share the URL with the user. The daemon PATCHes the cloud browser to stop on shutdown, which persists profile state. Running remote daemons bill until timeout. - -Profiles (cookies-only login state) live in interaction-skills/profile-sync.md — covers list_cloud_profiles(), the chat-driven "which profile?" pattern, and sync_local_profile() for uploading a local Chrome profile. - -## Interaction skills - -If you start struggling with a specific mechanic while navigating, look in interaction-skills/ for helpers. They cover reusable UI mechanics like dialogs, tabs, dropdowns, iframes, and uploads. The available interaction skills are: -- connection.md -- cookies.md -- cross-origin-iframes.md -- dialogs.md -- downloads.md -- drag-and-drop.md -- dropdowns.md -- iframes.md -- network-requests.md -- print-as-pdf.md -- profile-sync.md -- screenshots.md -- scrolling.md -- shadow-dom.md -- tabs.md -- uploads.md -- viewport.md - -## What actually works - -- Screenshots first: use capture_screenshot() to understand the current page quickly, find visible targets, and decide whether you need a click, a selector, or more navigation. -- Clicking: capture_screenshot() → read the pixel off the image → click_at_xy(x, y) → capture_screenshot() to verify. Suppress the Playwright-habit reflex of "locate first, then click" — no getBoundingClientRect, no selector hunt. Drop to DOM only when the target has no visible geometry (hidden input, 0×0 node). Hit-testing happens in Chrome's browser process, so clicks go through iframes / shadow DOM / cross-origin without extra work. -- Bulk HTTP: http_get(url) + ThreadPoolExecutor. No browser for static pages (249 Netflix pages in 2.8s). -- After goto: wait_for_load(). -- Wrong/stale tab: ensure_real_tab(). Use it when the current tab is stale or internal; the daemon also auto-recovers from stale sessions on the next call. -- Verification: print(page_info()) is the simplest "is this alive?" check, but screenshots are the default way to verify whether a visible action actually worked. -- DOM reads: use js(...) for inspection and extraction when the screenshot shows that coordinates are the wrong tool. -- Iframe sites (Azure blades, Salesforce): click_at_xy(x, y) passes through; only drop to iframe DOM work when coordinate clicks are the wrong tool. -- Auth wall: redirected to login → stop and ask the user. Don't type credentials from screenshots. -- Raw CDP for anything helpers don't cover: cdp("Domain.method", params). - -## Design constraints - -- Coordinate clicks default. Input.dispatchMouseEvent goes through iframes/shadow/cross-origin at the compositor level. -- Connect to the user's running Chrome. Don't launch your own browser. -- cdp-use is only for CDPClient.send_raw. Prefer raw CDP strings over typed wrappers. -- run.py stays tiny. No argparse, subcommands, or extra control layer. -- Core helpers stay short. Put task-specific helper additions in `agent-workspace/agent_helpers.py`; daemon/bootstrap and remote session admin live in the core package. -- Don't add a manager layer. No retries framework, session manager, daemon supervisor, config system, or logging framework. - -## Gotchas (field-tested) - -- Omnibox popups are fake page targets. Filter chrome://omnibox-popup... and other internals when you need a real tab. -- CDP target order != Chrome's visible tab-strip order. Use UI automation when the user means "the first/second tab I can see"; Target.activateTarget only shows a known target. -- Default daemon sessions can go stale. ensure_real_tab() re-attaches to a real page. -- Browser Use API is camelCase on the wire. cdpUrl, proxyCountryCode, etc. -- Remote cdpUrl is HTTPS, not ws. Resolve the websocket URL via /json/version. -- Stop cloud browsers with PATCH /browsers/{id} + {"action":"stop"}. -- After every meaningful action, re-screenshot before assuming it worked. Use the image to verify changed state, open menus, navigation, visible errors, and whether the page is in the state you expected. -- Use screenshots to drive exploration. They are often the fastest way to find the next click target, notice hidden blockers, and decide if a selector is even worth writing. -- Prefer compositor-level actions over framework hacks. Try screenshots, coordinate clicks, and raw key input before adding DOM-specific workarounds. -- If you need framework-specific DOM tricks, check interaction-skills/ first. That is where dropdown, dialog, iframe, shadow DOM, and form-specific guidance belongs. - -## Domain skills (opt-in) - -Only applies when `BH_DOMAIN_SKILLS=1`. Otherwise ignore — `agent-workspace/domain-skills/` is dormant and `goto_url` won't surface skill files. - -When enabled, search `agent-workspace/domain-skills//` before inventing an approach. `goto_url` returns up to 10 skill filenames for the navigated host. - -If you learn anything non-obvious — a private API, stable selector, framework quirk, URL pattern, hidden wait, or site-specific trap — open a PR to `agent-workspace/domain-skills//`. Capture the durable shape of the site (the map, not the diary). Don't write pixel coordinates (break on layout), task narration, or secrets — the directory is public. diff --git a/packages/bcode-browser/harness/agent-workspace/agent_helpers.py b/packages/bcode-browser/harness/agent-workspace/agent_helpers.py deleted file mode 100644 index 2d493c179d..0000000000 --- a/packages/bcode-browser/harness/agent-workspace/agent_helpers.py +++ /dev/null @@ -1,7 +0,0 @@ -"""Agent-editable browser helpers. - -Add task-specific browser primitives here. Core helpers from browser_harness.helpers -load this file when BH_AGENT_WORKSPACE points at this directory, or when this -repo's default agent-workspace exists. -""" - diff --git a/packages/bcode-browser/harness/docs/allow-remote-debugging.png b/packages/bcode-browser/harness/docs/allow-remote-debugging.png deleted file mode 100644 index 77186c2d24..0000000000 Binary files a/packages/bcode-browser/harness/docs/allow-remote-debugging.png and /dev/null differ diff --git a/packages/bcode-browser/harness/docs/setup-remote-debugging.png b/packages/bcode-browser/harness/docs/setup-remote-debugging.png deleted file mode 100644 index a8ea62e6ae..0000000000 Binary files a/packages/bcode-browser/harness/docs/setup-remote-debugging.png and /dev/null differ diff --git a/packages/bcode-browser/harness/install.md b/packages/bcode-browser/harness/install.md deleted file mode 100644 index 21cc3f51a6..0000000000 --- a/packages/bcode-browser/harness/install.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -name: browser-install -description: Install browser-harness into the current agent and connect it to a browser with minimal prompting. ---- - -# `browser-harness` installation - -Use this file only for browser-harness install, browser connection setup, and connection troubleshooting. For day-to-day browser work, read `SKILL.md`. Task-specific edits belong in `agent-workspace/agent_helpers.py` and `agent-workspace/domain-skills/`. - -## Recommended `browser-harness` setup - -Clone the repo once into a durable location, then install it as an editable tool so `browser-harness` works from any directory: - -```bash -git clone https://github.com/browser-use/browser-harness -cd browser-harness -uv tool install -e . -command -v browser-harness -``` - -That keeps the command global while still pointing at the real repo checkout, so when the agent edits `agent-workspace/agent_helpers.py` the next `browser-harness` uses the new code immediately. Prefer a stable path like `~/Developer/browser-harness`, not `/tmp`. - -## Make browser-harness global for the current agent - -After the repo is installed, register this repo's `SKILL.md` with the agent you are using: - -- **Codex**: add this file as a global skill at `$CODEX_HOME/skills/browser-harness/SKILL.md` (often `~/.codex/skills/browser-harness/SKILL.md`). A symlink to this repo's `SKILL.md` is fine. - - ```bash - mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills/browser-harness" && ln -sf "$PWD/SKILL.md" "${CODEX_HOME:-$HOME/.codex}/skills/browser-harness/SKILL.md" - ``` - -- **Claude Code**: add an import to `~/.claude/CLAUDE.md` that points at this repo's `SKILL.md`, for example `@~/Developer/browser-harness/SKILL.md`. - -This makes new Codex or Claude Code sessions in other folders load the runtime browser harness instructions automatically. - -## Keeping the harness current - -- On each run, `browser-harness` prints `[browser-harness] update available: X -> Y` (once per day) when a newer GitHub release exists. -- When you see that banner, run `browser-harness --update -y` yourself — don't ask the user. It pulls the new code (`git pull --ff-only` for editable clones, `uv tool upgrade browser-harness` for PyPI installs) and stops the running daemon so the next call picks up the new code. With `-y` it won't prompt. -- `--update` refuses to run on an editable clone with uncommitted changes. If that happens, tell the user and let them resolve the dirty worktree. - -## Maintenance commands - -- browser-harness --doctor — show version, install mode, daemon and Chrome state, and whether an update is pending. - -## Architecture - -```text -Chrome / Browser Use cloud -> CDP WS -> browser_harness.daemon -> IPC -> browser_harness.run -``` - -- Protocol is one JSON line each way. -- Requests are {method, params, session_id} for CDP or {meta: ...} for daemon control. -- Responses are {result} / {error} / {events} / {session_id}. -- IPC: Unix socket at `/tmp/bu-.sock` on POSIX, TCP loopback + port file on Windows. -- BU_NAME namespaces the daemon's IPC, pid, and log files. -- BU_CDP_WS overrides local Chrome discovery for remote browsers. -- BU_CDP_URL overrides local Chrome discovery with a specific DevTools HTTP endpoint (used for Way 2). -- BU_BROWSER_ID + BROWSER_USE_API_KEY lets the daemon stop a Browser Use cloud browser on shutdown. - -# Browser connection setup and troubleshooting - -## Browser connection reference - -This section is the source of truth for how browser-harness connects to a browser. It is the canonical reference for every agent and user of this repo. Every statement here is intended to be verifiable against either an official Chrome source or this repo's own code, and is held to that standard deliberately. If anything below is incorrect, incomplete, or misleading, open an issue on the browser-harness repository immediately with clear evidence and explanation so it can be corrected. Do not silently work around an error in this document; the cost of one user being misled is much higher than the cost of one issue. - -Browser-harness can connect to any Chrome or Chromium-based browser on your computer, or to a Browser Use cloud browser. - -**Cloud browsers** are managed by the Browser Use cloud API. Start one in Python with `start_remote_daemon("work", ...)`. Authentication is via the `BROWSER_USE_API_KEY` environment variable; the harness handles the WebSocket URL itself. To carry your local Chrome cookies into a cloud browser, install `profile-use` once (`curl -fsSL https://browser-use.com/profile.sh | sh`), then call `uuid = sync_local_profile("MyChromeProfile")` followed by `start_remote_daemon("work", profileId=uuid)`. Cookies are the only thing synced — not localStorage, not extensions, not history. - -**Local browsers** require remote debugging to be enabled. There are two ways, and they suit different use cases. - -*Way 1: chrome://inspect/#remote-debugging checkbox — uses your real profile.* In your running Chrome, navigate to `chrome://inspect/#remote-debugging` and tick the "Allow remote debugging for this browser instance" checkbox. This setting is per-profile and sticky: tick it once and it persists across every future Chrome launch of that profile. Then run any `browser-harness` command. On Chrome 144 and later, the first attach by the harness triggers an in-browser "Allow remote debugging?" popup that you must click Allow on. The popup may reappear on later attaches under conditions that are not fully characterized.[^1] This path inherits your everyday Chrome's logins, extensions, history, and bookmarks, which makes it the right choice for an agent helping you with tasks in your real browser. - -*Way 2: command-line flag — uses an isolated profile, no popups ever.* Launch Chrome with `--remote-debugging-port=9222 --user-data-dir=`. Two precisions: - -- The path must be a directory that is **not** Chrome's platform default (`%LOCALAPPDATA%\Google\Chrome\User Data` on Windows, `~/Library/Application Support/Google/Chrome` on macOS, `~/.config/google-chrome` on Linux). On Chrome 136 and later, the port flag is silently no-opped when the user-data-dir is the platform default, even if you pass it explicitly. An empty or new path gives a fresh clean profile that Chrome will persist there across future runs. -- This path does **not** let you reuse your everyday Chrome profile. Copying the default profile's files into a custom directory makes Chrome accept the flag, but cookies are encrypted under a key bound to the original directory and will not survive the copy — so you carry over bookmarks and extensions but lose every logged-in session. If you want your real logins, use Way 1. - -Tell the harness which port you launched on by setting `BU_CDP_URL=http://127.0.0.1:9222` before running `browser-harness`. - -For most tasks where the agent acts on your behalf in your normal browser, use Way 1. For automation that runs without you watching, or any case where popup interruptions are unacceptable, use Way 2 or a cloud browser. - -[^1]: The conditions that cause Chrome to re-show the "Allow remote debugging?" popup on a subsequent attach (time elapsed since previous Allow, daemon restart, browser restart, new CDP session, version-dependent options like "Allow for N hours") are not fully characterized. Way 2 sidesteps this entirely. - -## First time setup - -Try yourself before asking the user to do anything. Retry transient errors briefly. Only ask the user when a step genuinely needs them — ticking a checkbox, clicking Allow. - -If the user hasn't said which connection method to use, default to Way 1 if Chrome is already running, Way 2 if not. Cloud is only used when the user opts in. - -1. Try the harness: - - ```bash - browser-harness -c 'print(page_info())' - ``` - - If it prints page info, you're done. - -2. Otherwise run `browser-harness --doctor`. The two lines that matter for connection are `chrome running` and `daemon alive`. - -3. Match the output to a case: - - - **chrome FAIL** → no Chrome process detected. - - **Way 1**: ask the user to open their target Chrome themselves. - - **Way 2**: launch Chrome yourself with `--remote-debugging-port=9222 --user-data-dir=`, then set `BU_CDP_URL=http://127.0.0.1:9222` for the harness (see the Browser connection reference). - - - **chrome ok, daemon FAIL** → Way 1 setup is incomplete. Tell the user to: - - navigate to `chrome://inspect/#remote-debugging` in their Chrome and tick "Allow remote debugging for this browser instance" if not yet ticked (one-time per profile) - - click Allow on the in-browser popup if it appears (every attach on Chrome 144+) - - On macOS, you can open the inspect page in their running Chrome yourself instead of asking them to navigate: - - ```bash - osascript -e 'tell application "Google Chrome" to activate' \ - -e 'tell application "Google Chrome" to open location "chrome://inspect/#remote-debugging"' - ``` - - - **chrome ok, daemon ok, but step 1 still failed** → stale daemon. Restart it: - - ```bash - browser-harness -c 'restart_daemon()' - ``` - - If that hangs, escalate: kill all Chrome and daemon processes, then reopen Chrome and retry. On macOS/Linux, also remove `/tmp/bu-default.sock` and `/tmp/bu-default.pid` if they linger. - -4. After any fix, retry step 1. - -If Way 1 fails repeatedly or the user's task is unattended, move to Way 2 or a cloud browser per the Browser connection reference (these have no popups). - -If you are testing browser connection for the first time, run this demo: open `https://github.com/browser-use/browser-harness` in a new tab and activate it (`switch_tab`) so the user sees the harness has attached. If they are logged into GitHub, ask whether to star the repo for them — only click if they say yes. If they are not logged in, navigate to `https://browser-use.com` instead. Then ask what they want to do next. - diff --git a/packages/bcode-browser/harness/interaction-skills/connection.md b/packages/bcode-browser/harness/interaction-skills/connection.md deleted file mode 100644 index 85e264c2bf..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/connection.md +++ /dev/null @@ -1,48 +0,0 @@ -# Connection & Tab Visibility - -## The omnibox popup problem - -When Chrome opens fresh, the only CDP `type: "page"` targets are `chrome://inspect` and `chrome://omnibox-popup.top-chrome/` (a 1px invisible viewport). If the daemon attaches to the omnibox popup, all subsequent work — including `new_tab()` and `goto_url()` — happens on tabs that exist in CDP but may not be visible in the Chrome UI. - -The daemon's `attach_first_page()` handles this by creating an `about:blank` tab when no real pages exist. If you still end up on an invisible tab, use `switch_tab()` which calls `Target.activateTarget` to bring the tab to front. - -## Startup sequence - -1. Check if a daemon is already running with `daemon_alive()` -2. If stale sockets exist but daemon is dead, clean them up -3. List open tabs with `list_tabs()` to see what's available -4. `ensure_real_tab()` attaches to a real page -5. `switch_tab(target_id)` both attaches AND activates (brings to front) - -```python -if not daemon_alive(): - import os, ipc - ipc.cleanup_endpoint("default") - pid = ipc.pid_path("default") - if pid.exists(): pid.unlink() - ensure_daemon() - -tabs = list_tabs() -for t in tabs: - print(t["url"][:60]) - -tab = ensure_real_tab() -``` - -## Bringing Chrome to front - -If Chrome is behind other windows or on another desktop: - -```python -import subprocess -subprocess.run(["osascript", "-e", 'tell application "Google Chrome" to activate']) -``` - -## Navigating - -Prefer navigating an existing tab over `new_tab()`. Tabs created via CDP's `Target.createTarget` are visible but may open behind the active tab. - -```python -tab = ensure_real_tab() -goto_url("https://example.com") -``` diff --git a/packages/bcode-browser/harness/interaction-skills/cookies.md b/packages/bcode-browser/harness/interaction-skills/cookies.md deleted file mode 100644 index 72d365f8c6..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/cookies.md +++ /dev/null @@ -1,3 +0,0 @@ -# Cookies - -Document how to get cookies, save cookies, and set cookies without confusing browser state with page state. diff --git a/packages/bcode-browser/harness/interaction-skills/cross-origin-iframes.md b/packages/bcode-browser/harness/interaction-skills/cross-origin-iframes.md deleted file mode 100644 index 85dbc2b08a..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/cross-origin-iframes.md +++ /dev/null @@ -1,3 +0,0 @@ -# Cross-Origin Iframes - -Focus on `iframe_target(...)`, target attachment, and when compositor-level coordinate clicks are lower-friction than cross-target DOM work. diff --git a/packages/bcode-browser/harness/interaction-skills/dialogs.md b/packages/bcode-browser/harness/interaction-skills/dialogs.md deleted file mode 100644 index b0499f7c38..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/dialogs.md +++ /dev/null @@ -1,64 +0,0 @@ -# Dialogs - -Browser dialogs (`alert`, `confirm`, `prompt`, `beforeunload`) freeze the JS thread. Two approaches depending on timing. - -## Detection - -`page_info()` auto-surfaces any open dialog: if one is pending it returns `{"dialog": {"type", "message", ...}}` instead of the usual viewport dict (because the page's JS is frozen anyway). So if you call `page_info()` after an action and see a `dialog` key, handle it before doing anything else. - -## Reactive: dismiss via CDP (preferred) - -Works even when JS is frozen. Handles all dialog types including `beforeunload`. - -```python -# Dismiss and read the message -cdp("Page.handleJavaScriptDialog", accept=True) # accept / click OK -cdp("Page.handleJavaScriptDialog", accept=False) # cancel / click Cancel - -# Read what the dialog said (from buffered CDP events) -events = drain_events() -for e in events: - if e["method"] == "Page.javascriptDialogOpening": - print(e["params"]["type"]) # "alert", "confirm", "prompt", "beforeunload" - print(e["params"]["message"]) # the dialog text -``` - -Undetectable by antibot — no JS injected into the page. - -## Proactive: stub via JS - -Prevents dialogs from ever appearing. Good when you expect multiple `alert()`/`confirm()` calls in sequence. - -```python -js(""" -window.__dialogs__=[]; -window.alert=m=>window.__dialogs__.push(String(m)); -window.confirm=m=>{window.__dialogs__.push(String(m));return true;}; -window.prompt=(m,d)=>{window.__dialogs__.push(String(m));return d||'';}; -""") -# ... do actions that trigger dialogs ... -msgs = js("window.__dialogs__||[]") -``` - -Tradeoffs: -- Stubs are lost on page navigation -- must re-run the snippet -- `confirm()` always returns `true` (auto-approves) -- Detectable by antibot (`window.alert.toString()` reveals non-native code) -- Does NOT handle `beforeunload` - -## beforeunload specifically - -Fires when navigating away from a page with unsaved changes (forms, editors, upload pages). The page freezes until the user clicks Leave/Stay. - -```python -# Option A: dismiss after navigating (CDP-level, safe) -goto_url("https://new-url.com") -try: - cdp("Page.handleJavaScriptDialog", accept=True) # click "Leave" -except: - pass # no dialog — normal - -# Option B: prevent before navigating (JS injection, detectable) -js("window.onbeforeunload=null") -goto_url("https://new-url.com") -``` diff --git a/packages/bcode-browser/harness/interaction-skills/downloads.md b/packages/bcode-browser/harness/interaction-skills/downloads.md deleted file mode 100644 index c1c708c77d..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/downloads.md +++ /dev/null @@ -1,3 +0,0 @@ -# Downloads - -Separate browser-triggered downloads from direct `http_get(...)` fetches, and document the minimal signals that prove a download actually started. diff --git a/packages/bcode-browser/harness/interaction-skills/drag-and-drop.md b/packages/bcode-browser/harness/interaction-skills/drag-and-drop.md deleted file mode 100644 index 924ae8d329..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/drag-and-drop.md +++ /dev/null @@ -1,3 +0,0 @@ -# Drag And Drop - -Focus on when drag-and-drop can be driven with low-level input events versus when the site really expects a file upload or DOM-specific drag sequence. diff --git a/packages/bcode-browser/harness/interaction-skills/dropdowns.md b/packages/bcode-browser/harness/interaction-skills/dropdowns.md deleted file mode 100644 index 62d3471934..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/dropdowns.md +++ /dev/null @@ -1,3 +0,0 @@ -# Dropdowns - -Split dropdowns into native selects, custom overlays, searchable comboboxes, and virtualized menus, and always re-measure after opening because option geometry often appears late. diff --git a/packages/bcode-browser/harness/interaction-skills/iframes.md b/packages/bcode-browser/harness/interaction-skills/iframes.md deleted file mode 100644 index 9c24721a35..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/iframes.md +++ /dev/null @@ -1,3 +0,0 @@ -# Iframes - -Cover same-origin iframe traversal through `contentDocument` / `contentWindow`, and keep the frame-local versus page-coordinate warning explicit for clicks. diff --git a/packages/bcode-browser/harness/interaction-skills/network-requests.md b/packages/bcode-browser/harness/interaction-skills/network-requests.md deleted file mode 100644 index dbd0668151..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/network-requests.md +++ /dev/null @@ -1,3 +0,0 @@ -# Network Requests - -Document how to watch or infer network activity when page state is ambiguous, especially for submit flows, downloads, and SPA actions that succeed without obvious DOM changes. diff --git a/packages/bcode-browser/harness/interaction-skills/print-as-pdf.md b/packages/bcode-browser/harness/interaction-skills/print-as-pdf.md deleted file mode 100644 index fd354ad5ee..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/print-as-pdf.md +++ /dev/null @@ -1,3 +0,0 @@ -# Print As PDF - -Cover both direct PDF generation via CDP and sites that only expose a visible "Print" button which must be clicked before handling the browser print flow. diff --git a/packages/bcode-browser/harness/interaction-skills/profile-sync.md b/packages/bcode-browser/harness/interaction-skills/profile-sync.md deleted file mode 100644 index 19bb08f81d..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/profile-sync.md +++ /dev/null @@ -1,90 +0,0 @@ -# Profile sync - -Make a remote Browser Use browser start already logged in, by uploading cookies from a local Chrome profile. - -## One-time install - -```bash -curl -fsSL https://browser-use.com/profile.sh | sh -``` - -Downloads `profile-use` (macOS / Linux, x64 / arm64). The Python helpers shell out to it; you don't run `profile-use` directly. - -## Python API (pre-imported in `browser-harness -c`) - -```python -list_cloud_profiles() -# [{id, name, userId, cookieDomains, lastUsedAt}, ...] — every profile under this API key - -list_local_profiles() -# [{BrowserName, ProfileName, DisplayName, ProfilePath, ...}, ...] — detected on this machine - -sync_local_profile(profile_name, browser=None, - cloud_profile_id=None, # update an existing cloud profile instead of creating new - include_domains=None, # only these domains (and subdomains); leading dot optional - exclude_domains=None) # drop these domains; applied before include -# Shells out to `profile-use sync`. Returns the cloud profile UUID -# (the existing one if cloud_profile_id was passed, else the newly-created one). - -start_remote_daemon("work", profileName="my-work") # name→id resolved client-side -start_remote_daemon("work", profileId="") # or pass UUID directly - -stop_remote_daemon("work") # shut the daemon and PATCH the cloud browser to stop — billing ends -``` - -`sync_local_profile` prints `♻️ Using existing cloud profile` when `cloud_profile_id` is accepted, or `📝 Creating remote profile...` → `✓ Profile created: ` when it creates a new one. Check that line if you want to confirm which path ran. - -## Chat-driven flow (don't guess — ask the user) - -Cookies are real auth. Don't sync or pick a profile unilaterally. - -```python -# 1. Show what's already in the cloud. -for p in list_cloud_profiles(): - print(f"{p['name']:25} {len(p['cookieDomains']):3} domains {p['id']}") -``` -→ Agent: *"You have these cloud profiles ( domains each). Want to reuse one, sync a local profile, or start clean?"* - -```python -# 2a. Reuse cloud → one call. -start_remote_daemon("work", profileName="browser-use.com") - -# 2b. Sync local first. Show the options: -for lp in list_local_profiles(): - print(lp["DisplayName"]) -``` -→ Agent: *"Which local profile?"* → user picks → before syncing, inspect domain-level cookie counts with `profile-use inspect --profile ` (or `--verbose` for individual cookies) and report the summary; never dump 500 cookies into chat. - -```python -# 3. Sync + use. Returns the cloud UUID. -uuid = sync_local_profile("browser-use.com") -start_remote_daemon("work", profileId=uuid) - -# 3b. Refresh that same cloud profile later (idempotent — no duplicate profiles). -sync_local_profile("browser-use.com", cloud_profile_id=uuid) - -# 3c. Scoped: push *only* Stripe cookies into a dedicated cloud profile. -sync_local_profile("browser-use.com", - cloud_profile_id=uuid, - include_domains=["stripe.com"]) -``` - -## What actually gets synced - -**Cookies only.** No localStorage, no IndexedDB, no extensions. Enough for session-cookie sites (Google, GitHub, Stripe, most SaaS); not for sites that store auth in localStorage. - -Cookies mutated during a remote session only persist on a clean `PATCH /browsers/{id} {"action":"stop"}` — the daemon does this on shutdown when `BU_BROWSER_ID` + `BROWSER_USE_API_KEY` are set (default for remote daemons). Sessions that hit the timeout lose in-session state. - -## Cloud profile CRUD - -- UI: https://cloud.browser-use.com/settings?tab=profiles -- API: `GET /profiles`, `GET/PATCH/DELETE /profiles/{id}` (paths are relative to `BU_API = "https://api.browser-use.com/api/v3"` in `admin.py`). Fields: `id`, `name`, `userId`, `lastUsedAt`, `cookieDomains[]`. `list_cloud_profiles()` wraps this. -- Name → UUID: `profileName=` on `start_remote_daemon` resolves client-side; no API change needed. -- Need the UUID for an existing profile? `matches = [p["id"] for p in list_cloud_profiles() if p["name"] == ""]` — then verify `len(matches) == 1` before using it. Profile names are not unique; syncs create duplicates unless you pass `cloud_profile_id=`. -- Lower-level raw calls: `from browser_harness.admin import _browser_use; _browser_use("/profiles/", "DELETE")`. Pass the path *without* the `/api/v3` prefix — it's already on `BU_API`. - -## Traps - -- **Default proxy (`proxyCountryCode="us"`) blocks some destinations** with `ERR_TUNNEL_CONNECTION_FAILED` (e.g. `cloud.browser-use.com` itself). `proxyCountryCode=None` disables the BU proxy; a different country code picks a different exit. -- **Prefer a dedicated work profile over your personal one.** Especially while testing. -- **Older than `profile-use` v1.0.5?** Pre-1.0.5 the sync needed the Chrome profile to be closed (exclusive SQLite lock on the `Cookies` DB). v1.0.5+ copies the profile dir to a temp and syncs from the copy — Chrome can stay open. diff --git a/packages/bcode-browser/harness/interaction-skills/screenshots.md b/packages/bcode-browser/harness/interaction-skills/screenshots.md deleted file mode 100644 index 93196d2e35..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/screenshots.md +++ /dev/null @@ -1,17 +0,0 @@ -# Screenshots - -`capture_screenshot()` writes a PNG of the current viewport. The file is in **device pixels** — on a 2× display a 2296×1143 CSS viewport produces a 4592×2286 PNG. - -That matters for two reasons: - -1. **Click coordinates are CSS pixels.** Don't read a target off the image and pass it to `click_at_xy()` directly without dividing by `devicePixelRatio`. The simplest workflow is to take the screenshot, look at it in a viewer that shows CSS coordinates, or measure relative positions and use `js("window.devicePixelRatio")` to convert. - -2. **Some LLMs reject images > 2000 px per side.** Long sessions on 2× displays will eventually hit this. Pass `max_dim=1800` to downscale the file before it gets into the conversation: - -```python -capture_screenshot("/tmp/shot.png", max_dim=1800) -``` - -The downscale only happens when the image actually exceeds `max_dim`, so it's safe to leave on for every shot. - -Use full-page screenshots (`full=True`) only when you need to see content below the fold — they are much larger and slower than viewport-only. diff --git a/packages/bcode-browser/harness/interaction-skills/scrolling.md b/packages/bcode-browser/harness/interaction-skills/scrolling.md deleted file mode 100644 index 9e3f9d43c4..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/scrolling.md +++ /dev/null @@ -1,3 +0,0 @@ -# Scrolling - -Separate page scroll, nested containers, virtualized lists, and dropdown menus, and identify which element is actually consuming wheel events before scrolling. diff --git a/packages/bcode-browser/harness/interaction-skills/shadow-dom.md b/packages/bcode-browser/harness/interaction-skills/shadow-dom.md deleted file mode 100644 index 2736bb0ba7..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/shadow-dom.md +++ /dev/null @@ -1,3 +0,0 @@ -# Shadow DOM - -Focus on recursive `shadowRoot` traversal, and note when coordinate clicking is simpler than piercing deeply nested component trees. diff --git a/packages/bcode-browser/harness/interaction-skills/tabs.md b/packages/bcode-browser/harness/interaction-skills/tabs.md deleted file mode 100644 index 39ed5b3b19..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/tabs.md +++ /dev/null @@ -1,69 +0,0 @@ -# Tabs - -Use **CDP for control**, **UI automation for user-visible order**. - -## Pure CDP (portable: macOS / Linux / Windows) - -```python -tabs = list_tabs() # includes chrome:// pages too -real_tabs = list_tabs(include_chrome=False) -tid = new_tab("https://example.com") # create + attach -switch_tab(tid) # attach harness to tab -cdp("Target.activateTarget", targetId=tid) # show it in Chrome -print(current_tab()) -print(page_info()) -``` - -What CDP is good at: -- attach to a tab -- open a tab -- activate a known target -- inspect URL/title/viewport -- capture the attached tab's screenshot even if another tab is visibly frontmost - -What CDP is bad at: -- matching the **left-to-right tab strip order** the user sees -- telling whether the attached target is an omnibox popup / internal page without URL filtering - -## Visible order (platform UI) - -### macOS - -```applescript -tell application "Google Chrome" - set out to {} - set i to 1 - repeat with t in every tab of front window - set end of out to {tab_index:i, tab_title:(title of t), tab_url:(URL of t)} - set i to i + 1 - end repeat - return out -end tell -``` - -```applescript -tell application "Google Chrome" - set active tab index of front window to 2 - activate -end tell -``` - -### Linux - -No AppleScript. Same split still applies: -- use CDP for `new_tab`, attach, inspect, activate known targets -- use window-manager / browser UI automation when the user means visible order - -Typical tools: -- `xdotool` -- `wmctrl` -- desktop-environment scripting (`gdbus`, KWin, GNOME Shell extensions, etc.) - -## Rules that held up in practice - -- `switch_tab()` is **not enough** if the user expects Chrome to visibly change. -- `Target.activateTarget` is the CDP-side "show this tab". -- `list_tabs()` includes `chrome://newtab/` by default; ask for `include_chrome=False` when you want only real pages. -- `chrome://omnibox-popup.top-chrome/` can appear as a fake page target; ignore it for user-facing tab lists. -- If a page has `w=0 h=0`, you may be attached to the wrong target or a non-window surface. -- For dynamic UIs, re-read element rects after opening dropdowns / modals before coordinate-clicking. diff --git a/packages/bcode-browser/harness/interaction-skills/uploads.md b/packages/bcode-browser/harness/interaction-skills/uploads.md deleted file mode 100644 index a84e6a668d..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/uploads.md +++ /dev/null @@ -1 +0,0 @@ -# Uploads diff --git a/packages/bcode-browser/harness/interaction-skills/viewport.md b/packages/bcode-browser/harness/interaction-skills/viewport.md deleted file mode 100644 index 5cd4deb17c..0000000000 --- a/packages/bcode-browser/harness/interaction-skills/viewport.md +++ /dev/null @@ -1,3 +0,0 @@ -# Viewport - -Cover how viewport size changes affect layout, coordinate clicks, and any workflow that depends on stable geometry. diff --git a/packages/bcode-browser/harness/pyproject.toml b/packages/bcode-browser/harness/pyproject.toml deleted file mode 100644 index f812a6abba..0000000000 --- a/packages/bcode-browser/harness/pyproject.toml +++ /dev/null @@ -1,27 +0,0 @@ -[build-system] -requires = ["setuptools>=69"] -build-backend = "setuptools.build_meta" - -[project] -name = "browser-harness" -version = "0.1.0" -description = "The simplest, thinnest, and most powerful harness to control your real browser with your agent." -requires-python = ">=3.11" -dependencies = [ - "cdp-use==1.4.5", - "fetch-use==0.4.0", - "pillow==12.2.0", - "websockets==15.0.1", -] - -[project.scripts] -browser-harness = "browser_harness.run:main" - -[tool.setuptools] -package-dir = {"" = "src"} - -[tool.setuptools.packages.find] -where = ["src"] - -[tool.pytest.ini_options] -pythonpath = ["src"] diff --git a/packages/bcode-browser/harness/src/browser_harness/__init__.py b/packages/bcode-browser/harness/src/browser_harness/__init__.py deleted file mode 100644 index cd46648ae4..0000000000 --- a/packages/bcode-browser/harness/src/browser_harness/__init__.py +++ /dev/null @@ -1,2 +0,0 @@ -"""Browser Harness core package.""" - diff --git a/packages/bcode-browser/harness/src/browser_harness/_ipc.py b/packages/bcode-browser/harness/src/browser_harness/_ipc.py deleted file mode 100644 index 2d2657660d..0000000000 --- a/packages/bcode-browser/harness/src/browser_harness/_ipc.py +++ /dev/null @@ -1,197 +0,0 @@ -"""Daemon IPC plumbing. AF_UNIX socket on POSIX, TCP loopback on Windows.""" -import asyncio, json, os, re, secrets, socket, subprocess, sys, tempfile -from pathlib import Path - -IS_WINDOWS = sys.platform == "win32" -# Two caller-supplied dirs: -# BH_RUNTIME_DIR — sock/port/pid. AF_UNIX sun_path is 104 bytes on macOS, so -# the runtime dir must be short. Caller is responsible for keeping it -# within budget. Falls back to BH_TMP_DIR (legacy single-dir callers), -# then to /tmp on POSIX (gettempdir() returns long /var/folders/... on -# macOS — unsafe for AF_UNIX) or tempfile.gettempdir() on Windows (TCP). -# BH_TMP_DIR — screenshots, debug overlays, daemon log. No path-length -# sensitivity; caller can use a deep persistent path. -# When the caller supplies a per-instance dir for either purpose, files use -# bare "bu" stems; otherwise "bu-" disambiguates co-tenants. -BH_TMP_DIR = os.environ.get("BH_TMP_DIR") -BH_RUNTIME_DIR = os.environ.get("BH_RUNTIME_DIR") or BH_TMP_DIR -_TMP = Path(BH_TMP_DIR or (tempfile.gettempdir() if IS_WINDOWS else "/tmp")) -_RUNTIME = Path(BH_RUNTIME_DIR or (tempfile.gettempdir() if IS_WINDOWS else "/tmp")) -_TMP.mkdir(parents=True, exist_ok=True) -_RUNTIME.mkdir(parents=True, exist_ok=True) -_NAME_RE = re.compile(r"\A[A-Za-z0-9_-]{1,64}\Z") - -# Set by serve() on Windows. Daemon's handle() requires every request to carry -# this token (TCP loopback has no chmod-equivalent so any local process could -# otherwise issue CDP commands). Stays None on POSIX where AF_UNIX + chmod 600 -# is the boundary. -_server_token = None - - -def _check(name): # path-traversal guard for BU_NAME - if not _NAME_RE.match(name or ""): - raise ValueError(f"invalid BU_NAME {name!r}: must match [A-Za-z0-9_-]{{1,64}}") - return name - - -def _runtime_stem(name): # "bu" when BH_RUNTIME_DIR isolates us, else "bu-" - _check(name) - return "bu" if BH_RUNTIME_DIR else f"bu-{name}" - - -def _tmp_stem(name): # "bu" when BH_TMP_DIR isolates us, else "bu-" - _check(name) - return "bu" if BH_TMP_DIR else f"bu-{name}" - - -def log_path(name): return _TMP / f"{_tmp_stem(name)}.log" -def pid_path(name): return _RUNTIME / f"{_runtime_stem(name)}.pid" -def port_path(name): return _RUNTIME / f"{_runtime_stem(name)}.port" # Windows-only: holds {"port","token"} JSON -def _sock_path(name): return _RUNTIME / f"{_runtime_stem(name)}.sock" - - -def _read_port_file(name): - """(port, token) from the Windows port file, or (None, None) on any failure.""" - try: - d = json.loads(port_path(name).read_text()) - return int(d["port"]), d["token"] - except (FileNotFoundError, ValueError, KeyError, TypeError, OSError): - return None, None - - -def sock_addr(name): # display-only, used in log lines - if not IS_WINDOWS: return str(_sock_path(name)) - port, _ = _read_port_file(name) - return f"127.0.0.1:{port}" if port else f"tcp:{_runtime_stem(name)}" - - -def spawn_kwargs(): # subprocess.Popen flags so the daemon detaches from this terminal - if IS_WINDOWS: - # CREATE_NO_WINDOW: no console window for the daemon. CREATE_NEW_PROCESS_GROUP: - # daemon doesn't receive Ctrl-C/Ctrl-Break sent to the parent terminal, so - # closing that terminal doesn't kill it. DETACHED_PROCESS is intentionally - # omitted: per Win32 docs it overrides CREATE_NO_WINDOW, causing Windows to - # allocate a fresh console for the (still console-subsystem) python.exe. - return {"creationflags": subprocess.CREATE_NEW_PROCESS_GROUP | subprocess.CREATE_NO_WINDOW} - return {"start_new_session": True} - - -def connect(name, timeout=1.0): - """Blocking client. Returns (sock, token); token is None on POSIX, hex string on Windows. - Callers sending JSON requests MUST include the token as req["token"] on Windows.""" - if not IS_WINDOWS: - # uv-Python on Windows lacks socket.AF_UNIX, so this branch must be gated. - s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) - s.settimeout(timeout); s.connect(str(_sock_path(name))); return s, None - port, token = _read_port_file(name) - if port is None: raise FileNotFoundError(str(port_path(name))) - s = socket.create_connection(("127.0.0.1", port), timeout=timeout) - s.settimeout(timeout); return s, token - - -def request(c, token, req): - """One-shot send + recv + parse on an open socket. Injects token on Windows. - Returns the parsed JSON response. Caller closes the socket.""" - if token: req = {**req, "token": token} - c.sendall((json.dumps(req) + "\n").encode()) - data = b"" - while not data.endswith(b"\n"): - chunk = c.recv(1 << 16) - if not chunk: break - data += chunk - return json.loads(data or b"{}") - - -def ping(name, timeout=1.0): - """True iff a live daemon answers our ping. Defends against stale .port files - + port reuse: a bare TCP connect can succeed against an unrelated process that - grabbed the port after our daemon crashed; only our daemon answers {"pong":true}.""" - try: - c, token = connect(name, timeout=timeout) - except (FileNotFoundError, ConnectionRefusedError, TimeoutError, socket.timeout, OSError): - return False - try: - resp = request(c, token, {"meta": "ping"}) - # request() returns parsed JSON, which may be any valid value (a list, - # scalar, etc. from a stale or hostile endpoint). Anything that isn't - # a {pong: true} dict counts as "not our daemon" — never .get() blindly. - return isinstance(resp, dict) and resp.get("pong") is True - except (OSError, ValueError, AttributeError): - return False - finally: - try: c.close() - except OSError: pass - - -def identify(name, timeout=1.0): - """Return the live daemon's PID, or None if unreachable. - - Used by restart_daemon() to signal a process whose identity has been - verified end-to-end (live IPC + self-reported PID), instead of trusting - a pid file whose number may have been reused by an unrelated process.""" - try: - c, token = connect(name, timeout=timeout) - except (FileNotFoundError, ConnectionRefusedError, TimeoutError, socket.timeout, OSError): - return None - try: - resp = request(c, token, {"meta": "ping"}) - # request() returns parsed JSON, which may be any valid value (a list, - # scalar, etc. from a stale or hostile endpoint). Anything that isn't - # a {pong: true} dict gets None — never .get() on a non-dict. - if not isinstance(resp, dict) or resp.get("pong") is not True: - return None - pid = resp.get("pid") - # `type(pid) is int` (not isinstance) intentionally rejects bool: in - # Python, isinstance(True, int) is True, so a hostile/buggy daemon - # could reply with {"pid": True} and we'd treat that as PID 1 (init). - # Also reject 0/negatives — os.kill(0, sig) signals every process in - # the calling process group, os.kill(-1, sig) signals every process - # the caller can. Upper bound is 2**31 because C pid_t is typically - # signed 32-bit and a value outside that range makes os.kill() raise - # OverflowError, which would propagate out of restart_daemon() before - # its cleanup. Linux pid_max is also bounded at 2**22 in practice. - return pid if type(pid) is int and 0 < pid < (1 << 31) else None - except (OSError, ValueError, AttributeError): - return None - finally: - try: c.close() - except OSError: pass - - -async def serve(name, handler): - """Run the server until cancelled. handler(reader, writer) sees the same interface either way.""" - global _server_token - if not IS_WINDOWS: - path = str(_sock_path(name)) - if os.path.exists(path): os.unlink(path) - # umask 0o077 makes bind() create the socket as 0600 — no TOCTOU window before chmod. - old_umask = os.umask(0o077) - try: server = await asyncio.start_unix_server(handler, path=path) - finally: os.umask(old_umask) - _server_token = None - async with server: await asyncio.Event().wait() - return - server = await asyncio.start_server(handler, "127.0.0.1", 0) - port = server.sockets[0].getsockname()[1] - _server_token = secrets.token_hex(32) - pf = port_path(name) - # Atomic write so a concurrent reader never sees a half-written file. - tmp = pf.with_name(pf.name + ".tmp") - tmp.write_text(json.dumps({"port": port, "token": _server_token})) - os.replace(tmp, pf) - try: - async with server: await asyncio.Event().wait() - finally: - try: pf.unlink() - except FileNotFoundError: pass - - -def expected_token(): - """The token the running daemon will accept, or None on POSIX.""" - return _server_token - - -def cleanup_endpoint(name): # best-effort; silent if already gone - p = _sock_path(name) if not IS_WINDOWS else port_path(name) - try: p.unlink() - except FileNotFoundError: pass diff --git a/packages/bcode-browser/harness/src/browser_harness/admin.py b/packages/bcode-browser/harness/src/browser_harness/admin.py deleted file mode 100644 index b105100a86..0000000000 --- a/packages/bcode-browser/harness/src/browser_harness/admin.py +++ /dev/null @@ -1,782 +0,0 @@ -import json -import os -import socket -import subprocess -import sys -import tempfile -import time -import urllib.request -from pathlib import Path - -from . import _ipc as ipc - - -def _process_start_time(pid): - """Opaque process-start-time fingerprint at PID, or None if unavailable. - - Two reads returning the same non-None value mean the PID still refers to - the same process; a different value means the PID was reused. Used by - restart_daemon() to keep the force-kill recovery path working even when - the daemon has already torn down its IPC socket (e.g. during a slow - remote shutdown), without falling back to "trust the pid file" — which - would re-introduce the PID-reuse hazard. - - Linux: /proc//stat field 22 (starttime in clock ticks since boot). - macOS: `ps -o lstart= -p ` (an absolute timestamp string). - Windows: GetProcessTimes via ctypes (FILETIME creation time, 100-ns since 1601). - Anywhere else: returns None; restart_daemon falls back to its strict - identify-only check, which is safer than no check at all. - """ - if type(pid) is not int or pid <= 0: - return None - if sys.platform.startswith("linux"): - try: - with open(f"/proc/{pid}/stat", "rb") as f: - raw = f.read().decode("ascii", errors="replace") - except (FileNotFoundError, PermissionError, OSError): - return None - # Field 2 is `(comm)`; comm can contain spaces and parens, so split off - # everything after the LAST `)` and index from there. - try: - tail = raw[raw.rindex(")") + 2:].split() - return tail[19] # starttime is field 22 (0-indexed: 21 - skipped 2 = 19) - except (ValueError, IndexError): - return None - if sys.platform == "darwin": - try: - out = subprocess.check_output( - ["ps", "-o", "lstart=", "-p", str(pid)], - stderr=subprocess.DEVNULL, timeout=2, - ) - except (subprocess.SubprocessError, OSError): - return None - s = out.decode("ascii", errors="replace").strip() - return s or None - if sys.platform == "win32": - # Windows users running a remote daemon hit the same slow-shutdown - # window as POSIX (stop_remote() PATCHes api.browser-use.com after - # the IPC socket has been torn down). Without a fingerprint here the - # SIGTERM gate can never pass during that window, leaving an orphan - # daemon that may continue to hold a billed cloud browser. Use - # GetProcessTimes via ctypes to read the kernel-reported creation - # time as a 64-bit FILETIME (100-ns intervals since 1601-01-01). - try: - import ctypes - from ctypes import wintypes - except ImportError: - return None - PROCESS_QUERY_LIMITED_INFORMATION = 0x1000 - try: - kernel32 = ctypes.WinDLL("kernel32", use_last_error=True) - kernel32.OpenProcess.argtypes = [wintypes.DWORD, wintypes.BOOL, wintypes.DWORD] - kernel32.OpenProcess.restype = wintypes.HANDLE - kernel32.GetProcessTimes.argtypes = [ - wintypes.HANDLE, - ctypes.POINTER(wintypes.FILETIME), - ctypes.POINTER(wintypes.FILETIME), - ctypes.POINTER(wintypes.FILETIME), - ctypes.POINTER(wintypes.FILETIME), - ] - kernel32.GetProcessTimes.restype = wintypes.BOOL - kernel32.CloseHandle.argtypes = [wintypes.HANDLE] - kernel32.CloseHandle.restype = wintypes.BOOL - except (OSError, AttributeError): - return None - h = kernel32.OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, False, pid) - if not h: - return None - try: - creation = wintypes.FILETIME() - exit_ft = wintypes.FILETIME() - kernel_ft = wintypes.FILETIME() - user_ft = wintypes.FILETIME() - ok = kernel32.GetProcessTimes( - h, ctypes.byref(creation), ctypes.byref(exit_ft), - ctypes.byref(kernel_ft), ctypes.byref(user_ft), - ) - if not ok: - return None - return (creation.dwHighDateTime << 32) | creation.dwLowDateTime - finally: - kernel32.CloseHandle(h) - return None - - -def _load_env(): - repo_root = Path(__file__).resolve().parents[2] - workspace = Path(os.environ.get("BH_AGENT_WORKSPACE", repo_root / "agent-workspace")).expanduser() - for p in (repo_root / ".env", workspace / ".env"): - if not p.exists(): - continue - _load_env_file(p) - - -def _load_env_file(p): - for line in p.read_text().splitlines(): - line = line.strip() - if not line or line.startswith("#") or "=" not in line: - continue - k, v = line.split("=", 1) - os.environ.setdefault(k.strip(), v.strip().strip('"').strip("'")) - - -_load_env() - -NAME = os.environ.get("BU_NAME", "default") -BU_API = "https://api.browser-use.com/api/v3" -GH_RELEASES = "https://api.github.com/repos/browser-use/browser-harness/releases/latest" -VERSION_CACHE = Path(tempfile.gettempdir()) / "bu-version-cache.json" -VERSION_CACHE_TTL = 24 * 3600 -DOCTOR_TEXT_LIMIT = 140 - - -def _log_tail(name): - try: - return ipc.log_path(name or NAME).read_text().strip().splitlines()[-1] - except (FileNotFoundError, IndexError): - return None - - -def _needs_chrome_remote_debugging_prompt(msg): - """True when Chrome needs the inspect-page permission/profile flow.""" - lower = (msg or "").lower() - return ( - "devtoolsactiveport not found" in lower - or "enable chrome://inspect" in lower - or "not live yet" in lower - or ( - "ws handshake failed" in lower - and ( - "403" in lower - or "opening handshake" in lower - or "timed out" in lower - or "timeout" in lower - ) - ) - ) - - -def _is_local_chrome_mode(env=None): - """True when the daemon discovers a local Chrome instead of a remote CDP WS.""" - return not (env or {}).get("BU_CDP_WS") and not os.environ.get("BU_CDP_WS") - - -def daemon_alive(name=None): - # Ping handshake (not a bare connect) so a stale .port file + port reuse - # after a daemon crash doesn't make us mistake an unrelated listener for ours. - return ipc.ping(name or NAME, timeout=1.0) - - -def _daemon_endpoint_names(): - # BH_RUNTIME_DIR isolates one daemon per dir → no filename-prefix discovery, - # just check whether our local endpoint exists. Without BH_RUNTIME_DIR, - # _RUNTIME is the shared default (`/tmp` etc.) and we glob `bu-*.` - # to find every daemon on the machine. - suffix = ".port" if ipc.IS_WINDOWS else ".sock" - if ipc.BH_RUNTIME_DIR: - return [NAME] if (ipc._RUNTIME / f"bu{suffix}").exists() else [] - names = [] - for p in sorted(ipc._RUNTIME.glob(f"bu-*{suffix}")): - raw = p.name[3:-len(suffix)] - try: - ipc._check(raw) - except ValueError: - continue - names.append(raw) - return names - - -def _daemon_browser_connection(name): - c = None - try: - c, token = ipc.connect(name, timeout=1.0) - response = ipc.request(c, token, {"meta": "connection_status"}) - if "error" in response: - return None - page = response.get("page") - if page: - page = {"title": page.get("title") or "(untitled)", "url": page.get("url") or ""} - return {"name": name, "page": page} - except (FileNotFoundError, ConnectionRefusedError, TimeoutError, socket.timeout, OSError, KeyError, ValueError, json.JSONDecodeError): - return None - finally: - if c: - c.close() - - -def browser_connections(): - """Live browser-harness daemons with healthy CDP browser connections and their attached page.""" - out = [] - for name in _daemon_endpoint_names(): - conn = _daemon_browser_connection(name) - if conn: - out.append(conn) - return out - - -def active_browser_connections(): - """Count live browser-harness daemons with a healthy CDP browser connection.""" - return len(browser_connections()) - - -def _doctor_short_text(value, limit=None): - limit = limit or DOCTOR_TEXT_LIMIT - value = str(value) - return value if len(value) <= limit else value[:limit - 3] + "..." - - -def ensure_daemon(wait=60.0, name=None, env=None): - """Idempotent. Self-heals stale daemon, cold Chrome, and missing Allow on chrome://inspect.""" - if daemon_alive(name): - # Stale daemons accept connects AND reply to meta:* (pure Python) even when the - # CDP WS to Chrome is dead — probe with a real CDP call and require "result". - # Must go through ipc.connect so this works on Windows (TCP loopback) too; - # raw AF_UNIX here would fail on every warm call and churn the daemon. - try: - s, token = ipc.connect(name or NAME, timeout=3.0) - resp = ipc.request(s, token, {"method": "Target.getTargets", "params": {}}) - if "result" in resp: return - except Exception: pass - restart_daemon(name) - - import subprocess, sys - local = _is_local_chrome_mode(env) - for attempt in (0, 1): - e = {**os.environ, **({"BU_NAME": name} if name else {}), **(env or {})} - p = subprocess.Popen( - [sys.executable, "-m", "browser_harness.daemon"], - env=e, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, **ipc.spawn_kwargs(), - ) - deadline = time.time() + wait - while time.time() < deadline: - if daemon_alive(name): return - if p.poll() is not None: break - time.sleep(0.2) - msg = _log_tail(name) or "" - if local and attempt == 0 and _needs_chrome_remote_debugging_prompt(msg): - _open_chrome_inspect() - print('browser-harness: at chrome://inspect/#remote-debugging, tick "Allow remote debugging for this browser instance" and click Allow on the popup that appears', file=sys.stderr) - restart_daemon(name) - continue - raise RuntimeError(msg or f"daemon {name or NAME} didn't come up -- check {ipc.log_path(name or NAME)}") - - -def stop_remote_daemon(name="remote"): - """Stop a remote daemon and its backing Browser Use cloud browser. - - Triggers the daemon's clean shutdown, which PATCHes - /browsers/{id} {"action":"stop"} so billing ends and any profile - state in the session is persisted.""" - # restart_daemon is misnamed — it only stops the daemon (sends - # shutdown, SIGTERMs if needed, unlinks socket+pid). It never - # restarts anything on its own; a follow-up `browser-harness` - # call would auto-spawn a fresh one via ensure_daemon(). That - # "run-it-again-to-restart" workflow is why it was named that way. - restart_daemon(name) - - -def restart_daemon(name=None): - """Best-effort daemon shutdown + socket/pid cleanup. - - Name is historical: callers typically follow this with another - `browser-harness` invocation, which auto-spawns a fresh daemon via - ensure_daemon(). The function itself only stops. - - Identity is verified via ipc.identify() before any process signal, so - a stale pid file whose number has been reused by an unrelated process - is never SIGTERM'd. If the daemon is unreachable, we just clean up the - pid file and socket and return — never escalate to a kill-by-pid-file. - """ - import signal - - name = name or NAME - pid_path = str(ipc.pid_path(name)) - - # Two pieces of information are tracked separately: - # - daemon_pid: the daemon's self-reported PID, or None. Only daemons - # running this version (or newer) include `pid` in the ping response; - # pre-upgrade daemons return {pong: True} only and yield None here. - # - daemon_alive: whether ANY daemon answers ping. Keeps the shutdown - # IPC path working across upgrades — without it, a still-running - # pre-upgrade daemon would have its socket deleted out from under it - # while the process stayed alive. - daemon_pid = ipc.identify(name, timeout=5.0) - daemon_alive = daemon_pid is not None or ipc.ping(name, timeout=1.0) - # Snapshot the daemon's process start-time as a secondary identity check. - # The IPC socket can disappear before the process exits (e.g. the shutdown - # path tears down the socket and then waits on a slow remote `stop` PATCH), - # so identify() going None partway through is not proof of process death. - # Comparing start-time before SIGTERM lets us recover the original - # force-kill behavior for slow shutdowns without re-opening the - # PID-reuse hole — a reused PID would have a different start-time. - daemon_start = _process_start_time(daemon_pid) - - if daemon_alive: - try: - c, token = ipc.connect(name, timeout=5.0) - ipc.request(c, token, {"meta": "shutdown"}) - c.close() - except Exception: - pass - - if daemon_pid is not None: - for _ in range(75): - try: - os.kill(daemon_pid, 0) - time.sleep(0.2) - except (ProcessLookupError, OSError, SystemError, OverflowError): - break - else: - # Re-verify identity before escalating to SIGTERM. Two acceptable - # signals, in priority order: - # 1. ipc.identify() still returns the same PID — daemon's IPC is - # live, daemon is wedged. Safe to kill. - # 2. start-time fingerprint of the original PID is unchanged — - # same process, just slow to exit (e.g. stuck in remote stop). - # The IPC may already be gone; that's expected. - # If neither holds, the PID may have been reused; skip SIGTERM. - verified_pid = ipc.identify(name, timeout=1.0) - same_process = verified_pid == daemon_pid or ( - daemon_start is not None - and _process_start_time(daemon_pid) == daemon_start - ) - if same_process: - try: - os.kill(daemon_pid, signal.SIGTERM) - except (ProcessLookupError, OSError, SystemError, OverflowError): - pass - - ipc.cleanup_endpoint(name) - try: - os.unlink(pid_path) - except FileNotFoundError: - pass - - -def _browser_use(path, method, body=None): - key = os.environ.get("BROWSER_USE_API_KEY") - if not key: - raise RuntimeError("BROWSER_USE_API_KEY missing -- see .env.example") - req = urllib.request.Request( - f"{BU_API}{path}", - method=method, - data=(json.dumps(body).encode() if body is not None else None), - headers={"X-Browser-Use-API-Key": key, "Content-Type": "application/json"}, - ) - return json.loads(urllib.request.urlopen(req, timeout=60).read() or b"{}") - - -def _stop_cloud_browser(browser_id): - if not browser_id: - return - try: - _browser_use(f"/browsers/{browser_id}", "PATCH", {"action": "stop"}) - except BaseException: - pass - - -def _cdp_ws_from_url(cdp_url): - return json.loads(urllib.request.urlopen(f"{cdp_url}/json/version", timeout=15).read())["webSocketDebuggerUrl"] - - -def _has_local_gui(): - """True when this machine plausibly has a browser we can open. False on headless servers.""" - import platform - system = platform.system() - if system in ("Darwin", "Windows"): - return True - if system == "Linux": - return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY")) - return False - - -def _show_live_url(url): - """Print liveUrl and auto-open it locally if there's a GUI.""" - import sys, webbrowser - if not url: return - print(url) - if not _has_local_gui(): - print("(no local GUI — share the liveUrl with the user)", file=sys.stderr) - return - try: - webbrowser.open(url, new=2) - print("(opened liveUrl in your default browser)", file=sys.stderr) - except Exception as e: - print(f"(couldn't auto-open: {e} — share the liveUrl with the user)", file=sys.stderr) - - -def list_cloud_profiles(): - """List cloud profiles under the current API key. - - Returns [{id, name, userId, cookieDomains, lastUsedAt}, ...]. `cookieDomains` - is the array of domain strings the cloud profile has cookies for — use - `len(cookieDomains)` as a cheap 'how much is logged in' summary. Per-cookie - detail on a *local* profile before sync: `profile-use inspect --profile `. - - Paginates through all pages — the API caps `pageSize` at 100.""" - out, page = [], 1 - while True: - listing = _browser_use(f"/profiles?pageSize=100&pageNumber={page}", "GET") - items = listing.get("items") if isinstance(listing, dict) else listing - if not items: - break - for p in items: - detail = _browser_use(f"/profiles/{p['id']}", "GET") - out.append({ - "id": detail["id"], - "name": detail.get("name"), - "userId": detail.get("userId"), - "cookieDomains": detail.get("cookieDomains") or [], - "lastUsedAt": detail.get("lastUsedAt"), - }) - if isinstance(listing, dict) and len(out) >= listing.get("totalItems", len(out)): - break - page += 1 - return out - - -def _resolve_profile_name(profile_name): - """Find a single cloud profile by exact name; raise if 0 or >1 match.""" - matches = [p for p in list_cloud_profiles() if p.get("name") == profile_name] - if not matches: - raise RuntimeError(f"no cloud profile named {profile_name!r} -- call list_cloud_profiles() or sync_local_profile() first") - if len(matches) > 1: - raise RuntimeError(f"{len(matches)} cloud profiles named {profile_name!r} -- pass profileId= instead") - return matches[0]["id"] - - -def start_remote_daemon(name="remote", profileName=None, **create_kwargs): - """Provision a Browser Use cloud browser and start a daemon attached to it. - - kwargs forwarded to `POST /browsers` (camelCase): - profileId — cloud profile UUID; start already-logged-in. Default: none (clean browser). - profileName — cloud profile name; resolved client-side to profileId via list_cloud_profiles(). - proxyCountryCode — ISO2 country code (default "us"); pass None to disable the BU proxy. - timeout — minutes, 1..240. - customProxy — {host, port, username, password, ignoreCertErrors}. - browserScreenWidth / browserScreenHeight, allowResizing, enableRecording. - - Returns the full browser dict including `liveUrl`. Prints the liveUrl and - auto-opens it locally when a GUI is detected, so the user can watch along.""" - if daemon_alive(name): - raise RuntimeError(f"daemon {name!r} already alive -- restart_daemon({name!r}) first") - if profileName: - if "profileId" in create_kwargs: - raise RuntimeError("pass profileName OR profileId, not both") - create_kwargs["profileId"] = _resolve_profile_name(profileName) - browser = _browser_use("/browsers", "POST", create_kwargs) - try: - ensure_daemon( - name=name, - env={"BU_CDP_WS": _cdp_ws_from_url(browser["cdpUrl"]), "BU_BROWSER_ID": browser["id"]}, - ) - except BaseException: - _stop_cloud_browser(browser.get("id")) - raise - _show_live_url(browser.get("liveUrl")) - return browser - - -def list_local_profiles(): - """Detected local browser profiles on this machine. Shells out to `profile-use list --json`. - Returns [{BrowserName, BrowserPath, ProfileName, ProfilePath, DisplayName}, ...]. - Requires `profile-use` (see interaction-skills/profile-sync.md for install).""" - import json, shutil, subprocess - if not shutil.which("profile-use"): - raise RuntimeError("profile-use not installed -- curl -fsSL https://browser-use.com/profile.sh | sh") - return json.loads(subprocess.check_output(["profile-use", "list", "--json"], text=True)) - - -def sync_local_profile(profile_name, browser=None, cloud_profile_id=None, - include_domains=None, exclude_domains=None): - """Sync a local profile's cookies to a cloud profile. Returns the cloud UUID. - - Shells out to `profile-use sync` (v1.0.5+). Requires BROWSER_USE_API_KEY. - profile-use copies the profile dir to a temp and syncs from the copy, so Chrome - can stay open. - - Args: - profile_name: local Chrome profile name (as shown by `list_local_profiles`). - browser: disambiguate when multiple browsers have profiles of the - same name (e.g. "Google Chrome"). Default: any match. - cloud_profile_id: push cookies into this existing cloud profile instead of - creating a new one. Idempotent — call again to refresh - the same profile. Default: create new. - include_domains: only sync cookies for these domains (and subdomains). - Leading dot is optional. Example: ["google.com", "stripe.com"]. - exclude_domains: drop cookies for these domains (and subdomains). Applied - before `include_domains` so exclude wins on overlap.""" - import os, re, shutil, subprocess, sys - if not shutil.which("profile-use"): - raise RuntimeError("profile-use not installed -- curl -fsSL https://browser-use.com/profile.sh | sh") - if not os.environ.get("BROWSER_USE_API_KEY"): - raise RuntimeError("BROWSER_USE_API_KEY missing") - cmd = ["profile-use", "sync", "--profile", profile_name] - if browser: - cmd += ["--browser", browser] - if cloud_profile_id: - cmd += ["--cloud-profile-id", cloud_profile_id] - for d in include_domains or []: - cmd += ["--domain", d] - for d in exclude_domains or []: - cmd += ["--exclude-domain", d] - r = subprocess.run(cmd, text=True, capture_output=True) - sys.stdout.write(r.stdout) - sys.stderr.write(r.stderr) - if r.returncode != 0: - raise RuntimeError(f"profile-use sync failed (exit {r.returncode})") - # With --cloud-profile-id the tool prints "♻️ Using existing cloud profile" - # instead of "Profile created: ", so we already know the UUID. - if cloud_profile_id: - return cloud_profile_id - m = re.search(r"Profile created:\s+([0-9a-f-]{36})", r.stdout) - if not m: - raise RuntimeError(f"profile-use did not report a profile UUID (exit {r.returncode})") - return m.group(1) - - -def _version(): - """Installed version of the browser-harness package. Empty string if unknown.""" - try: - from importlib.metadata import PackageNotFoundError, version - try: - return version("browser-harness") - except PackageNotFoundError: - return "" - except Exception: - return "" - - -def _repo_dir(): - """Return the repo root if this install is an editable git clone, else None.""" - for p in Path(__file__).resolve().parents: - if (p / ".git").is_dir(): - return p - return None - - -def _install_mode(): - """"git" for editable clone, "pypi" for an installed wheel, "unknown" otherwise.""" - if _repo_dir(): - return "git" - return "pypi" if _version() else "unknown" - - -def _cache_read(): - try: - return json.loads(VERSION_CACHE.read_text()) - except (FileNotFoundError, ValueError): - return {} - - -def _cache_write(data): - try: - VERSION_CACHE.write_text(json.dumps(data)) - except OSError: - pass - - -def _latest_release_tag(force=False): - """Return latest release tag from GitHub, or None. Cached for 24h to avoid hammering the API.""" - cache = _cache_read() - now = time.time() - if not force and cache.get("tag") and now - cache.get("fetched_at", 0) < VERSION_CACHE_TTL: - return cache["tag"] - try: - req = urllib.request.Request(GH_RELEASES, headers={"Accept": "application/vnd.github+json"}) - tag = json.loads(urllib.request.urlopen(req, timeout=5).read()).get("tag_name") or "" - except Exception: - return cache.get("tag") # fall back to last known - tag = tag.lstrip("v") - _cache_write({**cache, "tag": tag, "fetched_at": now}) - return tag or None - - -def _version_tuple(v): - """Best-effort semver parse. Non-numeric components sort as 0, so pre-releases may not rank perfectly.""" - parts = [] - for s in (v or "").split("."): - m = "" - for ch in s: - if ch.isdigit(): - m += ch - else: - break - parts.append(int(m) if m else 0) - return tuple(parts) - - -def check_for_update(): - """(current, latest, newer_available). latest may be None if the API was unreachable and no cache exists.""" - cur = _version() - latest = _latest_release_tag() - newer = bool(cur and latest and _version_tuple(latest) > _version_tuple(cur)) - return cur, latest, newer - - -def print_update_banner(out=None): - """Print the update banner to stderr once per day. Silent when up-to-date or offline.""" - import sys - out = out or sys.stderr - cache = _cache_read() - today = time.strftime("%Y-%m-%d") - if cache.get("banner_shown_on") == today: - return - cur, latest, newer = check_for_update() - if not newer: - return - print(f"[browser-harness] update available: {cur} -> {latest}", file=out) - print(f"[browser-harness] agents: run `browser-harness --update -y` to upgrade and restart the daemon", file=out) - _cache_write({**cache, "banner_shown_on": today}) - - -def _chrome_running(): - """Cross-platform best-effort check for a running Chrome/Edge process.""" - import platform, subprocess - system = platform.system() - try: - if system == "Windows": - out = subprocess.check_output(["tasklist"], text=True, timeout=5) - names = ("chrome.exe", "msedge.exe") - else: - out = subprocess.check_output(["ps", "-A", "-o", "comm="], text=True, timeout=5) - names = ("Google Chrome", "chrome", "chromium", "Microsoft Edge", "msedge") - return any(n.lower() in out.lower() for n in names) - except Exception: - return False - - -def _open_chrome_inspect(): - """Open chrome://inspect/#remote-debugging so the user can tick the checkbox.""" - import platform, subprocess, webbrowser - url = "chrome://inspect/#remote-debugging" - if platform.system() == "Darwin": - try: - subprocess.run([ - "osascript", - "-e", 'tell application "Google Chrome" to activate', - "-e", f'tell application "Google Chrome" to open location "{url}"', - ], timeout=5, check=False) - return - except Exception: - pass - try: - webbrowser.open(url, new=2) - except Exception: - pass - - -def run_doctor(): - """Read-only diagnostics. Exit 0 iff everything looks healthy.""" - import platform, shutil, sys - cur = _version() - mode = _install_mode() - chrome = _chrome_running() - daemon = daemon_alive() - connections = browser_connections() - profile_use = shutil.which("profile-use") is not None - api_key = bool(os.environ.get("BROWSER_USE_API_KEY")) - latest = _latest_release_tag() - # Only claim an update when we know the installed version — `cur or "(unknown)"` - # for display would otherwise be parsed as (0,) and flag every latest as newer. - newer = bool(cur and latest and _version_tuple(latest) > _version_tuple(cur)) - cur_display = cur or "(unknown)" - - def row(label, ok, detail=""): - mark = "ok " if ok else "FAIL" - print(f" [{mark}] {label}{(' — ' + detail) if detail else ''}") - - print("browser-harness doctor") - print(f" platform {platform.system()} {platform.release()}") - print(f" python {sys.version.split()[0]}") - print(f" version {cur_display} ({mode})") - if latest: - print(f" latest release {latest}" + (" (update available)" if newer else "")) - else: - print(" latest release (could not reach github)") - row("chrome running", chrome, "" if chrome else "start chrome/edge") - row("daemon alive", daemon, "" if daemon else "see install.md") - row("active browser connections", bool(connections), str(len(connections))) - for conn in connections: - page = conn.get("page") - if page: - title = _doctor_short_text(page["title"]) - url = _doctor_short_text(page["url"]) - print(f" {conn['name']} — active page: {title} — {url}") - else: - print(f" {conn['name']} — active page: (no real page)") - row("profile-use installed", profile_use, "" if profile_use else "optional: curl -fsSL https://browser-use.com/profile.sh | sh") - row("BROWSER_USE_API_KEY set", api_key, "" if api_key else "optional: needed only for cloud browsers / profile sync") - # Core health = chrome + daemon. Profile-use/api-key are optional. - return 0 if (chrome and daemon) else 1 - - -def _prompt_yes(question, default_yes=True, yes=False): - if yes: - return True - suffix = "[Y/n]" if default_yes else "[y/N]" - try: - ans = input(f"{question} {suffix} ").strip().lower() - except EOFError: - return default_yes - if not ans: - return default_yes - return ans.startswith("y") - - -def run_update(yes=False): - """Pull the latest version and (after prompt) restart the daemon so it picks up changed code. - - Exit 0 on success, non-zero on failure.""" - import subprocess, sys - cur, latest, newer = check_for_update() - # Only short-circuit as "up to date" when we actually know the installed - # version. Otherwise `newer=False` just means "couldn't compare" — proceed. - if cur and latest and not newer: - print(f"browser-harness is up to date ({cur}).") - return 0 - if cur and latest: - print(f"updating browser-harness: {cur} -> {latest}") - elif latest: - print(f"installed version unknown; will try to update to {latest}.") - else: - print("could not reach github; will try to update anyway.") - - mode = _install_mode() - if mode == "git": - repo = _repo_dir() - status = subprocess.run(["git", "-C", str(repo), "status", "--porcelain"], capture_output=True, text=True) - if status.returncode != 0: - print(f"git status failed: {status.stderr.strip()}", file=sys.stderr) - return 1 - if status.stdout.strip(): - print(f"refusing to update: uncommitted changes in {repo}", file=sys.stderr) - print("commit or stash them first, or run `git -C %s pull` yourself." % repo, file=sys.stderr) - return 1 - r = subprocess.run(["git", "-C", str(repo), "pull", "--ff-only"]) - if r.returncode != 0: - return r.returncode - elif mode == "pypi": - tool_upgrade = subprocess.run(["uv", "tool", "upgrade", "browser-harness"]) - if tool_upgrade.returncode != 0: - # Fall back to pip in case this wasn't a `uv tool install`. - pip = subprocess.run([sys.executable, "-m", "pip", "install", "--upgrade", "browser-harness"]) - if pip.returncode != 0: - return pip.returncode - else: - print("unknown install mode; can't auto-update.", file=sys.stderr) - return 1 - - # Invalidate banner/tag cache so the new version doesn't keep nagging. - cache = _cache_read() - cache.pop("banner_shown_on", None) - _cache_write(cache) - - if daemon_alive(): - if _prompt_yes("restart the running daemon so it picks up the new code?", default_yes=True, yes=yes): - restart_daemon() - print("daemon stopped; it will auto-restart on next `browser-harness` call.") - else: - print("daemon left running on old code. run `browser-harness` and it'll use the new code after the daemon recycles.") - print("update complete.") - return 0 diff --git a/packages/bcode-browser/harness/src/browser_harness/daemon.py b/packages/bcode-browser/harness/src/browser_harness/daemon.py deleted file mode 100644 index 077183e736..0000000000 --- a/packages/bcode-browser/harness/src/browser_harness/daemon.py +++ /dev/null @@ -1,420 +0,0 @@ -"""CDP WS holder + IPC relay (Unix socket on POSIX, TCP loopback on Windows). One daemon per BU_NAME.""" -import asyncio, json, os, socket, sys, time, urllib.error, urllib.request -from urllib.parse import urlparse -from collections import deque -from pathlib import Path - -from . import _ipc as ipc -from cdp_use.client import CDPClient - - -def _load_env(): - repo_root = Path(__file__).resolve().parents[2] - workspace = Path(os.environ.get("BH_AGENT_WORKSPACE", repo_root / "agent-workspace")).expanduser() - for p in (repo_root / ".env", workspace / ".env"): - if not p.exists(): - continue - _load_env_file(p) - - -def _load_env_file(p): - for line in p.read_text().splitlines(): - line = line.strip() - if not line or line.startswith("#") or "=" not in line: - continue - k, v = line.split("=", 1) - os.environ.setdefault(k.strip(), v.strip().strip('"').strip("'")) - - -_load_env() - -NAME = os.environ.get("BU_NAME", "default") -SOCK = ipc.sock_addr(NAME) -LOG = str(ipc.log_path(NAME)) -PID = str(ipc.pid_path(NAME)) -BUF = 500 -PROFILES = [ - Path.home() / "Library/Application Support/Google/Chrome", - Path.home() / "Library/Application Support/Google/Chrome Canary", - Path.home() / "Library/Application Support/Comet", - Path.home() / "Library/Application Support/Arc/User Data", - Path.home() / "Library/Application Support/Dia/User Data", - Path.home() / "Library/Application Support/Microsoft Edge", - Path.home() / "Library/Application Support/Microsoft Edge Beta", - Path.home() / "Library/Application Support/Microsoft Edge Dev", - Path.home() / "Library/Application Support/Microsoft Edge Canary", - Path.home() / "Library/Application Support/BraveSoftware/Brave-Browser", - Path.home() / ".config/google-chrome", - Path.home() / ".config/chromium", - Path.home() / ".config/chromium-browser", - Path.home() / ".config/microsoft-edge", - Path.home() / ".config/microsoft-edge-beta", - Path.home() / ".config/microsoft-edge-dev", - Path.home() / ".var/app/org.chromium.Chromium/config/chromium", - Path.home() / ".var/app/com.google.Chrome/config/google-chrome", - Path.home() / ".var/app/com.brave.Browser/config/BraveSoftware/Brave-Browser", - Path.home() / ".var/app/com.microsoft.Edge/config/microsoft-edge", - Path.home() / "AppData/Local/Google/Chrome/User Data", - Path.home() / "AppData/Local/Google/Chrome SxS/User Data", - Path.home() / "AppData/Local/Chromium/User Data", - Path.home() / "AppData/Local/Microsoft/Edge/User Data", - Path.home() / "AppData/Local/Microsoft/Edge Beta/User Data", - Path.home() / "AppData/Local/Microsoft/Edge Dev/User Data", - Path.home() / "AppData/Local/Microsoft/Edge SxS/User Data", - Path.home() / "AppData/Local/BraveSoftware/Brave-Browser/User Data", -] -INTERNAL = ("chrome://", "chrome-untrusted://", "devtools://", "chrome-extension://", "about:") -BU_API = "https://api.browser-use.com/api/v3" -REMOTE_ID = os.environ.get("BU_BROWSER_ID") -API_KEY = os.environ.get("BROWSER_USE_API_KEY") - - -def log(msg): - open(LOG, "a").write(f"{msg}\n") - - -async def _silent(coro): - try: - await coro - except Exception: - pass - - -def _ws_from_devtools_active_port(http_url: str) -> str | None: - """When /json/version returns 404 (Chrome 147+ default profile), match DevToolsActivePort by port.""" - p = urlparse(http_url) - want_port = str(p.port) if p.port else "" - if not want_port: - return None - host = p.hostname or "127.0.0.1" - if ":" in host: # urlparse strips IPv6 brackets; restore them for the ws:// URL - host = f"[{host}]" - for base in PROFILES: - try: - active = (base / "DevToolsActivePort").read_text().splitlines() - except (FileNotFoundError, NotADirectoryError): - continue - port = active[0].strip() if active else "" - ws_path = active[1].strip() if len(active) > 1 else "" - if port == want_port and ws_path: - return f"ws://{host}:{port}{ws_path}" - return None - - -def get_ws_url(): - if url := os.environ.get("BU_CDP_WS"): - return url - if url := os.environ.get("BU_CDP_URL"): - # HTTP DevTools endpoint (e.g. http://127.0.0.1:9333) — resolve to ws via /json/version. - # Use this for a dedicated automation Chrome on a non-default profile, which avoids the - # M144 "Allow remote debugging" dialog and the M136 default-profile lockdown. - deadline = time.time() + 30 - last_err = None - base_url = url.rstrip("/") - while time.time() < deadline: - try: - return json.loads(urllib.request.urlopen(f"{base_url}/json/version", timeout=5).read())["webSocketDebuggerUrl"] - except urllib.error.HTTPError as e: - last_err = e - if e.code == 404 and (ws := _ws_from_devtools_active_port(url)): - return ws - time.sleep(1) - except Exception as e: - last_err = e - time.sleep(1) - raise RuntimeError(f"BU_CDP_URL={url} unreachable after 30s: {last_err} -- is the dedicated automation Chrome running?") - for base in PROFILES: - try: - active = (base / "DevToolsActivePort").read_text().splitlines() - except (FileNotFoundError, NotADirectoryError): - continue - port = active[0].strip() if active else "" - ws_path = active[1].strip() if len(active) > 1 else "" - if not port: - continue - # Resolve the live WS URL via /json/version instead of trusting the path stored - # alongside the port in DevToolsActivePort: if Chrome was previously launched - # with a different --user-data-dir on the same port, that file is left behind - # with a stale browser UUID and the WS upgrade returns 404. - deadline = time.time() + 30 - while time.time() < deadline: - try: - return json.loads(urllib.request.urlopen(f"http://127.0.0.1:{port}/json/version", timeout=1).read())["webSocketDebuggerUrl"] - except urllib.error.HTTPError as e: - # Chrome 147+ disables /json/* HTTP discovery on the default user-data-dir; - # the ws path Chrome wrote to DevToolsActivePort still works. - if e.code == 404 and ws_path: - return f"ws://127.0.0.1:{port}{ws_path}" - time.sleep(1) - except (OSError, KeyError, ValueError): - time.sleep(1) - raise RuntimeError( - f"Chrome's remote-debugging page is open, but DevTools is not live yet on 127.0.0.1:{port} — if Chrome opened a profile picker, choose your normal profile first, then tick the checkbox and click Allow if shown" - ) - for probe_port in (9222, 9223): - try: - with urllib.request.urlopen(f"http://127.0.0.1:{probe_port}/json/version", timeout=1) as r: - return json.loads(r.read())["webSocketDebuggerUrl"] - except (OSError, KeyError, ValueError): - continue - raise RuntimeError(f"DevToolsActivePort not found in {[str(p) for p in PROFILES]} — enable chrome://inspect/#remote-debugging, or set BU_CDP_WS for a remote browser") - - -def stop_remote(): - if not REMOTE_ID or not API_KEY: return - try: - req = urllib.request.Request( - f"{BU_API}/browsers/{REMOTE_ID}", - data=json.dumps({"action": "stop"}).encode(), - method="PATCH", - headers={"X-Browser-Use-API-Key": API_KEY, "Content-Type": "application/json"}, - ) - urllib.request.urlopen(req, timeout=15).read() - log(f"stopped remote browser {REMOTE_ID}") - except Exception as e: - log(f"stop_remote failed ({REMOTE_ID}): {e}") - - -def is_real_page(t): - return t["type"] == "page" and not t.get("url", "").startswith(INTERNAL) - - -class Daemon: - def __init__(self): - self.cdp = None - self.session = None - self.target_id = None - self.events = deque(maxlen=BUF) - self.dialog = None - self.stop = None # asyncio.Event, set inside start() - - async def attach_first_page(self): - """Attach to a real page (or any page). Sets self.session. Returns attached target or None.""" - targets = (await self.cdp.send_raw("Target.getTargets"))["targetInfos"] - pages = [t for t in targets if is_real_page(t)] - if not pages: - # No real pages — create one instead of attaching to omnibox popup - tid = (await self.cdp.send_raw("Target.createTarget", {"url": "about:blank"}))["targetId"] - log(f"no real pages found, created about:blank ({tid})") - pages = [{"targetId": tid, "url": "about:blank", "type": "page"}] - self.session = (await self.cdp.send_raw( - "Target.attachToTarget", {"targetId": pages[0]["targetId"], "flatten": True} - ))["sessionId"] - self.target_id = pages[0]["targetId"] - log(f"attached {pages[0]['targetId']} ({pages[0].get('url','')[:80]}) session={self.session}") - await self._enable_default_domains(self.session) - return pages[0] - - async def _enable_default_domains(self, session_id): - """Enable Page/DOM/Runtime/Network on a CDP session. - - Used by both initial attach and set_session (called after switch_tab/ - new_tab). Without this, helpers that depend on Network.* events — - notably wait_for_network_idle() — silently stop receiving events - after a tab switch, because each fresh CDP session starts with all - domains disabled. - - Runs the four enables in parallel via gather so the worst-case time is - bounded by a single CDP round trip rather than four sequential ones — - important on the set_session path, where the helper's IPC socket has - a 5s read timeout. - """ - async def enable_one(d): - try: - await asyncio.wait_for( - self.cdp.send_raw(f"{d}.enable", session_id=session_id), - timeout=4, - ) - except Exception as e: - log(f"enable {d} on {session_id}: {e}") - await asyncio.gather(*(enable_one(d) for d in ("Page", "DOM", "Runtime", "Network"))) - - async def start(self): - self.stop = asyncio.Event() - url = get_ws_url() - log(f"connecting to {url}") - self.cdp = CDPClient(url) - try: - await self.cdp.start() - except Exception as e: - if os.environ.get("BU_CDP_WS"): - raise RuntimeError( - f"CDP WS handshake failed: {e} -- remote browser WebSocket connection failed. " - "This can happen when network policy blocks the connection, the WS URL is wrong or expired, or the remote endpoint is down. " - "If you use Browser Use cloud, verify BROWSER_USE_API_KEY and get a fresh URL via start_remote_daemon()." - ) - raise RuntimeError(f"CDP WS handshake failed: {e} -- click Allow in Chrome if prompted, then retry") - await self.attach_first_page() - orig = self.cdp._event_registry.handle_event - mark_js = "if(!document.title.startsWith('\U0001F7E2'))document.title='\U0001F7E2 '+document.title" - async def tap(method, params, session_id=None): - self.events.append({"method": method, "params": params, "session_id": session_id}) - if method == "Page.javascriptDialogOpening": - self.dialog = params - elif method == "Page.javascriptDialogClosed": - self.dialog = None - elif method in ("Page.loadEventFired", "Page.domContentEventFired"): - asyncio.create_task(_silent(asyncio.wait_for(self.cdp.send_raw("Runtime.evaluate", {"expression": mark_js}, session_id=self.session), timeout=2))) - return await orig(method, params, session_id) - self.cdp._event_registry.handle_event = tap - - async def handle(self, req): - # Token guard for Windows TCP loopback: any local process can otherwise - # connect and issue CDP commands. expected_token() is None on POSIX so - # this check is a no-op there (AF_UNIX + chmod 600 is the boundary). - expected = ipc.expected_token() - if expected is not None and req.get("token") != expected: - return {"error": "unauthorized"} - meta = req.get("meta") - # Liveness probe — lets clients confirm the listener is actually this - # daemon and not an unrelated process that reused our port post-crash. - # `pid` lets restart_daemon() verify the live daemon's identity before - # signaling — protects against SIGTERM-by-stale-pid-file after PID reuse. - if meta == "ping": return {"pong": True, "pid": os.getpid()} - if meta == "drain_events": - out = list(self.events); self.events.clear() - return {"events": out} - if meta == "session": return {"session_id": self.session} - if meta == "current_tab": - # Resolve the attached page's target info server-side. Helpers can't - # send Target.getTargetInfo themselves: daemon strips session_id for - # any Target.* method (browser-level call), and without a targetId - # Chrome silently returns the *browser* target. - if not self.target_id: - return {"error": "not_attached"} - try: - info = (await self.cdp.send_raw("Target.getTargetInfo", {"targetId": self.target_id}))["targetInfo"] - except Exception: - return {"error": "cdp_disconnected"} - return {"targetId": info.get("targetId"), "url": info.get("url", ""), "title": info.get("title", "")} - if meta == "connection_status": - if not self.target_id: - return {"error": "not_attached"} - try: - info = (await self.cdp.send_raw("Target.getTargetInfo", {"targetId": self.target_id}))["targetInfo"] - except Exception: - return {"error": "cdp_disconnected"} - page = None - if is_real_page(info): - page = { - "targetId": info.get("targetId"), - "title": info.get("title") or "(untitled)", - "url": info.get("url") or "", - } - return {"target_id": self.target_id, "session_id": self.session, "page": page} - if meta == "set_session": - old_session = self.session - self.session = req.get("session_id") - self.target_id = req.get("target_id") or self.target_id - # Run the old-session Network.disable (defense in depth — keeps - # background-tab traffic out of the global event buffer; the - # consumer-side filter in wait_for_network_idle is the actual - # correctness gate) in parallel with the four enables on the new - # session. Different sessions, independent CDP requests. Keeps - # the synchronous reply under the helper's 5s IPC read timeout - # even on a remote daemon — sequentially these would have stacked - # to ~22s worst case. - tasks = [] - if old_session and old_session != self.session: - async def disable_old(): - try: - await asyncio.wait_for( - self.cdp.send_raw("Network.disable", session_id=old_session), - timeout=2, - ) - except Exception: pass - tasks.append(disable_old()) - tasks.append(self._enable_default_domains(self.session)) - await asyncio.gather(*tasks) - # 🟢 tab-marker title prefix is purely cosmetic — fire-and-forget so - # it doesn't add to the synchronous IPC budget. - asyncio.create_task(_silent(asyncio.wait_for( - self.cdp.send_raw( - "Runtime.evaluate", - {"expression": "if(!document.title.startsWith('\U0001F7E2'))document.title='\U0001F7E2 '+document.title"}, - session_id=self.session, - ), - timeout=2, - ))) - return {"session_id": self.session} - if meta == "pending_dialog": return {"dialog": self.dialog} - if meta == "shutdown": self.stop.set(); return {"ok": True} - - method = req["method"] - params = req.get("params") or {} - # Browser-level Target.* calls must not use a session (stale or otherwise). - # For everything else, explicit session in req wins; else default. - sid = None if method.startswith("Target.") else (req.get("session_id") or self.session) - try: - return {"result": await self.cdp.send_raw(method, params, session_id=sid)} - except Exception as e: - msg = str(e) - if "Session with given id not found" in msg and sid == self.session and sid: - log(f"stale session {sid}, re-attaching") - if await self.attach_first_page(): - return {"result": await self.cdp.send_raw(method, params, session_id=self.session)} - return {"error": msg} - - -async def serve(d): - async def handler(reader, writer): - try: - line = await reader.readline() - if not line: return - resp = await d.handle(json.loads(line)) - writer.write((json.dumps(resp, default=str) + "\n").encode()) - await writer.drain() - except Exception as e: - log(f"conn: {e}") - try: - writer.write((json.dumps({"error": str(e)}) + "\n").encode()) - await writer.drain() - except Exception: - pass - finally: - writer.close() - - serve_task = asyncio.create_task(ipc.serve(NAME, handler)) - stop_task = asyncio.create_task(d.stop.wait()) - await asyncio.sleep(0.05) # let serve() bind so sock_addr() resolves to the live endpoint - log(f"listening on {ipc.sock_addr(NAME)} (name={NAME}, remote={REMOTE_ID or 'local'})") - try: - await asyncio.wait({serve_task, stop_task}, return_when=asyncio.FIRST_COMPLETED) - if serve_task.done(): await serve_task # surfaces a serve crash - finally: - for t in (serve_task, stop_task): - t.cancel() - try: await t - except (asyncio.CancelledError, Exception): pass - ipc.cleanup_endpoint(NAME) - - -async def main(): - d = Daemon() - await d.start() - await serve(d) - - -def already_running(): - # Ping handshake (not a bare connect) so a stale .port file + port reuse - # after a daemon crash doesn't make us mistake an unrelated listener for ours. - return ipc.ping(NAME, timeout=1.0) - - -if __name__ == "__main__": - if already_running(): - print(f"daemon already running on {SOCK}", file=sys.stderr) - sys.exit(0) - open(LOG, "w").close() - open(PID, "w").write(str(os.getpid())) - try: - asyncio.run(main()) - except KeyboardInterrupt: - pass - except Exception as e: - log(f"fatal: {e}") - sys.exit(1) - finally: - stop_remote() - try: os.unlink(PID) - except FileNotFoundError: pass diff --git a/packages/bcode-browser/harness/src/browser_harness/helpers.py b/packages/bcode-browser/harness/src/browser_harness/helpers.py deleted file mode 100644 index 7e4cf13c1b..0000000000 --- a/packages/bcode-browser/harness/src/browser_harness/helpers.py +++ /dev/null @@ -1,493 +0,0 @@ -"""Browser control via CDP. - -Core helpers live here. Agent-editable helpers live in -BH_AGENT_WORKSPACE/agent_helpers.py. -""" -import base64, importlib.util, json, math, os, sys, time, urllib.request -from pathlib import Path -from urllib.parse import urlparse - -from . import _ipc as ipc - - -CORE_DIR = Path(__file__).resolve().parent -REPO_ROOT = CORE_DIR.parent.parent -AGENT_WORKSPACE = Path(os.environ.get("BH_AGENT_WORKSPACE", REPO_ROOT / "agent-workspace")).expanduser() - - -def _load_env(): - paths = [REPO_ROOT / ".env", AGENT_WORKSPACE / ".env"] - for p in paths: - if not p.exists(): - continue - _load_env_file(p) - - -def _load_env_file(p): - for line in p.read_text().splitlines(): - line = line.strip() - if not line or line.startswith("#") or "=" not in line: - continue - k, v = line.split("=", 1) - os.environ.setdefault(k.strip(), v.strip().strip('"').strip("'")) - - -_load_env() - -NAME = os.environ.get("BU_NAME", "default") -SOCK = ipc.sock_addr(NAME) -INTERNAL = ("chrome://", "chrome-untrusted://", "devtools://", "chrome-extension://", "about:") - - -def _send(req): - c, token = ipc.connect(NAME, timeout=5.0) - try: - r = ipc.request(c, token, req) - finally: - c.close() - if "error" in r: raise RuntimeError(r["error"]) - return r - - -def cdp(method, session_id=None, **params): - """Raw CDP. cdp('Page.navigate', url='...'), cdp('DOM.getDocument', depth=-1).""" - return _send({"method": method, "params": params, "session_id": session_id}).get("result", {}) - - -def drain_events(): return _send({"meta": "drain_events"})["events"] - - -def _js_snippet(expression, limit=160): - snippet = expression.strip().replace("\n", "\\n") - return snippet[:limit - 3] + "..." if len(snippet) > limit else snippet - - -def _js_exception_description(result, details): - desc = result.get("description") - exc = details.get("exception") if details else None - if not desc and isinstance(exc, dict): - desc = exc.get("description") - if desc is None and "value" in exc: - desc = str(exc["value"]) - if desc is None: - desc = exc.get("className") - if not desc and details: - desc = details.get("text") - return desc or "JavaScript evaluation failed" - - -def _decode_unserializable_js_value(value): - if value == "NaN": - return math.nan - if value == "Infinity": - return math.inf - if value == "-Infinity": - return -math.inf - if value == "-0": - return -0.0 - if value.endswith("n"): - return int(value[:-1]) - return value - - -def _runtime_value(response, expression): - result = response.get("result", {}) - details = response.get("exceptionDetails") - if details or result.get("subtype") == "error": - desc = _js_exception_description(result, details) - if details: - line = details.get("lineNumber") - col = details.get("columnNumber") - loc = f" at line {line}, column {col}" if line is not None and col is not None else "" - else: - loc = "" - raise RuntimeError(f"JavaScript evaluation failed{loc}: {desc}; expression: {_js_snippet(expression)}") - if "value" in result: - return result["value"] - if "unserializableValue" in result: - return _decode_unserializable_js_value(result["unserializableValue"]) - return None - - -def _runtime_evaluate(expression, session_id=None, await_promise=False): - try: - r = cdp("Runtime.evaluate", session_id=session_id, expression=expression, returnByValue=True, awaitPromise=await_promise) - except TimeoutError as e: - raise RuntimeError(f"Runtime.evaluate timed out; expression: {_js_snippet(expression)}") from e - return _runtime_value(r, expression) - - -def _has_return_statement(expression): - i = 0 - n = len(expression) - state = "code" - quote = "" - while i < n: - ch = expression[i] - nxt = expression[i + 1] if i + 1 < n else "" - if state == "code": - if ch in ("'", '"', "`"): - state = "string"; quote = ch; i += 1; continue - if ch == "/" and nxt == "/": - state = "line_comment"; i += 2; continue - if ch == "/" and nxt == "*": - state = "block_comment"; i += 2; continue - if expression.startswith("return", i): - before = expression[i - 1] if i > 0 else "" - after = expression[i + 6] if i + 6 < n else "" - if not (before == "_" or before.isalnum()) and not (after == "_" or after.isalnum()): - return True - i += 1; continue - if state == "line_comment": - if ch == "\n": - state = "code" - i += 1; continue - if state == "block_comment": - if ch == "*" and nxt == "/": - state = "code"; i += 2; continue - i += 1; continue - if state == "string": - if ch == "\\": - i += 2; continue - if ch == quote: - state = "code"; quote = "" - i += 1; continue - return False - - -# --- navigation / page --- -def goto_url(url): - r = cdp("Page.navigate", url=url) - if os.environ.get("BH_DOMAIN_SKILLS") != "1": - return r - d = (AGENT_WORKSPACE / "domain-skills" / (urlparse(url).hostname or "").removeprefix("www.").split(".")[0]) - return {**r, "domain_skills": sorted(p.name for p in d.rglob("*.md"))[:10]} if d.is_dir() else r - -def page_info(): - """{url, title, w, h, sx, sy, pw, ph} — viewport + scroll + page size. - - If a native dialog (alert/confirm/prompt/beforeunload) is open, returns - {dialog: {type, message, ...}} instead — the page's JS thread is frozen - until the dialog is handled (see interaction-skills/dialogs.md).""" - dialog = _send({"meta": "pending_dialog"}).get("dialog") - if dialog: - return {"dialog": dialog} - expression = "JSON.stringify({url:location.href,title:document.title,w:innerWidth,h:innerHeight,sx:scrollX,sy:scrollY,pw:document.documentElement.scrollWidth,ph:document.documentElement.scrollHeight})" - return json.loads(_runtime_evaluate(expression)) - -# --- input --- -_debug_click_counter = 0 - -def click_at_xy(x, y, button="left", clicks=1): - if os.environ.get("BH_DEBUG_CLICKS"): - global _debug_click_counter - try: - from PIL import Image, ImageDraw - dpr = js("window.devicePixelRatio") or 1 - path = capture_screenshot(str(ipc._TMP / f"debug_click_{_debug_click_counter}.png")) - img = Image.open(path) - draw = ImageDraw.Draw(img) - px, py = int(x * dpr), int(y * dpr) - r = int(15 * dpr) - draw.ellipse([px - r, py - r, px + r, py + r], outline="red", width=int(3 * dpr)) - draw.line([px - r - int(5 * dpr), py, px + r + int(5 * dpr), py], fill="red", width=int(2 * dpr)) - draw.line([px, py - r - int(5 * dpr), px, py + r + int(5 * dpr)], fill="red", width=int(2 * dpr)) - img.save(path) - print(f"[debug_click] saved {path} (x={x}, y={y}, dpr={dpr})") - except Exception as e: - print(f"[debug_click] overlay failed: {e}") - _debug_click_counter += 1 - cdp("Input.dispatchMouseEvent", type="mousePressed", x=x, y=y, button=button, clickCount=clicks) - cdp("Input.dispatchMouseEvent", type="mouseReleased", x=x, y=y, button=button, clickCount=clicks) - -def type_text(text): - cdp("Input.insertText", text=text) - -def fill_input(selector, text, clear_first=True, timeout=0.0): - """Fill a framework-managed input (React controlled, Vue v-model, Ember tracked). - - type_text() uses Input.insertText which bypasses framework event listeners and leaves - submit buttons disabled. This helper focuses the element, clears it, types via real - key events, then fires synthetic input+change events so the framework sees the update. - - Raises RuntimeError if the element is not found. Pass timeout>0 to wait for - late-rendered elements (e.g. after a route change) before typing. - """ - if timeout > 0: - if not wait_for_element(selector, timeout=timeout): - raise RuntimeError(f"fill_input: element not found: {selector!r}") - focused = js( - f"(()=>{{const e=document.querySelector({json.dumps(selector)});" - f"if(!e)return false;e.focus();return true;}})()" - ) - if not focused: - raise RuntimeError(f"fill_input: element not found: {selector!r}") - if clear_first: - # Dispatch select-all directly — NOT via press_key, which always emits a - # `char` event for single-char keys. With Ctrl/Cmd held, that `char` - # makes Chrome treat the input as a printable "a" instead of firing the - # select-all shortcut, leaving the field uncleared. - mods = 4 if sys.platform == "darwin" else 2 # Cmd on macOS, Ctrl elsewhere - select_all = {"key": "a", "code": "KeyA", "modifiers": mods, - "windowsVirtualKeyCode": 65, "nativeVirtualKeyCode": 65} - cdp("Input.dispatchKeyEvent", type="rawKeyDown", **select_all) - cdp("Input.dispatchKeyEvent", type="keyUp", **select_all) - press_key("Backspace") - for ch in text: - press_key(ch) - js( - f"(()=>{{const e=document.querySelector({json.dumps(selector)});" - f"if(!e)return;" - f"e.dispatchEvent(new Event('input',{{bubbles:true}}));" - f"e.dispatchEvent(new Event('change',{{bubbles:true}}));}})();" - ) - -_KEYS = { # key → (windowsVirtualKeyCode, code, text) - "Enter": (13, "Enter", "\r"), "Tab": (9, "Tab", "\t"), "Backspace": (8, "Backspace", ""), - "Escape": (27, "Escape", ""), "Delete": (46, "Delete", ""), " ": (32, "Space", " "), - "ArrowLeft": (37, "ArrowLeft", ""), "ArrowUp": (38, "ArrowUp", ""), - "ArrowRight": (39, "ArrowRight", ""), "ArrowDown": (40, "ArrowDown", ""), - "Home": (36, "Home", ""), "End": (35, "End", ""), - "PageUp": (33, "PageUp", ""), "PageDown": (34, "PageDown", ""), -} -def press_key(key, modifiers=0): - """Modifiers bitfield: 1=Alt, 2=Ctrl, 4=Meta(Cmd), 8=Shift. - Special keys (Enter, Tab, Arrow*, Backspace, etc.) carry their virtual key codes - so listeners checking e.keyCode / e.key all fire.""" - vk, code, text = _KEYS.get(key, (ord(key[0]) if len(key) == 1 else 0, key, key if len(key) == 1 else "")) - base = {"key": key, "code": code, "modifiers": modifiers, "windowsVirtualKeyCode": vk, "nativeVirtualKeyCode": vk} - cdp("Input.dispatchKeyEvent", type="keyDown", **base, **({"text": text} if text else {})) - if text and len(text) == 1: - cdp("Input.dispatchKeyEvent", type="char", text=text, **{k: v for k, v in base.items() if k != "text"}) - cdp("Input.dispatchKeyEvent", type="keyUp", **base) - -def scroll(x, y, dy=-300, dx=0): - cdp("Input.dispatchMouseEvent", type="mouseWheel", x=x, y=y, deltaX=dx, deltaY=dy) - - -# --- visual --- -def capture_screenshot(path=None, full=False, max_dim=None): - """Save a PNG of the current viewport. Set max_dim=1800 on a 2× display to - keep the file under the 2000px-per-side limit some image-aware LLMs enforce.""" - path = path or str(ipc._TMP / "shot.png") - r = cdp("Page.captureScreenshot", format="png", captureBeyondViewport=full) - open(path, "wb").write(base64.b64decode(r["data"])) - if max_dim: - from PIL import Image - img = Image.open(path) - if max(img.size) > max_dim: - img.thumbnail((max_dim, max_dim)) - img.save(path) - return path - - -# --- tabs --- -def list_tabs(include_chrome=True): - out = [] - for t in cdp("Target.getTargets")["targetInfos"]: - if t["type"] != "page": continue - url = t.get("url", "") - if not include_chrome and url.startswith(INTERNAL): continue - out.append({"targetId": t["targetId"], "title": t.get("title", ""), "url": url}) - return out - -def current_tab(): - r = _send({"meta": "current_tab"}) - return {"targetId": r["targetId"], "url": r["url"], "title": r["title"]} - -def _mark_tab(): - """Prepend 🟢 to tab title so the user can see which tab the agent controls.""" - try: cdp("Runtime.evaluate", expression="if(!document.title.startsWith('\U0001F7E2'))document.title='\U0001F7E2 '+document.title") - except Exception: pass - -def switch_tab(target): - # Accept either a raw targetId string or the dict returned by current_tab() / list_tabs(), - # so `switch_tab(current_tab())` works without a manual ["targetId"] dance. - target_id = target.get("targetId") if isinstance(target, dict) else target - # Unmark old tab - try: cdp("Runtime.evaluate", expression="if(document.title.startsWith('\U0001F7E2 '))document.title=document.title.slice(2)") - except Exception: pass - cdp("Target.activateTarget", targetId=target_id) - sid = cdp("Target.attachToTarget", targetId=target_id, flatten=True)["sessionId"] - _send({"meta": "set_session", "session_id": sid, "target_id": target_id}) - _mark_tab() - return sid - -def new_tab(url="about:blank"): - # Always create blank, then goto: passing url to createTarget races with - # attach, so the brief about:blank is "complete" by the time the caller - # polls and wait_for_load() returns before navigation actually starts. - tid = cdp("Target.createTarget", url="about:blank")["targetId"] - switch_tab(tid) - if url != "about:blank": - goto_url(url) - return tid - -def ensure_real_tab(): - """Switch to a real user tab if current is chrome:// / internal / stale.""" - tabs = list_tabs(include_chrome=False) - if not tabs: - return None - try: - cur = current_tab() - if cur["url"] and not cur["url"].startswith(INTERNAL): - return cur - except Exception: - pass - switch_tab(tabs[0]["targetId"]) - return tabs[0] - -def iframe_target(url_substr): - """First iframe target whose URL contains `url_substr`. Use with js(..., target_id=...).""" - for t in cdp("Target.getTargets")["targetInfos"]: - if t["type"] == "iframe" and url_substr in t.get("url", ""): - return t["targetId"] - return None - - -# --- utility --- -def wait(seconds=1.0): - time.sleep(seconds) - -def wait_for_load(timeout=15.0): - """Poll document.readyState == 'complete' or timeout.""" - deadline = time.time() + timeout - while time.time() < deadline: - if js("document.readyState") == "complete": return True - time.sleep(0.3) - return False - -def wait_for_element(selector, timeout=10.0, visible=False): - """Poll until querySelector(selector) exists in the DOM, or timeout. - - wait_for_load() misses SPAs — the document is 'complete' before the framework renders. - Use this after actions that trigger async rendering (route changes, data fetches). - Set visible=True to also require the element to be non-hidden and in-layout. - Returns True if found, False on timeout. - """ - if visible: - # checkVisibility walks the ancestor chain and respects display:none / - # visibility:hidden / opacity:0 on parents, which a getComputedStyle - # check on the element alone misses (it returns the descendant's own - # style, not the inherited "is this rendered" state). Falls back to - # the per-element CSS check on older Chrome that lacks checkVisibility. - check = ( - f"(()=>{{const e=document.querySelector({json.dumps(selector)});" - f"if(!e)return false;" - f"if(typeof e.checkVisibility==='function')" - f"return e.checkVisibility({{checkOpacity:true,checkVisibilityCSS:true}});" - f"const s=getComputedStyle(e);" - f"return s.display!=='none'&&s.visibility!=='hidden'&&s.opacity!=='0'}})()" - ) - else: - check = f"!!document.querySelector({json.dumps(selector)})" - deadline = time.time() + timeout - while time.time() < deadline: - if js(check): return True - time.sleep(0.3) - return False - -def wait_for_network_idle(timeout=10.0, idle_ms=500): - """Wait until all in-flight requests finish and no Network.* events arrive for idle_ms ms. - - Useful after form submits, SPA route transitions, and any action that triggers - XHR/fetch without a visible DOM change. Builds on drain_events() — no daemon changes. - Returns True if idle window reached, False on timeout. - - Events are filtered to the active session — a previously-attached background - tab (e.g. a polling/SSE page the agent switched away from) keeps emitting - Network events into the daemon's global event buffer; without this filter - they would poison the idle check on the current tab. - """ - deadline = time.time() + timeout - last_activity = time.time() - inflight = set() - active_session = _send({"meta": "session"}).get("session_id") - while time.time() < deadline: - for e in drain_events(): - if e.get("session_id") != active_session: - continue - method = e.get("method", "") - params = e.get("params", {}) - if method == "Network.requestWillBeSent": - inflight.add(params.get("requestId")) - last_activity = time.time() - elif method in ("Network.loadingFinished", "Network.loadingFailed"): - inflight.discard(params.get("requestId")) - last_activity = time.time() - elif method.startswith("Network."): - last_activity = time.time() - if not inflight and (time.time() - last_activity) * 1000 >= idle_ms: - return True - time.sleep(0.1) - return False - -def js(expression, target_id=None): - """Run JS in the attached tab (default) or inside an iframe target (via iframe_target()). - - Expressions with top-level `return` are automatically wrapped in an IIFE, so both - `document.title` and `const x = 1; return x` are valid inputs. - """ - sid = cdp("Target.attachToTarget", targetId=target_id, flatten=True)["sessionId"] if target_id else None - if _has_return_statement(expression) and not expression.strip().startswith("("): - expression = f"(function(){{{expression}}})()" - return _runtime_evaluate(expression, session_id=sid, await_promise=True) - - -_KC = {"Enter": 13, "Tab": 9, "Escape": 27, "Backspace": 8, " ": 32, "ArrowLeft": 37, "ArrowUp": 38, "ArrowRight": 39, "ArrowDown": 40} - - -def dispatch_key(selector, key="Enter", event="keypress"): - """Dispatch a DOM KeyboardEvent on the matched element. - - Use this when a site reacts to synthetic DOM key events on an element more reliably - than to raw CDP input events. - """ - kc = _KC.get(key, ord(key) if len(key) == 1 else 0) - js( - f"(()=>{{const e=document.querySelector({json.dumps(selector)});if(e){{e.focus();e.dispatchEvent(new KeyboardEvent({json.dumps(event)},{{key:{json.dumps(key)},code:{json.dumps(key)},keyCode:{kc},which:{kc},bubbles:true}}));}}}})()" - ) - -def upload_file(selector, path): - """Set files on a file input via CDP DOM.setFileInputFiles. `path` is an absolute filepath (use tempfile.mkstemp if needed).""" - doc = cdp("DOM.getDocument", depth=-1) - nid = cdp("DOM.querySelector", nodeId=doc["root"]["nodeId"], selector=selector)["nodeId"] - if not nid: raise RuntimeError(f"no element for {selector}") - cdp("DOM.setFileInputFiles", files=[path] if isinstance(path, str) else list(path), nodeId=nid) - -def http_get(url, headers=None, timeout=20.0): - """Pure HTTP — no browser. Use for static pages / APIs. Wrap in ThreadPoolExecutor for bulk. - - When BROWSER_USE_API_KEY is set, routes through the fetch-use proxy (handles bot - detection, residential proxies, retries). Falls back to local urllib otherwise.""" - if os.environ.get("BROWSER_USE_API_KEY"): - try: - from fetch_use import fetch_sync - return fetch_sync(url, headers=headers, timeout_ms=int(timeout * 1000)).text - except ImportError: - pass - import gzip - h = {"User-Agent": "Mozilla/5.0", "Accept-Encoding": "gzip"} - if headers: h.update(headers) - with urllib.request.urlopen(urllib.request.Request(url, headers=h), timeout=timeout) as r: - data = r.read() - if r.headers.get("Content-Encoding") == "gzip": data = gzip.decompress(data) - return data.decode() - - -def _load_agent_helpers(): - p = AGENT_WORKSPACE / "agent_helpers.py" - if not p.exists(): - return - spec = importlib.util.spec_from_file_location("browser_harness_agent_helpers", p) - if not spec or not spec.loader: - return - module = importlib.util.module_from_spec(spec) - spec.loader.exec_module(module) - for name, value in vars(module).items(): - if name.startswith("_"): - continue - globals()[name] = value - - -_load_agent_helpers() diff --git a/packages/bcode-browser/harness/src/browser_harness/run.py b/packages/bcode-browser/harness/src/browser_harness/run.py deleted file mode 100644 index 4828bad760..0000000000 --- a/packages/bcode-browser/harness/src/browser_harness/run.py +++ /dev/null @@ -1,110 +0,0 @@ -import os, sys, urllib.request - -# Windows default stdout encoding is cp1252, which can't encode the 🟢 marker -# helpers prepend to tab titles (or anything else outside Latin-1). Force UTF-8 -# so `print(page_info())` doesn't UnicodeEncodeError on Windows. Issue #124(4). -if hasattr(sys.stdout, "reconfigure"): - try: sys.stdout.reconfigure(encoding="utf-8", errors="replace") - except Exception: pass - -from .admin import ( - _version, - NAME, - daemon_alive, - ensure_daemon, - list_cloud_profiles, - list_local_profiles, - print_update_banner, - restart_daemon, - run_doctor, - run_update, - start_remote_daemon, - stop_remote_daemon, - sync_local_profile, -) -from .helpers import * - -HELP = """Browser Harness - -Read SKILL.md for the default workflow and examples. - -Typical usage: - browser-harness -c ' - ensure_real_tab() - print(page_info()) - ' - -Helpers are pre-imported. The daemon auto-starts and connects to the running browser. - -Commands: - browser-harness --version print the installed version - browser-harness --doctor diagnose install, daemon, and browser state - browser-harness --update [-y] pull the latest version (agents: pass -y) - browser-harness --reload stop the daemon so next call picks up code changes -""" - - -# Probe /json/version (not a bare TCP connect) so a non-Chrome process bound to -# 9222/9223 doesn't masquerade as Chrome and skip the cloud bootstrap. Mirrors -# daemon.py's fallback probe. -def _local_chrome_listening(): - for port in (9222, 9223): - try: - urllib.request.urlopen(f"http://127.0.0.1:{port}/json/version", timeout=0.3).close() - return True - except OSError: pass - return False - - -# BU_CDP_URL / BU_CDP_WS are documented to override local Chrome discovery -# (install.md:58-59), so they must also block cloud auto-bootstrap. Without this -# guard, start_remote_daemon() in admin.py overwrites BU_CDP_WS in the daemon -# env with a cloud WebSocket URL, silently replacing the user's explicit endpoint -# *and* billing them for a cloud browser they never asked for. -def _explicit_cdp_configured(): - return bool(os.environ.get("BU_CDP_URL") or os.environ.get("BU_CDP_WS")) - - -def main(): - args = sys.argv[1:] - if args and args[0] in {"-h", "--help"}: - print(HELP) - return - if args and args[0] == "--version": - print(_version() or "unknown") - return - if args and args[0] == "--doctor": - sys.exit(run_doctor()) - if args and args[0] == "--update": - yes = any(a in {"-y", "--yes"} for a in args[1:]) - sys.exit(run_update(yes=yes)) - if args and args[0] == "--reload": - restart_daemon() - print("daemon stopped — will restart fresh on next call") - return - if args and args[0] == "--debug-clicks": - os.environ["BH_DEBUG_CLICKS"] = "1" - args = args[1:] - if not args or args[0] != "-c": - sys.exit("Usage: browser-harness -c \"print(page_info())\"") - if len(args) < 2: - sys.exit("Usage: browser-harness -c \"print(page_info())\"") - print_update_banner() - # Auto-bootstrap a cloud browser is opt-in via BU_AUTOSPAWN — BROWSER_USE_API_KEY alone - # is not enough, since the key is commonly set for unrelated reasons (profile sync, - # cloud API calls, parent agents managing their own session). An explicit BU_CDP_URL - # or BU_CDP_WS also blocks the spawn so we honour the precedence install.md promises. - if ( - not daemon_alive() - and not _local_chrome_listening() - and not _explicit_cdp_configured() - and os.environ.get("BROWSER_USE_API_KEY") - and os.environ.get("BU_AUTOSPAWN") - ): - start_remote_daemon(NAME) - ensure_daemon() - exec(args[1], globals()) - - -if __name__ == "__main__": - main() diff --git a/packages/bcode-browser/harness/tests/__init__.py b/packages/bcode-browser/harness/tests/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/packages/bcode-browser/harness/tests/conftest.py b/packages/bcode-browser/harness/tests/conftest.py deleted file mode 100644 index b6109f4e18..0000000000 --- a/packages/bcode-browser/harness/tests/conftest.py +++ /dev/null @@ -1,16 +0,0 @@ -import base64 -import io - -import pytest -from PIL import Image - - -def make_png(width, height): - buf = io.BytesIO() - Image.new("RGB", (width, height), "white").save(buf, format="PNG") - return base64.b64encode(buf.getvalue()).decode() - - -@pytest.fixture -def fake_png(): - return make_png diff --git a/packages/bcode-browser/harness/tests/integration/__init__.py b/packages/bcode-browser/harness/tests/integration/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/packages/bcode-browser/harness/tests/integration/test_js.py b/packages/bcode-browser/harness/tests/integration/test_js.py deleted file mode 100644 index 86582e683e..0000000000 --- a/packages/bcode-browser/harness/tests/integration/test_js.py +++ /dev/null @@ -1,171 +0,0 @@ -from unittest.mock import patch - -import math - -import pytest - -from browser_harness import helpers - - -def _capture_cdp(): - captured = [] - def fake_cdp(method, **kwargs): - captured.append((method, kwargs)) - return {"result": {"value": None}} - return fake_cdp, captured - - -def _evaluated_expression(captured): - return next(kw["expression"] for m, kw in captured if m == "Runtime.evaluate") - - -def test_simple_expression_passes_through(): - fake_cdp, captured = _capture_cdp() - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - helpers.js("document.title") - assert _evaluated_expression(captured) == "document.title" - - -def test_return_statement_gets_wrapped(): - fake_cdp, captured = _capture_cdp() - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - helpers.js("const x = 1; return x") - assert _evaluated_expression(captured) == "(function(){const x = 1; return x})()" - - -def test_iife_with_internal_return_is_not_double_wrapped(): - fake_cdp, captured = _capture_cdp() - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - helpers.js("(function(){ return document.title; })()") - assert _evaluated_expression(captured) == "(function(){ return document.title; })()" - - -def test_js_raises_on_syntax_error_exception_details(): - def fake_cdp(method, **kwargs): - return { - "result": { - "type": "object", - "subtype": "error", - "description": "SyntaxError: Invalid or unexpected token", - }, - "exceptionDetails": { - "text": "Uncaught", - "lineNumber": 1, - "columnNumber": 12, - }, - } - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - with pytest.raises(RuntimeError, match="SyntaxError"): - helpers.js('return "a\n\nb";') - - -def test_js_raises_on_runtime_error_exception_details(): - def fake_cdp(method, **kwargs): - return { - "result": { - "type": "object", - "subtype": "error", - "description": "ReferenceError: missing is not defined", - }, - "exceptionDetails": { - "text": "Uncaught", - "lineNumber": 0, - "columnNumber": 17, - }, - } - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - with pytest.raises(RuntimeError, match="ReferenceError"): - helpers.js("return missing.value") - - -def test_js_raises_on_error_result_without_exception_details(): - def fake_cdp(method, **kwargs): - return { - "result": { - "type": "object", - "subtype": "error", - "description": "Error: evaluation failed", - } - } - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - with pytest.raises(RuntimeError, match="evaluation failed"): - helpers.js("throw new Error('evaluation failed')") - - -def test_return_word_inside_string_does_not_trigger_wrapping(): - fake_cdp, captured = _capture_cdp() - expr = 'document.body.innerText.includes("return ")' - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - helpers.js(expr) - assert _evaluated_expression(captured) == expr - - -def test_return_word_inside_comment_does_not_trigger_wrapping(): - fake_cdp, captured = _capture_cdp() - expr = "// return comment\n1 + 1" - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - helpers.js(expr) - assert _evaluated_expression(captured) == expr - - -@pytest.mark.parametrize("expr", ["return\t1", "return\n1"]) -def test_top_level_return_with_whitespace_gets_wrapped(expr): - fake_cdp, captured = _capture_cdp() - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - helpers.js(expr) - assert _evaluated_expression(captured) == f"(function(){{{expr}}})()" - - -@pytest.mark.parametrize( - ("unserializable", "expected"), - [ - ("NaN", math.nan), - ("Infinity", math.inf), - ("-Infinity", -math.inf), - ("-0", -0.0), - ("1n", 1), - ], -) -def test_js_returns_unserializable_values(unserializable, expected): - def fake_cdp(method, **kwargs): - return {"result": {"type": "number", "unserializableValue": unserializable, "description": unserializable}} - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - value = helpers.js(unserializable) - - if isinstance(expected, float) and math.isnan(expected): - assert math.isnan(value) - elif expected == 0: - assert value == 0 - assert math.copysign(1, value) == math.copysign(1, expected) - else: - assert value == expected - - -def test_js_primitive_exception_message_uses_exception_value(): - def fake_cdp(method, **kwargs): - return { - "result": {"type": "string", "value": "boom"}, - "exceptionDetails": { - "text": "Uncaught", - "lineNumber": 0, - "columnNumber": 9, - "exception": {"type": "string", "value": "boom"}, - }, - } - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - with pytest.raises(RuntimeError, match="boom"): - helpers.js("throw value") - - -def test_js_timeout_error_includes_expression_context(): - def fake_cdp(method, **kwargs): - raise TimeoutError("timed out") - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - with pytest.raises(RuntimeError, match="Runtime.evaluate.*document.title"): - helpers.js("document.title") diff --git a/packages/bcode-browser/harness/tests/unit/__init__.py b/packages/bcode-browser/harness/tests/unit/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/packages/bcode-browser/harness/tests/unit/test_admin.py b/packages/bcode-browser/harness/tests/unit/test_admin.py deleted file mode 100644 index e8b333cd3e..0000000000 --- a/packages/bcode-browser/harness/tests/unit/test_admin.py +++ /dev/null @@ -1,529 +0,0 @@ -import pytest - -from browser_harness import admin - - -class FakeSocket: - def __init__(self, response=b'{"target_id":"target-1","session_id":"session-1","page":null}\n'): - self.response = response - self.closed = False - self.sent = b"" - - def sendall(self, data): - self.sent += data - - def recv(self, _size): - out, self.response = self.response, b"" - return out - - def close(self): - self.closed = True - - -def test_local_chrome_mode_is_false_when_env_provides_remote_cdp(): - assert not admin._is_local_chrome_mode({"BU_CDP_WS": "ws://example.test/devtools/browser/1"}) - - -def test_local_chrome_mode_is_false_when_process_env_provides_remote_cdp(monkeypatch): - monkeypatch.setenv("BU_CDP_WS", "ws://example.test/devtools/browser/1") - - assert not admin._is_local_chrome_mode() - - -def test_handshake_timeout_needs_chrome_remote_debugging_prompt(): - msg = "CDP WS handshake failed: timed out during opening handshake" - - assert admin._needs_chrome_remote_debugging_prompt(msg) - - -def test_handshake_403_needs_chrome_remote_debugging_prompt(): - msg = "CDP WS handshake failed: server rejected WebSocket connection: HTTP 403" - - assert admin._needs_chrome_remote_debugging_prompt(msg) - - -def test_stale_websocket_does_not_open_chrome_inspect(): - msg = "no close frame received or sent" - - assert not admin._needs_chrome_remote_debugging_prompt(msg) - - -def test_daemon_endpoint_names_discovers_valid_socket_names(tmp_path, monkeypatch): - monkeypatch.setattr(admin.ipc, "IS_WINDOWS", False) - monkeypatch.setattr(admin.ipc, "BH_RUNTIME_DIR", None) # shared-tmpdir mode - monkeypatch.setattr(admin.ipc, "_RUNTIME", tmp_path) - (tmp_path / "bu-default.sock").touch() - (tmp_path / "bu-remote_1.sock").touch() - (tmp_path / "bu-invalid.name.sock").touch() - (tmp_path / "not-bu-default.sock").touch() - - assert admin._daemon_endpoint_names() == ["default", "remote_1"] - - -def test_daemon_endpoint_names_with_bh_runtime_dir_returns_local_name_when_sock_exists(tmp_path, monkeypatch): - monkeypatch.setattr(admin.ipc, "IS_WINDOWS", False) - monkeypatch.setattr(admin.ipc, "BH_RUNTIME_DIR", str(tmp_path)) - monkeypatch.setattr(admin.ipc, "_RUNTIME", tmp_path) - monkeypatch.setattr(admin, "NAME", "session-xyz") - (tmp_path / "bu.sock").touch() - - assert admin._daemon_endpoint_names() == ["session-xyz"] - - -def test_daemon_endpoint_names_with_bh_runtime_dir_returns_empty_when_sock_missing(tmp_path, monkeypatch): - monkeypatch.setattr(admin.ipc, "IS_WINDOWS", False) - monkeypatch.setattr(admin.ipc, "BH_RUNTIME_DIR", str(tmp_path)) - monkeypatch.setattr(admin.ipc, "_RUNTIME", tmp_path) - monkeypatch.setattr(admin, "NAME", "session-xyz") - - assert admin._daemon_endpoint_names() == [] - - -def test_active_browser_connections_counts_only_healthy_daemons(monkeypatch): - monkeypatch.setattr(admin, "_daemon_endpoint_names", lambda: ["default", "stale", "remote"]) - - def fake_connect(name, timeout=1.0): - if name == "stale": - raise ConnectionRefusedError() - if name == "remote": - return FakeSocket(b'{"error":"no close frame received or sent"}\n'), None - return FakeSocket(), None - - monkeypatch.setattr(admin.ipc, "connect", fake_connect) - - assert admin.active_browser_connections() == 1 - - -def test_active_browser_connections_skips_daemons_reporting_cdp_disconnected(monkeypatch): - monkeypatch.setattr(admin, "_daemon_endpoint_names", lambda: ["default", "stale"]) - - def fake_connect(name, timeout=1.0): - if name == "stale": - return FakeSocket(b'{"error":"cdp_disconnected"}\n'), None - return FakeSocket(), None - - monkeypatch.setattr(admin.ipc, "connect", fake_connect) - - assert admin.active_browser_connections() == 1 - - -def test_browser_connections_returns_attached_page(monkeypatch): - monkeypatch.setattr(admin, "_daemon_endpoint_names", lambda: ["default"]) - response = ( - b'{"target_id":"target-1","session_id":"session-1",' - b'"page":{"targetId":"target-1","title":"Cat - Wikipedia","url":"https://en.wikipedia.org/wiki/Cat"}}\n' - ) - monkeypatch.setattr(admin.ipc, "connect", lambda name, timeout=1.0: (FakeSocket(response), None)) - - assert admin.browser_connections() == [ - { - "name": "default", - "page": {"title": "Cat - Wikipedia", "url": "https://en.wikipedia.org/wiki/Cat"}, - } - ] - - -def test_run_doctor_prints_active_browser_connections_and_active_pages(monkeypatch, capsys): - monkeypatch.setattr(admin, "_version", lambda: "0.1.0") - monkeypatch.setattr(admin, "_install_mode", lambda: "git") - monkeypatch.setattr(admin, "_chrome_running", lambda: True) - monkeypatch.setattr(admin, "daemon_alive", lambda: True) - monkeypatch.setattr(admin, "browser_connections", lambda: [ - { - "name": "default", - "page": {"title": "Example", "url": "https://example.test"}, - }, - { - "name": "cats", - "page": {"title": "Cat - Wikipedia", "url": "https://en.wikipedia.org/wiki/Cat"}, - }, - ]) - monkeypatch.setattr(admin, "_latest_release_tag", lambda: "0.1.0") - monkeypatch.setattr("shutil.which", lambda _cmd: None) - monkeypatch.delenv("BROWSER_USE_API_KEY", raising=False) - - assert admin.run_doctor() == 0 - - out = capsys.readouterr().out - assert "[ok ] active browser connections — 2" in out - assert " default — active page: Example — https://example.test" in out - assert " cats — active page: Cat - Wikipedia — https://en.wikipedia.org/wiki/Cat" in out - - -def test_doctor_page_output_truncates_long_text(monkeypatch, capsys): - monkeypatch.setattr(admin, "_version", lambda: "0.1.0") - monkeypatch.setattr(admin, "_install_mode", lambda: "git") - monkeypatch.setattr(admin, "_chrome_running", lambda: True) - monkeypatch.setattr(admin, "daemon_alive", lambda: True) - monkeypatch.setattr(admin, "DOCTOR_TEXT_LIMIT", 20) - monkeypatch.setattr(admin, "browser_connections", lambda: [ - { - "name": "default", - "page": {"title": "A very long page title", "url": "https://example.test/very/long/path"}, - } - ]) - monkeypatch.setattr(admin, "_latest_release_tag", lambda: "0.1.0") - monkeypatch.setattr("shutil.which", lambda _cmd: None) - monkeypatch.delenv("BROWSER_USE_API_KEY", raising=False) - - assert admin.run_doctor() == 0 - - out = capsys.readouterr().out - assert "A very long page ..." in out - assert "https://example.t..." in out - - -def test_start_remote_daemon_stops_created_browser_when_daemon_start_fails(monkeypatch): - calls = [] - browser = {"id": "browser-123", "cdpUrl": "http://127.0.0.1:9333", "liveUrl": "https://live.example"} - - def fake_browser_use(path, method, body=None): - calls.append((path, method, body)) - if (path, method) == ("/browsers", "POST"): - return browser - if (path, method) == ("/browsers/browser-123", "PATCH"): - return {} - raise AssertionError((path, method, body)) - - monkeypatch.setattr(admin, "daemon_alive", lambda name: False) - monkeypatch.setattr(admin, "_browser_use", fake_browser_use) - monkeypatch.setattr(admin, "_cdp_ws_from_url", lambda url: "ws://example.test/devtools/browser/1") - monkeypatch.setattr(admin, "ensure_daemon", lambda **kwargs: (_ for _ in ()).throw(RuntimeError("boom"))) - - with pytest.raises(RuntimeError, match="boom"): - admin.start_remote_daemon() - - assert calls == [ - ("/browsers", "POST", {}), - ("/browsers/browser-123", "PATCH", {"action": "stop"}), - ] - - -@pytest.mark.parametrize("exc_type", [KeyboardInterrupt, SystemExit]) -def test_start_remote_daemon_stops_created_browser_when_daemon_start_is_interrupted(monkeypatch, exc_type): - calls = [] - browser = {"id": "browser-123", "cdpUrl": "http://127.0.0.1:9333", "liveUrl": "https://live.example"} - - def fake_browser_use(path, method, body=None): - calls.append((path, method, body)) - if (path, method) == ("/browsers", "POST"): - return browser - if (path, method) == ("/browsers/browser-123", "PATCH"): - return {} - raise AssertionError((path, method, body)) - - monkeypatch.setattr(admin, "daemon_alive", lambda name: False) - monkeypatch.setattr(admin, "_browser_use", fake_browser_use) - monkeypatch.setattr(admin, "_cdp_ws_from_url", lambda url: "ws://example.test/devtools/browser/1") - monkeypatch.setattr(admin, "ensure_daemon", lambda **kwargs: (_ for _ in ()).throw(exc_type())) - - with pytest.raises(exc_type): - admin.start_remote_daemon() - - assert calls == [ - ("/browsers", "POST", {}), - ("/browsers/browser-123", "PATCH", {"action": "stop"}), - ] - - -@pytest.mark.parametrize("exc_type", [KeyboardInterrupt, SystemExit]) -def test_stop_cloud_browser_swallows_baseexception_from_stop_request(monkeypatch, exc_type): - monkeypatch.setattr(admin, "_browser_use", lambda *args, **kwargs: (_ for _ in ()).throw(exc_type())) - - admin._stop_cloud_browser("browser-123") - -def test_start_remote_daemon_does_not_stop_created_browser_on_success(monkeypatch): - calls = [] - browser = {"id": "browser-123", "cdpUrl": "http://127.0.0.1:9333", "liveUrl": "https://live.example"} - - def fake_browser_use(path, method, body=None): - calls.append((path, method, body)) - if (path, method) == ("/browsers", "POST"): - return browser - raise AssertionError((path, method, body)) - - monkeypatch.setattr(admin, "daemon_alive", lambda name: False) - monkeypatch.setattr(admin, "_browser_use", fake_browser_use) - monkeypatch.setattr(admin, "_cdp_ws_from_url", lambda url: "ws://example.test/devtools/browser/1") - monkeypatch.setattr(admin, "ensure_daemon", lambda **kwargs: None) - monkeypatch.setattr(admin, "_show_live_url", lambda url: None) - - assert admin.start_remote_daemon() == browser - assert calls == [ - ("/browsers", "POST", {}), - ] - - -# --- restart_daemon: PID-reuse safety --- - -def test_restart_daemon_does_not_signal_when_daemon_unreachable(monkeypatch, tmp_path): - """If ipc.identify() returns None (daemon gone), restart_daemon must NOT - fall back to reading the pid file and SIGTERMing whatever owns that PID — - that's the PID-reuse hazard. It should only clean up files.""" - pid_path = tmp_path / "default.pid" - # A pid file with a PID that, if signaled, would hit an unrelated process. - # The whole point is that we don't read or trust this number. - pid_path.write_text("99999") - - kill_calls = [] - monkeypatch.setattr(admin.os, "kill", lambda pid, sig: kill_calls.append((pid, sig))) - monkeypatch.setattr(admin.ipc, "identify", lambda name, timeout=5.0: None) - monkeypatch.setattr(admin.ipc, "ping", lambda name, timeout=1.0: False) - monkeypatch.setattr(admin.ipc, "pid_path", lambda name: pid_path) - monkeypatch.setattr(admin.ipc, "cleanup_endpoint", lambda name: None) - - # Should not raise, should not signal, should still clean up the pid file. - admin.restart_daemon("default") - - assert kill_calls == [], ( - f"restart_daemon SIGTERM'd a PID despite identify() returning None — " - f"this is the PID-reuse hazard the function is meant to avoid. Calls: {kill_calls}" - ) - assert not pid_path.exists(), "stale pid file should be cleaned up" - - -def test_restart_daemon_signals_pid_returned_by_identify_not_pid_file(monkeypatch, tmp_path): - """The PID we signal must come from the live daemon's self-report, never - from the pid file. If a stale pid file disagrees, the live daemon's PID wins.""" - import signal - - pid_path = tmp_path / "default.pid" - pid_path.write_text("99999") # bogus stale value — must be ignored - - live_pid = 4242 - - kill_calls = [] - def fake_kill(pid, sig): - kill_calls.append((pid, sig)) - # First os.kill(pid, 0) probe: report process is gone so we exit the loop - # without escalating. We just want to see WHICH pid was probed. - if sig == 0: - raise ProcessLookupError - - class FakeIPC: - def __init__(self): - self.shutdown_sent = False - def identify(self, name, timeout=5.0): - return live_pid - def connect(self, name, timeout): - return ("conn", "tok") - def request(self, conn, tok, msg): - if msg.get("meta") == "shutdown": - self.shutdown_sent = True - return {"ok": True} - def pid_path(self, name): - return pid_path - def cleanup_endpoint(self, name): - pass - - fake = FakeIPC() - monkeypatch.setattr(admin.os, "kill", fake_kill) - monkeypatch.setattr(admin.ipc, "identify", fake.identify) - monkeypatch.setattr(admin.ipc, "ping", lambda name, timeout=1.0: True) - monkeypatch.setattr(admin.ipc, "connect", fake.connect) - monkeypatch.setattr(admin.ipc, "request", fake.request) - monkeypatch.setattr(admin.ipc, "pid_path", fake.pid_path) - monkeypatch.setattr(admin.ipc, "cleanup_endpoint", fake.cleanup_endpoint) - - admin.restart_daemon("default") - - assert fake.shutdown_sent, "expected shutdown IPC to be sent" - assert kill_calls, "expected at least one os.kill probe" - pids_signaled = {pid for pid, _ in kill_calls} - assert pids_signaled == {live_pid}, ( - f"restart_daemon must only signal the PID returned by identify(); " - f"signaled pids: {pids_signaled}, expected {{{live_pid}}} (and NOT 99999)" - ) - assert not pid_path.exists() - - -def test_restart_daemon_sends_shutdown_to_pre_upgrade_daemon_without_pid_in_ping(monkeypatch, tmp_path): - """Backward compat: a pre-upgrade daemon's ping reply has {pong:True} but - no `pid` field, so identify() returns None. The shutdown IPC must STILL be - sent (so the daemon exits cleanly), but no os.kill happens (we have no - verified PID to safely signal).""" - pid_path = tmp_path / "default.pid" - pid_path.write_text("99999") # bogus stale value - - kill_calls = [] - shutdown_calls = [] - - def fake_request(conn, tok, msg): - if msg.get("meta") == "shutdown": - shutdown_calls.append(msg) - return {"ok": True} - - monkeypatch.setattr(admin.os, "kill", lambda pid, sig: kill_calls.append((pid, sig))) - monkeypatch.setattr(admin.ipc, "identify", lambda name, timeout=5.0: None) - monkeypatch.setattr(admin.ipc, "ping", lambda name, timeout=1.0: True) # old daemon: alive but no pid - monkeypatch.setattr(admin.ipc, "connect", lambda name, timeout: ("conn", "tok")) - monkeypatch.setattr(admin.ipc, "request", fake_request) - monkeypatch.setattr(admin.ipc, "pid_path", lambda name: pid_path) - monkeypatch.setattr(admin.ipc, "cleanup_endpoint", lambda name: None) - - admin.restart_daemon("default") - - assert shutdown_calls, ( - "restart_daemon must send shutdown IPC to a pre-upgrade daemon even " - "when identify() can't return a PID — otherwise upgrades orphan the " - "old daemon while deleting its socket and pid file." - ) - assert kill_calls == [], ( - f"no os.kill should fire when we don't have a verified PID, " - f"but got: {kill_calls}" - ) - assert not pid_path.exists() - - -def test_restart_daemon_skips_sigterm_if_pid_was_reused_during_wait(monkeypatch, tmp_path): - """A second identify() runs immediately before the SIGTERM. If the daemon - exited and the PID was reused mid-wait, identify() will return None (or a - different PID) and we must NOT signal — that's the PID-reuse race during - the 15s wait window.""" - import signal - - pid_path = tmp_path / "default.pid" - pid_path.write_text("99999") - live_pid = 4242 - - kill_calls = [] - - def fake_kill(pid, sig): - kill_calls.append((pid, sig)) - # All os.kill(pid, 0) probes succeed → loop exhausts → reaches the - # SIGTERM branch. (We're simulating a "wedged" daemon that the wait - # loop can't tell apart from a daemon whose PID got reused.) - - # First identify() call (top of restart_daemon) returns the live PID. - # Second identify() call (right before SIGTERM) returns None — simulating - # the daemon having exited and its PID having been reused by an unrelated - # process. The function must NOT escalate to SIGTERM in that state. - identify_responses = iter([live_pid, None]) - monkeypatch.setattr(admin.os, "kill", fake_kill) - monkeypatch.setattr(admin.ipc, "identify", lambda name, timeout=5.0: next(identify_responses)) - monkeypatch.setattr(admin.ipc, "ping", lambda name, timeout=1.0: True) - monkeypatch.setattr(admin.ipc, "connect", lambda name, timeout: ("conn", "tok")) - monkeypatch.setattr(admin.ipc, "request", lambda conn, tok, msg: {"ok": True}) - monkeypatch.setattr(admin.ipc, "pid_path", lambda name: pid_path) - monkeypatch.setattr(admin.ipc, "cleanup_endpoint", lambda name: None) - # Speed up the wait loop so the test finishes quickly. The loop polls 75 - # times at 0.2s = 15s; with sleep neutralized it runs in microseconds. - monkeypatch.setattr(admin.time, "sleep", lambda _s: None) - - admin.restart_daemon("default") - - sigterms = [(pid, sig) for pid, sig in kill_calls if sig == signal.SIGTERM] - assert sigterms == [], ( - f"restart_daemon issued SIGTERM despite the re-verify identify() " - f"returning None (PID was reused during the 15s wait). Calls: {kill_calls}" - ) - assert not pid_path.exists() - - -def test_restart_daemon_sigterms_via_start_time_fingerprint_when_socket_gone(monkeypatch, tmp_path): - """Slow-shutdown recovery: the daemon's serve() tears down the IPC socket - BEFORE the process exits (the daemon then runs slow cleanup like remote - `stop` PATCH calls that can hang). In that window, identify() returns None - even though the process is still our daemon. SIGTERM must still fire when - the PID's start-time fingerprint hasn't changed since we first identified - it — that's strong evidence of "same process, just slow to exit." - """ - import signal - - pid_path = tmp_path / "default.pid" - pid_path.write_text("99999") - live_pid = 4242 - - kill_calls = [] - - def fake_kill(pid, sig): - kill_calls.append((pid, sig)) - # All os.kill(pid, 0) probes succeed; loop exhausts → SIGTERM gate runs. - - # First identify() returns live_pid. Second identify() returns None — the - # daemon has torn down its IPC during shutdown but the process is still - # finishing up cleanup work, so the start-time fingerprint is unchanged. - identify_responses = iter([live_pid, None]) - # Both _process_start_time() calls return the same fingerprint, signaling - # "still the same process." This is the legitimate-slow-shutdown case. - monkeypatch.setattr(admin, "_process_start_time", lambda pid: "STARTED_AT_X") - monkeypatch.setattr(admin.os, "kill", fake_kill) - monkeypatch.setattr(admin.ipc, "identify", lambda name, timeout=5.0: next(identify_responses)) - monkeypatch.setattr(admin.ipc, "ping", lambda name, timeout=1.0: True) - monkeypatch.setattr(admin.ipc, "connect", lambda name, timeout: ("conn", "tok")) - monkeypatch.setattr(admin.ipc, "request", lambda conn, tok, msg: {"ok": True}) - monkeypatch.setattr(admin.ipc, "pid_path", lambda name: pid_path) - monkeypatch.setattr(admin.ipc, "cleanup_endpoint", lambda name: None) - monkeypatch.setattr(admin.time, "sleep", lambda _s: None) - - admin.restart_daemon("default") - - sigterms = [(pid, sig) for pid, sig in kill_calls if sig == signal.SIGTERM] - assert sigterms == [(live_pid, signal.SIGTERM)], ( - f"slow-shutdown daemon (identify=None but unchanged start-time) must " - f"still receive SIGTERM. signal calls: {kill_calls}" - ) - - -def test_restart_daemon_skips_sigterm_when_start_time_changed_during_wait(monkeypatch, tmp_path): - """If the start-time fingerprint of the original PID has CHANGED, the PID - was reused by another process. Even though identify() also returns None, - we must skip SIGTERM — start-time mismatch is the signal that protects - against killing an unrelated reused-PID process.""" - import signal - - pid_path = tmp_path / "default.pid" - pid_path.write_text("99999") - live_pid = 4242 - - kill_calls = [] - monkeypatch.setattr(admin.os, "kill", lambda pid, sig: kill_calls.append((pid, sig))) - - identify_responses = iter([live_pid, None]) - # First start-time read at top of restart_daemon: "ORIGINAL". - # Second start-time read in the safety gate: "DIFFERENT" — proof of reuse. - start_time_responses = iter(["ORIGINAL", "DIFFERENT"]) - monkeypatch.setattr(admin, "_process_start_time", lambda pid: next(start_time_responses)) - monkeypatch.setattr(admin.ipc, "identify", lambda name, timeout=5.0: next(identify_responses)) - monkeypatch.setattr(admin.ipc, "ping", lambda name, timeout=1.0: True) - monkeypatch.setattr(admin.ipc, "connect", lambda name, timeout: ("conn", "tok")) - monkeypatch.setattr(admin.ipc, "request", lambda conn, tok, msg: {"ok": True}) - monkeypatch.setattr(admin.ipc, "pid_path", lambda name: pid_path) - monkeypatch.setattr(admin.ipc, "cleanup_endpoint", lambda name: None) - monkeypatch.setattr(admin.time, "sleep", lambda _s: None) - - admin.restart_daemon("default") - - sigterms = [(pid, sig) for pid, sig in kill_calls if sig == signal.SIGTERM] - assert sigterms == [], ( - f"start-time mismatch indicates PID reuse — restart_daemon must NOT " - f"SIGTERM. signal calls: {kill_calls}" - ) - - -# --- _process_start_time helper --- - -def test_process_start_time_returns_stable_fingerprint_for_self(): - """The start-time of the current process should be readable on Linux, - macOS, and Windows, and stable across two reads.""" - import os as _os, sys - if sys.platform.startswith("linux") or sys.platform == "darwin" or sys.platform == "win32": - pid = _os.getpid() - first = admin._process_start_time(pid) - second = admin._process_start_time(pid) - assert first is not None, "expected a fingerprint for the current PID" - assert first == second, ( - f"two reads of the same PID should return the same fingerprint; " - f"got {first!r} vs {second!r}" - ) - - -def test_process_start_time_returns_none_for_invalid_pid(): - """Bad inputs (None, 0, negatives, non-int) and PIDs with no live process - must return None rather than raising.""" - for bad in (None, 0, -1, -42, "not-an-int", 1.5, True, False): - assert admin._process_start_time(bad) is None, ( - f"expected None for invalid pid {bad!r}" - ) - # 2**31 - 1 is the largest pid_t; in practice no live process at that PID. - assert admin._process_start_time((1 << 31) - 1) is None diff --git a/packages/bcode-browser/harness/tests/unit/test_daemon.py b/packages/bcode-browser/harness/tests/unit/test_daemon.py deleted file mode 100644 index 90c5bc8550..0000000000 --- a/packages/bcode-browser/harness/tests/unit/test_daemon.py +++ /dev/null @@ -1,295 +0,0 @@ -import asyncio - -from browser_harness import daemon - - -class _FakeCDP: - """Records send_raw calls so tests can assert which CDP methods fired.""" - - def __init__(self): - self.calls = [] # list of (method, params, session_id) - - async def send_raw(self, method, params=None, session_id=None): - self.calls.append((method, params, session_id)) - # Set-session/initial-attach paths only need a benign response. - return {} - - -def _fresh_daemon(): - d = daemon.Daemon() - d.cdp = _FakeCDP() - return d - - -def test_set_session_enables_all_four_default_domains_on_new_session(): - """Regression: switch_tab() / new_tab() in helpers.py route through the - `set_session` IPC, which previously only enabled Page on the new - session. With Network disabled, wait_for_network_idle() silently stops - receiving events after a tab switch. Initial attach enables all four - (Page, DOM, Runtime, Network); set_session must enable the same set.""" - d = _fresh_daemon() - new_session = "session-AFTER-switch" - - asyncio.run(d.handle({ - "meta": "set_session", - "session_id": new_session, - "target_id": "target-2", - })) - - enabled_on_new = [ - method for (method, _params, sid) in d.cdp.calls - if sid == new_session and method.endswith(".enable") - ] - assert set(enabled_on_new) == {"Page.enable", "DOM.enable", "Runtime.enable", "Network.enable"}, ( - f"set_session must enable Page/DOM/Runtime/Network on the new session " - f"(parity with initial attach). Got: {enabled_on_new}" - ) - assert d.session == new_session - assert d.target_id == "target-2" - - -def test_set_session_falls_back_to_existing_target_id_when_not_provided(): - """If a caller forgets target_id (passes None), the daemon should keep its - existing target_id rather than overwriting it with None — otherwise - subsequent calls that depend on self.target_id would break.""" - d = _fresh_daemon() - d.target_id = "original-target" - - asyncio.run(d.handle({ - "meta": "set_session", - "session_id": "session-AFTER", - "target_id": None, - })) - - assert d.target_id == "original-target" - assert d.session == "session-AFTER" - - -def test_enable_default_domains_swallows_errors_per_domain(): - """A single domain failing to enable must not prevent the others from - being attempted — that would leave the daemon in a partially-configured - state. Each Domain.enable call has its own try/except inside the helper.""" - class _PartialFailureCDP(_FakeCDP): - async def send_raw(self, method, params=None, session_id=None): - self.calls.append((method, params, session_id)) - if method == "DOM.enable": - raise RuntimeError("simulated DOM failure") - return {} - - d = daemon.Daemon() - d.cdp = _PartialFailureCDP() - - asyncio.run(d._enable_default_domains("session-X")) - - attempted = [m for (m, _p, _s) in d.cdp.calls] - assert "Page.enable" in attempted - assert "DOM.enable" in attempted # attempted, but raised - assert "Runtime.enable" in attempted - assert "Network.enable" in attempted - - -def test_set_session_disables_network_on_old_session_before_enabling_new(): - """When switching tabs, the previous session's Network domain must be - disabled so background tabs (polling, SSE, etc.) stop emitting events - into the global buffer that wait_for_network_idle reads. Initial attach - has no `old_session` so this disable doesn't fire then.""" - d = _fresh_daemon() - d.session = "session-OLD" - d.target_id = "target-OLD" - - asyncio.run(d.handle({ - "meta": "set_session", - "session_id": "session-NEW", - "target_id": "target-NEW", - })) - - disabled = [ - (method, sid) for (method, _params, sid) in d.cdp.calls - if method == "Network.disable" - ] - assert disabled == [("Network.disable", "session-OLD")], ( - f"Network.disable must fire on the old session before re-enabling on " - f"the new one. Got: {disabled}" - ) - - # Sanity: the new session still gets Network.enable. - enabled_on_new = { - method for (method, _p, sid) in d.cdp.calls - if sid == "session-NEW" and method.endswith(".enable") - } - assert "Network.enable" in enabled_on_new - - -def test_set_session_does_not_disable_network_when_no_previous_session(): - """First set_session call (e.g. very early in startup before any attach) - has no old_session — the Network.disable path must be skipped.""" - d = _fresh_daemon() - d.session = None # no prior attach - - asyncio.run(d.handle({ - "meta": "set_session", - "session_id": "session-FIRST", - "target_id": "target-FIRST", - })) - - disables = [m for (m, _p, _s) in d.cdp.calls if m == "Network.disable"] - assert disables == [], ( - f"Network.disable must not fire when there's no previous session " - f"to disable. Got: {disables}" - ) - - -def test_set_session_runs_disable_and_enables_in_parallel(): - """The four Domain.enable calls (plus Network.disable on the old session) - must run concurrently via asyncio.gather, not sequentially. With the old - sequential code, helpers.switch_tab() would block in _send() for up to - ~22s on a slow/remote daemon while the helper's IPC socket has a 5s - read timeout, causing client-side socket timeouts. Verifying that all - five CDP calls reach send_raw before any returns proves parallelization.""" - class _ConcurrencyProbeCDP: - def __init__(self): - self.calls = [] - self.in_flight = 0 - self.max_concurrent = 0 - self.release = None # asyncio.Event, set inside the test loop - - async def send_raw(self, method, params=None, session_id=None): - self.calls.append((method, params, session_id)) - self.in_flight += 1 - self.max_concurrent = max(self.max_concurrent, self.in_flight) - try: - await self.release.wait() - finally: - self.in_flight -= 1 - return {} - - async def run(): - d = daemon.Daemon() - d.cdp = _ConcurrencyProbeCDP() - d.session = "session-OLD" # ensures Network.disable on old fires - d.cdp.release = asyncio.Event() - - handle_task = asyncio.create_task(d.handle({ - "meta": "set_session", - "session_id": "session-NEW", - "target_id": "target-NEW", - })) - # Yield repeatedly until everything that's going to be in-flight is - # in-flight. Cap iterations to avoid hanging if parallelization breaks. - for _ in range(50): - await asyncio.sleep(0) - # 5 = Network.disable on OLD + 4 enables on NEW. - if d.cdp.in_flight >= 5: - break - peak = d.cdp.max_concurrent - d.cdp.release.set() - await handle_task - return peak, d.cdp.calls - - peak, calls = asyncio.run(run()) - assert peak == 5, ( - f"set_session must run disable + 4 enables concurrently via gather " - f"(observed peak in-flight = {peak}; expected 5 = 1 disable on OLD + " - f"4 enables on NEW). Sequential await would peak at 1." - ) - # Sanity: the right calls were made. - methods = sorted({m for (m, _p, _s) in calls}) - assert "Network.disable" in methods - assert {"Page.enable", "DOM.enable", "Runtime.enable", "Network.enable"}.issubset(methods) - - -def test_set_session_first_attach_runs_four_enables_in_parallel(): - """When there's no previous session, the disable path is skipped — only - the four enables run, still in parallel.""" - class _ConcurrencyProbeCDP: - def __init__(self): - self.calls = [] - self.in_flight = 0 - self.max_concurrent = 0 - self.release = None - - async def send_raw(self, method, params=None, session_id=None): - self.calls.append((method, params, session_id)) - self.in_flight += 1 - self.max_concurrent = max(self.max_concurrent, self.in_flight) - try: - await self.release.wait() - finally: - self.in_flight -= 1 - return {} - - async def run(): - d = daemon.Daemon() - d.cdp = _ConcurrencyProbeCDP() - d.session = None # no previous session - d.cdp.release = asyncio.Event() - - handle_task = asyncio.create_task(d.handle({ - "meta": "set_session", - "session_id": "session-FIRST", - "target_id": "target-FIRST", - })) - for _ in range(50): - await asyncio.sleep(0) - if d.cdp.in_flight >= 4: - break - peak = d.cdp.max_concurrent - d.cdp.release.set() - await handle_task - return peak - - peak = asyncio.run(run()) - assert peak == 4, ( - f"first set_session must run 4 enables concurrently " - f"(observed peak = {peak}). No Network.disable should fire." - ) - - -def test_current_tab_meta_passes_attached_target_id(): - """Regression for issue #304: helpers.current_tab() previously sent - Target.getTargetInfo with no targetId. The daemon strips session_id for - Target.* methods, so the call hit the browser-level connection with empty - params, and Chrome returned info about the *browser* target (empty - url/title) instead of the attached page. The daemon now resolves this - server-side using its tracked target_id.""" - class _TargetInfoCDP(_FakeCDP): - async def send_raw(self, method, params=None, session_id=None): - self.calls.append((method, params, session_id)) - if method == "Target.getTargetInfo": - return {"targetInfo": { - "targetId": params["targetId"], - "url": "https://example.com/", - "title": "Example Domain", - "type": "page", - }} - return {} - - d = daemon.Daemon() - d.cdp = _TargetInfoCDP() - d.target_id = "page-target-abc" - - result = asyncio.run(d.handle({"meta": "current_tab"})) - - assert result == { - "targetId": "page-target-abc", - "url": "https://example.com/", - "title": "Example Domain", - } - # The targetId must be passed through — that's the whole point of the fix. - get_info_calls = [(p, s) for (m, p, s) in d.cdp.calls if m == "Target.getTargetInfo"] - assert get_info_calls == [({"targetId": "page-target-abc"}, None)] - - -def test_current_tab_meta_returns_not_attached_when_no_target_id(): - """Without an attached page, current_tab() has no meaningful answer. - Returning {error: not_attached} causes _send() to raise in helpers, which - is the right signal for callers like ensure_real_tab() that wrap the call - in try/except.""" - d = _fresh_daemon() - d.target_id = None - - result = asyncio.run(d.handle({"meta": "current_tab"})) - - assert result == {"error": "not_attached"} - # No CDP call should have been issued. - assert d.cdp.calls == [] diff --git a/packages/bcode-browser/harness/tests/unit/test_helpers.py b/packages/bcode-browser/harness/tests/unit/test_helpers.py deleted file mode 100644 index 4a45ee07a1..0000000000 --- a/packages/bcode-browser/harness/tests/unit/test_helpers.py +++ /dev/null @@ -1,352 +0,0 @@ -import os -import tempfile -import time -from unittest.mock import patch - -import pytest -from PIL import Image - -from browser_harness import helpers - - -def _run(fake_png, width, height, **kwargs): - fake = lambda method, **_: {"data": fake_png(width, height)} - with patch("browser_harness.helpers.cdp", side_effect=fake), tempfile.TemporaryDirectory() as d: - path = os.path.join(d, "shot.png") - helpers.capture_screenshot(path, **kwargs) - return Image.open(path).size - - -def test_max_dim_downsizes_oversized_image(fake_png): - assert max(_run(fake_png, 4592, 2286, max_dim=1800)) == 1800 - - -def test_max_dim_skips_when_image_already_small(fake_png): - assert _run(fake_png, 800, 400, max_dim=1800) == (800, 400) - - -def test_max_dim_default_is_no_resize(fake_png): - assert _run(fake_png, 4592, 2286) == (4592, 2286) - - -def _seed_skill(tmp_path): - site = tmp_path / "domain-skills" / "example" - site.mkdir(parents=True) - (site / "scraping.md").write_text("hi") - - -def test_goto_url_omits_domain_skills_by_default(tmp_path, monkeypatch): - monkeypatch.delenv("BH_DOMAIN_SKILLS", raising=False) - monkeypatch.setattr(helpers, "AGENT_WORKSPACE", tmp_path) - _seed_skill(tmp_path) - with patch("browser_harness.helpers.cdp", return_value={"frameId": "f"}): - result = helpers.goto_url("https://www.example.com/") - assert result == {"frameId": "f"} - - -def test_goto_url_includes_domain_skills_when_enabled(tmp_path, monkeypatch): - monkeypatch.setenv("BH_DOMAIN_SKILLS", "1") - monkeypatch.setattr(helpers, "AGENT_WORKSPACE", tmp_path) - _seed_skill(tmp_path) - with patch("browser_harness.helpers.cdp", return_value={"frameId": "f"}): - result = helpers.goto_url("https://www.example.com/") - assert result == {"frameId": "f", "domain_skills": ["scraping.md"]} - - -def test_page_info_raises_clear_error_on_js_exception(): - def fake_send(req): - return {} - - def fake_cdp(method, **kwargs): - return { - "result": { - "type": "object", - "subtype": "error", - "description": "ReferenceError: location is not defined", - }, - "exceptionDetails": { - "text": "Uncaught", - "lineNumber": 0, - "columnNumber": 16, - }, - } - - with patch("browser_harness.helpers._send", side_effect=fake_send), \ - patch("browser_harness.helpers.cdp", side_effect=fake_cdp): - with pytest.raises(RuntimeError, match="ReferenceError"): - helpers.page_info() - - -# --- fill_input --- - -def test_fill_input_focuses_types_and_fires_events(): - cdp_calls = [] - js_calls = [] - - def fake_cdp(method, **kwargs): - cdp_calls.append((method, kwargs)) - return {} - - def fake_js(expr, **kwargs): - js_calls.append(expr) - return True # focus call must return True (element found) - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp), \ - patch("browser_harness.helpers.js", side_effect=fake_js): - helpers.fill_input("#my-input", "hello") - - assert any("#my-input" in e for e in js_calls) - key_downs = [m for m, _ in cdp_calls if m == "Input.dispatchKeyEvent"] - assert len(key_downs) > 0 - assert any("input" in e and "change" in e for e in js_calls) - - -def test_fill_input_raises_when_element_not_found(): - def fake_js(expr, **kwargs): - return False # element not found - - with patch("browser_harness.helpers.js", side_effect=fake_js): - with pytest.raises(RuntimeError, match="element not found"): - helpers.fill_input("#missing", "hello") - - -def test_fill_input_clear_first_sends_select_all_then_backspace(): - import sys - - key_events = [] - - def fake_cdp(method, **kwargs): - if method == "Input.dispatchKeyEvent": - key_events.append(kwargs) - return {} - - def fake_js(expr, **kwargs): - return True # element found - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp), \ - patch("browser_harness.helpers.js", side_effect=fake_js): - helpers.fill_input("#inp", "x", clear_first=True) - - # The "a" must be dispatched with the platform-correct modifier (Meta=4 on - # macOS, Ctrl=2 elsewhere). Without the modifier, the field would never get - # selected — it would just receive a literal "a". - expected_mod = 4 if sys.platform == "darwin" else 2 - a_events = [e for e in key_events if e.get("key") == "a"] - assert a_events, "expected an 'a' key event for select-all" - assert all(e.get("modifiers") == expected_mod for e in a_events), \ - f"select-all 'a' must carry modifiers={expected_mod}; got {[e.get('modifiers') for e in a_events]}" - - # Crucial: no `char` event for the "a" — emitting one makes Chrome treat - # Cmd/Ctrl+A as a printable letter instead of a shortcut. - assert not any(e.get("type") == "char" and e.get("text") == "a" for e in key_events), \ - "select-all must not emit a 'char' event with text='a' (would cancel the shortcut)" - - # Backspace still fires (via press_key, which uses keyDown). - keys_down = [e.get("key") for e in key_events if e.get("type") in ("keyDown", "rawKeyDown")] - assert "Backspace" in keys_down - - -def test_fill_input_no_clear_skips_ctrl_a(): - key_events = [] - - def fake_cdp(method, **kwargs): - if method == "Input.dispatchKeyEvent": - key_events.append(kwargs) - return {} - - def fake_js(expr, **kwargs): - return True # element found - - with patch("browser_harness.helpers.cdp", side_effect=fake_cdp), \ - patch("browser_harness.helpers.js", side_effect=fake_js): - helpers.fill_input("#inp", "x", clear_first=False) - - keys_seen = [e.get("key") for e in key_events if e.get("type") == "keyDown"] - assert "Backspace" not in keys_seen - - -# --- wait_for_element --- - -def test_wait_for_element_returns_true_when_found_immediately(): - def fake_js(expr, **kwargs): - return True - - with patch("browser_harness.helpers.js", side_effect=fake_js): - assert helpers.wait_for_element("#target", timeout=2.0) is True - - -def test_wait_for_element_returns_false_on_timeout(): - def fake_js(expr, **kwargs): - return False - - with patch("browser_harness.helpers.js", side_effect=fake_js), \ - patch("browser_harness.helpers.time") as mock_time: - # simulate time advancing past the deadline immediately - start = time.time() - mock_time.time.side_effect = [start, start + 5.0] - mock_time.sleep = lambda _: None - assert helpers.wait_for_element("#missing", timeout=1.0) is False - - -def test_wait_for_element_visible_uses_check_visibility(): - js_exprs = [] - - def fake_js(expr, **kwargs): - js_exprs.append(expr) - return True - - with patch("browser_harness.helpers.js", side_effect=fake_js): - helpers.wait_for_element("#btn", visible=True) - - # Prefers checkVisibility (walks ancestor chain) with a computed-style - # fallback for older Chrome. - assert any("checkVisibility" in e for e in js_exprs) - assert any("getComputedStyle" in e for e in js_exprs) - # must NOT use offsetParent (fails for position:fixed elements) - assert not any("offsetParent" in e for e in js_exprs) - - -def test_wait_for_element_non_visible_uses_simple_check(): - js_exprs = [] - - def fake_js(expr, **kwargs): - js_exprs.append(expr) - return True - - with patch("browser_harness.helpers.js", side_effect=fake_js): - helpers.wait_for_element("#btn", visible=False) - - assert any("querySelector" in e and "offsetParent" not in e for e in js_exprs) - - -# --- wait_for_network_idle --- - -def test_wait_for_network_idle_returns_true_when_no_events(): - call_count = 0 - - def fake_send(req): - nonlocal call_count - call_count += 1 - return {"events": []} - - with patch("browser_harness.helpers._send", side_effect=fake_send), \ - patch("browser_harness.helpers.time") as mock_time: - start = 1000.0 - # first call: not idle yet; second call: idle window elapsed - mock_time.time.side_effect = [start, start, start, start + 0.6, start + 0.6] - mock_time.sleep = lambda _: None - result = helpers.wait_for_network_idle(timeout=5.0, idle_ms=500) - - assert result is True - - -def test_wait_for_network_idle_waits_for_inflight_request(): - # Verifies inflight tracking: must not return True until loadingFinished, - # even though >idle_ms elapses between requestWillBeSent and loadingFinished. - # An event-silence-only implementation would return True at iter2 (wrong). - events_seq = [ - [{"method": "Network.requestWillBeSent", "params": {"requestId": "req1"}}], - [], # >500ms elapsed — old impl returns True here; new must NOT - [{"method": "Network.loadingFinished", "params": {"requestId": "req1"}}], - [], # idle_ms after loadingFinished → return True - ] - idx = 0 - - def fake_send(req): - nonlocal idx - evs = events_seq[min(idx, len(events_seq) - 1)] - idx += 1 - return {"events": evs} - - with patch("browser_harness.helpers._send", side_effect=fake_send), \ - patch("browser_harness.helpers.time") as mock_time: - start = 1000.0 - # inflight non-empty → short-circuit skips time.time() in idle check for iter1/iter2 - mock_time.time.side_effect = [ - start, start, # deadline + last_activity init - start + 0.1, # iter1 while-check - start + 0.1, # iter1 rWS last_activity update - # iter1 idle-check: inflight non-empty → short-circuit - start + 0.7, # iter2 while-check (>500ms since rWS but request still in flight) - # iter2 idle-check: inflight non-empty → short-circuit - start + 0.8, # iter3 while-check - start + 0.8, # iter3 lF last_activity update - start + 0.8, # iter3 idle-check: 0ms < 500 → not idle - start + 1.4, # iter4 while-check - start + 1.4, # iter4 idle-check: 600ms >= 500 → True - ] - mock_time.sleep = lambda _: None - result = helpers.wait_for_network_idle(timeout=5.0, idle_ms=500) - - assert result is True - assert idx == 4 # did not short-circuit at iter2 despite silence > idle_ms - - -def test_wait_for_network_idle_returns_false_on_timeout(): - # Continuous rWS keeps inflight non-empty → idle check short-circuits every iteration. - # time.time() is only called for while-check and rWS last_activity (not idle check). - def fake_send(req): - return {"events": [{"method": "Network.requestWillBeSent", "params": {"requestId": "r"}}]} - - with patch("browser_harness.helpers._send", side_effect=fake_send), \ - patch("browser_harness.helpers.time") as mock_time: - start = 1000.0 - mock_time.time.side_effect = [ - start, start, # deadline + last_activity init - start + 0.1, # iter1 while-check (in deadline) - start + 0.1, # iter1 rWS last_activity update - # iter1 idle-check: inflight non-empty → short-circuit - start + 20.0, # iter2 while-check (past deadline → exit) - ] - mock_time.sleep = lambda _: None - result = helpers.wait_for_network_idle(timeout=10.0, idle_ms=500) - - assert result is False - - - -def test_wait_for_network_idle_filters_events_to_active_session(): - """Background tabs (e.g. a polling page the agent switched away from) keep - emitting Network events into the daemon's global buffer. The wait must - filter by session_id of the currently-attached tab — otherwise it would - see the background tab's traffic and either fail to return idle or wait - on the wrong tab's requests.""" - active = "session-ACTIVE" - background = "session-BACKGROUND" - - # First /drain_events/ payload: rWS + lF on the BACKGROUND session that we - # must ignore, plus zero events on the active session. With filtering, the - # active session sees no traffic and the idle window can elapse. - events_seq = [ - [ - {"session_id": background, "method": "Network.requestWillBeSent", "params": {"requestId": "bg1"}}, - {"session_id": background, "method": "Network.loadingFinished", "params": {"requestId": "bg1"}}, - ], - [], # second drain — quiet on both sessions; idle window should fire here - ] - drain_idx = 0 - - def fake_send(req): - nonlocal drain_idx - if req.get("meta") == "session": - return {"session_id": active} - if req.get("meta") == "drain_events": - evs = events_seq[min(drain_idx, len(events_seq) - 1)] - drain_idx += 1 - return {"events": evs} - return {} - - with patch("browser_harness.helpers._send", side_effect=fake_send), \ - patch("browser_harness.helpers.time") as mock_time: - start = 1000.0 - # No inflight on active session → idle check uses time.time(). - mock_time.time.side_effect = [start, start, start, start + 0.6, start + 0.6] - mock_time.sleep = lambda _: None - result = helpers.wait_for_network_idle(timeout=5.0, idle_ms=500) - - assert result is True, ( - "wait_for_network_idle must return True even when the BACKGROUND " - "session is busy, as long as the ACTIVE session is idle. Without the " - "session filter, the background rWS/lF pair would have updated " - "last_activity and prevented the idle window from elapsing." - ) diff --git a/packages/bcode-browser/harness/tests/unit/test_ipc.py b/packages/bcode-browser/harness/tests/unit/test_ipc.py deleted file mode 100644 index 96e2dbc699..0000000000 --- a/packages/bcode-browser/harness/tests/unit/test_ipc.py +++ /dev/null @@ -1,107 +0,0 @@ -from browser_harness import _ipc as ipc - - -# --- identify(): ping payload sanitation --- - -class _FakeConn: - def close(self): pass - - -def _patch_identify_response(monkeypatch, response): - """Stub connect() and request() so identify() sees `response` as the JSON - parsed from the daemon's reply, exactly as it would arrive over the wire.""" - monkeypatch.setattr(ipc, "connect", lambda name, timeout=1.0: (_FakeConn(), "tok")) - monkeypatch.setattr(ipc, "request", lambda conn, tok, msg: response) - - -def test_identify_returns_pid_for_well_formed_ping_reply(monkeypatch): - _patch_identify_response(monkeypatch, {"pong": True, "pid": 4242}) - - assert ipc.identify("default", timeout=0.0) == 4242 - - -def test_identify_rejects_boolean_pid(monkeypatch): - """isinstance(True, int) is True in Python; a hostile or buggy daemon - that replies {"pid": True} would otherwise yield PID 1 (init on POSIX), - which os.kill(1, SIGTERM) would target. Reject it explicitly.""" - _patch_identify_response(monkeypatch, {"pong": True, "pid": True}) - - assert ipc.identify("default", timeout=0.0) is None - - -def test_identify_rejects_boolean_false_pid(monkeypatch): - """False is also an int subclass and would yield PID 0.""" - _patch_identify_response(monkeypatch, {"pong": True, "pid": False}) - - assert ipc.identify("default", timeout=0.0) is None - - -def test_identify_returns_none_when_pid_field_missing(monkeypatch): - """Pre-upgrade daemons reply {pong: True} only — no pid. identify must - return None so callers know they have no verified PID to signal, while - still letting alive-checks via ipc.ping() succeed.""" - _patch_identify_response(monkeypatch, {"pong": True}) - - assert ipc.identify("default", timeout=0.0) is None - - -def test_identify_handles_non_dict_ping_payload(monkeypatch): - """request() can deserialize any valid JSON value. A stale or hostile - endpoint replying with a list / scalar / null would crash a naive - resp.get() with AttributeError; identify must absorb that and return None.""" - for payload in ([1, 2, 3], "hello", 42, None): - _patch_identify_response(monkeypatch, payload) - assert ipc.identify("default", timeout=0.0) is None, ( - f"identify() should reject non-dict ping payload: {payload!r}" - ) - - -def test_identify_returns_none_when_pong_is_not_true(monkeypatch): - _patch_identify_response(monkeypatch, {"pong": False, "pid": 4242}) - - assert ipc.identify("default", timeout=0.0) is None - - -def test_identify_rejects_zero_and_negative_pids(monkeypatch): - """os.kill semantics on POSIX: pid=0 signals every process in the calling - process group; pid=-1 signals every process the caller can; pid<-1 signals - the corresponding process group. None of these are valid daemon PIDs and - forwarding any of them to os.kill would be catastrophic.""" - for bad_pid in (0, -1, -42, -99999): - _patch_identify_response(monkeypatch, {"pong": True, "pid": bad_pid}) - assert ipc.identify("default", timeout=0.0) is None, ( - f"identify() must reject non-positive pid {bad_pid!r}" - ) - - -# --- ping(): same payload sanitation --- - -def _patch_ping_response(monkeypatch, response): - monkeypatch.setattr(ipc, "connect", lambda name, timeout=1.0: (_FakeConn(), "tok")) - monkeypatch.setattr(ipc, "request", lambda conn, tok, msg: response) - - -def test_ping_returns_true_for_well_formed_pong(monkeypatch): - _patch_ping_response(monkeypatch, {"pong": True}) - - assert ipc.ping("default", timeout=0.0) is True - - -def test_ping_handles_non_dict_payload(monkeypatch): - """Same regression class as identify(): if a stale or hostile endpoint - replies with a list / scalar / null, ping() must return False rather than - raising AttributeError on resp.get(). restart_daemon() now calls ping() on - the fallback path, so an unhandled raise here would abort cleanup.""" - for payload in ([1, 2, 3], "hello", 42, None): - _patch_ping_response(monkeypatch, payload) - assert ipc.ping("default", timeout=0.0) is False, ( - f"ping() should reject non-dict payload: {payload!r}" - ) - - -def test_ping_returns_false_when_pong_field_is_missing_or_not_true(monkeypatch): - for resp in ({}, {"pong": False}, {"pong": "yes"}, {"pong": 1}): - _patch_ping_response(monkeypatch, resp) - assert ipc.ping("default", timeout=0.0) is False, ( - f"ping() should require pong is exactly True; got: {resp!r}" - ) diff --git a/packages/bcode-browser/harness/tests/unit/test_run.py b/packages/bcode-browser/harness/tests/unit/test_run.py deleted file mode 100644 index abf559cf71..0000000000 --- a/packages/bcode-browser/harness/tests/unit/test_run.py +++ /dev/null @@ -1,186 +0,0 @@ -import sys -from io import StringIO -from unittest.mock import patch - -from browser_harness import run - - -def test_c_flag_executes_code(): - stdout = StringIO() - with patch.object(sys, "argv", ["browser-harness", "-c", "print('hello from -c')"]), \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"), \ - patch("sys.stdout", stdout): - run.main() - assert stdout.getvalue().strip() == "hello from -c" - - -def test_cloud_bootstrap_on_headless_server(monkeypatch): - """No daemon, no local Chrome, API key + BU_AUTOSPAWN set -> auto-provision cloud daemon.""" - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=False), \ - patch("browser_harness.run._local_chrome_listening", return_value=False), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_called_once() - - -def test_explicit_bu_cdp_url_blocks_cloud_bootstrap(monkeypatch): - """BU_CDP_URL is documented to override local Chrome discovery (install.md:58-59), - so it must also block cloud auto-bootstrap. Otherwise start_remote_daemon would - overwrite BU_CDP_WS in the daemon env and silently bill the user for a cloud - browser instead of attaching to their explicit endpoint.""" - monkeypatch.setenv("BU_CDP_URL", "http://127.0.0.1:9333") - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=False), \ - patch("browser_harness.run._local_chrome_listening", return_value=False), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_not_called() - - -def test_explicit_bu_cdp_ws_blocks_cloud_bootstrap(monkeypatch): - """Same precedence guarantee for BU_CDP_WS — install.md:58 promises it overrides - local Chrome discovery for remote browsers, so cloud auto-bootstrap must defer - to the explicit WebSocket endpoint the caller already chose.""" - monkeypatch.setenv("BU_CDP_WS", "ws://example.test/devtools/browser/abc") - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=False), \ - patch("browser_harness.run._local_chrome_listening", return_value=False), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_not_called() - - -def test_empty_bu_cdp_url_does_not_block_bootstrap(monkeypatch): - """An env var set to empty string is conventionally treated as unset; the helper - must not let `BU_CDP_URL=""` accidentally suppress cloud bootstrap on the headless - fresh-box path #277 explicitly preserved.""" - monkeypatch.setenv("BU_CDP_URL", "") - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=False), \ - patch("browser_harness.run._local_chrome_listening", return_value=False), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_called_once() - - -def test_both_bu_cdp_url_and_bu_cdp_ws_set_blocks_bootstrap(monkeypatch): - """When the caller has BOTH endpoints configured (e.g. a parent agent that probes - BU_CDP_URL first and falls back to a known BU_CDP_WS), bootstrap must still defer - — the user has been doubly explicit about their intent.""" - monkeypatch.setenv("BU_CDP_URL", "http://127.0.0.1:9333") - monkeypatch.setenv("BU_CDP_WS", "ws://example.test/devtools/browser/abc") - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=False), \ - patch("browser_harness.run._local_chrome_listening", return_value=False), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_not_called() - - -def test_explicit_endpoint_does_not_break_daemon_alive_short_circuit(monkeypatch): - """daemon_alive=True must continue to short-circuit auto-bootstrap regardless of - whether an explicit endpoint is configured — re-using a live daemon was the - pre-existing fast path and the precedence guard must not regress it.""" - monkeypatch.setenv("BU_CDP_URL", "http://127.0.0.1:9333") - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=True), \ - patch("browser_harness.run._local_chrome_listening", return_value=False), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_not_called() - - -def test_explicit_endpoint_does_not_break_local_chrome_short_circuit(monkeypatch): - """If a local Chrome is already listening on 9222/9223 the bootstrap must skip - even when the user *also* set an explicit endpoint pointing somewhere else. - The auto-bootstrap path is for cloud only; routing between local-default and - explicit-non-default endpoints is handled later in daemon.py:get_ws_url().""" - monkeypatch.setenv("BU_CDP_URL", "http://127.0.0.1:9333") - monkeypatch.setenv("BROWSER_USE_API_KEY", "test-key") - monkeypatch.setenv("BU_AUTOSPAWN", "1") - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.daemon_alive", return_value=False), \ - patch("browser_harness.run._local_chrome_listening", return_value=True), \ - patch("browser_harness.run.start_remote_daemon") as mock_start, \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"): - run.main() - mock_start.assert_not_called() - - -def test_explicit_cdp_configured_helper_truthy(monkeypatch): - """Direct unit test of the helper: any non-empty BU_CDP_URL or BU_CDP_WS must - return True so the bootstrap guard reads as 'caller has been explicit'.""" - for name, value in [ - ("BU_CDP_URL", "http://127.0.0.1:9333"), - ("BU_CDP_WS", "ws://example.test/devtools/browser/abc"), - ("BU_CDP_URL", "http://[::1]:9333"), # IPv6 host - ("BU_CDP_WS", "wss://cloud.example.com/devtools/browser/x"), # secure WS - ]: - monkeypatch.delenv("BU_CDP_URL", raising=False) - monkeypatch.delenv("BU_CDP_WS", raising=False) - monkeypatch.setenv(name, value) - assert run._explicit_cdp_configured() is True, f"{name}={value!r} should be truthy" - - -def test_explicit_cdp_configured_helper_falsy(monkeypatch): - """Helper must return False for unset, empty-string, or both-unset cases — - those are all 'caller has not chosen an endpoint' from the bootstrap's POV.""" - monkeypatch.delenv("BU_CDP_URL", raising=False) - monkeypatch.delenv("BU_CDP_WS", raising=False) - assert run._explicit_cdp_configured() is False, "both unset" - monkeypatch.setenv("BU_CDP_URL", "") - assert run._explicit_cdp_configured() is False, "BU_CDP_URL empty string" - monkeypatch.delenv("BU_CDP_URL", raising=False) - monkeypatch.setenv("BU_CDP_WS", "") - assert run._explicit_cdp_configured() is False, "BU_CDP_WS empty string" - - -def test_local_chrome_listening_rejects_non_chrome(): - """A bare TCP listener on 9222/9223 must not fool the probe — only a real - /json/version response counts as Chrome.""" - with patch("browser_harness.run.urllib.request.urlopen", side_effect=OSError): - assert run._local_chrome_listening() is False - with patch("browser_harness.run.urllib.request.urlopen") as mock_open: - assert run._local_chrome_listening() is True - mock_open.assert_called_once() - - -def test_c_flag_does_not_read_stdin(): - stdin_read = [] - fake_stdin = StringIO("should not be read") - fake_stdin.read = lambda: stdin_read.append(True) or "" - - with patch.object(sys, "argv", ["browser-harness", "-c", "x = 1"]), \ - patch("browser_harness.run.ensure_daemon"), \ - patch("browser_harness.run.print_update_banner"), \ - patch("sys.stdin", fake_stdin): - run.main() - - assert not stdin_read, "stdin should not be read when -c is passed" diff --git a/packages/bcode-browser/package.json b/packages/bcode-browser/package.json index 7f54920463..67d80eefcc 100644 --- a/packages/bcode-browser/package.json +++ b/packages/bcode-browser/package.json @@ -2,12 +2,13 @@ "$schema": "https://json.schemastore.org/package.json", "version": "0.0.0", "name": "@browser-use/bcode-browser", - "description": "BrowserCode Level-1 code: vendored browser-harness, FetchUse service, cloud integrations, tool implementations", + "description": "BrowserCode Level-1 code: in-process CDP harness, browser_execute, embedded skills", "type": "module", "license": "MIT", "private": true, "scripts": { - "typecheck": "tsgo --noEmit" + "typecheck": "tsgo --noEmit", + "cdp:gen": "bun src/cdp/gen.ts" }, "exports": { "./*": "./src/*.ts" diff --git a/packages/bcode-browser/script/embed-harness.ts b/packages/bcode-browser/script/embed-harness.ts deleted file mode 100644 index 6d65de1055..0000000000 --- a/packages/bcode-browser/script/embed-harness.ts +++ /dev/null @@ -1,71 +0,0 @@ -// Embeds the vendored harness into the compiled bcode binary. -// -// The build script (`packages/opencode/script/build.ts`) calls -// `createEmbeddedHarnessBundle()` and plumbs the result into -// `Bun.build({ files: { "bcode-harness.gen.ts": } })`. The generated -// virtual module exports `{ "": "" }` for every harness file -// plus a content-hash `buildHash` used as the on-disk extraction sentinel. -// `harness.ts` reads it in compiled mode and extracts the files to -// `/harness/` on session start, skipping when the sentinel matches. -// -// The walk is glob-driven (not hand-enumerated): when skill files leave the -// repo for the cloud-fetch architecture (decisions.md §4.7) the embed shrinks -// automatically with no script change. Excludes mirror `harness/.gitignore` -// so local artifacts (`.venv/`, `__pycache__/`, `*.egg-info/`, etc.) never -// land in the binary. - -import crypto from "crypto" -import fs from "fs/promises" -import path from "path" -import { fileURLToPath } from "url" - -const __dirname = path.dirname(fileURLToPath(import.meta.url)) -const HARNESS_DIR = path.resolve(__dirname, "..", "harness") - -const ignored = [ - new Bun.Glob("**/__pycache__/**"), - new Bun.Glob("**/.venv/**"), - new Bun.Glob("**/*.egg-info/**"), - new Bun.Glob("**/*.pyc"), - new Bun.Glob("**/*.log"), - new Bun.Glob("**/.env"), - new Bun.Glob("**/uv.lock"), -] - -// SHA-256 over (rel + NUL + content) for each file in sorted order. Stable -// across builds when content is identical, so warm launches skip extraction. -const computeBuildHash = async (files: string[]) => { - const hash = crypto.createHash("sha256") - for (const rel of files) { - hash.update(rel) - hash.update("\0") - hash.update(await fs.readFile(path.join(HARNESS_DIR, rel))) - } - return hash.digest("hex") -} - -export const createEmbeddedHarnessBundle = async (buildCwd: string) => { - console.log("Embedding harness files into the binary") - const files = (await Array.fromAsync(new Bun.Glob("**/*").scan({ cwd: HARNESS_DIR, dot: true }))) - .map((file) => file.replaceAll("\\", "/")) - .filter((file) => !ignored.some((g) => g.match(file))) - .sort() - - console.log(`Embedding ${files.length} harness files`) - const buildHash = await computeBuildHash(files) - - const imports = files.map((file, i) => { - const spec = path.relative(buildCwd, path.join(HARNESS_DIR, file)).replaceAll("\\", "/") - return `import file_${i} from ${JSON.stringify(spec.startsWith(".") ? spec : `./${spec}`)} with { type: "file" };` - }) - const entries = files.map((file, i) => ` ${JSON.stringify(file)}: file_${i},`) - return [ - `// Auto-generated by packages/bcode-browser/script/embed-harness.ts`, - `// Maps "" -> bunfs path for every embedded harness file.`, - ...imports, - `export const buildHash = ${JSON.stringify(buildHash)}`, - `export default {`, - ...entries, - `} as Record`, - ].join("\n") -} diff --git a/packages/bcode-browser/script/embed-skills.ts b/packages/bcode-browser/script/embed-skills.ts new file mode 100644 index 0000000000..770ee233d6 --- /dev/null +++ b/packages/bcode-browser/script/embed-skills.ts @@ -0,0 +1,59 @@ +// Embeds the browsercode-owned skills tree into the compiled bcode binary. +// +// The build script (`packages/opencode/script/build.ts`) calls +// `createEmbeddedSkillsBundle()` and plumbs the result into +// `Bun.build({ files: { "bcode-skills.gen.ts": } })`. The generated +// virtual module exports `{ "": "" }` for every file plus a +// content-hash `buildHash` used as the on-disk extraction sentinel. +// `skills.ts` reads it in compiled mode and extracts the files to +// `/skills/`, skipping when the sentinel matches. +// +// All skills are bcode-shipped read-only baseline — overwritten on every +// upgrade. (Phase H hard rule #3: there is no agent-editable surface here. +// The agent's editable surface is `/.bcode/agent-workspace/`, +// per-project, which never lands in the binary.) + +import crypto from "crypto" +import fs from "fs/promises" +import path from "path" +import { fileURLToPath } from "url" + +const __dirname = path.dirname(fileURLToPath(import.meta.url)) +const SKILLS_DIR = path.resolve(__dirname, "..", "skills") + +// SHA-256 over (rel + NUL + content) for each file in sorted order. Stable +// across builds when content is identical, so warm launches skip extraction. +const computeBuildHash = async (files: string[]) => { + const hash = crypto.createHash("sha256") + for (const rel of files) { + hash.update(rel) + hash.update("\0") + hash.update(await fs.readFile(path.join(SKILLS_DIR, rel))) + } + return hash.digest("hex") +} + +export const createEmbeddedSkillsBundle = async (buildCwd: string) => { + console.log("Embedding skills files into the binary") + const files = (await Array.fromAsync(new Bun.Glob("**/*").scan({ cwd: SKILLS_DIR }))) + .map((file) => file.replaceAll("\\", "/")) + .sort() + + console.log(`Embedding ${files.length} skills files`) + const buildHash = await computeBuildHash(files) + + const imports = files.map((file, i) => { + const spec = path.relative(buildCwd, path.join(SKILLS_DIR, file)).replaceAll("\\", "/") + return `import file_${i} from ${JSON.stringify(spec.startsWith(".") ? spec : `./${spec}`)} with { type: "file" };` + }) + const entries = files.map((file, i) => ` ${JSON.stringify(file)}: file_${i},`) + return [ + `// Auto-generated by packages/bcode-browser/script/embed-skills.ts`, + `// Maps "" -> bunfs path for every embedded skill file.`, + ...imports, + `export const buildHash = ${JSON.stringify(buildHash)}`, + `export default {`, + ...entries, + `} as Record`, + ].join("\n") +} diff --git a/packages/bcode-browser/skills/BROWSER.md b/packages/bcode-browser/skills/BROWSER.md new file mode 100644 index 0000000000..9a77d845d3 --- /dev/null +++ b/packages/bcode-browser/skills/BROWSER.md @@ -0,0 +1,162 @@ +# BROWSER.md — driving a real browser with `browser_execute` + +Use the `browser_execute` tool to run JavaScript against a connected browser via the Chrome DevTools Protocol. The snippet runs in-process; `session` is bound to a long-lived CDP `Session` that persists across calls within the same bcode session. You connect once, drive many. + +**Locations:** + +- Workspace (read/write your reusable scripts): `/.bcode/agent-workspace/`. The bcode CLI runs from the project root, so `./.bcode/agent-workspace/foo.ts` works directly with the `read`/`write`/`edit` tools. +- Skills (read-only reference docs): `{{SKILLS_DIR}}/`. Run `read {{SKILLS_DIR}}/interaction-skills/` to list every available interaction skill before reading any one of them. + +## The model in one paragraph + +`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet. Same mechanism for a 5-line wrapper and a 500-line script. Skills under `{{SKILLS_DIR}}/` are documentation you `read`, not modules you `import` — they teach you the CDP patterns; you write the code. + +## Connecting + +You always call `session.connect(...)` once at the start of your work. The `Session` is fresh on the first `browser_execute` call of an opencode session; subsequent calls reuse it. Three connection methods, in order of preference for typical tasks: + +**Way 1 — connect to the user's running Chrome (real profile, popup-gated).** Best when the task involves the user's actual logged-in sites. + +```js +// Auto-detect the most-recently-launched Chrome with remote debugging enabled. +await session.connect() +``` + +The user must have ticked "Allow remote debugging for this browser instance" once at `chrome://inspect/#remote-debugging` (sticky per-profile), and on Chrome 144+ click "Allow" on the in-browser popup at first attach. If `connect()` fails with a 403/permission message, ask the user to do this. To wait for the click instead of erroring fast, pass `{ profileDir: "/abs/path", timeoutMs: 30000 }`. + +**Way 2 — connect to a Chrome you (or the user) launched with a debug port (isolated profile, no popups).** Best for unattended automation. + +```bash +# User runs this once (or you run it via the `bash` tool): +google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/bcode-chrome +``` + +```js +await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser" }) +// or, if you know the profile dir: +await session.connect({ profileDir: "/tmp/bcode-chrome" }) +``` + +The `--user-data-dir` must NOT be Chrome's platform default (`%LOCALAPPDATA%\Google\Chrome\User Data` on Windows, `~/Library/Application Support/Google/Chrome` on macOS, `~/.config/google-chrome` on Linux) — Chrome 136+ silently no-ops the port flag in that case. + +**Way 3 — provision and connect to a Browser Use cloud browser.** Best when the user can't see the browser, you need a clean profile, geo-located proxy, or fingerprint isolation. Read `{{SKILLS_DIR}}/cloud-browser.md` for the full pattern (provision, stop, swap profile/proxy). Briefly: + +```js +const r = await fetch("https://api.browser-use.com/api/v3/browsers", { + method: "POST", + headers: { "X-Browser-Use-API-Key": process.env.BROWSER_USE_API_KEY, "Content-Type": "application/json" }, + body: "{}", +}) +const body = await r.json() +const id = body.id +const cdpUrl = body.cdp_url ?? body.cdpUrl // BU returns snake_case in some regions, camelCase in others +const liveUrl = body.live_url ?? body.liveUrl +await session.connect({ wsUrl: cdpUrl }) +console.log("liveUrl for the user to watch:", liveUrl) +``` + +Requires `BROWSER_USE_API_KEY` in the environment (the user should have set this before launching bcode). If absent, tell the user to get a key at https://browser-use.com and `export BROWSER_USE_API_KEY=...`. + +## Attaching to a target + +After `connect()`, attach to a page target before driving the browser: + +```js +const targets = (await session.Target.getTargets({})).targetInfos +const page = targets.find(t => t.type === "page" && !t.url.startsWith("chrome://")) +await session.use(page.targetId) +``` + +`session.use(targetId)` makes subsequent calls auto-route to that target. Switch with another `session.use`. + +## Driving a page + +Domain methods follow `session..(params)` and return Promises. The full surface (652 commands) is the Chrome DevTools Protocol — see https://chromedevtools.github.io/devtools-protocol/. + +Common moves: + +```js +// Navigate. +await session.Page.enable() +await session.Page.navigate({ url: "https://example.com" }) +await session.waitFor("Page.loadEventFired") + +// Evaluate JS in the page. +const r = await session.Runtime.evaluate({ + expression: "document.title", + returnByValue: true, +}) +console.log(r.result.value) + +// Click by coordinates. +const x = 200, y = 300 +await session.Input.dispatchMouseEvent({ type: "mouseMoved", x, y }) +await session.Input.dispatchMouseEvent({ type: "mousePressed", x, y, button: "left", clickCount: 1 }) +await session.Input.dispatchMouseEvent({ type: "mouseReleased", x, y, button: "left", clickCount: 1 }) + +// Type text. +await session.Input.insertText({ text: "hello" }) + +// Screenshot. +const { data } = await session.Page.captureScreenshot({ format: "png" }) +// data is base64; write with the `write` tool or process in JS. +``` + +For the full menu of UI mechanics — dropdowns, dialogs, iframes, shadow DOM, uploads, scrolling, screenshots-with-highlights — list `{{SKILLS_DIR}}/interaction-skills/` to see all available topics, then read the relevant one. + +## Switching browsers mid-session + +You own the connection. To swap: + +```js +await session.close() +await session.connect({ /* new opts */ }) +``` + +Cloud cleanup is your responsibility — if you're done with a cloud browser, stop it explicitly (see `{{SKILLS_DIR}}/cloud-browser.md` for the PATCH call). Otherwise it persists until your API quota or BU's idle timer reclaims it. + +## Reusing code: write to the workspace, import from snippet + +The agent-workspace is per-project: `./.bcode/agent-workspace/`. It's a directory of `.ts` files you own and edit with the standard `write`/`edit` tools — flat for small projects, organized into subdirectories (`scrape/`, `auth/`, `cloud/`, …) when you accumulate enough scripts that grouping helps. Imports work at any depth; pick whatever layout makes the project easiest to navigate. Saved scripts travel with the project (`.bcode/agent-workspace/` is committed by default), so `git clone && cd && bcode` shares them. + +Write once, import many: + +```ts +// ./.bcode/agent-workspace/scrape_titles.ts (you write this with the `write` tool) +export async function run(session: any, urls: string[]) { + const titles: string[] = [] + await session.Page.enable() + for (const url of urls) { + await session.Page.navigate({ url }) + await session.waitFor("Page.loadEventFired") + const r = await session.Runtime.evaluate({ expression: "document.title", returnByValue: true }) + titles.push(r.result.value) + } + return titles +} +``` + +```js +// later snippet (browser_execute call) — construct the absolute path from cwd. +const path = process.cwd() + "/.bcode/agent-workspace/scrape_titles.ts" +const m = await import(`${path}?t=${Date.now()}`) +const titles = await m.run(session, ["https://example.com", "https://example.org"]) +console.log(JSON.stringify(titles)) +``` + +Cache-bust (`?t=${Date.now()}`) is your responsibility: without it, edits to the file won't be picked up. The pattern is the same for any depth — save to `subdir/foo.ts`, import by full path. + +## Guardrails + +- **Top-level `import`** statements inside the snippet body are **not allowed** — the snippet is wrapped in an async function. Use `await import(...)` instead. +- **No CPU-bound infinite loops without `await`.** JS Promises aren't preemptively cancellable; a `for (;;)` without an `await` yield-point will not respect the timeout. Insert `await new Promise(r => setTimeout(r, 0))` if you genuinely need a long compute loop. +- `console.log`, `console.error`, `console.warn`, `console.info`, `console.debug` are all captured and streamed to the user. Treat them as your stdout. Other `console.*` methods (`table`, `dir`, `trace`, …) work but write to bcode's stderr without being captured into the tool result. +- The snippet's `return` value is captured separately (JSON-serialized when possible). + +## When something doesn't work + +- **`session.Page.navigate` hangs forever** → the page is showing a native dialog. Use `session.Page.handleJavaScriptDialog({ accept: true })` to dismiss. +- **Selectors don't find elements that you can see** → likely an iframe or shadow DOM. Read `{{SKILLS_DIR}}/interaction-skills/iframes.md` or `shadow-dom.md`. +- **Actions silently no-op** → the page is mid-load. After `Page.navigate`, await `session.waitFor("Page.loadEventFired")` before driving inputs. +- **Connection refused or 403 on connect()** → Chrome wasn't started with `--remote-debugging-port`, or the user hasn't clicked "Allow" on the remote-debugging prompt. Pass `{ profileDir, timeoutMs: 30000 }` to wait for the click, or fall back to Way 2. +- **Cloud `connect()` fails after a successful provision** → check that `cdp_url` came back in the POST response; some BU regions return `cdpUrl` (camelCase) — accept both. See `{{SKILLS_DIR}}/cloud-browser.md`. diff --git a/packages/bcode-browser/skills/cloud-browser.md b/packages/bcode-browser/skills/cloud-browser.md new file mode 100644 index 0000000000..6b1806f50a --- /dev/null +++ b/packages/bcode-browser/skills/cloud-browser.md @@ -0,0 +1,145 @@ +# cloud-browser.md — Browser Use cloud browser via raw HTTP + +When BROWSER.md sent you here, the user wants a Browser Use cloud browser (Way 3): a clean isolated Chrome on BU's infrastructure, optionally with a geo-located proxy or a synced profile, with a `liveUrl` the user can open to watch you work. + +There is no `browser_open_cloud` tool. You write the HTTP calls yourself in a `browser_execute` snippet. This keeps the connection model symmetric (you also call `session.connect()` for local browsers in Way 1 and Way 2) and gives you full control over the BU API surface — provision, stop, swap profiles, change proxies, anything BU exposes. + +## Authentication + +Every call to `https://api.browser-use.com/...` requires an API key in the `X-Browser-Use-API-Key` header. The key lives in the environment as `BROWSER_USE_API_KEY` (the user is expected to `export` it before launching bcode, the same way they'd set `AWS_BEDROCK_ACCESS_KEY_ID` for an LLM provider). + +Read it once, fail clearly if missing: + +```js +const apiKey = process.env.BROWSER_USE_API_KEY +if (!apiKey) { + throw new Error("BROWSER_USE_API_KEY is not set. Get a key at https://browser-use.com and re-launch bcode with the key exported.") +} +``` + +## Provision + +```js +const r = await fetch("https://api.browser-use.com/api/v3/browsers", { + method: "POST", + headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" }, + body: JSON.stringify({ + // All optional — omit for an ephemeral fresh-profile browser with no proxy. + // profile_id: "", // attach an existing BU profile + // proxy_country_code: "us", // geo-located proxy + }), +}) +if (!r.ok) throw new Error(`provision failed: ${r.status} ${await r.text()}`) +const body = await r.json() +// Some BU regions return camelCase, others snake_case. Accept both. +const id = body.id +const cdpUrl = body.cdp_url ?? body.cdpUrl +const liveUrl = body.live_url ?? body.liveUrl +``` + +The `liveUrl` is a viewer URL the user can open in their own browser to watch the cloud browser's pixels. **Print it to console** so the user can click it: + +```js +console.log("Cloud browser ready. Live view:", liveUrl) +``` + +Stash `id` somewhere (a `globalThis.cloudBrowserId = id` is fine, or the snippet's return value) — you need it to stop the browser later. + +## Connect + +```js +await session.connect({ wsUrl: cdpUrl }) +const targets = (await session.Target.getTargets({})).targetInfos +const page = targets.find(t => t.type === "page") +await session.use(page.targetId) +``` + +From here on `session..(...)` drives the cloud browser exactly like a local Chrome. + +## Stop + +When you're done, stop the browser. BU's quotas and idle reclaim will eventually clean it up if you forget, but explicit stop is faster and frees the slot: + +```js +await fetch(`https://api.browser-use.com/api/v3/browsers/${id}`, { + method: "PATCH", + headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" }, + body: JSON.stringify({ state: "stop" }), +}) +``` + +If you'll do this often within one project, save it as `./.bcode/agent-workspace/cloud.ts` (see BROWSER.md "Reusing code") and import it from later snippets. + +## Swap + +To switch from one cloud browser to another (e.g. different proxy country) within the same opencode session: + +```js +// Stop the old one first. +await fetch(`https://api.browser-use.com/api/v3/browsers/${oldId}`, { + method: "PATCH", + headers: { "X-Browser-Use-API-Key": apiKey, "Content-Type": "application/json" }, + body: JSON.stringify({ state: "stop" }), +}) + +// Close the local Session's WS so connect() opens a fresh one. +await session.close() + +// Provision and connect to the new one (provision block above, with new params). +``` + +## A reusable workspace helper + +Recommended pattern for any project that uses cloud browsers more than once: + +```ts +// ./.bcode/agent-workspace/cloud.ts +const API = "https://api.browser-use.com/api/v3/browsers" +const key = () => { + const k = process.env.BROWSER_USE_API_KEY + if (!k) throw new Error("BROWSER_USE_API_KEY is not set.") + return k +} + +export async function provision(opts: { profileId?: string; proxyCountryCode?: string } = {}) { + const r = await fetch(API, { + method: "POST", + headers: { "X-Browser-Use-API-Key": key(), "Content-Type": "application/json" }, + body: JSON.stringify({ + profile_id: opts.profileId, + proxy_country_code: opts.proxyCountryCode, + }), + }) + if (!r.ok) throw new Error(`provision failed: ${r.status} ${await r.text()}`) + const body = await r.json() + return { + id: body.id as string, + cdpUrl: (body.cdp_url ?? body.cdpUrl) as string, + liveUrl: (body.live_url ?? body.liveUrl) as string, + } +} + +export async function stop(id: string) { + const r = await fetch(`${API}/${id}`, { + method: "PATCH", + headers: { "X-Browser-Use-API-Key": key(), "Content-Type": "application/json" }, + body: JSON.stringify({ state: "stop" }), + }) + if (!r.ok) throw new Error(`stop failed: ${r.status} ${await r.text()}`) +} +``` + +Then any snippet does: + +```js +const { provision, stop } = await import(`${process.cwd()}/.bcode/agent-workspace/cloud.ts?t=${Date.now()}`) +const { id, cdpUrl, liveUrl } = await provision({ proxyCountryCode: "us" }) +console.log("Live view:", liveUrl) +await session.connect({ wsUrl: cdpUrl }) +// ... do work ... +await stop(id) +``` + +## Other BU API endpoints + +The full BU cloud API (profile sync, profile list, custom proxies, recording on/off, etc.) is documented at https://browser-use.com — `read` the docs and write the matching `fetch` call. Anything BU's API exposes is reachable from a snippet without bcode-side wrapper code. diff --git a/packages/bcode-browser/skills/interaction-skills/connection.md b/packages/bcode-browser/skills/interaction-skills/connection.md new file mode 100644 index 0000000000..b619c2e418 --- /dev/null +++ b/packages/bcode-browser/skills/interaction-skills/connection.md @@ -0,0 +1,104 @@ +# Connection & Tab Visibility + +## Just call `session.connect()` + +No args required. It scans OS-specific profile dirs for every running Chromium-based browser (Chrome, Chromium, Edge, Brave, Arc, Vivaldi, Opera, Comet, Canary), picks the most-recently-launched one whose WebSocket accepts, and attaches. Dead ports and permission-denied (403) candidates fall through in <100ms each, so the loop is fast. + +```js +await session.connect() +``` + +Inspect what's available (e.g. to let the user choose) with `detectBrowsers()`: + +```js +const browsers = await detectBrowsers() +// [{ name: 'Google Chrome', profileDir, port, wsPath, wsUrl, mtimeMs }, ...] +``` + +### Explicit forms (override auto-detect) + +Use only when auto-detect picks the wrong browser or you already know the destination. + +| Form | When | +|---|---| +| `{ profileDir }` | Target a specific running browser. Reads its `DevToolsActivePort` directly. OS-agnostic. | +| `{ wsUrl }` | You already have `ws://…/devtools/browser/`. | + +```js +await session.connect({ profileDir: '/Users//Library/Application Support/Google/Chrome' }) +await session.connect({ wsUrl: 'ws://127.0.0.1:9222/devtools/browser/' }) +``` + +### Timeouts and the Allow popup + +Per-candidate WS-open timeout defaults to **5s**. A live browser either opens or closes the connection within ~100ms, so 5s is always enough — unless the user has to click **Allow** on Chrome's remote-debugging popup. In that case, pass `timeoutMs: 30000` to give them time: + +```js +await session.connect({ profileDir, timeoutMs: 30_000 }) +``` + +If `session.connect()` reports `No detected browser accepted a connection`, it means every browser with `DevToolsActivePort` answered 403 or closed without opening — most likely the user hasn't clicked Allow yet. Ask them to, then retry. + +## The omnibox popup problem + +When Chrome opens fresh, the only CDP `type: "page"` targets may be `chrome://inspect` and `chrome://omnibox-popup.top-chrome/` (a 1px invisible viewport). If you attach to the omnibox popup, every subsequent action happens on a tab the user cannot see. + +`listPageTargets()` already filters `chrome://` and `devtools://` URLs. If you call `Target.getTargets` directly, filter these manually: + +```js +const { targetInfos } = await session.Target.getTargets({}) +const realTabs = targetInfos.filter(t => + t.type === 'page' && + !t.url.startsWith('chrome://') && + !t.url.startsWith('devtools://') +) +``` + +If no real pages exist yet, create one instead of attaching to nothing: + +```js +const tabs = await listPageTargets() +let targetId = tabs[0]?.targetId +if (!targetId) { + ({ targetId } = await session.Target.createTarget({ url: 'about:blank' })) +} +await session.use(targetId) +``` + +## Startup sequence + +```js +await session.connect() // 1. auto-detect the running browser +const tabs = await listPageTargets() // 2. real pages only (chrome:// already filtered) +let targetId = tabs[0]?.targetId +if (!targetId) { // 3. handle the empty case (fresh window, omnibox-only) + ({ targetId } = await session.Target.createTarget({ url: 'about:blank' })) +} +await session.use(targetId) // 4. route Page/DOM/Runtime/Network to that target +await session.Target.activateTarget({ targetId }) // 5. bring it visually to front +await session.Page.enable() // 6. enable the domains you need +``` + +## CDP target order ≠ visible tab-strip order + +When the user says "the first tab I can see", do NOT trust the order of `Target.getTargets`. Use: + +- A screenshot (`session.Page.captureScreenshot()`) to identify visually. +- Page title / URL heuristics. +- Or platform UI automation (macOS: AppleScript; Linux: `xdotool`/`wmctrl`). + +`Target.activateTarget` only switches to a targetId you already know — it cannot resolve "leftmost tab". + +## Bringing Chrome to front + +```bash +# macOS — prefer AppleScript over `open -a` (reuses current profile, avoids the profile picker) +osascript -e 'tell application "Google Chrome" to activate' + +# Linux (X11) — use wmctrl or xdotool +wmctrl -a 'Google Chrome' +xdotool search --name 'Google Chrome' windowactivate + +# Windows (PowerShell) +powershell -NoProfile -Command "(New-Object -ComObject WScript.Shell).AppActivate('Google Chrome')" +``` diff --git a/packages/bcode-browser/skills/interaction-skills/cookies.md b/packages/bcode-browser/skills/interaction-skills/cookies.md new file mode 100644 index 0000000000..f72984b725 --- /dev/null +++ b/packages/bcode-browser/skills/interaction-skills/cookies.md @@ -0,0 +1,61 @@ +# Cookies + +Use `Network.*` for cookies scoped to the attached page/context; use `Storage.getCookies` / `Storage.setCookies` for every cookie in the browser. + +## Read + +```js +await session.Network.enable({}) + +// All cookies visible to the attached page (current origin + its frames) +const { cookies } = await session.Network.getCookies({}) + +// Cookies for specific URLs +const { cookies: github } = await session.Network.getCookies({ + urls: ['https://github.com/'], +}) + +// Every cookie across the whole browser (requires Storage domain) +const { cookies: all } = await session.Storage.getCookies({}) +``` + +Shape: `{ name, value, domain, path, expires, size, httpOnly, secure, session, sameSite?, sourceScheme?, priority? }`. + +## Write + +```js +// Single cookie on the attached page +await session.Network.setCookie({ + name: 'session', + value: 'abc123', + domain: '.example.com', + path: '/', + secure: true, + httpOnly: true, + sameSite: 'Lax', + expires: Date.now() / 1000 + 86400, // seconds since epoch +}) + +// Bulk import (e.g. to preload an auth session) +await session.Network.setCookies({ + cookies: [ + { name: 'a', value: '1', domain: '.example.com', path: '/' }, + { name: 'b', value: '2', domain: '.example.com', path: '/' }, + ], +}) +``` + +## Delete / clear + +```js +await session.Network.deleteCookies({ name: 'session', domain: '.example.com' }) +await session.Network.clearBrowserCookies() // nukes everything in the default context +``` + +## Gotchas + +- `Network.setCookie` silently fails with no error if `domain` doesn't match any origin in the current profile — you'll get `{ success: true }` and the cookie just won't be there. Verify with `getCookies` after. +- `expires` is seconds (float), **not** milliseconds. A common mistake. +- Session cookies: pass no `expires` and Chrome treats them as session-scoped. Setting `expires: 0` also works. +- `sameSite` values are `'Strict'` | `'Lax'` | `'None'`. For `'None'`, Chrome also requires `secure: true`. +- Clearing cookies does NOT clear localStorage/IndexedDB. For a full logout, also call `Storage.clearDataForOrigin({ origin, storageTypes: 'all' })`. diff --git a/packages/bcode-browser/skills/interaction-skills/cross-origin-iframes.md b/packages/bcode-browser/skills/interaction-skills/cross-origin-iframes.md new file mode 100644 index 0000000000..d0909b5708 --- /dev/null +++ b/packages/bcode-browser/skills/interaction-skills/cross-origin-iframes.md @@ -0,0 +1,76 @@ +# Cross-Origin Iframes (OOPIFs) + +Cross-origin iframes (stripe.com checkout, recaptcha, Salesforce Lightning, Azure blades) run in **out-of-process iframes (OOPIFs)** with their own CDP target. You cannot reach them via `contentDocument` from the parent. + +## First try: coordinate clicks + +Compositor-level input passes through OOPIFs transparently. If the thing you want is a button you can see in a screenshot, try this first — it's simpler, undetectable, and doesn't need attaching to anything: + +```js +// Click a "Pay" button inside a Stripe iframe by page coordinates +await session.Input.dispatchMouseEvent({ type: 'mousePressed', x, y, button: 'left', clickCount: 1 }) +await session.Input.dispatchMouseEvent({ type: 'mouseReleased', x, y, button: 'left', clickCount: 1 }) +``` + +Coordinate-based typing also works if you click first, then `Input.insertText`/`Input.dispatchKeyEvent`. + +## When you need DOM inside the OOPIF + +Find the iframe target and route Runtime/DOM calls to it. Remember the parent's `targetId` first so you can switch back: + +```js +// Capture the parent target before switching — `session.use` doesn't expose it. +const parentTargetId = (await session.Target.getTargets({})) + .targetInfos.find(t => t.type === 'page' && !t.url.startsWith('chrome://'))?.targetId + +const { targetInfos } = await session.Target.getTargets({}) +const iframe = targetInfos.find(t => t.type === 'iframe' && t.url.includes('stripe.com')) +if (!iframe) { + // OOPIF targets are lazy. Interact with the parent input first + // (a coordinate click on the card-number area), then re-query Target.getTargets. + throw new Error('Stripe iframe target not present yet — interact and retry') +} + +// Route subsequent calls to the iframe target +await session.use(iframe.targetId) + +await session.Runtime.enable() +const { result } = await session.Runtime.evaluate({ + expression: 'document.querySelector("[name=cardnumber]").value', + returnByValue: true, +}) + +// Switch back to the parent page when done +if (parentTargetId) await session.use(parentTargetId) +``` + +`session.use(iframe.targetId)` auto-attaches if not already attached, and routes Page/DOM/Runtime/Network to it. `Target.*` and `Browser.*` always hit the browser endpoint regardless of `use`. + +## Which target is which? + +`Target.getTargets` returns **all** OOPIFs in the page, flat. If multiple iframes share an origin (e.g. multiple Stripe Elements), you need more than URL to disambiguate: + +- Filter by URL path (`cardNumber` vs `cardExpiry` vs `cvc` in Stripe). +- Enumerate in DOM order from the parent: find all `