diff --git a/UPSTREAM.md b/UPSTREAM.md index fc4ae601a..8f418a3d5 100644 --- a/UPSTREAM.md +++ b/UPSTREAM.md @@ -87,6 +87,7 @@ Each upstream has its own append-only table. Add a row every time you pull. | 2026-04-28 | `fefca43` | `04f7716` | bcode | 7 upstream commits. Windows fixes (PRs #232, #240) + skill rename (PR #242). Files: `src/browser_harness/_ipc.py` (BH_TMP_DIR override for sock/port/pid/log/screenshot dir; drop DETACHED_PROCESS to suppress empty Windows console window), `src/browser_harness/admin.py` (route `ensure_daemon` warm probe through `ipc.connect` so Windows TCP loopback works; new `_open_inspect=False` flag on `ensure_daemon` used by `run_setup` to prevent chrome://inspect tab flooding; drop unused `_paths()` helper), `src/browser_harness/helpers.py` (`capture_screenshot` and click-debug overlay route through `ipc._TMP` instead of `tempfile.gettempdir()` so BH_TMP_DIR covers them too), `SKILL.md` (`name: browser-harness` → `name: browser`), `install.md` (`name: browser-harness-install` → `name: browser-install`). All in protected `src/browser_harness/*.py` zone — taken verbatim. SKILL/install frontmatter rename only affects how end-users invoke the skill (`/browser` vs `/browser-harness`); our `browser-execute.txt` references SKILL.md by file path, so no integration code changes. Divergences touched: none. PR #240 e2e tested separately on Linux against headless Chrome before sync. | | 2026-04-28 | `04f7716` | `2125cea` | bcode | 1 upstream commit (PR #243). `src/browser_harness/_ipc.py`: `_TMP.mkdir(parents=True, exist_ok=True)` at module load so a caller-supplied `BH_TMP_DIR` pointing at a non-existent directory no longer fails the first sock/port/pid/log/screenshot write. Prerequisite for browsercode's per-session scratch-dir use case. Protected zone — taken verbatim. Divergences touched: none. | | 2026-04-29 | `2125cea` | `997ee45` | bcode | 6 upstream commits (PRs #241, #244, #245). `src/browser_harness/_ipc.py`: when `BH_TMP_DIR` is set, drop the `bu-` filename prefix (caller-isolated dir means no shared-tmpdir disambiguation needed); without `BH_TMP_DIR` the original `bu-` scheme is unchanged. `src/browser_harness/admin.py`: `_daemon_endpoint_names` short-circuits to the local NAME when `BH_TMP_DIR` is set (no glob); plus catch `SystemError` from `os.kill` on Windows during `restart_daemon`. `src/browser_harness/daemon.py`: discover DevToolsActivePort in Comet and Arc profiles on macOS. `tests/unit/test_admin.py`: 2 new tests for the `BH_TMP_DIR` discovery path. All in protected `src/browser_harness/*.py` + tests — taken verbatim. Smoke test + 12 admin unit tests pass. The `_ipc` filename change pairs with our recent per-session BH_TMP_DIR work (browsercode PR #22) — caller isolation now extends to filenames as well as the dir. Divergences touched: none. | +| 2026-04-30 | `997ee45` | `660827d` | bcode | 11 upstream commits (PRs #246, #247, #251, #254, #256, #260). `src/browser_harness/daemon.py`: resolve WS via `/json/version` to avoid stale `DevToolsActivePort` path (PR #260) + report `cdp_disconnected` on stale CDP probe in `connection_status` (PR #254) + cleanup remote browser when daemon startup fails (PR #251). `src/browser_harness/admin.py`: companion changes for the daemon fixes. `tests/unit/test_admin.py`: 7 new tests. New domain skills: `agent-workspace/domain-skills/xiaohongshu/scraping.md` (PR #246), and a top-level `domain-skills/shopify-admin/` tree (PR #247: README, embedded-apps, knowledge-base, polaris-inputs). Note: PR #247 added skills at the top-level `domain-skills/` path, not under `agent-workspace/domain-skills/` as the post-#229 layout would suggest — vendored verbatim to match upstream layout. Doc updates: README operator framing (PR #255), install.md heredoc → `-c` flag (PR #256), profile-sync.md same. All files outside divergences — taken verbatim. Smoke test + 19 admin unit tests pass. Divergences touched: none. | --- diff --git a/packages/bcode-browser/harness/README.md b/packages/bcode-browser/harness/README.md index 21061f577..ccfc32942 100644 --- a/packages/bcode-browser/harness/README.md +++ b/packages/bcode-browser/harness/README.md @@ -2,9 +2,9 @@ # Browser Harness ♞ -The simplest, thinnest, **self-healing** harness that gives LLM **complete freedom** to complete any browser task. Built directly on CDP. +Connect an LLM directly to your real browser with a thin, editable CDP harness. For browser tasks where you need **complete freedom**. -The agent writes what's missing, mid-task, inside `agent-workspace/`. No framework, no recipes, no rails. One websocket to Chrome, nothing between. +One websocket to Chrome, nothing between. The agent writes what's missing during execution. The harness improves itself every run. ``` ● agent: wants to upload a file diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md new file mode 100644 index 000000000..1c149fce2 --- /dev/null +++ b/packages/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md @@ -0,0 +1,84 @@ +# Xiaohongshu — Search and Sort + +URL patterns: +- Home / discovery: `https://www.xiaohongshu.com/explore` +- Search results: `https://www.xiaohongshu.com/search_result?keyword=...` + +## Search flow + +- Prefer direct navigation to the desktop search results page over automating the home-page search box. +- Reliable primary path: `https://www.xiaohongshu.com/search_result?keyword=&source=web_explore_feed` +- This route loads the normal desktop results page and avoids home-page input flakiness. +- The search results page can also appear with variants such as `type=51` or other `source` values after in-app navigation; do not treat those as suspicious if the rendered results are correct. +- The top search box on `explore` can work, and searching from the home page has transitioned to `search_result` without a login wall in some sessions. +- The page exposes duplicate search inputs in the DOM with the same placeholder `搜索小红书`. +- The home-page search input can behave like a tightly controlled app field: direct DOM value assignment may be cleared immediately, and harness `type_text()` may fail to populate it even when the input is focused. +- Treat the home-page input as best-effort only. Use it when a human-like interactive flow matters, but for automation default to constructing the `search_result` URL directly. + +## Sort behavior + +- On the current desktop results layout, `最新` is **not** a top-level tab beside `综合`. +- Open the `筛选` control in the upper-right of the results header to access sort options. +- Inside `筛选`, `排序依据` contains: + - `综合` + - `最新` + - `最多点赞` + - `最多评论` + - `最多收藏` +- The `排序依据` row can render duplicate DOM nodes for the same pill text, including non-interactive clones. +- Raw global text search for `最新` can hit the wrong node first. Scope to the `排序依据` section and then choose the visible interactive `.tags` node. +- Prefer semantic filtering such as `aria-hidden != "true"` or section-scoped visible `.tags` selection over style-specific checks. +- When `最新` is active, the `筛选` trigger changes to `已筛选`. +- The rendered feed and the `已筛选` / active-pill UI are more reliable than `window.__INITIAL_STATE__.search.searchContext.sort` for confirming latest sort. + +## Stable cues + +- Search channel tabs near the top: `全部`, `图文`, `视频`, `用户` +- Sort panel labels: `筛选`, `排序依据`, `最新` +- Filter sections also visible in the panel: `笔记类型`, `发布时间`, `搜索范围`, `位置距离` + +## Interaction notes + +- DOM `.click()` opened the `筛选` panel reliably. +- DOM `.click()` on the visible `最新` pill inside the open `排序依据` section reliably activated latest sort. +- The reliable DOM pattern was: + - find the `排序依据` section / `.filters` block + - search within that block for `.tags` + - choose the one whose text is `最新` and which is the visible interactive node + - call `.click()` on that visible node +- Example selector strategy: + - find `.filters` whose first label is `排序依据` + - inside it, pick `.tags` where `textContent.trim() === "最新"` and `el.getAttribute("aria-hidden") !== "true"` +- `getClientRects().length > 0` alone may be insufficient to distinguish the working node from a duplicate. +- A broad `document.querySelectorAll("*")` text match for `最新` is not reliable on this page because it may click the hidden duplicate instead of the visible control. +- Coordinate click on the visible `最新` pill also worked and remains a valid fallback if DOM targeting gets confused by future UI changes. +- After selecting `最新`, the grid briefly showed skeleton placeholders before the refreshed results appeared. +- The search page stores the currently rendered note cards in `window.__INITIAL_STATE__.search.feeds._value` as an array of feed entries. For ordinary note cards, the useful fields were: + - `id` + - `xsecToken` + - `noteCard.displayTitle` + - `noteCard.user.nickname` +- The feed array can contain non-note inserts such as hot-query modules. Filter for entries with `noteCard` before treating an item as a note result. + +## Post opening + +- Do **not** assume a raw results link like `https://www.xiaohongshu.com/explore/` is directly openable. +- Opening that raw `/explore/` URL in a fresh tab can redirect to the web `404` / app-only gate even when the same post is openable from search results. +- To open a post from search results, click the visible card image / card in-page first. +- That click navigation can land on a tokenized URL like `https://www.xiaohongshu.com/explore/?xsec_token=...&xsec_source=pc_search`, which is a more reliable note URL than the raw `/explore/` form. +- Once the tokenized URL is obtained from the click flow, it can be revisited in-session for extraction. +- If the search results state is already loaded, you can reconstruct the tokenized note URL directly from a feed item without re-clicking: + - `https://www.xiaohongshu.com/explore/?xsec_token=&xsec_source=pc_search` + +## Post extraction + +- On tokenized post pages opened via `pc_search`, `document.body.innerText` can be a useful first-pass extraction source because it often includes the rendered note text, hashtags, timestamp, engagement counts, and visible comments. +- Verify that the note content actually rendered before trusting `document.body.innerText`, because the page can also include substantial navigation, footer, and comment noise. +- Prefer `document.body.innerText` as a fallback or initial probe before writing fragile per-element selectors for post content. + +## Gotchas + +- Do not assume `Enter` alone finished the workflow until you verify the URL changed to `search_result` or the result grid appeared. +- Do not assume the visible `综合` tab controls all sorting; on this layout, time ordering is hidden inside `筛选`. +- Do not assume the first DOM node whose text is `最新` is the clickable one; this panel duplicates pills and the hidden clone can absorb naive text-based targeting without changing state. +- Do not assume a successfully opened post can be reproduced by stripping query params; preserve the `xsec_token` when reopening results-derived post URLs. diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/README.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/README.md new file mode 100644 index 000000000..7904e2cc7 --- /dev/null +++ b/packages/bcode-browser/harness/domain-skills/shopify-admin/README.md @@ -0,0 +1,36 @@ +# shopify-admin + +Browser-harness patterns for `admin.shopify.com` and embedded Shopify apps. + +## Files in this folder + +- `embedded-apps.md` — every Shopify app runs in an iframe; how to target it +- `polaris-inputs.md` — Polaris React inputs reject synthetic value setters; use CDP type_text +- `knowledge-base.md` — automating the Shopify Knowledge Base App for FAQ entries + +## When to use these + +You're driving Shopify admin and need to add / edit / configure something. The Shopify admin UI is large and many surfaces are embedded apps — first check whether what you need is in an embedded app (most apps under `admin.shopify.com/store//apps//...` are). + +## When to skip + +- If the operation is read-only product / inventory data → use the **Storefront API** (HTTP) instead, much faster +- If the store has a custom admin app with API token provisioned → use the **Admin API** (GraphQL or REST) instead, no UI scraping +- If you're editing theme code → use the **Shopify CLI** (`shopify theme push`) — don't touch the theme editor UI + +The browser is the right tool only when: +- The setting / app exposes no API +- The change is one-time or rare enough not to justify scripting +- You're discovering / exploring the admin (e.g., finding selectors for a future automation) + +## Authentication + +Mike (or the human owner) must be logged into `admin.shopify.com` in the Chrome session that browser-harness attaches to. The harness does NOT log in — it inherits the human's session. + +If you hit `accounts.shopify.com` redirect, stop and ask the human to log in. Don't type credentials. + +## Polaris is in transition (Jan 2026 onward) + +Shopify is migrating its design system from React-based Polaris to Web-Components-based Polaris. Most legacy admin surfaces are still React. Newer surfaces (Catalog Mapping, parts of Settings) may be web components. + +Screenshot first. If you see `` or `` web component tags → use the web component pattern. If you see `[class*="Polaris-"]` React class names → use the CDP keystrokes pattern in `polaris-inputs.md`. diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md new file mode 100644 index 000000000..82a777b9d --- /dev/null +++ b/packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md @@ -0,0 +1,72 @@ +# Shopify embedded apps run in iframes + +Every Shopify app surfaced in the admin (first-party like Knowledge Base, third-party like Okendo) renders inside a sandboxed iframe. Your top-level `document` queries find the Shopify chrome (sidebar, header, search bar) but **none of the app's UI**. + +## How to target the iframe + +```python +from helpers import iframe_target, js, type_text + +# 1. Find the iframe by URL substring +tid = iframe_target("qa-pairs-app") # Knowledge Base App + +# 2. Run JS inside the iframe by passing target_id +result = js(""" +(() => { + const button = Array.from(document.querySelectorAll('button')).find(b => b.textContent.trim() === 'Add FAQ'); + if (button) { button.click(); return {clicked: true}; } + return {clicked: false}; +})() +""", target_id=tid) +``` + +## Finding the URL substring + +The iframe's URL contains the app slug. Run: + +```python +import json +for t in cdp("Target.getTargets")["targetInfos"]: + if t["type"] == "iframe" and "shopify" in t.get("url", "").lower(): + print(t["url"]) +``` + +Then pick a substring unique to your target app. + +## Known Shopify app iframe slugs + +| App | iframe URL substring | +|---|---| +| Shopify Knowledge Base (qa-pairs-app) | `qa-pairs-app` | +| Shopify Online Store editor | `online-store-web.shopifyapps.com` | +| Shopify Hydrogen Storefront | `hydrogen-storefronts` (or similar — verify) | + +Add to this table when you discover new ones. + +## Why iframes + +Shopify uses App Bridge to embed third-party apps with isolation. Your top-level page CAN'T directly access app DOM for security reasons — you need iframe targeting (which the harness does via CDP `Target.attachToTarget`). + +## Coordinate clicks vs JS clicks + +Coordinate clicks (`click(x, y)`) pass through iframes at the compositor level — they work. But JS clicks scoped to the iframe target are more reliable for routine button taps because: + +- Element text content is stable across UI redesigns +- DPR scaling on retina is automatic +- React event handlers are guaranteed to fire (vs. CDP mouse events which sometimes hit a transparent layer above the button) + +## Gotcha — multiple iframes from same app + +The Online Store editor renders the storefront preview AND the editor toolbar in two separate iframes. Pick the right one by URL substring; don't assume the first match is correct. + +```python +# WRONG — picks first match +tid = iframe_target("online-store-web") + +# RIGHT — disambiguate +for t in cdp("Target.getTargets")["targetInfos"]: + url = t.get("url", "") + if "online-store-web" in url and "editor" in url: + tid = t["targetId"] + break +``` diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md new file mode 100644 index 000000000..b4fbba921 --- /dev/null +++ b/packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md @@ -0,0 +1,109 @@ +# Shopify Knowledge Base App — automating FAQ entries + +The Knowledge Base App (Shopify Winter '26 Edition) lets merchants control how AI agents (ChatGPT, Perplexity, Claude, Copilot, Gemini) answer questions about their brand. Each entry is a Question / Answer pair. The app currently has no public API and is English-only as of Winter '26 — browser automation is the canonical path. + +## URL pattern + +``` +https://admin.shopify.com/store//apps/shopify-knowledge-base/app +``` + +Sub-routes: +- `/app` — overview (FAQ list, top unanswered questions, query log) +- `/app/new` — Add FAQ form +- `/app/pairs/` — entry detail / edit + +## Iframe slug + +The app runs at iframe URL containing `qa-pairs-app`: + +```python +tid = iframe_target("qa-pairs-app") +``` + +## Adding a single FAQ + +See `polaris-inputs.md` for the full canonical pattern. Quick version: + +```python +def add_faq(question, answer): + tid = iframe_target("qa-pairs-app") + # focus question input via JS, type via CDP, focus answer, type, click Save + # poll URL for /pairs/ success signal +``` + +## Batching multiple FAQs + +After saving an entry, the success page shows "FAQ created. Add another FAQ" link. Click it via JS to skip navigating back to overview: + +```python +def click_add_another(): + tid = iframe_target("qa-pairs-app") + js(""" + (() => { + const link = Array.from(document.querySelectorAll('a, button')) + .find(x => x.textContent.trim() === 'Add another FAQ'); + if (link) link.click(); + })() + """, target_id=tid) +``` + +Loop: + +```python +ENTRIES = [(q1, a1), (q2, a2), ...] +for q, a in ENTRIES: + click_add_another() + time.sleep(1.5) # wait for form to render + ok, info = add_faq(q, a) + print(f"{q[:40]} -> {ok} ({info})") + if not ok: break +``` + +## Brand voice — what to put in answers + +This is application-specific (depends on the merchant). For JING the rule was Aesop founder-letter tone — sentence case, no exclamation points, "JING" not "we", specific over generic. + +The Shopify guidance "Provide a brief answer in 1 or 2 sentences" is a soft hint. The textarea accepts longer text and AI agents prefer specific multi-sentence answers. Aim for 2-4 short sentences with concrete details. + +## What to put in the Knowledge Base + +Categories that materially shape AI agent answers about your brand: + +1. **Brand voice / DNA** — "What is your brand?" / "What's your tone?" +2. **Specs** — exact materials, dimensions, weights, sizes (NOT marketing prose) +3. **Comparisons** — "How does X compare to ?" with concrete differences +4. **Policies** — returns, shipping, care, warranty, contact (in brand voice) +5. **Origin** — founder, where made, why brand exists +6. **Limitations** — what you DON'T do (V1 scope, US-only, etc.) — agents that hallucinate availability hurt conversion + +Skip: anything marketing-speak. The Knowledge Base is for **truth, in voice**, not pitch copy. + +## Top unanswered questions + +The overview shows up to 7 "Top unanswered questions" Shopify auto-detected from query logs. **Answer these first** — they're real shopper queries hitting your store right now. Once answered, the section empties. + +## Query log + +`/admin/apps/shopify-knowledge-base/app/queries` (or "Query log" in app sidebar) shows what shoppers actually asked AI agents about your brand. Read weekly. New patterns become new FAQ entries. + +## Verifying entries surface in AI + +After adding an entry, allow 24 hours for AI provider indexing, then test: + +- ChatGPT: "Tell me about 's return policy" → check if your exact wording surfaces +- Perplexity: same +- Claude: "Compare vs " → see if your comparison framing appears + +If the answer doesn't surface, the entry might be too long, too vague, or contradicted by another source (your homepage, an outdated blog post). Tighten the answer. + +## Limits + +As of Winter '26 Edition: +- English-only +- No bulk import / CSV upload +- No API for read or write +- Each entry maximum ~500 words (soft cap; UI shows guidance "1 or 2 sentences") +- No version history visible to the merchant + +Watch Shopify changelogs for API exposure — likely in Spring '26 or Summer '26 Edition. When it ships, switch to API-driven population. diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/polaris-inputs.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/polaris-inputs.md new file mode 100644 index 000000000..5d4fdf0d6 --- /dev/null +++ b/packages/bcode-browser/harness/domain-skills/shopify-admin/polaris-inputs.md @@ -0,0 +1,137 @@ +# Polaris React inputs require CDP-native keystrokes + +Shopify admin uses Polaris (their design system). Until January 2026 it was React-based. Polaris React text inputs and textareas are controlled components that **reject the standard "React-friendly" synthetic value setter pattern.** + +## The trap + +This pattern looks like it works — the field's `value` shows the right text: + +```js +const setter = Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value').set; +setter.call(inputEl, "my text"); +inputEl.dispatchEvent(new Event('input', { bubbles: true })); +``` + +But the **Save / Submit button stays disabled**. Polaris's onChange handler reads from React's internal state, which the synthetic event chain doesn't fully update. + +## What works + +CDP-native keystrokes via `Input.insertText`: + +```python +from helpers import js, type_text + +# 1. Focus the input via JS — this works fine +js(""" +(() => { + const input = Array.from(document.querySelectorAll('input[type="text"], input:not([type])')) + .find(x => { const r = x.getBoundingClientRect(); return r.width > 100 && r.height > 0; }); + if (input) input.focus(); +})() +""", target_id=tid) + +# 2. Type via CDP — fires Input.insertText which is the lowest-level +# text-entry signal. React's controlled-input subscriber catches this. +type_text("My question text") +``` + +For textareas, same pattern with `document.querySelectorAll('textarea')`. + +## Full add-FAQ pattern (Knowledge Base App) + +```python +import time +from helpers import iframe_target, js, type_text, page_info, screenshot + +def add_faq(question: str, answer: str) -> tuple[bool, str]: + tid = iframe_target("qa-pairs-app") + + # 1. Make sure the form is rendered + for _ in range(15): + ready = js(""" + (() => { + const i = Array.from(document.querySelectorAll('input[type="text"], input:not([type])')) + .find(x => { const r = x.getBoundingClientRect(); return r.width > 100; }); + const t = Array.from(document.querySelectorAll('textarea')) + .find(x => { const r = x.getBoundingClientRect(); return r.width > 100; }); + if (i && t) { i.focus(); return true; } + return false; + })() + """, target_id=tid) + if ready: break + time.sleep(0.3) + + # 2. Type question (input has focus from step 1) + type_text(question) + time.sleep(0.2) + + # 3. Focus textarea, type answer + js(""" + (() => { + const t = Array.from(document.querySelectorAll('textarea')) + .find(x => { const r = x.getBoundingClientRect(); return r.width > 100; }); + if (t) t.focus(); + })() + """, target_id=tid) + time.sleep(0.2) + type_text(answer) + time.sleep(0.4) + + # 4. Click Save (now enabled because Polaris saw real keystrokes) + saved = js(""" + (() => { + const btn = Array.from(document.querySelectorAll('button')).find(b => b.textContent.trim() === 'Save'); + if (!btn || btn.disabled) return {clicked: false, disabled: btn?.disabled}; + btn.click(); + return {clicked: true}; + })() + """, target_id=tid) + if not saved.get("clicked"): + return False, "save_button_disabled" + + # 5. Poll URL for save success — Shopify redirects to /pairs/ + for _ in range(20): + time.sleep(0.3) + url = page_info().get("url", "") + if "/pairs/" in url and "/new" not in url: + return True, url.split("/pairs/")[-1] + return False, "save_timeout" +``` + +## Why this works + +Polaris React components subscribe to native `inputType` events (e.g., `insertText` from IME / accessibility tools / paste). The synthetic React-friendly setter fires `input` events but skips the lower-level `inputType` signal that Polaris validates against to enable Save buttons. + +CDP `Input.insertText` (which the harness's `type_text()` calls) emits the full native event chain, including `inputType: 'insertText'`, which React catches via its synthetic event system. + +## Polaris Web Components (post January 2026) + +The `polaris-react` repo was archived January 6, 2026. New Polaris is web-component-based. For new admin surfaces (Catalog Mapping, parts of Settings), the pattern shifts: + +```js +// Web components expose value setter on the element itself +const wc = document.querySelector('s-text-field'); +wc.value = 'my text'; +wc.dispatchEvent(new CustomEvent('input', { bubbles: true, detail: { value: 'my text' } })); +``` + +But until Shopify completes the migration (probably late 2026), **always test the React pattern first** — most legacy surfaces still use it. + +## How to know which pattern to use + +Screenshot the form first. Then JS-introspect: + +```js +// Check if React-based (Polaris-* class names) or web-component-based (s-* tags) +const hasReact = document.querySelector('[class*="Polaris-"]'); +const hasWC = document.querySelector('s-text-field, s-button, s-textarea'); +return { hasReact: !!hasReact, hasWC: !!hasWC }; +``` + +If both, lean web component (the surface is mid-migration and the WC will be authoritative). + +## Avoid + +- Coordinate-based typing via `Input.dispatchKeyEvent` keypress-by-keypress — slower, more brittle, no real benefit over `Input.insertText` +- `el.value = 'x'` without the setter prototype trick — won't even fill the visible field on Polaris React +- `dispatchEvent(new Event('change', ...))` only — Polaris listens for `input`, not `change`, on text fields diff --git a/packages/bcode-browser/harness/install.md b/packages/bcode-browser/harness/install.md index 7b19e24cc..589ebe15a 100644 --- a/packages/bcode-browser/harness/install.md +++ b/packages/bcode-browser/harness/install.md @@ -48,9 +48,7 @@ Prefer `browser-harness --setup` — it runs the full attach-and-escalate flow b 2. First try the harness directly. If this works, skip manual browser setup: ```bash -uv run browser-harness <<'PY' -print(page_info()) -PY +uv run browser-harness -c 'print(page_info())' ``` Reuse an existing healthy daemon if it is already responding. Do not kill it during setup unless the attach is clearly stale and you are confident no other agent is using the same `BU_NAME`. For parallel agents, use distinct `BU_NAME`s so they do not fight over the same default session. @@ -77,11 +75,12 @@ osascript -e 'tell application "Google Chrome" to activate' \ 7. Verify with: ```bash -uv run browser-harness <<'PY' +uv run browser-harness -c "$(cat <<'PY' goto_url("https://github.com/browser-use/browser-harness") wait_for_load() print(page_info()) PY +)" ``` If that fails with a stale websocket or stale socket, restart the daemon once and retry: diff --git a/packages/bcode-browser/harness/interaction-skills/profile-sync.md b/packages/bcode-browser/harness/interaction-skills/profile-sync.md index a67ab2826..961695a76 100644 --- a/packages/bcode-browser/harness/interaction-skills/profile-sync.md +++ b/packages/bcode-browser/harness/interaction-skills/profile-sync.md @@ -10,7 +10,7 @@ curl -fsSL https://browser-use.com/profile.sh | sh Downloads `profile-use` (macOS / Linux / Windows, x64 / arm64). The Python helpers shell out to it; you don't run `profile-use` directly. -## Python API (pre-imported in `browser-harness <<'PY'`) +## Python API (pre-imported in `browser-harness -c`) ```python list_cloud_profiles() diff --git a/packages/bcode-browser/harness/src/browser_harness/admin.py b/packages/bcode-browser/harness/src/browser_harness/admin.py index 8404c9d06..6a387c0da 100644 --- a/packages/bcode-browser/harness/src/browser_harness/admin.py +++ b/packages/bcode-browser/harness/src/browser_harness/admin.py @@ -248,6 +248,15 @@ def _browser_use(path, method, body=None): return json.loads(urllib.request.urlopen(req, timeout=60).read() or b"{}") +def _stop_cloud_browser(browser_id): + if not browser_id: + return + try: + _browser_use(f"/browsers/{browser_id}", "PATCH", {"action": "stop"}) + except BaseException: + pass + + def _cdp_ws_from_url(cdp_url): return json.loads(urllib.request.urlopen(f"{cdp_url}/json/version", timeout=15).read())["webSocketDebuggerUrl"] @@ -338,10 +347,14 @@ def start_remote_daemon(name="remote", profileName=None, **create_kwargs): raise RuntimeError("pass profileName OR profileId, not both") create_kwargs["profileId"] = _resolve_profile_name(profileName) browser = _browser_use("/browsers", "POST", create_kwargs) - ensure_daemon( - name=name, - env={"BU_CDP_WS": _cdp_ws_from_url(browser["cdpUrl"]), "BU_BROWSER_ID": browser["id"]}, - ) + try: + ensure_daemon( + name=name, + env={"BU_CDP_WS": _cdp_ws_from_url(browser["cdpUrl"]), "BU_BROWSER_ID": browser["id"]}, + ) + except BaseException: + _stop_cloud_browser(browser.get("id")) + raise _show_live_url(browser.get("liveUrl")) return browser diff --git a/packages/bcode-browser/harness/src/browser_harness/daemon.py b/packages/bcode-browser/harness/src/browser_harness/daemon.py index 5737195f0..4042519d7 100644 --- a/packages/bcode-browser/harness/src/browser_harness/daemon.py +++ b/packages/bcode-browser/harness/src/browser_harness/daemon.py @@ -93,25 +93,22 @@ def get_ws_url(): raise RuntimeError(f"BU_CDP_URL={url} unreachable after 30s: {last_err} -- is the dedicated automation Chrome running?") for base in PROFILES: try: - port, path = (base / "DevToolsActivePort").read_text().strip().split("\n", 1) + port = (base / "DevToolsActivePort").read_text().strip().split("\n", 1)[0].strip() except (FileNotFoundError, NotADirectoryError): continue + # Resolve the live WS URL via /json/version instead of trusting the path stored + # alongside the port in DevToolsActivePort: if Chrome was previously launched + # with a different --user-data-dir on the same port, that file is left behind + # with a stale browser UUID and the WS upgrade returns 404. deadline = time.time() + 30 - while True: - probe = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - probe.settimeout(1) + while time.time() < deadline: try: - probe.connect(("127.0.0.1", int(port.strip()))) - break - except OSError: - if time.time() >= deadline: - raise RuntimeError( - f"Chrome's remote-debugging page is open, but DevTools is not live yet on 127.0.0.1:{port.strip()} — if Chrome opened a profile picker, choose your normal profile first, then tick the checkbox and click Allow if shown" - ) + return json.loads(urllib.request.urlopen(f"http://127.0.0.1:{port}/json/version", timeout=1).read())["webSocketDebuggerUrl"] + except (OSError, KeyError, ValueError): time.sleep(1) - finally: - probe.close() - return f"ws://127.0.0.1:{port.strip()}{path.strip()}" + raise RuntimeError( + f"Chrome's remote-debugging page is open, but DevTools is not live yet on 127.0.0.1:{port} — if Chrome opened a profile picker, choose your normal profile first, then tick the checkbox and click Allow if shown" + ) for probe_port in (9222, 9223): try: with urllib.request.urlopen(f"http://127.0.0.1:{probe_port}/json/version", timeout=1) as r: @@ -209,18 +206,19 @@ async def handle(self, req): return {"events": out} if meta == "session": return {"session_id": self.session} if meta == "connection_status": + if not self.target_id: + return {"error": "not_attached"} + try: + info = (await self.cdp.send_raw("Target.getTargetInfo", {"targetId": self.target_id}))["targetInfo"] + except Exception: + return {"error": "cdp_disconnected"} page = None - if self.target_id: - try: - info = (await self.cdp.send_raw("Target.getTargetInfo", {"targetId": self.target_id}))["targetInfo"] - if is_real_page(info): - page = { - "targetId": info.get("targetId"), - "title": info.get("title") or "(untitled)", - "url": info.get("url") or "", - } - except Exception: - page = None + if is_real_page(info): + page = { + "targetId": info.get("targetId"), + "title": info.get("title") or "(untitled)", + "url": info.get("url") or "", + } return {"target_id": self.target_id, "session_id": self.session, "page": page} if meta == "set_session": self.session = req.get("session_id") diff --git a/packages/bcode-browser/harness/tests/unit/test_admin.py b/packages/bcode-browser/harness/tests/unit/test_admin.py index 9b9590199..ee298086c 100644 --- a/packages/bcode-browser/harness/tests/unit/test_admin.py +++ b/packages/bcode-browser/harness/tests/unit/test_admin.py @@ -1,3 +1,5 @@ +import pytest + from browser_harness import admin @@ -92,6 +94,19 @@ def fake_connect(name, timeout=1.0): assert admin.active_browser_connections() == 1 +def test_active_browser_connections_skips_daemons_reporting_cdp_disconnected(monkeypatch): + monkeypatch.setattr(admin, "_daemon_endpoint_names", lambda: ["default", "stale"]) + + def fake_connect(name, timeout=1.0): + if name == "stale": + return FakeSocket(b'{"error":"cdp_disconnected"}\n') + return FakeSocket() + + monkeypatch.setattr(admin.ipc, "connect", fake_connect) + + assert admin.active_browser_connections() == 1 + + def test_browser_connections_returns_attached_page(monkeypatch): monkeypatch.setattr(admin, "_daemon_endpoint_names", lambda: ["default"]) response = ( @@ -156,3 +171,84 @@ def test_doctor_page_output_truncates_long_text(monkeypatch, capsys): out = capsys.readouterr().out assert "A very long page ..." in out assert "https://example.t..." in out + + +def test_start_remote_daemon_stops_created_browser_when_daemon_start_fails(monkeypatch): + calls = [] + browser = {"id": "browser-123", "cdpUrl": "http://127.0.0.1:9333", "liveUrl": "https://live.example"} + + def fake_browser_use(path, method, body=None): + calls.append((path, method, body)) + if (path, method) == ("/browsers", "POST"): + return browser + if (path, method) == ("/browsers/browser-123", "PATCH"): + return {} + raise AssertionError((path, method, body)) + + monkeypatch.setattr(admin, "daemon_alive", lambda name: False) + monkeypatch.setattr(admin, "_browser_use", fake_browser_use) + monkeypatch.setattr(admin, "_cdp_ws_from_url", lambda url: "ws://example.test/devtools/browser/1") + monkeypatch.setattr(admin, "ensure_daemon", lambda **kwargs: (_ for _ in ()).throw(RuntimeError("boom"))) + + with pytest.raises(RuntimeError, match="boom"): + admin.start_remote_daemon() + + assert calls == [ + ("/browsers", "POST", {}), + ("/browsers/browser-123", "PATCH", {"action": "stop"}), + ] + + +@pytest.mark.parametrize("exc_type", [KeyboardInterrupt, SystemExit]) +def test_start_remote_daemon_stops_created_browser_when_daemon_start_is_interrupted(monkeypatch, exc_type): + calls = [] + browser = {"id": "browser-123", "cdpUrl": "http://127.0.0.1:9333", "liveUrl": "https://live.example"} + + def fake_browser_use(path, method, body=None): + calls.append((path, method, body)) + if (path, method) == ("/browsers", "POST"): + return browser + if (path, method) == ("/browsers/browser-123", "PATCH"): + return {} + raise AssertionError((path, method, body)) + + monkeypatch.setattr(admin, "daemon_alive", lambda name: False) + monkeypatch.setattr(admin, "_browser_use", fake_browser_use) + monkeypatch.setattr(admin, "_cdp_ws_from_url", lambda url: "ws://example.test/devtools/browser/1") + monkeypatch.setattr(admin, "ensure_daemon", lambda **kwargs: (_ for _ in ()).throw(exc_type())) + + with pytest.raises(exc_type): + admin.start_remote_daemon() + + assert calls == [ + ("/browsers", "POST", {}), + ("/browsers/browser-123", "PATCH", {"action": "stop"}), + ] + + +@pytest.mark.parametrize("exc_type", [KeyboardInterrupt, SystemExit]) +def test_stop_cloud_browser_swallows_baseexception_from_stop_request(monkeypatch, exc_type): + monkeypatch.setattr(admin, "_browser_use", lambda *args, **kwargs: (_ for _ in ()).throw(exc_type())) + + admin._stop_cloud_browser("browser-123") + +def test_start_remote_daemon_does_not_stop_created_browser_on_success(monkeypatch): + calls = [] + browser = {"id": "browser-123", "cdpUrl": "http://127.0.0.1:9333", "liveUrl": "https://live.example"} + + def fake_browser_use(path, method, body=None): + calls.append((path, method, body)) + if (path, method) == ("/browsers", "POST"): + return browser + raise AssertionError((path, method, body)) + + monkeypatch.setattr(admin, "daemon_alive", lambda name: False) + monkeypatch.setattr(admin, "_browser_use", fake_browser_use) + monkeypatch.setattr(admin, "_cdp_ws_from_url", lambda url: "ws://example.test/devtools/browser/1") + monkeypatch.setattr(admin, "ensure_daemon", lambda **kwargs: None) + monkeypatch.setattr(admin, "_show_live_url", lambda url: None) + + assert admin.start_remote_daemon() == browser + assert calls == [ + ("/browsers", "POST", {}), + ]