-
Notifications
You must be signed in to change notification settings - Fork 1
sync: harness 660827d #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
84 changes: 84 additions & 0 deletions
84
...ges/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| # Xiaohongshu — Search and Sort | ||
|
|
||
| URL patterns: | ||
| - Home / discovery: `https://www.xiaohongshu.com/explore` | ||
| - Search results: `https://www.xiaohongshu.com/search_result?keyword=...` | ||
|
|
||
| ## Search flow | ||
|
|
||
| - Prefer direct navigation to the desktop search results page over automating the home-page search box. | ||
| - Reliable primary path: `https://www.xiaohongshu.com/search_result?keyword=<url-encoded keyword>&source=web_explore_feed` | ||
| - This route loads the normal desktop results page and avoids home-page input flakiness. | ||
| - The search results page can also appear with variants such as `type=51` or other `source` values after in-app navigation; do not treat those as suspicious if the rendered results are correct. | ||
| - The top search box on `explore` can work, and searching from the home page has transitioned to `search_result` without a login wall in some sessions. | ||
| - The page exposes duplicate search inputs in the DOM with the same placeholder `搜索小红书`. | ||
| - The home-page search input can behave like a tightly controlled app field: direct DOM value assignment may be cleared immediately, and harness `type_text()` may fail to populate it even when the input is focused. | ||
| - Treat the home-page input as best-effort only. Use it when a human-like interactive flow matters, but for automation default to constructing the `search_result` URL directly. | ||
|
|
||
| ## Sort behavior | ||
|
|
||
| - On the current desktop results layout, `最新` is **not** a top-level tab beside `综合`. | ||
| - Open the `筛选` control in the upper-right of the results header to access sort options. | ||
| - Inside `筛选`, `排序依据` contains: | ||
| - `综合` | ||
| - `最新` | ||
| - `最多点赞` | ||
| - `最多评论` | ||
| - `最多收藏` | ||
| - The `排序依据` row can render duplicate DOM nodes for the same pill text, including non-interactive clones. | ||
| - Raw global text search for `最新` can hit the wrong node first. Scope to the `排序依据` section and then choose the visible interactive `.tags` node. | ||
| - Prefer semantic filtering such as `aria-hidden != "true"` or section-scoped visible `.tags` selection over style-specific checks. | ||
| - When `最新` is active, the `筛选` trigger changes to `已筛选`. | ||
| - The rendered feed and the `已筛选` / active-pill UI are more reliable than `window.__INITIAL_STATE__.search.searchContext.sort` for confirming latest sort. | ||
|
|
||
| ## Stable cues | ||
|
|
||
| - Search channel tabs near the top: `全部`, `图文`, `视频`, `用户` | ||
| - Sort panel labels: `筛选`, `排序依据`, `最新` | ||
| - Filter sections also visible in the panel: `笔记类型`, `发布时间`, `搜索范围`, `位置距离` | ||
|
|
||
| ## Interaction notes | ||
|
|
||
| - DOM `.click()` opened the `筛选` panel reliably. | ||
| - DOM `.click()` on the visible `最新` pill inside the open `排序依据` section reliably activated latest sort. | ||
| - The reliable DOM pattern was: | ||
| - find the `排序依据` section / `.filters` block | ||
| - search within that block for `.tags` | ||
| - choose the one whose text is `最新` and which is the visible interactive node | ||
| - call `.click()` on that visible node | ||
| - Example selector strategy: | ||
| - find `.filters` whose first label is `排序依据` | ||
| - inside it, pick `.tags` where `textContent.trim() === "最新"` and `el.getAttribute("aria-hidden") !== "true"` | ||
| - `getClientRects().length > 0` alone may be insufficient to distinguish the working node from a duplicate. | ||
| - A broad `document.querySelectorAll("*")` text match for `最新` is not reliable on this page because it may click the hidden duplicate instead of the visible control. | ||
| - Coordinate click on the visible `最新` pill also worked and remains a valid fallback if DOM targeting gets confused by future UI changes. | ||
| - After selecting `最新`, the grid briefly showed skeleton placeholders before the refreshed results appeared. | ||
| - The search page stores the currently rendered note cards in `window.__INITIAL_STATE__.search.feeds._value` as an array of feed entries. For ordinary note cards, the useful fields were: | ||
| - `id` | ||
| - `xsecToken` | ||
| - `noteCard.displayTitle` | ||
| - `noteCard.user.nickname` | ||
| - The feed array can contain non-note inserts such as hot-query modules. Filter for entries with `noteCard` before treating an item as a note result. | ||
|
|
||
| ## Post opening | ||
|
|
||
| - Do **not** assume a raw results link like `https://www.xiaohongshu.com/explore/<id>` is directly openable. | ||
| - Opening that raw `/explore/<id>` URL in a fresh tab can redirect to the web `404` / app-only gate even when the same post is openable from search results. | ||
| - To open a post from search results, click the visible card image / card in-page first. | ||
| - That click navigation can land on a tokenized URL like `https://www.xiaohongshu.com/explore/<id>?xsec_token=...&xsec_source=pc_search`, which is a more reliable note URL than the raw `/explore/<id>` form. | ||
| - Once the tokenized URL is obtained from the click flow, it can be revisited in-session for extraction. | ||
| - If the search results state is already loaded, you can reconstruct the tokenized note URL directly from a feed item without re-clicking: | ||
| - `https://www.xiaohongshu.com/explore/<id>?xsec_token=<xsecToken>&xsec_source=pc_search` | ||
|
|
||
| ## Post extraction | ||
|
|
||
| - On tokenized post pages opened via `pc_search`, `document.body.innerText` can be a useful first-pass extraction source because it often includes the rendered note text, hashtags, timestamp, engagement counts, and visible comments. | ||
| - Verify that the note content actually rendered before trusting `document.body.innerText`, because the page can also include substantial navigation, footer, and comment noise. | ||
| - Prefer `document.body.innerText` as a fallback or initial probe before writing fragile per-element selectors for post content. | ||
|
|
||
| ## Gotchas | ||
|
|
||
| - Do not assume `Enter` alone finished the workflow until you verify the URL changed to `search_result` or the result grid appeared. | ||
| - Do not assume the visible `综合` tab controls all sorting; on this layout, time ordering is hidden inside `筛选`. | ||
| - Do not assume the first DOM node whose text is `最新` is the clickable one; this panel duplicates pills and the hidden clone can absorb naive text-based targeting without changing state. | ||
| - Do not assume a successfully opened post can be reproduced by stripping query params; preserve the `xsec_token` when reopening results-derived post URLs. |
36 changes: 36 additions & 0 deletions
36
packages/bcode-browser/harness/domain-skills/shopify-admin/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| # shopify-admin | ||
|
|
||
| Browser-harness patterns for `admin.shopify.com` and embedded Shopify apps. | ||
|
|
||
| ## Files in this folder | ||
|
|
||
| - `embedded-apps.md` — every Shopify app runs in an iframe; how to target it | ||
| - `polaris-inputs.md` — Polaris React inputs reject synthetic value setters; use CDP type_text | ||
| - `knowledge-base.md` — automating the Shopify Knowledge Base App for FAQ entries | ||
|
|
||
| ## When to use these | ||
|
|
||
| You're driving Shopify admin and need to add / edit / configure something. The Shopify admin UI is large and many surfaces are embedded apps — first check whether what you need is in an embedded app (most apps under `admin.shopify.com/store/<store>/apps/<app-slug>/...` are). | ||
|
|
||
| ## When to skip | ||
|
|
||
| - If the operation is read-only product / inventory data → use the **Storefront API** (HTTP) instead, much faster | ||
| - If the store has a custom admin app with API token provisioned → use the **Admin API** (GraphQL or REST) instead, no UI scraping | ||
| - If you're editing theme code → use the **Shopify CLI** (`shopify theme push`) — don't touch the theme editor UI | ||
|
|
||
| The browser is the right tool only when: | ||
| - The setting / app exposes no API | ||
| - The change is one-time or rare enough not to justify scripting | ||
| - You're discovering / exploring the admin (e.g., finding selectors for a future automation) | ||
|
|
||
| ## Authentication | ||
|
|
||
| Mike (or the human owner) must be logged into `admin.shopify.com` in the Chrome session that browser-harness attaches to. The harness does NOT log in — it inherits the human's session. | ||
|
|
||
| If you hit `accounts.shopify.com` redirect, stop and ask the human to log in. Don't type credentials. | ||
|
|
||
| ## Polaris is in transition (Jan 2026 onward) | ||
|
|
||
| Shopify is migrating its design system from React-based Polaris to Web-Components-based Polaris. Most legacy admin surfaces are still React. Newer surfaces (Catalog Mapping, parts of Settings) may be web components. | ||
|
|
||
| Screenshot first. If you see `<s-text-field>` or `<s-button>` web component tags → use the web component pattern. If you see `[class*="Polaris-"]` React class names → use the CDP keystrokes pattern in `polaris-inputs.md`. | ||
72 changes: 72 additions & 0 deletions
72
packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # Shopify embedded apps run in iframes | ||
|
|
||
| Every Shopify app surfaced in the admin (first-party like Knowledge Base, third-party like Okendo) renders inside a sandboxed iframe. Your top-level `document` queries find the Shopify chrome (sidebar, header, search bar) but **none of the app's UI**. | ||
|
|
||
| ## How to target the iframe | ||
|
|
||
| ```python | ||
| from helpers import iframe_target, js, type_text | ||
|
|
||
| # 1. Find the iframe by URL substring | ||
| tid = iframe_target("qa-pairs-app") # Knowledge Base App | ||
|
|
||
| # 2. Run JS inside the iframe by passing target_id | ||
| result = js(""" | ||
| (() => { | ||
| const button = Array.from(document.querySelectorAll('button')).find(b => b.textContent.trim() === 'Add FAQ'); | ||
| if (button) { button.click(); return {clicked: true}; } | ||
| return {clicked: false}; | ||
| })() | ||
| """, target_id=tid) | ||
| ``` | ||
|
|
||
| ## Finding the URL substring | ||
|
|
||
| The iframe's URL contains the app slug. Run: | ||
|
|
||
| ```python | ||
| import json | ||
| for t in cdp("Target.getTargets")["targetInfos"]: | ||
| if t["type"] == "iframe" and "shopify" in t.get("url", "").lower(): | ||
| print(t["url"]) | ||
| ``` | ||
|
|
||
| Then pick a substring unique to your target app. | ||
|
|
||
| ## Known Shopify app iframe slugs | ||
|
|
||
| | App | iframe URL substring | | ||
| |---|---| | ||
| | Shopify Knowledge Base (qa-pairs-app) | `qa-pairs-app` | | ||
| | Shopify Online Store editor | `online-store-web.shopifyapps.com` | | ||
| | Shopify Hydrogen Storefront | `hydrogen-storefronts` (or similar — verify) | | ||
|
|
||
| Add to this table when you discover new ones. | ||
|
|
||
| ## Why iframes | ||
|
|
||
| Shopify uses App Bridge to embed third-party apps with isolation. Your top-level page CAN'T directly access app DOM for security reasons — you need iframe targeting (which the harness does via CDP `Target.attachToTarget`). | ||
|
|
||
| ## Coordinate clicks vs JS clicks | ||
|
|
||
| Coordinate clicks (`click(x, y)`) pass through iframes at the compositor level — they work. But JS clicks scoped to the iframe target are more reliable for routine button taps because: | ||
|
|
||
| - Element text content is stable across UI redesigns | ||
| - DPR scaling on retina is automatic | ||
| - React event handlers are guaranteed to fire (vs. CDP mouse events which sometimes hit a transparent layer above the button) | ||
|
|
||
| ## Gotcha — multiple iframes from same app | ||
|
|
||
| The Online Store editor renders the storefront preview AND the editor toolbar in two separate iframes. Pick the right one by URL substring; don't assume the first match is correct. | ||
|
|
||
| ```python | ||
| # WRONG — picks first match | ||
| tid = iframe_target("online-store-web") | ||
|
|
||
| # RIGHT — disambiguate | ||
| for t in cdp("Target.getTargets")["targetInfos"]: | ||
| url = t.get("url", "") | ||
| if "online-store-web" in url and "editor" in url: | ||
| tid = t["targetId"] | ||
| break | ||
| ``` |
109 changes: 109 additions & 0 deletions
109
packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,109 @@ | ||||||
| # Shopify Knowledge Base App — automating FAQ entries | ||||||
|
|
||||||
| The Knowledge Base App (Shopify Winter '26 Edition) lets merchants control how AI agents (ChatGPT, Perplexity, Claude, Copilot, Gemini) answer questions about their brand. Each entry is a Question / Answer pair. The app currently has no public API and is English-only as of Winter '26 — browser automation is the canonical path. | ||||||
|
|
||||||
| ## URL pattern | ||||||
|
|
||||||
| ``` | ||||||
| https://admin.shopify.com/store/<store-handle>/apps/shopify-knowledge-base/app | ||||||
| ``` | ||||||
|
|
||||||
| Sub-routes: | ||||||
| - `/app` — overview (FAQ list, top unanswered questions, query log) | ||||||
| - `/app/new` — Add FAQ form | ||||||
| - `/app/pairs/<id>` — entry detail / edit | ||||||
|
|
||||||
| ## Iframe slug | ||||||
|
|
||||||
| The app runs at iframe URL containing `qa-pairs-app`: | ||||||
|
|
||||||
| ```python | ||||||
| tid = iframe_target("qa-pairs-app") | ||||||
| ``` | ||||||
|
|
||||||
| ## Adding a single FAQ | ||||||
|
|
||||||
| See `polaris-inputs.md` for the full canonical pattern. Quick version: | ||||||
|
|
||||||
| ```python | ||||||
| def add_faq(question, answer): | ||||||
| tid = iframe_target("qa-pairs-app") | ||||||
| # focus question input via JS, type via CDP, focus answer, type, click Save | ||||||
| # poll URL for /pairs/<id> success signal | ||||||
| ``` | ||||||
|
|
||||||
| ## Batching multiple FAQs | ||||||
|
|
||||||
| After saving an entry, the success page shows "FAQ created. Add another FAQ" link. Click it via JS to skip navigating back to overview: | ||||||
|
|
||||||
| ```python | ||||||
| def click_add_another(): | ||||||
| tid = iframe_target("qa-pairs-app") | ||||||
| js(""" | ||||||
| (() => { | ||||||
| const link = Array.from(document.querySelectorAll('a, button')) | ||||||
| .find(x => x.textContent.trim() === 'Add another FAQ'); | ||||||
| if (link) link.click(); | ||||||
| })() | ||||||
| """, target_id=tid) | ||||||
| ``` | ||||||
|
|
||||||
| Loop: | ||||||
|
|
||||||
| ```python | ||||||
| ENTRIES = [(q1, a1), (q2, a2), ...] | ||||||
| for q, a in ENTRIES: | ||||||
| click_add_another() | ||||||
| time.sleep(1.5) # wait for form to render | ||||||
| ok, info = add_faq(q, a) | ||||||
| print(f"{q[:40]} -> {ok} ({info})") | ||||||
| if not ok: break | ||||||
| ``` | ||||||
|
|
||||||
| ## Brand voice — what to put in answers | ||||||
|
|
||||||
| This is application-specific (depends on the merchant). For JING the rule was Aesop founder-letter tone — sentence case, no exclamation points, "JING" not "we", specific over generic. | ||||||
|
|
||||||
| The Shopify guidance "Provide a brief answer in 1 or 2 sentences" is a soft hint. The textarea accepts longer text and AI agents prefer specific multi-sentence answers. Aim for 2-4 short sentences with concrete details. | ||||||
|
|
||||||
| ## What to put in the Knowledge Base | ||||||
|
|
||||||
| Categories that materially shape AI agent answers about your brand: | ||||||
|
|
||||||
| 1. **Brand voice / DNA** — "What is your brand?" / "What's your tone?" | ||||||
| 2. **Specs** — exact materials, dimensions, weights, sizes (NOT marketing prose) | ||||||
| 3. **Comparisons** — "How does X compare to <competitor>?" with concrete differences | ||||||
| 4. **Policies** — returns, shipping, care, warranty, contact (in brand voice) | ||||||
| 5. **Origin** — founder, where made, why brand exists | ||||||
| 6. **Limitations** — what you DON'T do (V1 scope, US-only, etc.) — agents that hallucinate availability hurt conversion | ||||||
|
|
||||||
| Skip: anything marketing-speak. The Knowledge Base is for **truth, in voice**, not pitch copy. | ||||||
|
|
||||||
| ## Top unanswered questions | ||||||
|
|
||||||
| The overview shows up to 7 "Top unanswered questions" Shopify auto-detected from query logs. **Answer these first** — they're real shopper queries hitting your store right now. Once answered, the section empties. | ||||||
|
|
||||||
| ## Query log | ||||||
|
|
||||||
| `/admin/apps/shopify-knowledge-base/app/queries` (or "Query log" in app sidebar) shows what shoppers actually asked AI agents about your brand. Read weekly. New patterns become new FAQ entries. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P2: The Query log URL uses an inconsistent path ( Prompt for AI agents
Suggested change
|
||||||
|
|
||||||
| ## Verifying entries surface in AI | ||||||
|
|
||||||
| After adding an entry, allow 24 hours for AI provider indexing, then test: | ||||||
|
|
||||||
| - ChatGPT: "Tell me about <your brand>'s return policy" → check if your exact wording surfaces | ||||||
| - Perplexity: same | ||||||
| - Claude: "Compare <your brand> vs <competitor>" → see if your comparison framing appears | ||||||
|
|
||||||
| If the answer doesn't surface, the entry might be too long, too vague, or contradicted by another source (your homepage, an outdated blog post). Tighten the answer. | ||||||
|
|
||||||
| ## Limits | ||||||
|
|
||||||
| As of Winter '26 Edition: | ||||||
| - English-only | ||||||
| - No bulk import / CSV upload | ||||||
| - No API for read or write | ||||||
| - Each entry maximum ~500 words (soft cap; UI shows guidance "1 or 2 sentences") | ||||||
| - No version history visible to the merchant | ||||||
|
|
||||||
| Watch Shopify changelogs for API exposure — likely in Spring '26 or Summer '26 Edition. When it ships, switch to API-driven population. | ||||||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: The recommendation is too broad: Storefront API is fine for storefront-facing product reads, but not general inventory reads. Inventory workflows typically require Admin API data (for example, per-location inventory levels).
Prompt for AI agents