Skip to content

Launch feedback: what browser-agent failures should BrowserTrace capture? #3

@aaronlab

Description

@aaronlab

BrowserTrace is in its first public launch-ready shape. This issue collects workflow feedback from people building browser agents, not stars/upvotes.

Browser Use is the primary feedback path now. Stagehand, Skyvern, Playwright + LLM, and computer-use workflows are still supported as secondary integrations.

Try first:

Current demo story:

Browser Use tries to upload file:///tmp/browsertrace-report.html, navigates to the local file path instead, and the upload preview never appears. BrowserTrace replays the screenshot, URL, action, model output, status, and first red step so the failure is inspectable.

Current comparison path:

browsertrace compare <failed_run_id> <success_run_id>
browsertrace compare <failed_run_id> <success_run_id> --json

Fastest local path from PyPI:

uvx --from "browsertrace[ui]" browsertrace doctor
uvx --from "browsertrace[ui]" browsertrace demo
uvx --from "browsertrace[ui]" browsertrace list
uvx --from "browsertrace[ui]" browsertrace

Persistent install from PyPI:

pip install "browsertrace[ui]"
browsertrace doctor
browsertrace demo
browsertrace

For public-safe trace sharing:

browsertrace export <run_id> --public -o public.html

--public omits prompt/model I/O, screenshots, and URLs from the standalone HTML export. For security-sensitive reports or changes, or anything that includes private trace data, use the Security Policy before sharing details publicly: https://github.com/aaronlab/browsertrace/blob/main/SECURITY.md

If you build with Browser Use, Stagehand, Skyvern, Playwright + LLMs, or computer-use stacks, please share:

  • What failed run was hardest for you to debug?
  • What data did you wish you had captured?
  • Which framework/runtime should BrowserTrace support better first?
  • Would a portable HTML export be enough for sharing failures, or do you need hosted links?
  • Which fields should be redacted before public sharing?
  • For failed-vs-good Browser Use runs, which final-result fields should comparison separate: extracted content, tool output, final result, retry/repair attempts, or one normalized summary?

Concrete traces, screenshots, sanitized snippets, and anonymized examples are especially useful.

For small docs fixes while discussing feedback, use the current good-first issue queue and First PR Recipe:

Metadata

Metadata

Assignees

No one assigned

    Labels

    launch-feedbackFeedback from launch, demos, or first-time users

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions