Skip to content

fix(core): declare CUA screenshot media type at capture boundary (#2300)#2306

Open
seanmcguire12 wants to merge 5 commits into
mainfrom
evals/external-contrib-yawbtng-cua-media-type
Open

fix(core): declare CUA screenshot media type at capture boundary (#2300)#2306
seanmcguire12 wants to merge 5 commits into
mainfrom
evals/external-contrib-yawbtng-cua-media-type

Conversation

@seanmcguire12

@seanmcguire12 seanmcguire12 commented Jul 2, 2026

Copy link
Copy Markdown
Member

thanks @yawbtng for the contribution here!

why

Closes #2046. This is the reshaped version of #2159, following the approach @seanmcguire12 outlined when closing that PR.

setScreenshotProvider returned a bare base64 string, so every CUA client had to independently infer or hardcode the media type — all four assumed image/png. A non-PNG screenshot (e.g. a JPEG from a custom provider) was then mislabeled as PNG in the provider function-response payload, which is the root of #2046. Clients also stripped a hardcoded data:image/png;base64, prefix by regex, so any other prefix silently broke.

what changed

Move the media-type declaration to the capture boundary. setScreenshotProvider now returns an explicit payload:

export interface ScreenshotProviderResult {
  base64: string;
  mediaType: "image/png" | "image/jpeg";
}
  • Default handler (v3CuaAgentHandler) captures PNG explicitly (type: "png") and returns { base64, mediaType: "image/png" }, so the default is unchanged.
  • Anthropic: media_type: screenshot.mediaType, data: screenshot.base64 (drops the .replace(/^data:image\/png;base64,/, "")).
  • Google: mimeType: screenshot.mediaType (drops the PNG-only prefix strip).
  • OpenAI / Microsoft: build data:${screenshot.mediaType};base64,${screenshot.base64}.
  • options.base64Image (caller-supplied) still defaults to image/png, preserving existing behavior.

ScreenshotProviderResult is exported from the public entrypoint.

testing

  • New cua-screenshot-mediatype.test.ts: asserts a non-PNG (image/jpeg) media type is honored by all four clients' captureScreenshot(), and that the options.base64Image path still defaults to png.
  • Updated the public API type test for setScreenshotProvider(...) and the Anthropic/Microsoft CUA client tests to the new provider shape.
  • pnpm --filter @browserbasehq/stagehand run typecheck passes; the CUA
  • public-API unit suites are green (55 tests).

Summary by cubic

Declare the screenshot media type at the capture boundary and thread it through all CUA clients to fix mislabeled images and remove PNG-only prefix handling. Non‑PNG screenshots (e.g. JPEG) now work end-to-end.

  • Bug Fixes

    • setScreenshotProvider now returns ScreenshotProviderResult ({ base64, mediaType }) instead of a string.
    • Default handler captures PNG (type: "png") and returns image/png.
    • Clients: Anthropic/Google pass mediaType through; OpenAI/Microsoft build data:${mediaType};base64,${base64}; removed PNG-only prefix stripping.
    • captureScreenshot({ base64Image }) accepts optional mediaType; defaults to image/png.
    • Added tests covering JPEG across clients and updated public API type tests.
  • Migration

    • If you use a custom setScreenshotProvider, return { base64, mediaType: "image/png" | "image/jpeg" } instead of a base64 string. No changes needed with the built-in handler.

Written for commit 42fb085. Summary will update on new commits.

Review in cubic

why

what changed

test plan

## why

Closes #2046. This is the reshaped version of #2159, following the
approach @seanmcguire12 outlined when closing that PR.

`setScreenshotProvider` returned a bare base64 string, so every CUA
client had to independently infer or hardcode the media type — all four
assumed `image/png`. A non-PNG screenshot (e.g. a JPEG from a custom
provider) was then mislabeled as PNG in the provider function-response
payload, which is the root of #2046. Clients also stripped a hardcoded
`data:image/png;base64,` prefix by regex, so any other prefix silently
broke.

## what changed

Move the media-type declaration to the capture boundary.
`setScreenshotProvider` now returns an explicit payload:

```ts
export interface ScreenshotProviderResult {
  base64: string;
  mediaType: "image/png" | "image/jpeg";
}
```

- **Default handler** (`v3CuaAgentHandler`) captures PNG explicitly
(`type: "png"`) and returns `{ base64, mediaType: "image/png" }`, so the
default is unchanged.
- **Anthropic**: `media_type: screenshot.mediaType`, `data:
screenshot.base64` (drops the `.replace(/^data:image\/png;base64,/,
"")`).
- **Google**: `mimeType: screenshot.mediaType` (drops the PNG-only
prefix strip).
- **OpenAI / Microsoft**: build
`data:${screenshot.mediaType};base64,${screenshot.base64}`.
- `options.base64Image` (caller-supplied) still defaults to `image/png`,
preserving existing behavior.

`ScreenshotProviderResult` is exported from the public entrypoint.

## testing

- New `cua-screenshot-mediatype.test.ts`: asserts a non-PNG
(`image/jpeg`) media type is honored by all four clients'
`captureScreenshot()`, and that the `options.base64Image` path still
defaults to png.
- Updated the public API type test for `setScreenshotProvider(...)` and
the Anthropic/Microsoft CUA client tests to the new provider shape.
- `pnpm --filter @browserbasehq/stagehand run typecheck` passes; the CUA
+ public-API unit suites are green (55 tests).


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Declare the screenshot media type at the capture boundary and pass it
through all CUA clients. Fixes non‑PNG screenshots being mislabeled as
PNG and removes PNG-only prefix stripping.

- **Bug Fixes**
- `setScreenshotProvider` now returns `{ base64, mediaType }`
(`ScreenshotProviderResult`) instead of a string.
  - Default handler explicitly captures PNG and returns `image/png`.
- Clients: Anthropic/Google pass `mediaType` through; OpenAI/Microsoft
build `data:${mediaType};base64,${base64}`; removed PNG-only prefix
regex.
  - `options.base64Image` still defaults to `image/png`.
- Added tests validating JPEG flows through all clients; updated public
API type tests.

- **Migration**
- If you provide a custom `setScreenshotProvider`, return `{ base64,
mediaType: "image/png" | "image/jpeg" }` instead of a base64 string.
  - No changes needed if you use the built-in handler.

<sup>Written for commit affd2ad.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2300?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 42fb085

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 12 files

Confidence score: 2/5

  • The riskiest issue is a versioning mismatch in .changeset/fix-screenshot-provider-mediatype.md: this appears to introduce a breaking setScreenshotProvider return-shape change but is labeled like a patch, which could surprise downstream consumers with unplanned breakage—mark the changeset as major (or at least minor) and add migration guidance before merging.
  • In packages/core/lib/v3/agent/AnthropicCUAClient.ts, captureScreenshot() now assumes { base64, mediaType }, so legacy providers returning a string can fail at runtime and break screenshot-dependent flows—add legacy string normalization (or a clear shape validation error) before merging.
  • In packages/core/lib/v3/agent/OpenAICUAClient.ts, legacy string screenshot results can become malformed data URLs, creating silent bad screenshot payloads instead of actionable failures—normalize old string outputs to a PNG object shape or throw an explicit error in captureScreenshot() before merging.

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .changeset/fix-screenshot-provider-mediatype.md
Comment thread packages/core/lib/v3/agent/AnthropicCUAClient.ts
Comment thread packages/core/lib/v3/agent/OpenAICUAClient.ts
@seanmcguire12

Copy link
Copy Markdown
Member Author

@cubic-dev-ai review. keep in mind that the versioning decision is intentional based on this comment

@cubic-dev-ai

cubic-dev-ai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@cubic-dev-ai review. keep in mind that the versioning decision is intentional based on this comment

@seanmcguire12 I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 13 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core(cua): Google function-response image handling hardcodes PNG mimeType

2 participants