Skip to content

ci(wayland-e2e): poll for a warm screenshot instead of a single fixed sleep#379

Draft
latekvo wants to merge 1 commit into
mainfrom
fix/wayland-screenshot-retry
Draft

ci(wayland-e2e): poll for a warm screenshot instead of a single fixed sleep#379
latekvo wants to merge 1 commit into
mainfrom
fix/wayland-screenshot-retry

Conversation

@latekvo

@latekvo latekvo commented Jun 19, 2026

Copy link
Copy Markdown
Member

Problem

The Boot + drive AVD on headless Weston job's "Screenshot returns real pixels" step takes a single screenshot 15s after boot_completed and asserts it is > 20 KB. That single shot races two independent warm-up delays:

  1. SurfaceFlinger painting the lockscreen UI, and
  2. the simulator-server's screenshot stream, which only starts warming on the first capture request.

On slow/contended runners the one-and-only capture comes back as either:

  • {"error":"...no image to export"} — a cold stream, response has no .data envelope, so the step's json.load(...)['data'] raises KeyError: 'data'; or
  • an all-zero framebuffer that PNG-compresses to ~3–7 KB and fails the > 20 KB check.

Either way the step fails even though the boot was healthy. This is intermittent — the same commit passes most runs and fails some.

Fix

Replace the single fixed sleep 15 + one capture with a bounded poll (15 attempts, 4s apart). Each request also warms the stream, and the loop exits as soon as a frame exceeds 20 KB. A cold-stream error response is tolerated and retried rather than aborting the step. Genuinely broken framebuffers still fail, now with per-attempt diagnostics in the log.

  • Healthy boots pass in the first attempt or two (no slower than before in the common case).
  • Worst-case added headroom is ~60s, only spent when the framebuffer never warms — which is a real failure worth surfacing.

Verification

The workflow file is unchanged elsewhere; only this step's run: script changed.

  • prettier --check passes on the workflow file.
  • Extracted the exact run: script and ran it against a mock curl covering: cold-error → all-zero frame → real frame (exits 0 on the real frame); always-cold (exits 1 after 15 attempts); always all-zero framebuffer (exits 1). Exit codes confirmed 0 / 1 respectively.

… sleep

The 'Screenshot returns real pixels' step took one screenshot 15s after
boot_completed and required it to be >20 KB. That races two independent
warm-up delays: SurfaceFlinger painting the lockscreen, and the
simulator-server's screenshot stream only starting to warm on the first
capture request. On slow/contended runners the first (and only) capture
comes back either as a '{"error":"...no image to export"}' response
with no .data envelope (KeyError: 'data') or an all-zero framebuffer that
PNG-compresses to a few KB, failing the step even though the boot was
healthy.

Replace the single attempt with a bounded poll (15 attempts, 4s apart):
each request also warms the stream, and we stop as soon as a frame
exceeds 20 KB. Healthy boots still pass in the first attempt or two;
genuinely broken framebuffers still fail, now with per-attempt diagnostics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant