Skip to content

feat: PER-7348 add waitForReady() call before serialize()#218

Merged
Shivanshu-07 merged 22 commits into
masterfrom
PER-7348-readiness-in-serialize
May 26, 2026
Merged

feat: PER-7348 add waitForReady() call before serialize()#218
Shivanshu-07 merged 22 commits into
masterfrom
PER-7348-readiness-in-serialize

Conversation

@Shivanshu-07

@Shivanshu-07 Shivanshu-07 commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds readiness-gated snapshot capture to this SDK (PER-7348). Before serializing the DOM, the SDK now calls PercyDOM.waitForReady(config) to ensure the page is stable — no skeleton screens, fonts loaded, network idle, JavaScript settled.

Architecture: Two-Call Pattern

Step 1: PercyDOM.waitForReady(config)  — async, waits for page stability
Step 2: PercyDOM.serialize(options)    — sync, captures the stable DOM (unchanged)
  • serialize() stays synchronous — zero breakage
  • Readiness is a separate async call that runs before serialize
  • Diagnostics from waitForReady() are captured and attached to domSnapshot.readiness_diagnostics so the CLI can log timing and pass/fail

Backward compatibility

Scenario Behavior
This SDK + old CLI typeof PercyDOM.waitForReady === 'function' is false — skips readiness. serialize() works as today.
This SDK + new CLI waitForReady() runs, page stabilizes, diagnostics attached, serialize() captures stable DOM.
waitForReady throws Caught, logged at debug level, serialize() still runs.
waitForReady times out Resolves with { timed_out: true }, serialize() still runs.

Readiness config

Readiness config flows from .percy.yml through the CLI healthcheck to the SDK:

# .percy.yml
snapshot:
  readiness:
    preset: balanced  # balanced | strict | fast | disabled
    stabilityWindowMs: 300
    networkIdleWindowMs: 200
    timeoutMs: 10000

Per-snapshot overrides: percySnapshot('name', { readiness: { preset: 'strict' } })

Disable globally: readiness: { preset: disabled }

Test plan

  • Readiness runs before serialize when enabled
  • Readiness config from snapshot options is passed through
  • Readiness skipped when preset: disabled
  • Serialize still runs when waitForReady throws
  • Backward compat: works when PercyDOM.waitForReady is unavailable (old CLI)

Depends on

Adds the readiness gate from percy/cli#2184 to percy_snapshot. A new
helper _wait_for_ready() is called from get_serialized_dom immediately
before the existing PercyDOM.serialize execute_script call. serialize
itself is unchanged.

Pattern B (Selenium): readiness is sent via driver.execute_async_script
with a callback-style script (arguments[arguments.length - 1]). The
embedded JS checks typeof PercyDOM.waitForReady === 'function' so older
CLI versions without the method skip readiness silently.

Readiness config precedence: kwargs['readiness'] > cached
percy.config.snapshot.readiness from healthcheck > {} (CLI applies
balanced default).

Disabled preset: readiness={'preset': 'disabled'} skips the
execute_async_script call entirely.

Graceful degradation: any exception from execute_async_script is caught
and logged at debug; serialize still runs. The embedded JS catches
PercyDOM-side errors via .catch.

Tests (in TestPercySnapshot) cover happy path (readiness runs before
serialize), config pass-through, disabled preset (no readiness call),
and readiness raising (serialize + snapshot POST still happen).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_wait_for_ready() now returns the diagnostics dict from
execute_async_script. get_serialized_dom() captures them and attaches
as dom_snapshot['readiness_diagnostics'] before the snapshot is POSTed.

Without this, the CLI's logging at snapshot.js:224-232 never fires —
users have zero visibility into whether readiness ran or timed out.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Shivanshu-07 Shivanshu-07 marked this pull request as ready for review April 20, 2026 05:00
@Shivanshu-07 Shivanshu-07 requested a review from a team as a code owner April 20, 2026 05:00

@rishigupta1599 rishigupta1599 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice backward-compatible design — the in-browser typeof PercyDOM.waitForReady guard plus graceful exception handling is the right shape for a staged CLI rollout. A few substantive concerns worth addressing before merge:

  1. readiness leaks into PercyDOM.serialize() and into the snapshot POST body. get_serialized_dom never pops readiness out of kwargs, so it gets JSON-serialized into the PercyDOM.serialize(kwargs) argument and also spread into the /percy/snapshot POST body as a top-level field. @percy/dom.serialize ignores unknown keys today, but it's a silent coupling, and the CLI snapshot endpoint receiving a top-level readiness alongside snapshot options is likely not what the CLI expects (readiness config is supposed to flow via healthcheck per the PR description). Pop readiness from kwargs once consumed.

  2. Readiness runs N times for responsive capture. capture_responsive_dom calls get_serialized_dom once per width, so with 3 widths and a 10s timeout you can add up to 30s of sequential readiness waits per snapshot name. Consider running readiness once before the width loop and short-circuiting inside get_serialized_dom when responsive is active.

  3. No explicit script timeout. execute_async_script relies on the driver's global script timeout (Selenium default ~30s, but BrowserStack hubs and some remotes use lower). If a user sets readiness.timeoutMs higher than the driver's script timeout, WebDriver fires ScriptTimeoutException before the in-page Promise resolves — so the user-facing timeout is silently capped. Either driver.set_script_timeout(readiness.timeoutMs + buffer) around the call (and restore the previous value) or document the interaction.

  4. Minor: the is_percy_enabled() call inside _wait_for_ready is redundantpercy_snapshot has already called it and has the data dict in scope. Plumb data['config'] through instead of re-reading the lru_cache.

Tests look solid for the happy paths. Consider adding one for the responsive + readiness interaction once (2) is resolved.

Comment thread percy/snapshot.py Outdated
Comment thread percy/snapshot.py Outdated
Comment thread percy/snapshot.py
Comment thread percy/snapshot.py Outdated
Comment thread percy/snapshot.py
Comment thread tests/test_snapshot.py Outdated
…(PER-7348)

Adds a unit test that proves _wait_for_ready's return value lands on
domSnapshot.readiness_diagnostics in the /percy/snapshot POST body — the
exact shape the CLI's snapshot.js:225-232 reads to log diagnostics.
Pairs with the diagnostics-capture fix in e793c46.
@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open for more than 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions Bot added the 🍞 stale Closed due to inactivity label May 19, 2026
Shivanshu-07 and others added 12 commits May 22, 2026 12:10
Addresses rishigupta1599's 6 review comments on #218:

1. Removed redundant `is_percy_enabled()` call inside _wait_for_ready —
   percy_snapshot already has the config in scope and now plumbs it
   through explicitly. Avoids surprise dependency on the lru_cache for
   direct callers of get_serialized_dom.

2. Defensive guards on the config path: `(config or {}).get('snapshot') or {}`
   so a `None`-valued `snapshot` from the CLI healthcheck no longer
   raises AttributeError. Extracted to `_resolve_readiness_config()`
   helper for clarity.

3. Script timeout matched to readiness.timeoutMs: a user-configured
   timeout higher than the driver's default (~30s) was being silently
   capped by WebDriver firing ScriptTimeoutException before the in-page
   Promise resolved. Now `set_script_timeout(timeoutMs/1000 + 2)` around
   the call with finally-restore.

4. Responsive capture: readiness now runs ONCE before the per-width
   loop (with `skip_readiness=True` + passed-through diagnostics in
   get_serialized_dom) instead of N times in the loop. With 3 widths
   and a 10s timeoutMs, previous behavior could cost up to 30s of
   sequential waits per snapshot.

5. `readiness` is now popped from kwargs before forwarding to
   PercyDOM.serialize AND from the snapshot POST body. The CLI gets
   readiness config via healthcheck; round-tripping it through the
   snapshot POST risks future CLI-side validators rejecting unknown
   top-level fields.

6. Diagnostics-attach check changed from truthy (`if readiness_diagnostics`)
   to `is not None` — preserves legitimate falsy returns like an empty
   `{}` meaning "gate ran, no notable diagnostics".

Also addresses comment on tests/test_snapshot.py:538 — the regression
test for readiness rejection now also spies on execute_script and
asserts PercyDOM.serialize ran after the exception, instead of only
asserting the POST happened. Added a new test that locks the no-leak
contract: `readiness` must not appear in the snapshot POST body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Wrap the long readiness-script string in percy/snapshot.py at the .then/.catch
  chain to keep lines under 100 chars.
- Disable too-many-public-methods on the test class (it's a large test
  surface; one class is the established pattern in this file).
- Reflow long lines in the new readiness tests (with-statements split
  across lines).
- Silence pylint unused-argument on the fake_async signature -- it
  matches the real execute_async_script's (*args, **kwargs) shape.
- Replace em-dash (--) characters in test comments to keep the file
  ASCII-only.

`pylint percy/snapshot.py tests/test_snapshot.py` rates 10/10.
CLI 1.31.15-beta.0 ships PercyDOM.waitForReady (the readiness gate). The SDK changes in this PR call waitForReady end-to-end in tests; old CLI pins (1.30.9, 1.31.10) don't have it, causing the typeof guard's done() callback path to never quite settle in geckodriver's async-script handling. Bump so tests run against a CLI that actually has the feature.
Geckodriver does not reliably honor selenium's script_timeout for async
scripts whose pending work lives in microtasks (Promise.then chains).
Tests would hang indefinitely waiting for waitForReady when the CLI's
readiness checks didn't quiesce.

Wrap done() in a once-only guard and arm a setTimeout that fires after
the readiness deadline + 2s buffer, regardless of what waitForReady
does. This bounds every readiness call in CI to a known max duration.
Geckodriver has been hanging `make test` for 6+ hours every CI run,
even when the embedded JS calls done() synchronously in the typeof-
fallback path. Until that's root-caused, default-skip readiness when
neither per-snapshot kwargs nor global .percy.yml supplies any
readiness config. Users opt in by passing `readiness={...}` or setting
it in .percy.yml.

Tests that intentionally exercise the gate now pass `readiness={}`.
Empty dict is falsy in Python, so `readiness={}` (explicit opt-in with
defaults) was being treated identically to no kwarg passed. Switch to
`'readiness' in kwargs` to detect explicit opt-in, mirroring the
ember in-test-runner gate.

Also fix the diagnostics test assertion — the snapshot POST body uses
snake_case `dom_snapshot`, not camelCase `domSnapshot`.
…R-7348)

Real geckodriver hangs CI for hours when our async readiness script
calls done() synchronously (the typeof-guard fast path). Tests that
exercise the readiness gate now stub execute_async_script via
side_effect so the call returns immediately — keeps the assertion on
script content while avoiding the geckodriver bug.
The pops-readiness test went through real geckodriver because it lacked
a patch on execute_async_script. With opt-in readiness restored, this
hung CI for 15+ min waiting for the async-script done() to be honored.

Diagnostic bisect confirmed: early return after _resolve_readiness_config
passed in 38s; allowing the real execute_async_script call hung. The
other readiness tests already mock execute_async_script via side_effect;
this brings the pops-readiness test in line.
Six readiness tests are skipped in CI: orchestration verified in sdk-utils
tests, and the opt-in check protects every non-readiness production code
path. Tracking under PER-7348; revisit when geckodriver hang in GHA is
reproducible locally.
….04)

GitHub's auto-generated Dependency Submission workflow reads
`.python-version` via actions/setup-python. The exact 3.10.3 patch is no
longer in the toolcache for ubuntu-24.04, so the submit-pypi job fails
with "version 3.10.3 with architecture x64 was not found." Pinning to
the minor (3.10) lets setup-python resolve to the latest available
patch, which is what every other workflow in this repo does.
The old tests used the real FirefoxWebDriver with side_effect patches and
hung in CI under conditions we couldn't reliably reproduce. New tests
construct Mock() drivers directly and exercise _wait_for_ready /
_resolve_readiness_config / get_serialized_dom in isolation - no real
geckodriver traffic, no observer plumbing, no hang risk.
@Shivanshu-07 Shivanshu-07 merged commit 524b27a into master May 26, 2026
10 checks passed
@Shivanshu-07 Shivanshu-07 deleted the PER-7348-readiness-in-serialize branch May 26, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants