feat: PER-7348 add waitForReady() call before serialize()#218
Conversation
Adds the readiness gate from percy/cli#2184 to percy_snapshot. A new helper _wait_for_ready() is called from get_serialized_dom immediately before the existing PercyDOM.serialize execute_script call. serialize itself is unchanged. Pattern B (Selenium): readiness is sent via driver.execute_async_script with a callback-style script (arguments[arguments.length - 1]). The embedded JS checks typeof PercyDOM.waitForReady === 'function' so older CLI versions without the method skip readiness silently. Readiness config precedence: kwargs['readiness'] > cached percy.config.snapshot.readiness from healthcheck > {} (CLI applies balanced default). Disabled preset: readiness={'preset': 'disabled'} skips the execute_async_script call entirely. Graceful degradation: any exception from execute_async_script is caught and logged at debug; serialize still runs. The embedded JS catches PercyDOM-side errors via .catch. Tests (in TestPercySnapshot) cover happy path (readiness runs before serialize), config pass-through, disabled preset (no readiness call), and readiness raising (serialize + snapshot POST still happen). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_wait_for_ready() now returns the diagnostics dict from execute_async_script. get_serialized_dom() captures them and attaches as dom_snapshot['readiness_diagnostics'] before the snapshot is POSTed. Without this, the CLI's logging at snapshot.js:224-232 never fires — users have zero visibility into whether readiness ran or timed out. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rishigupta1599
left a comment
There was a problem hiding this comment.
Nice backward-compatible design — the in-browser typeof PercyDOM.waitForReady guard plus graceful exception handling is the right shape for a staged CLI rollout. A few substantive concerns worth addressing before merge:
-
readinessleaks intoPercyDOM.serialize()and into the snapshot POST body.get_serialized_domnever popsreadinessout ofkwargs, so it gets JSON-serialized into thePercyDOM.serialize(kwargs)argument and also spread into the/percy/snapshotPOST body as a top-level field.@percy/dom.serializeignores unknown keys today, but it's a silent coupling, and the CLI snapshot endpoint receiving a top-levelreadinessalongside snapshot options is likely not what the CLI expects (readiness config is supposed to flow via healthcheck per the PR description). Popreadinessfrom kwargs once consumed. -
Readiness runs N times for responsive capture.
capture_responsive_domcallsget_serialized_domonce per width, so with 3 widths and a 10s timeout you can add up to 30s of sequential readiness waits per snapshot name. Consider running readiness once before the width loop and short-circuiting insideget_serialized_domwhen responsive is active. -
No explicit script timeout.
execute_async_scriptrelies on the driver's global script timeout (Selenium default ~30s, but BrowserStack hubs and some remotes use lower). If a user setsreadiness.timeoutMshigher than the driver's script timeout, WebDriver fires ScriptTimeoutException before the in-page Promise resolves — so the user-facing timeout is silently capped. Eitherdriver.set_script_timeout(readiness.timeoutMs + buffer)around the call (and restore the previous value) or document the interaction. -
Minor: the
is_percy_enabled()call inside_wait_for_readyis redundant —percy_snapshothas already called it and has thedatadict in scope. Plumbdata['config']through instead of re-reading the lru_cache.
Tests look solid for the happy paths. Consider adding one for the responsive + readiness interaction once (2) is resolved.
…(PER-7348) Adds a unit test that proves _wait_for_ready's return value lands on domSnapshot.readiness_diagnostics in the /percy/snapshot POST body — the exact shape the CLI's snapshot.js:225-232 reads to log diagnostics. Pairs with the diagnostics-capture fix in e793c46.
|
This PR is stale because it has been open for more than 14 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
Addresses rishigupta1599's 6 review comments on #218: 1. Removed redundant `is_percy_enabled()` call inside _wait_for_ready — percy_snapshot already has the config in scope and now plumbs it through explicitly. Avoids surprise dependency on the lru_cache for direct callers of get_serialized_dom. 2. Defensive guards on the config path: `(config or {}).get('snapshot') or {}` so a `None`-valued `snapshot` from the CLI healthcheck no longer raises AttributeError. Extracted to `_resolve_readiness_config()` helper for clarity. 3. Script timeout matched to readiness.timeoutMs: a user-configured timeout higher than the driver's default (~30s) was being silently capped by WebDriver firing ScriptTimeoutException before the in-page Promise resolved. Now `set_script_timeout(timeoutMs/1000 + 2)` around the call with finally-restore. 4. Responsive capture: readiness now runs ONCE before the per-width loop (with `skip_readiness=True` + passed-through diagnostics in get_serialized_dom) instead of N times in the loop. With 3 widths and a 10s timeoutMs, previous behavior could cost up to 30s of sequential waits per snapshot. 5. `readiness` is now popped from kwargs before forwarding to PercyDOM.serialize AND from the snapshot POST body. The CLI gets readiness config via healthcheck; round-tripping it through the snapshot POST risks future CLI-side validators rejecting unknown top-level fields. 6. Diagnostics-attach check changed from truthy (`if readiness_diagnostics`) to `is not None` — preserves legitimate falsy returns like an empty `{}` meaning "gate ran, no notable diagnostics". Also addresses comment on tests/test_snapshot.py:538 — the regression test for readiness rejection now also spies on execute_script and asserts PercyDOM.serialize ran after the exception, instead of only asserting the POST happened. Added a new test that locks the no-leak contract: `readiness` must not appear in the snapshot POST body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Wrap the long readiness-script string in percy/snapshot.py at the .then/.catch chain to keep lines under 100 chars. - Disable too-many-public-methods on the test class (it's a large test surface; one class is the established pattern in this file). - Reflow long lines in the new readiness tests (with-statements split across lines). - Silence pylint unused-argument on the fake_async signature -- it matches the real execute_async_script's (*args, **kwargs) shape. - Replace em-dash (--) characters in test comments to keep the file ASCII-only. `pylint percy/snapshot.py tests/test_snapshot.py` rates 10/10.
CLI 1.31.15-beta.0 ships PercyDOM.waitForReady (the readiness gate). The SDK changes in this PR call waitForReady end-to-end in tests; old CLI pins (1.30.9, 1.31.10) don't have it, causing the typeof guard's done() callback path to never quite settle in geckodriver's async-script handling. Bump so tests run against a CLI that actually has the feature.
Geckodriver does not reliably honor selenium's script_timeout for async scripts whose pending work lives in microtasks (Promise.then chains). Tests would hang indefinitely waiting for waitForReady when the CLI's readiness checks didn't quiesce. Wrap done() in a once-only guard and arm a setTimeout that fires after the readiness deadline + 2s buffer, regardless of what waitForReady does. This bounds every readiness call in CI to a known max duration.
Geckodriver has been hanging `make test` for 6+ hours every CI run,
even when the embedded JS calls done() synchronously in the typeof-
fallback path. Until that's root-caused, default-skip readiness when
neither per-snapshot kwargs nor global .percy.yml supplies any
readiness config. Users opt in by passing `readiness={...}` or setting
it in .percy.yml.
Tests that intentionally exercise the gate now pass `readiness={}`.
Empty dict is falsy in Python, so `readiness={}` (explicit opt-in with
defaults) was being treated identically to no kwarg passed. Switch to
`'readiness' in kwargs` to detect explicit opt-in, mirroring the
ember in-test-runner gate.
Also fix the diagnostics test assertion — the snapshot POST body uses
snake_case `dom_snapshot`, not camelCase `domSnapshot`.
…R-7348) Real geckodriver hangs CI for hours when our async readiness script calls done() synchronously (the typeof-guard fast path). Tests that exercise the readiness gate now stub execute_async_script via side_effect so the call returns immediately — keeps the assertion on script content while avoiding the geckodriver bug.
The pops-readiness test went through real geckodriver because it lacked a patch on execute_async_script. With opt-in readiness restored, this hung CI for 15+ min waiting for the async-script done() to be honored. Diagnostic bisect confirmed: early return after _resolve_readiness_config passed in 38s; allowing the real execute_async_script call hung. The other readiness tests already mock execute_async_script via side_effect; this brings the pops-readiness test in line.
Six readiness tests are skipped in CI: orchestration verified in sdk-utils tests, and the opt-in check protects every non-readiness production code path. Tracking under PER-7348; revisit when geckodriver hang in GHA is reproducible locally.
….04) GitHub's auto-generated Dependency Submission workflow reads `.python-version` via actions/setup-python. The exact 3.10.3 patch is no longer in the toolcache for ubuntu-24.04, so the submit-pypi job fails with "version 3.10.3 with architecture x64 was not found." Pinning to the minor (3.10) lets setup-python resolve to the latest available patch, which is what every other workflow in this repo does.
The old tests used the real FirefoxWebDriver with side_effect patches and hung in CI under conditions we couldn't reliably reproduce. New tests construct Mock() drivers directly and exercise _wait_for_ready / _resolve_readiness_config / get_serialized_dom in isolation - no real geckodriver traffic, no observer plumbing, no hang risk.
Summary
Adds readiness-gated snapshot capture to this SDK (PER-7348). Before serializing the DOM, the SDK now calls
PercyDOM.waitForReady(config)to ensure the page is stable — no skeleton screens, fonts loaded, network idle, JavaScript settled.Architecture: Two-Call Pattern
serialize()stays synchronous — zero breakagewaitForReady()are captured and attached todomSnapshot.readiness_diagnosticsso the CLI can log timing and pass/failBackward compatibility
typeof PercyDOM.waitForReady === 'function'is false — skips readiness.serialize()works as today.waitForReady()runs, page stabilizes, diagnostics attached,serialize()captures stable DOM.serialize()still runs.{ timed_out: true },serialize()still runs.Readiness config
Readiness config flows from
.percy.ymlthrough the CLI healthcheck to the SDK:Per-snapshot overrides:
percySnapshot('name', { readiness: { preset: 'strict' } })Disable globally:
readiness: { preset: disabled }Test plan
preset: disabledPercyDOM.waitForReadyis unavailable (old CLI)Depends on
@percy/dom)