Skip to content

Wpcomsh fatal-error: log recovery-mode entry from screen click#48525

Open
taipeicoder wants to merge 3 commits intotrunkfrom
add/wpcomsh-fatal-recovery-redirect-log
Open

Wpcomsh fatal-error: log recovery-mode entry from screen click#48525
taipeicoder wants to merge 3 commits intotrunkfrom
add/wpcomsh-fatal-recovery-redirect-log

Conversation

@taipeicoder
Copy link
Copy Markdown
Contributor

Proposed changes

  • Adds a first-party signed redirect endpoint (fatal-recovery-redirect.php) that the fatal-error screen's "Enter recovery mode" link now points at instead of core's bare recovery URL. The endpoint verifies an HMAC bound to AUTH_SALT + LOGGED_IN_COOKIE, dedups on the signed URL, validates the user (live cookie + resume_plugins / resume_themes cap), generates a fresh core recovery URL, emits a wpcomsh_fatal_recovery logstash event, and redirects.
  • The recovery-mode email continues to carry the bare core URL, so email-originated entries don't hit our endpoint and don't log — the new event measures screen-originated recovery-link uptake exclusively.
  • Recovery key minting moves from fatal-screen render time to click time, so the recovery_keys option no longer accumulates a row per visiting admin during a sustained outage.
  • Wrapper TTL mirrors core's WP_Recovery_Mode::get_link_ttl() flow (recovery_mode_email_rate_limitrecovery_mode_email_link_ttlmax($valid_for, $rate_limit)), each filter guarded against re-fataling.
  • Refactors out four helpers in fatal-error-helpers.php shared with the existing deactivator path: wpcomsh_fatal_sign_payload, wpcomsh_fatal_verify_payload, wpcomsh_fatal_dedup_acquire, wpcomsh_fatal_emit_logstash_event. The deactivator endpoint and the screen-render path adopt them, dropping ~50 lines of inline duplication.

Related product discussion/links

  • Internal: extend the fatal-error telemetry from signature + deactivate events (already shipped) to include screen-link recovery uptake.

Does this pull request change what data or activity we track or use?

Yes — adds a single new logstash event slug (wpcomsh_fatal_recovery) under the existing atomic_extension_conflict feature bucket. Properties shipped: site_url, atomic_site_id (when defined). Per-site dedup at 5-minute window. No new PII.

Testing instructions

  1. On a non-multisite Atomic-style site, force a fatal in a plugin (e.g. drop a throwaway mu-plugin: add_action('init', function () { trigger_error('boom', E_USER_ERROR); });).
  2. Visit the front-end as a logged-in admin. Confirm the fatal-error screen renders with Enter recovery mode in the "What you can try next" list.
  3. Click the link. Expected:
    • Browser navigates through /?wpcomsh_recover=1&wpcomsh_exp=…&wpcomsh_sig=… → core recovery URL → wp-login.php → wp-admin in recovery mode.
    • One wpcomsh_fatal_recovery row in logstash with the site's URL.
  4. Refresh the same signed URL within 5 minutes. Expected: redirect still works (you re-enter recovery mode), but no second log row.
  5. Open the recovery email and click the link from there. Expected: enters recovery mode, no wpcomsh_fatal_recovery log row (email continues to use the bare core URL).
  6. On multisite, repeat step 2. Expected: the recovery link is not rendered on the screen.
  7. As a non-admin / logged-out user, manually craft a ?wpcomsh_recover=1&wpcomsh_sig=junk&wpcomsh_exp=… URL. Expected: silent no-op, no log, no redirect, no recovery-mode entry.
  8. As an admin, in DevTools modify the cookie value and replay a previously-valid signed URL. Expected: HMAC mismatch → silent no-op.
  9. Click the Deactivate button for a fatal plugin to confirm the deactivator path still works (it now shares the verify helper). Expected: same behavior as before — plugin removed from active_plugins, redirect to wp-admin/plugins.php.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!


Wpcomsh plugin:

  • Next scheduled release: Atomic deploys happen twice daily on weekdays (p9o2xV-2EN-p2)

If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack.

@github-actions github-actions Bot added the [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. label May 5, 2026
Routes the fatal-error screen's "Enter recovery mode" link through a
first-party signed redirect endpoint that emits `wpcomsh_fatal_recovery`
to logstash before forwarding to a freshly-generated core recovery URL.
Email-originated entries continue to use the bare core URL and don't log,
so the new event measures only screen-originated recovery-link uptake.

Refactors out shared helpers used by both this endpoint and the
deactivator: HMAC sign/verify (`wpcomsh_fatal_sign_payload` /
`wpcomsh_fatal_verify_payload`), per-key dedup with fail-open
semantics (`wpcomsh_fatal_dedup_acquire`), and the WPCOMSH_Log dispatch
funnel (`wpcomsh_fatal_emit_logstash_event`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@taipeicoder taipeicoder force-pushed the add/wpcomsh-fatal-recovery-redirect-log branch from 4b895fa to 8ccf415 Compare May 5, 2026 12:42
@taipeicoder taipeicoder added [Status] Needs Review This PR is ready for review. and removed [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. labels May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (WordPress.com Site Helper), and enable the add/wpcomsh-fatal-recovery-redirect-log branch.

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

@jp-launch-control
Copy link
Copy Markdown

jp-launch-control Bot commented May 5, 2026

Code Coverage Summary

This PR did not change code coverage!

That could be good or bad, depending on the situation. Everything covered before, and still is? Great! Nothing was covered before? Not so great. 🤷

Full summary · PHP report

@arthur791004
Copy link
Copy Markdown
Contributor

One thought on the HMAC piece: the sign_payload / verify_payload / dedup_acquire helpers are essentially a hand-rolled CSRF token (since real WP nonces aren't available at mu-plugin time). That makes sense for the deactivator endpoint, where a forged click would actually rewrite active_plugins and persist damage. But the recovery-redirect endpoint doesn't mutate anything — it just checks the user's cap, asks core for a recovery URL, and redirects.

So the worst case the HMAC currently prevents on the recovery side is: an admin gets tricked into clicking a link, and lands in their own recovery mode. That's reversible, gives the attacker nothing, and doesn't affect anyone else. Worth weighing whether it's worth the ~80 lines of crypto + cookie-bootstrap.

For reference, WordPress core itself doesn't HMAC the recovery URL it ships in the email — it relies on the single-use rm_key plus a cap check.

So one alternative worth considering: drop the HMAC machinery on the recovery side only, render the wrapper as a plain ?wpcomsh_recover=1, and gate it on the cookie + cap check that's already there. Keep the deactivator's HMAC exactly as is. Same two behavior wins, roughly half the diff, and the destructive vs non-destructive split shows up in the code.

Totally fine to land as-is too — just wanted to share the observation.

@arthur791004
Copy link
Copy Markdown
Contributor

One thought worth sharing on the recovery wrapper specifically: would it be better to use wp_create_nonce / wp_verify_nonce instead of the custom HMAC?

The two are essentially equivalent in security model — WP nonces bind to (user_id, session_token, action, tick), and the session token is derived from LOGGED_IN_COOKIE, which is the same thing sign_payload / verify_payload are HMACing over. Using the primitive directly would let us drop those helpers plus the recovery_mode_email_* TTL mirroring entirely.

The one caveat is that wp_verify_nonce needs wp_set_current_user( $user_id ) after the existing cookie validation, which is a side effect the current code carefully avoids. At mu-plugin file-load time though, regular plugins haven't iterated yet — the only callbacks that could hook set_current_user come from other mu-plugins, so the realistic re-fatal risk is small and a try/catch around the call would cover it.

Net effect: ~5 lines of nonce verification in place of ~95 lines of HMAC machinery, the recovery link reads as a normal authenticated WP action URL (?wpcomsh_recover=1&_wpnonce=…), and the deactivator keeps its HMAC unchanged since destructive actions deserve the heavier guard.

WDYT?

Replace the custom HMAC on the recovery click endpoint with a WP nonce
(`wpcomsh_recover` action). The recovery endpoint runs after
pluggable.php loads via `wpcomsh_fatal_current_user_id`, so
`wp_verify_nonce` is callable there; the deactivator endpoint keeps its
HMAC because it has to verify before pluggable.php is available. Pin
`nonce_life` for the recovery action via a `PHP_INT_MAX`-priority
filter registered lazily at the mint and verify call sites, so the
tick used at mint time (screen render, plugins loaded) and verify
time (mu-plugin file load, plugins not yet loaded) always agree. Mint
under the cookie-resolved user via `wp_set_current_user` so an ambient
`$current_user` mismatch can't sign for the wrong session.

Failure paths now 302 to a path on the WP install root with the
recovery query args stripped, instead of letting WP render a normal
page response under `?wpcomsh_recover=1&…`. The redirect base comes
from `wp_parse_url( site_url('/'), PHP_URL_PATH )`, not
`$_SERVER['REQUEST_URI']`, so a crafted `//attacker.com/path` request
URI can't produce a cross-origin `Location:` header. Subdirectory
installs (e.g. example.com/wp/) bounce back into the install root.
Unrelated query params survive via `remove_query_arg` over
`$_SERVER['QUERY_STRING']`. The header is sent raw rather than via
`wp_redirect` so early bails (empty nonce, multisite) don't pull
pluggable.php just to bounce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@taipeicoder
Copy link
Copy Markdown
Contributor Author

Thanks for the suggested approached. I don't personally have any strong opinions on either approach, so I went with yours. Although I don't think it ended up being simpler 🙂

Phan's PhanPluginNeverReturnFunction caught the closure: it always exits
and has no return path. wpcomsh requires PHP 8.1+, so the native `never`
return type is available — keeps the closure's exit-only contract
visible to the type checker rather than relying on inferred void.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants