RTC: Sanitize untrusted remote CRDT changes with safeHTML#77207
RTC: Sanitize untrusted remote CRDT changes with safeHTML#77207maxschmeling wants to merge 16 commits intotrunkfrom
Conversation
When a remote RTC collaborator sends malicious content (e.g. `<script>` tags, event handler attributes, `javascript:` URIs) through the shared CRDT document, it could execute in other collaborators' browsers when written to the local store. This adds DOMPurify sanitization at the single gateway where remote CRDT content enters the local entity store (`_updateEntityRecord` in the sync manager). All string values in the changes object are recursively sanitized before being passed to `editRecord`, while preserving WordPress block comment delimiters and safe HTML. Made-with: Cursor
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Size Change: +249 B (0%) Total Size: 7.75 MB 📦 View Changed
ℹ️ View Unchanged
|
|
Flaky tests detected in e459ac9. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/24477670449
|
Track which users submit sync updates via a contributor list stored in post meta on the sync storage post. On each poll response the server checks every contributor against the `unfiltered_html` capability and returns a `trustworthy` flag. The list is cleared on post save so the check resets for subsequent editing sessions. On the client side the polling provider relays the flag through a new `trustworthy` provider event. The sync manager stores it per entity and only runs DOMPurify sanitization when `trustworthy` is false, avoiding unnecessary filtering when all collaborators are trusted. Made-with: Cursor
| $storage = new WP_Sync_Post_Meta_Storage(); | ||
| $storage->clear_contributors( $room ); | ||
| } | ||
| add_action( 'save_post', 'gutenberg_clear_sync_contributors_on_save', 10, 2 ); |
There was a problem hiding this comment.
Are there any other actions we might want to clear out contributors for? Maybe on wp_delete_post() we'd want to check for and clean up contributor meta? Not immediately important, though.
There was a problem hiding this comment.
Actually not sure on delete. I would think we might want to just leave the standard WP behavior there.
Move contributor tracking to the top of the per-room loop in handle_request so that every authenticated access is recorded, including awareness-only polls. This is simpler and more conservative than gating on whether the request carries document updates. Made-with: Cursor
Replace the read-modify-write pattern (get list, append, update row) with add_post_meta inserting one row per contributor per access. Concurrent requests from different users no longer risk overwriting each other. Duplicates across rows are deduplicated in get_contributors() via array_unique. Made-with: Cursor
Check for an existing row with the same meta key and user ID value before inserting. This prevents unbounded row growth from repeated polls by the same user. A rare race between concurrent requests can produce at most one extra row per user, which get_contributors() already handles via array_unique. Made-with: Cursor
Go back to plain add_post_meta without the direct DB dedup query. Made-with: Cursor
Check existing meta rows via get_post_meta before inserting to avoid growing the table on every poll. A rare race can still produce one duplicate per user, handled by array_unique in get_contributors(). Made-with: Cursor
| export function sanitizeRemoteChanges( | ||
| changes: Partial< ObjectData > | ||
| ): Partial< ObjectData > { | ||
| return sanitizeObjectData( changes ); |
There was a problem hiding this comment.
Does this cause any issues with rich text HTML?
There was a problem hiding this comment.
I'm having trouble testing this PR due to some build errors, but looking at how DOMPurify works I don't think rich text will be an issue. There's a huge default whitelist which includes tags like <strong>.
| 'room' => $room, | ||
| 'should_compact' => $should_compact, | ||
| 'total_updates' => $total_updates, | ||
| 'trustworthy' => $trustworthy, |
There was a problem hiding this comment.
Instead of trustworthy, can we have this 1:1 match the capability it represents? E.g. trust_unfiltered_html. I can imagine we might inspect other capabilities that affect syncing in the future, and "trustworthy" might mean something else in that context.
There was a problem hiding this comment.
Or, even more generally, we could use the capability directly:
{
sync_capabilities: [ 'unfiltered_html' ] // When all users have the capability
// or
sync_capabilities: [] // When an untrusted user joins
}
Summary
@wordpress/syncpackage via@wordpress/dom'ssafeHTML, mitigating XSS risk from RTC collaborators (without theunfiltered_htmlcapability) injecting dangerous content (e.g.<script>tags,on*event-handler attributes) into the shared CRDT document.permissions.unfilteredHtmlflag computed server-side: when all contributors can use unfiltered HTML, remote changes are passed through. When any contributor lacks it, all string values in the changes object are recursively sanitized before reaching the local entity store.How it works
When a remote update arrives via the Yjs CRDT document, the sync manager extracts the changed properties through
getChangesFromCRDTDocand writes them to the local store viahandlers.editRecord().The new
sanitizeRemoteChanges()helper intercepts this flow when the room is untrusted, recursively walking the changes object and runningsafeHTMLon every string value.safeHTMLstrips<script>tags andon*event-handler attributes viabody.innerHTML, which naturally preserves WordPress block comment delimiters (<!-- wp:paragraph -->) and the rest of the safe HTML allowlist.The trust signal comes from the server. PHP computes
permissions: { unfiltered_html: bool }per room based on each tracked contributor, the HTTP polling provider relays it as apermissionsevent, and the sync manager toggles sanitization per entity.