Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion packages/@glimmer/runtime/lib/dom/sanitized-values.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,11 @@ function findProtocolForURL() {
let protocol = null;

if (typeof url === 'string') {
protocol = nodeURL.parse(url).protocol;
// browsers strip ASCII tab/newline/CR from urls before navigating, so
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't know this! do you have a link to a spec or something here? or... why don't we use URL's parse?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the WHATWG URL parser itself: step 2 of the basic URL parser says "Remove all ASCII tab or newline from input," where ASCII tab or newline is U+0009, U+000A, U+000D (https://url.spec.whatwg.org/#url-parsing). So new URL('java\nscript:alert(1)').protocol already comes back as javascript:. Node's legacy url.parse predates that and keeps the characters, so it reports a null protocol instead.

On using URL's parse here: we can't on this branch. This is the fastboot path, and fastboot sets the URL global to node's require('url') module, so URL isn't the WHATWG constructor anymore (that's what the comment above and the weirdURL.parse object check are detecting). The non-fastboot else branch does use new URL() and already gets this for free. Stripping the chars here just mirrors that behavior until fastboot stops shadowing the global.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so URL isn't the WHATWG constructor anym

would that make node's URL constructor wrong? I already have an RFC around some phrasing around "SSR environments need to properly implement a rendering env"

emberjs/rfcs#1178

would love your feedback.

the main goal I have is to remove all non-standard codepaths in the codebase, so that we can:

  • be leaner
  • force rendering environments to not push their quirks back on to us

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node's WHATWG URL constructor isn't wrong, it strips those per spec. The non-standard bits are the legacy url.parse and the fact that fastboot swaps the URL global for the old require('url') module, so this branch can't reach the spec-compliant parser. If the RFC gets SSR envs to stop shadowing URL, this whole legacy branch and this patch go away with it, so the direction lines up with what you want. I'll leave some notes on the RFC. Happy to keep this as a stopgap until then.

// `java\nscript:` runs as `javascript:`. `url.parse` keeps them and reports
// a null protocol, slipping past the badProtocols check. Strip them here to
// match the WHATWG `URL` parser used on the non-fastboot path.
protocol = nodeURL.parse(url.replace(/[\t\n\r]/gu, '')).protocol;
}

return protocol === null ? ':' : protocol;
Expand Down