fix: support non-base64 data URIs per RFC 2397#1556
Conversation
Per RFC 2397, the ;base64 flag is optional. When absent, the data portion is URL-encoded text, not base64. The validator previously checked all data URI payloads against the base64 regex, incorrectly rejecting valid URIs like data:text/plain,hello. This fix: - Only validates the data portion as base64 when ;base64 is present - Adds a prefix structure check to reject partial regex matches on invalid mediatypes (e.g. data:text,...) - Uses HasSuffix for ;base64 detection since RFC 2397 defines it as a terminal flag Fixes go-playground#518
There was a problem hiding this comment.
Pull request overview
Updates the datauri validator to align more closely with RFC 2397 by supporting non-base64 data URIs when the ;base64 flag is absent.
Changes:
- Validate the data payload as base64 only when the prefix ends with
;base64. - Add a prefix-structure guard to prevent partial
data:regex matches from accepting invalid mediatypes. - Expand
datauritest coverage for non-base64 data URIs and fix a malformed test case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| baked_in.go | Adjusts isDataURI logic to conditionally validate base64 and adds prefix structure validation. |
| validator_test.go | Adds/updates datauri test cases for non-base64 payloads and corrects a malformed case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Only validate the data portion as base64 when ;base64 is specified. | ||
| // Per RFC 2397, ;base64 is a terminal flag that appears immediately | ||
| // before the comma, so it is always a suffix of the prefix portion. | ||
| if strings.HasSuffix(uri[0], ";base64") { | ||
| return base64Regex().MatchString(uri[1]) | ||
| } | ||
|
|
||
| return true |
There was a problem hiding this comment.
This function now conditionally validates base64 only when ;base64 is present, but the public docs currently state that datauri "will also validate that the data portion is valid base64" unconditionally (see doc.go around the datauri tag documentation). Please update the docs (and any related comment/docstrings) to reflect the new behavior so consumers aren’t misled.
| {"data:text/plain,hello world", true}, | ||
| {"data:text/html,<h1>Hello</h1>", true}, |
There was a problem hiding this comment.
The new “valid” non-base64 data URI examples include unescaped spaces and angle brackets (data:text/plain,hello world, data:text/html,<h1>Hello</h1>). Those characters are not valid in a URI unless percent-encoded, and the PR description says the non-base64 form is URL-encoded text. Consider updating these cases to percent-encoded equivalents (e.g. hello%20world, %3Ch1%3EHello%3C%2Fh1%3E) or clarifying (and testing) that the validator intentionally accepts non-URI characters in the data portion.
| {"data:text/plain,hello world", true}, | |
| {"data:text/html,<h1>Hello</h1>", true}, | |
| {"data:text/plain,hello%20world", true}, | |
| {"data:text/html,%3Ch1%3EHello%3C%2Fh1%3E", true}, |
There was a problem hiding this comment.
I would change this, to disallow the non-URL encoded values, and allow URL encoded.
zemzale
left a comment
There was a problem hiding this comment.
I would like to see changes to docs, and fix around the data portion.
If before we only allowed base64, that would mean displaying this content as raw HTML is okay, now if we allow non-base64 stuff, we have to make sure it's URL encoded.
It's a very silly example, but could be reality if anyone using this library expect validated content from here be safe to use as raw HTML.
<!DOCTYPE html>
<html>
<body>
<button onclick="show('url')">URL-encoded</button>
<button onclick="show('b64')">Base64</button>
<button onclick="show('raw')">Raw</button>
<pre id="out"></pre>
<div id="raw"></div>
<script>
const payload = '<img src=x onerror=alert(1)>';
const values = {
url: "data:text/html," + encodeURIComponent(payload),
b64: "data:text/html;base64," + btoa(payload),
raw: payload,
};
function show(type) {
out.textContent = values[type];
raw.innerHTML = type === "raw" ? values[type] : "";
}
</script>
</body>
</html>
Summary
Fixes #518
Per RFC 2397, the
;base64flag in a data URI is optional. When absent, the data portion is URL-encoded text — not base64. Thedataurivalidator currently checks all data URI payloads against the base64 regex, incorrectly rejecting valid URIs likedata:text/plain,hello world.Changes
;base64is present in the prefixdata:text,...where the regex only matchesdata:)strings.HasSuffixfor;base64detection since RFC 2397 defines it as a terminal flag immediately before the commaBefore
After
Test plan
go test ./...)data:,ohai,data:text/plain,hello world,data:text/html,<h1>Hello</h1>);base64;sdfgsdfgsdfasdfa=sprefixdata:text,:;base85,...)