Skip to content

fix: support non-base64 data URIs per RFC 2397#1556

Open
lawrence3699 wants to merge 1 commit into
go-playground:masterfrom
lawrence3699:fix/datauri-rfc2397-non-base64
Open

fix: support non-base64 data URIs per RFC 2397#1556
lawrence3699 wants to merge 1 commit into
go-playground:masterfrom
lawrence3699:fix/datauri-rfc2397-non-base64

Conversation

@lawrence3699
Copy link
Copy Markdown

Summary

Fixes #518

Per RFC 2397, the ;base64 flag in a data URI is optional. When absent, the data portion is URL-encoded text — not base64. The datauri validator currently checks all data URI payloads against the base64 regex, incorrectly rejecting valid URIs like data:text/plain,hello world.

Changes

  • Only validate the data portion as base64 when ;base64 is present in the prefix
  • Add a prefix structure check to reject partial regex matches on invalid mediatypes (e.g. data:text,... where the regex only matches data:)
  • Use strings.HasSuffix for ;base64 detection since RFC 2397 defines it as a terminal flag immediately before the comma
  • Update test cases to cover non-base64 data URIs and fix a malformed test case

Before

// Always checked base64 — rejected valid non-base64 data URIs
validate.Var("data:text/plain,hello world", "datauri") // ❌ error
validate.Var("data:,ohai", "datauri")                   // ❌ error

After

validate.Var("data:text/plain,hello world", "datauri") // ✅ valid
validate.Var("data:,ohai", "datauri")                   // ✅ valid
validate.Var("data:image/png;base64,iVBOR...", "datauri") // ✅ still validated as base64

Test plan

  • All existing tests pass (go test ./...)
  • New test cases added for non-base64 data URIs (data:,ohai, data:text/plain,hello world, data:text/html,<h1>Hello</h1>)
  • Fixed malformed test case that had invalid ;base64;sdfgsdfgsdfasdfa=s prefix
  • Verified invalid data URIs are still rejected (e.g. data:text,:;base85,...)

Per RFC 2397, the ;base64 flag is optional. When absent, the data
portion is URL-encoded text, not base64. The validator previously
checked all data URI payloads against the base64 regex, incorrectly
rejecting valid URIs like data:text/plain,hello.

This fix:
- Only validates the data portion as base64 when ;base64 is present
- Adds a prefix structure check to reject partial regex matches on
  invalid mediatypes (e.g. data:text,...)
- Uses HasSuffix for ;base64 detection since RFC 2397 defines it as
  a terminal flag

Fixes go-playground#518
@lawrence3699 lawrence3699 requested a review from a team as a code owner April 12, 2026 09:32
Copilot AI review requested due to automatic review settings April 12, 2026 09:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the datauri validator to align more closely with RFC 2397 by supporting non-base64 data URIs when the ;base64 flag is absent.

Changes:

  • Validate the data payload as base64 only when the prefix ends with ;base64.
  • Add a prefix-structure guard to prevent partial data: regex matches from accepting invalid mediatypes.
  • Expand datauri test coverage for non-base64 data URIs and fix a malformed test case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
baked_in.go Adjusts isDataURI logic to conditionally validate base64 and adds prefix structure validation.
validator_test.go Adds/updates datauri test cases for non-base64 payloads and corrects a malformed case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread baked_in.go
Comment thread baked_in.go
Comment on lines +617 to +624
// Only validate the data portion as base64 when ;base64 is specified.
// Per RFC 2397, ;base64 is a terminal flag that appears immediately
// before the comma, so it is always a suffix of the prefix portion.
if strings.HasSuffix(uri[0], ";base64") {
return base64Regex().MatchString(uri[1])
}

return true
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function now conditionally validates base64 only when ;base64 is present, but the public docs currently state that datauri "will also validate that the data portion is valid base64" unconditionally (see doc.go around the datauri tag documentation). Please update the docs (and any related comment/docstrings) to reflect the new behavior so consumers aren’t misled.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated indeed.

Comment thread validator_test.go
Comment on lines +4003 to +4004
{"data:text/plain,hello world", true},
{"data:text/html,<h1>Hello</h1>", true},
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new “valid” non-base64 data URI examples include unescaped spaces and angle brackets (data:text/plain,hello world, data:text/html,<h1>Hello</h1>). Those characters are not valid in a URI unless percent-encoded, and the PR description says the non-base64 form is URL-encoded text. Consider updating these cases to percent-encoded equivalents (e.g. hello%20world, %3Ch1%3EHello%3C%2Fh1%3E) or clarifying (and testing) that the validator intentionally accepts non-URI characters in the data portion.

Suggested change
{"data:text/plain,hello world", true},
{"data:text/html,<h1>Hello</h1>", true},
{"data:text/plain,hello%20world", true},
{"data:text/html,%3Ch1%3EHello%3C%2Fh1%3E", true},

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this, to disallow the non-URL encoded values, and allow URL encoded.

Copy link
Copy Markdown
Member

@zemzale zemzale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see changes to docs, and fix around the data portion.

If before we only allowed base64, that would mean displaying this content as raw HTML is okay, now if we allow non-base64 stuff, we have to make sure it's URL encoded.

It's a very silly example, but could be reality if anyone using this library expect validated content from here be safe to use as raw HTML.

<!DOCTYPE html>
<html>
<body>
  <button onclick="show('url')">URL-encoded</button>
  <button onclick="show('b64')">Base64</button>
  <button onclick="show('raw')">Raw</button>
  <pre id="out"></pre>
  <div id="raw"></div>

  <script>
    const payload = '<img src=x onerror=alert(1)>';
    const values = {
      url: "data:text/html," + encodeURIComponent(payload),
      b64: "data:text/html;base64," + btoa(payload),
      raw: payload,
    };

    function show(type) {
      out.textContent = values[type];
      raw.innerHTML = type === "raw" ? values[type] : "";
    }
  </script>
</body>
</html>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

datauri fails to accept valid data URIs.

3 participants