fix non-ASCII filename mangling caused by NBSP in page titles by Self-Perfection · Pull Request #1930 · gildas-lormeau/SingleFile

Self-Perfection · 2026-04-09T22:04:02Z

Firefox's browser.downloads.download() rejects filenames containing NBSP (U+00A0) and narrow NBSP (U+202F) with "illegal characters", even though both are valid on every modern filesystem and accepted by Chromium (see https://bugzilla.mozilla.org/show_bug.cgi?id=2030811).

The existing catch-block in download-util.js had a fallback that replaced ALL non-ASCII runs with the replacement character on any "illegal characters" error — so a single invisible NBSP from typographic markup would silently destroy the entire Cyrillic/CJK/ Arabic filename.

Add a targeted retry branch (before the legacy non-ASCII strip) that handles six specific codepoints rejected by browser engines:

U+00A0 NBSP, U+202F narrow NBSP → replaced with regular space
U+00AD soft hyphen, U+200B ZWSP, U+FEFF BOM, U+2060 word joiner → removed

The first two are Gecko-specific rejections; the other four are rejected by both Gecko and Chromium. On Chromium the new branch never fires (no error is thrown), so behavior is unchanged.

Minimal repro (save via SingleFile on Firefox):

<title>Pies w łóżku</title>

Before: "Pies w_ku (...).html" (ł ó ż eaten by non-ASCII strip)
After: "Pies w łóżku (...).html" (NBSP → space, Polish letters intact)

Firefox's browser.downloads.download() rejects filenames containing NBSP (U+00A0) and narrow NBSP (U+202F) with "illegal characters", even though both are valid on every modern filesystem and accepted by Chromium (see https://bugzilla.mozilla.org/show_bug.cgi?id=2030811). The existing catch-block in download-util.js had a fallback that replaced ALL non-ASCII runs with the replacement character on any "illegal characters" error — so a single invisible NBSP from typographic markup would silently destroy the entire Cyrillic/CJK/ Arabic filename. Add a targeted retry branch (before the legacy non-ASCII strip) that handles six specific codepoints rejected by browser engines: - U+00A0 NBSP, U+202F narrow NBSP → replaced with regular space - U+00AD soft hyphen, U+200B ZWSP, U+FEFF BOM, U+2060 word joiner → removed The first two are Gecko-specific rejections; the other four are rejected by both Gecko and Chromium. On Chromium the new branch never fires (no error is thrown), so behavior is unchanged. Minimal repro (save via SingleFile on Firefox): <title>Pies w łóżku</title> Before: "Pies w_ku (...).html" (ł ó ż eaten by non-ASCII strip) After: "Pies w łóżku (...).html" (NBSP → space, Polish letters intact) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix non-ASCII filename mangling caused by NBSP in page titles#1930

fix non-ASCII filename mangling caused by NBSP in page titles#1930
Self-Perfection wants to merge 1 commit into
gildas-lormeau:masterfrom
Self-Perfection:fix/nbsp-filename-mangling

Self-Perfection commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Self-Perfection commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant