Add Unicode support for usernames - Allow international characters and emojis in usernames #2535

coderabbitai · 2025-11-28T18:03:57Z

⚠️ Potential issue | 🟠 Major

Address homoglyph security risk.

Unicode support introduces homoglyph attacks where visually identical characters from different scripts enable username impersonation. For example, Cyrillic 'а' (U+0430) looks identical to Latin 'a' (U+0061), allowing "Admin" and "Аdmin" to coexist as different usernames.

While resolveConfusablesTransformer() handles this for profanity detection (line 16), it doesn't prevent duplicate-looking usernames during registration or display.

Consider one of these approaches:

Restrict to specific scripts (recommended for gaming context):

// Allow only Latin, spaces, underscores, brackets, common diacritics, and emojis const validPattern = /^[\p{Script=Latin}\p{N}_[\] \p{Emoji}]+$/u;

Normalize usernames before storage/comparison:

import { confusables } from 'obscenity'; function normalizeUsername(username: string): string { // Apply confusables normalization similar to profanity check return username.normalize('NFKC').toLowerCase(); }

Add visual similarity check during registration to prevent confusable usernames.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Unicode property escapes \p{L} \p{N} \p{Emoji} browser support Chrome Firefox Safari Edge 2024 2025

💡 Result:

Short answer

Unicode property escapes (\p{...}/\P{...}) — including \p{L} and \p{N} — are supported in modern browsers: Chrome 64+, Firefox 78+, Safari 11.1+, and current Chromium-based Edge (Edge 79+). Use the /u flag. [1][2]

Notes about \p{Emoji}

\p{Emoji} (and related emoji properties) are available as Unicode properties, but emoji detection is tricky: some characters classified as Emoji include digits/symbols, and many “emoji” are multi-codepoint sequences. For reliably matching “visible emojis” prefer \p{Extended_Pictographic} (or combine Emoji_Presentation / Emoji_Modifier etc.) and be careful about sequence handling (use /v where available or handle ZWJ sequences). [2][3][4]

Practical guidance

If you need broad compatibility, transpile (e.g. regexpu/Babel) or feature-detect before using. Test on target browser versions (see Can I Use for exact version tables). [1][3]

Sources

Can I Use — Unicode character class escape (\p{...}). [caniuse data]. (caniuse.com)

MDN — Unicode character class escape documentation (examples, compatibility, Extended_Pictographic notes). (developer.mozilla.org)

Mathias Bynens — notes on Unicode property escapes and browser/engine history. (mathiasbynens.be)

Community notes / examples (StackOverflow, GitHub gists) discussing \p{Emoji} vs \p{Extended_Pictographic} and practical regexes. (stackoverflow.com)

Want a one-line compatibility table for specific Chrome/Firefox/Safari/Edge versions (2024–2025)? I can produce it.

Citations:

1: https://caniuse.com/mdn-javascript_regular_expressions_unicode_character_class_escape?utm_source=openai

2: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape?utm_source=openai

3: https://mathiasbynens.be/notes/es-unicode-property-escapes?utm_source=openai

4: https://stackoverflow.com/questions/18862256/how-to-detect-emoji-using-javascript?utm_source=openai

🏁 Script executed:

#!/bin/bash # Check for browser compatibility configuration echo "=== Checking browserslist configuration ===" find . -type f -name ".browserslistrc" -o -name "browserslist" -o -name ".eslintrc*" -o -name "tsconfig.json" -o -name "babel.config.*" -o -name "package.json" | head -20

Repository: openfrontio/OpenFrontIO

Length of output: 190

Unicode property escapes have solid modern browser support, but transpilation may be needed for legacy support.

Unicode property escapes (\p{L}, \p{N}) are supported in all modern browsers—Chrome 64+, Firefox 78+, Safari 11.1+, and Edge 79+. However:

If your project targets ES2018+ and modern browsers only, no action is needed.

If you support older browsers or need broad compatibility, use a transpiler (e.g., regexpu via Babel) to convert Unicode property escapes to equivalent character classes.

Note: \p{Emoji} is supported but has limitations for complex emoji (multi-codepoint sequences, ZWJ sequences). For reliable emoji detection, consider \p{Extended_Pictographic} or combine Emoji_Presentation / Emoji_Modifier properties with sequence handling.

Verify your project's target browsers in tsconfig.json, .browserslistrc, or babel.config.js to determine if transpilation is required.

🤖 Prompt for AI Agents

In src/core/validations/username.ts around lines 25-27, the regex uses Unicode property escapes (\p{L}, \p{N}, \p{Emoji}) which may break on older runtimes and \p{Emoji} misses complex emoji sequences; either ensure your build targets ES2018+ and supported browsers in tsconfig/browserslist so no transpilation is needed, or enable regex transpilation (e.g., Babel regexpu/polyfill or the regexpu-loader/plugin) so \p{...} is converted for legacy environments; additionally replace \p{Emoji} with \p{Extended_Pictographic} (and consider handling ZWJ/emoji modifier sequences separately) to more reliably match emoji sequences.

Firstly, this is a video game, where such issues are relatively unproblematic. To ensure protection against homoglyph attacks, simply standardise characters that look the same. For example, use the Latin letter “a” instead of the Cyrillic letter “a”. However, if a user enters a Cyrillic “a”, replace it with a Latin “a” in the background. Deal with all other problematic characters in the same way.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what obscenity exports and if confusables utilities are available echo "=== Checking obscenity imports and usage ===" rg -n "from ['\"]obscenity['\"]" -A 2 -B 2 echo -e "\n=== Checking package.json for obscenity version and related libraries ===" cat package.json | jq '.dependencies, .devDependencies' | grep -i "obscen\|confus" echo -e "\n=== Checking if there's a confusables library already in use ===" rg -n "confusables" --type ts --type js -g '!node_modules' echo -e "\n=== Checking for any existing normalization utilities ===" rg -n "normalize|normaliz" src/core/validations/username.ts

Length of output: 653

🌐 Web query:

obscenity npm library confusables API exports javascript

💡 Result:

Short answer — obscenity supports confusable/unicode-homoglyph normalization as a transformer in its matching pipeline. Import the library exports (matchers, censor, datasets and the recommended transformers) and include the recommended transformers (which contain confusables normalization) when you build a matcher.

Example (CommonJS / ESM):

Import top-level exports: RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers. Use them to create a matcher that includes confusables handling:
const { RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers } = require('obscenity');
const matcher = new RegExpMatcher({ ...englishDataset.build(), ...englishRecommendedTransformers });
const censor = new TextCensor();
const matches = matcher.getAllMatches('ʃ𝐟ʃὗƈｋ ỹоứ'); // confusable/form-mangled text
console.log(censor.applyTo('ʃ𝐟ʃὗƈｋ ỹоứ', matches));

Sources:

obscenity README / package (exports and usage examples). [1]

confusables (the npm library used for Unicode confusable handling / background on confusables normalization). [2]

References:
[1] obscenity (npm / GitHub) — package README and API examples.
[2] confusables (npm) — library for removing/injecting confusable characters.

🌐 Web query:

confusables npm library normalize unicode homoglyphs javascript

💡 Result:

Short answer — yes.

The npm package "confusables" (gc/confusables) normalizes/removes Unicode homoglyphs for English; API includes remove(...) and obfuscate(...). Example:
import remove from 'confusables'; remove('fück') // -> 'fuck'. [1]

Alternatives that use Unicode's confusables data:

unicode-confusables / @ensdomains/unicode-confusables — provides isConfusing, confusables, rectifyConfusion. Good for detection + rectification. [2]

homoglypher — utility to normalize homoglyphs to ASCII using Unicode's confusables.txt. [3]

Underlying standard: these libraries are based on Unicode's confusables mappings (UTS Public lobbies map will now be picked from a randomized playlist, assuring each map is played at least once, without duplicates. #39 / confusables.txt). For robust/security-sensitive use (IDNs, identifiers), follow UTS Public lobbies map will now be picked from a randomized playlist, assuring each map is played at least once, without duplicates. #39 guidance rather than ad-hoc rules. [4]

Sources:
[1] npm "confusables" (gc/confusables).
[2] npm "unicode-confusables" / @ensdomains.
[3] npm "homoglypher".
[4] Unicode Technical Standard UTS #39 (confusables.txt).

@Otzie2023, that's a pragmatic approach for the gaming context! Here's how to implement confusables normalization:

Recommended Implementation

Install the confusables package, which normalizes visually similar Unicode characters to their Latin equivalents:

npm install confusables

Then update src/core/validations/username.ts:

import { remove as removeConfusables } from 'confusables'; // Add a new normalization function export function normalizeUsername(username: string): string { // Convert confusables (Cyrillic "а" → Latin "a", etc.) return removeConfusables(username); } // Update sanitizeUsername to apply normalization export function sanitizeUsername(str: string): string { const normalized = normalizeUsername(str); const sanitized = Array.from(normalized) .filter((ch) => validPattern.test(ch)) .join("") .slice(0, MAX_USERNAME_LENGTH); return sanitized.padEnd(MIN_USERNAME_LENGTH, "x"); } // Optionally normalize in validation for consistency export function validateUsername(username: string): { isValid: boolean; error?: string; } { if (typeof username !== "string") { return { isValid: false, error: translateText("username.not_string") }; } // Normalize before validating const normalizedUsername = normalizeUsername(username); if (normalizedUsername.length < MIN_USERNAME_LENGTH) { return { isValid: false, error: translateText("username.too_short", { min: MIN_USERNAME_LENGTH, }), }; } if (normalizedUsername.length > MAX_USERNAME_LENGTH) { return { isValid: false, error: translateText("username.too_long", { max: MAX_USERNAME_LENGTH, }), }; } if (!validPattern.test(normalizedUsername)) { return { isValid: false, error: translateText("username.invalid_chars", { max: MAX_USERNAME_LENGTH, }), }; } return { isValid: true }; }

This way, when a user enters "Аdmin" (with Cyrillic "а"), it gets normalized to "Admin" (Latin) in the background, preventing duplicate-looking usernames while still allowing genuine international names like "José" or "李明" to pass through.

🧬 Code Graph Analysis Results

<src/core/Util.ts> (lines 64-72)

export function simpleHash(str: string): number { let hash = 0; for (let i = 0; i < str.length; i++) { const char = str.charCodeAt(i); hash = (hash << 5) - hash + char; hash = hash & hash; // Convert to 32-bit integer } return Math.abs(hash); }

<src/core/Util.ts> (lines 173-177)

export function sanitize(name: string): string { return Array.from(name) .join("") .replace(/[^\p{L}\p{N}\s\p{Emoji}\p{Emoji_Component}[\]_]/gu, ""); }

<src/core/Util.ts> (lines 347-350)

export function getClanTagOriginalCase(name: string): string | null { const clanTag = clanMatch(name); return clanTag ? clanTag[1] : null; }

🧠 Learnings used

Learnt from: Aotumuri Repo: openfrontio/OpenFrontIO PR: 1534 File: src/client/LangSelector.ts:97-106 Timestamp: 2025-07-23T12:36:35.354Z Learning: In OpenFrontIO's LangSelector.ts, the getClosestSupportedLang method always joins language code parts with underscores ("_") because all keys in the languageMap use underscore format (e.g., pt_BR, sv_SE, zh_CN). This normalization ensures consistency regardless of whether the input language code uses hyphens or underscores as delimiters.

-Original file line number
+Diff line change
@@ Expand Up / @@ -251,7 +251,7 @@ @@
         "not_string": "Username must be a string.",
         "too_short": "Username must be at least {min} characters long.",
         "too_long": "Username must not exceed {max} characters.",
-        "invalid_chars": "Username can only contain letters, numbers, spaces, underscores, and [square brackets]."
+        "invalid_chars": "Username can only contain letters (including Unicode), numbers, spaces, underscores, emojis, and [square brackets]."
       },
       "host_modal": {
         "title": "Private Lobby",
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -22,7 +22,9 @@ const matcher = new RegExpMatcher({ @@
     export const MIN_USERNAME_LENGTH = 3;
     export const MAX_USERNAME_LENGTH = 27;
-    const validPattern = /^[a-zA-Z0-9_[\] 🐈🍀üÜ]+$/u;
+    // Allow Unicode letters, numbers, spaces, underscores, brackets, and common symbols/emojis
+    // \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis
+    const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}\u{1F300}-\u{1F9FF}]+$/u;
     const shadowNames = [
       "NicePeopleOnly",
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Unicode support for usernames - Allow international characters and emojis in usernames #2535

Diff view

Diff view

There are no files selected for viewing

coderabbitai bot Nov 28, 2025 •

edited

Loading

Uh oh!

Otzie2023 Nov 28, 2025

Uh oh!

coderabbitai bot Nov 28, 2025

Uh oh!

Uh oh!

Add Unicode support for usernames - Allow international characters and emojis in usernames #2535

Are you sure you want to change the base?

Add Unicode support for usernames - Allow international characters and emojis in usernames #2535

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai bot Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Otzie2023 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Recommended Implementation

Uh oh!

Uh oh!

coderabbitai bot Nov 28, 2025 •

edited

Loading