Skip to content

fix(windows): UTF-16 boundary when stripping trailing dots/spaces#22

Open
fibibot wants to merge 1 commit into
denoland:mainfrom
fibibot:fix/windows-trailing-dot-wide-slice
Open

fix(windows): UTF-16 boundary when stripping trailing dots/spaces#22
fibibot wants to merge 1 commit into
denoland:mainfrom
fibibot:fix/windows-trailing-dot-wide-slice

Conversation

@fibibot
Copy link
Copy Markdown

@fibibot fibibot commented Apr 25, 2026

Summary

normalize_path strips trailing . and from each path component on Windows. The strip range was computed in UTF-8 bytes via as_encoded_bytes(), then applied as a slice index into the UTF-16 encode_wide() vector. For components containing multi-byte characters (Cyrillic, CJK, emoji, etc.) the UTF-8 byte length exceeds the UTF-16 unit count, so the slice indexes out of bounds and panics:

range end index 70 out of range for slice of length 55
   0: deno_path_util::normalize_path::inner
             at .../deno_path_util-0.6.4/src/lib.rs:267:66

Both characters being trimmed (. and ) are ASCII, so each trailing byte corresponds to exactly one UTF-16 unit. The fix computes trim_count from the byte-side trimming and applies it as wide.len() - trim_count on the UTF-16 side.

Fixes denoland/deno#31761.

Test plan

  • Added regression cases in test_normalize_path_win covering Cyrillic, CJK, and emoji components with trailing . — these would panic with the byte index, but slice cleanly with the UTF-16 index.
  • cargo fmt --check, cargo clippy --lib --tests -- -D warnings.

`normalize_path` strips trailing `.` and ` ` from each path component on
Windows. The strip range was computed in UTF-8 bytes via
`as_encoded_bytes()`, then applied as a slice index into the UTF-16
`encode_wide()` vector. For components containing multi-byte characters
(Cyrillic, CJK, emoji, etc.) the UTF-8 byte length exceeds the UTF-16
unit count, so the slice indexes out of bounds and panics:

    range end index 70 out of range for slice of length 55

Both characters being trimmed (`.` and ` `) are ASCII, so each trailing
byte corresponds to exactly one UTF-16 unit. Compute `trim_count` from
the byte trimming and translate it to a wide-vec end index
(`wide.len() - trim_count`).

Fixes denoland/deno#31761.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant