Skip to content

Soundness issue: BytesString::split_off can break UTF-8 invariant and cause UB via safe APIs #90

@yilin0518

Description

@yilin0518

Hello, I found a soundness issue in bytes-str 0.2.7 that allows triggering Undefined Behavior using only Safe Rust.

Description

BytesString::split_off does not validate UTF-8 character boundaries, but BytesString::as_str uses from_utf8_unchecked and assumes the internal buffer is always valid UTF-8.

Relevant code paths:

  • split_off: byte_string.rs (no char boundary check)
  • as_str: byte_string.rs (uses from_utf8_unchecked)
  • safe constructor: from_utf8_slice validates initial UTF-8

This allows constructing a valid BytesString first, then breaking the UTF-8 invariant with split_off at a non-char boundary, and finally obtaining an invalid &str through as_str.

PoC:

use bytes_str::BytesString;

fn main() {
    // Valid UTF-8 for "é" = [0xC3, 0xA9]
    let mut s = BytesString::from_utf8_slice(b"\xC3\xA9").unwrap();

    // Split inside a multi-byte code point (safe call)
    let _ = s.split_off(1);

    // as_str assumes UTF-8 invariant and returns &str
    let corrupted = s.as_str();

    // This triggers UB under Miri
    let _ = corrupted.chars().next();
}

Miri Result:

Corrupted bytes: [195]
error: Undefined Behavior: entering unreachable code
  --> /home/yilin/.rustup/toolchains/nightly-2025-12-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:48:23
   |
48 |     let y = unsafe { *bytes.next().unwrap_unchecked() };
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside `core::str::validations::next_code_point::<'_, std::slice::Iter<'_, u8>>` at /home/yilin/.rustup/toolchains/nightly-2025-12-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:48:23: 48:54
   = note: inside `<std::str::Chars<'_> as std::iter::Iterator>::next` at /home/yilin/.rustup/toolchains/nightly-2025-12-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/iter.rs:42:18: 42:49
note: inside `main`
  --> src/main.rs:13:46
   |
13 | ...ng chars: {:?}", corrupted_str.chars().next());
   |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

Suggested fix:

Please enforce character boundary checks in methods that split/truncate by index and can affect UTF-8 validity, especially:

  • split_off should assert or return an error when at is not a char boundary.

For consistency with String, panicking on non-char-boundary indices would also be reasonable.

Thanks for maintaining this crate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions