Hello, I found a soundness issue in bytes-str 0.2.7 that allows triggering Undefined Behavior using only Safe Rust.
Description
BytesString::split_off does not validate UTF-8 character boundaries, but BytesString::as_str uses from_utf8_unchecked and assumes the internal buffer is always valid UTF-8.
Relevant code paths:
- split_off: byte_string.rs (no char boundary check)
- as_str: byte_string.rs (uses from_utf8_unchecked)
- safe constructor: from_utf8_slice validates initial UTF-8
This allows constructing a valid BytesString first, then breaking the UTF-8 invariant with split_off at a non-char boundary, and finally obtaining an invalid &str through as_str.
PoC:
use bytes_str::BytesString;
fn main() {
// Valid UTF-8 for "é" = [0xC3, 0xA9]
let mut s = BytesString::from_utf8_slice(b"\xC3\xA9").unwrap();
// Split inside a multi-byte code point (safe call)
let _ = s.split_off(1);
// as_str assumes UTF-8 invariant and returns &str
let corrupted = s.as_str();
// This triggers UB under Miri
let _ = corrupted.chars().next();
}
Miri Result:
Corrupted bytes: [195]
error: Undefined Behavior: entering unreachable code
--> /home/yilin/.rustup/toolchains/nightly-2025-12-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:48:23
|
48 | let y = unsafe { *bytes.next().unwrap_unchecked() };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `core::str::validations::next_code_point::<'_, std::slice::Iter<'_, u8>>` at /home/yilin/.rustup/toolchains/nightly-2025-12-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:48:23: 48:54
= note: inside `<std::str::Chars<'_> as std::iter::Iterator>::next` at /home/yilin/.rustup/toolchains/nightly-2025-12-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/iter.rs:42:18: 42:49
note: inside `main`
--> src/main.rs:13:46
|
13 | ...ng chars: {:?}", corrupted_str.chars().next());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to 1 previous error
Suggested fix:
Please enforce character boundary checks in methods that split/truncate by index and can affect UTF-8 validity, especially:
- split_off should assert or return an error when at is not a char boundary.
For consistency with String, panicking on non-char-boundary indices would also be reasonable.
Thanks for maintaining this crate.
Hello, I found a soundness issue in bytes-str 0.2.7 that allows triggering Undefined Behavior using only Safe Rust.
Description
BytesString::split_off does not validate UTF-8 character boundaries, but BytesString::as_str uses from_utf8_unchecked and assumes the internal buffer is always valid UTF-8.
Relevant code paths:
This allows constructing a valid BytesString first, then breaking the UTF-8 invariant with split_off at a non-char boundary, and finally obtaining an invalid &str through as_str.
PoC:
Miri Result:
Suggested fix:
Please enforce character boundary checks in methods that split/truncate by index and can affect UTF-8 validity, especially:
For consistency with String, panicking on non-char-boundary indices would also be reasonable.
Thanks for maintaining this crate.