Skip to content

refactor: reduce unsafe, fix Miri violations, and deduplicate code#219

Open
liuq19 wants to merge 7 commits intomainfrom
refactor/reduce-unsafe-and-dedup
Open

refactor: reduce unsafe, fix Miri violations, and deduplicate code#219
liuq19 wants to merge 7 commits intomainfrom
refactor/reduce-unsafe-and-dedup

Conversation

@liuq19
Copy link
Collaborator

@liuq19 liuq19 commented Mar 18, 2026

Summary

  • Reduce unsafe usage without performance impact — convert Meta union to struct, narrow unsafe scopes, replace unchecked constructors with safe alternatives
  • Fix Miri Stacked Borrows violations — DocumentVisitor *mut Shared, exposed provenance for Arc/bump allocations, provenance-preserving copies
  • Deduplicate parser code (-232 lines) — merge 10 checked/unchecked pairs and 5 inplace/owned pairs via parameter
  • Deduplicate serializer code (-188 lines) — extract repeated methods into macros (reject_raw_value!, impl_serialize_int!, etc.)
  • Net reduction: -390 lines (599 added, 989 removed across 14 files)

Changes by Area

Area Key Changes
value/node.rs Meta union→struct, narrowed unsafe in unpack_ref/Drop, Miri provenance fixes in DocumentVisitor/forward_find_shared/visit_root
parser.rs Merged 10 checked/unchecked function pairs, 5 inplace/owned pairs, unified get_escaped_branchless via macro
serde/ser.rs reject_raw_value!, impl_serialize_int!, impl_serialize_key_int!, forward_sorted_key!, impl_serialize_float_key! macros
serde/de.rs fix_position() helper replacing 12 identical match blocks
value/array.rs Safe slice::swap replacing unsafe ptr::swap
reader.rs Safe slice indexing, full-slice provenance in PaddedSliceRead::new

Test plan

  • cargo test — 146 passed (default, sort_keys, arbitrary_precision)
  • cargo clippy --all-targets --all-features — zero warnings
  • Miri — 98/99 passed (1 skipped: large file test timeout under Miri interpreter)
  • Benchmark — no regression (most cases 1-5% faster due to better inlining)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 18, 2026 12:07
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 80.34188% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.42%. Comparing base (32668d6) to head (ca0715a).

Files with missing lines Patch % Lines
src/parser.rs 81.63% 36 Missing ⚠️
src/value/node.rs 85.39% 13 Missing ⚠️
src/serde/ser.rs 42.85% 12 Missing ⚠️
src/value/array.rs 0.00% 3 Missing ⚠️
src/lazyvalue/owned.rs 33.33% 2 Missing ⚠️
src/reader.rs 85.71% 1 Missing ⚠️
src/value/from.rs 0.00% 1 Missing ⚠️
src/value/ser.rs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #219      +/-   ##
==========================================
+ Coverage   69.97%   70.42%   +0.45%     
==========================================
  Files          42       42              
  Lines        9917     9524     -393     
==========================================
- Hits         6939     6707     -232     
+ Misses       2978     2817     -161     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liuq19
Copy link
Collaborator Author

liuq19 commented Mar 18, 2026

@codex

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors core parsing/value/serde code to reduce unsafe usage, address Miri/Stacked Borrows provenance issues, and deduplicate repeated parser/serializer logic across the crate.

Changes:

  • Reduced unsafe surface area and adjusted pointer provenance handling in DOM/value construction paths.
  • Deduplicated parser and serializer code via shared helpers/macros, consolidating checked/unchecked and owned/inplace variants.
  • Replaced some manual pointer/slice manipulation with safe standard-library equivalents.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/value/visitor.rs Adjusts visitor trait defaults and reserves key-visitor hook.
src/value/tls_buffer.rs Refactors TLS buffer allocation to avoid raw-box pointer creation.
src/value/ser.rs Deduplicates integer serializer forwarding; narrows unsafe use for floats.
src/value/partial_eq.rs Deduplicates Value numeric/bool equality helpers via macro.
src/value/node.rs Large refactor: Meta union→struct, narrower union-field unsafe, provenance fixes in DOM visitor/allocation flow.
src/value/from.rs Uses the (now safe) new_f64_unchecked constructor.
src/value/array.rs Replaces unsafe swaps/slice creation with safe slice APIs.
src/serde/ser.rs Extracts repeated serialization patterns into macros; adds safety comments for UTF-8 unchecked conversions.
src/serde/de.rs Adds fix_position() helper; exposes Arc provenance for pack_shared use.
src/reader.rs Uses safe slicing for peek_n; adjusts provenance handling for padded reads.
src/parser.rs Consolidates owned/inplace paths via dispatch_value, parse_string_visit, skip_one_checked, and shared checked/unchecked traversal helpers.
src/lazyvalue/owned.rs Removes an unnecessary unsafe around as_faststr().
src/lazyvalue/get.rs Routes get APIs through unified get_from_with_iter(..., checked) entrypoint.
src/input.rs Makes JsonSlice::as_faststr safe (noted issue: potential UB if bytes aren’t UTF-8).
Comments suppressed due to low confidence (2)

src/input.rs:21

  • JsonSlice::as_faststr is now a safe function but calls as_str(sub), which uses from_utf8_unchecked internally. Since JsonSlice::Raw can be constructed from an arbitrary &[u8] (via From<&[u8]>), calling as_faststr on non‑UTF‑8 bytes can trigger UB. Either make as_faststr unsafe again, or validate UTF‑8 (e.g., via std::str::from_utf8) before constructing the FastStr, or change the API to return a Result/fallback behavior on invalid UTF‑8.
    pub(crate) fn as_faststr(&self) -> FastStr {
        match self {
            JsonSlice::Raw(sub) => FastStr::new(as_str(sub)),
            JsonSlice::FastStr(f) => f.clone(),
        }

src/value/node.rs:972

  • new_f64_unchecked was changed from unsafe fn to a safe function but still only guards finiteness with debug_assert!. In release builds this permits constructing Value with NaN/±Inf, which can break internal assumptions (e.g., later Number::try_from(f64).unwrap() paths) and makes the "unchecked" naming misleading. Consider either making this function unsafe again, or enforcing the invariant in all builds (e.g., assert!(is_finite)/returning Option/using new_f64).
    pub(crate) fn new_f64_unchecked(fval: f64) -> Self {
        debug_assert!(fval.is_finite(), "f64 must be finite");
        Value {
            meta: Meta::new(Meta::F64),
            data: Data { fval },
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

## Unsafe Reduction
- Convert Meta from union to struct (eliminates 6 unsafe field accesses)
- Narrow unsafe scopes in unpack_ref/Drop/as_mut to per-arm union access
- Replace NonNull::new_unchecked with safe NonNull::new().expect()
- Replace unsafe ptr::swap with safe slice::swap in Array::swap_remove
- Replace unsafe from_raw_parts with safe slice indexing in reader/array
- Use safe NonNull construction in TlsBuf

## Miri Stacked Borrows Fixes
- Change DocumentVisitor.shared from &mut Shared to *mut Shared to
  prevent tag invalidation across mutable reborrows
- Use integer arithmetic in forward_find_shared to avoid narrow
  provenance when navigating to MetaNode in bump allocations
- Expose Arc allocation provenance for pack_shared's
  Arc::increment_strong_count
- Use ptr::copy_nonoverlapping in visit_root to preserve pointer
  provenance in Data union
- Fix PaddedSliceRead::new provenance (as_mut_ptr vs &mut [0])

## Parser Deduplication (-232 lines)
- Merge 10 checked/unchecked function pairs via bool parameter:
  skip_one, get_from_object, get_from_array, get_from_with_iter,
  get_many_keys, get_many_index
- Merge inplace/owned parsing pairs via Option<&mut Vec<u8>>:
  parse_string_visit, parse_array, parse_object, parse_dom,
  dispatch_value
- Unify get_escaped_branchless_u32/u64 via macro
- Remove wrapper functions, clean up _inner naming

## Serializer/Deserializer Deduplication (-188 lines)
- Extract 21 identical RawValue error methods into reject_raw_value! macro
- Extract 10+10+6 integer serialize methods into impl_serialize_int!/
  impl_serialize_key_int!/forward_sorted_key! macros
- Extract 12 fix_position error wrappers into helper method
- Extract serialize_f32/f64 key into impl_serialize_float_key! macro
- Deduplicate eq_* helpers in partial_eq.rs via impl_eq_fn! macro
- Deduplicate serialize forwarding in value/ser.rs via forward_to! macro

## Other Cleanup
- Remove incorrect #[allow(dead_code)] from JsonVisitor trait methods
- Remove dead parse_value/skip_string_unchecked2 functions

All changes verified: 146 tests pass across default/sort_keys/
arbitrary_precision features, clippy zero warnings, Miri 98/99 pass
(1 skipped due to large file timeout), benchmark shows no regression
(most cases 1-5% faster).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@liuq19 liuq19 force-pushed the refactor/reduce-unsafe-and-dedup branch from 59d992a to fce4c3a Compare March 18, 2026 12:13
liuq19 and others added 2 commits March 18, 2026 20:13
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

liuq19 and others added 2 commits March 20, 2026 14:08
Change Meta from a plain u64 struct to a union of u64 and *const Shared.
This preserves pointer provenance through storage for root_node variants,
making pack_shared/unpack_root strict-provenance compatible without
needing expose/recover roundtrips.

- Meta union stores tagged pointer via ptr field (root_node) or integer
  via val field (all other kinds)
- Replace implicit `as usize` casts with explicit expose_provenance() /
  with_exposed_provenance() for remaining exposed provenance sites
- Remove redundant double expose/recover in unpack_shared (MetaNode
  already stores pointer with full provenance)

Miri results:
- Default (permissive provenance): all tests pass
- Strict provenance: all tests pass except those exercising
  forward_find_shared (bump allocation navigation requires exposed
  provenance by design)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@liuq19
Copy link
Collaborator Author

liuq19 commented Mar 20, 2026

@codex

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2efba2da93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

liuq19 and others added 2 commits March 20, 2026 14:30
On 32-bit targets (e.g. wasm32-wasip1), *const Shared is 4 bytes but
u64 is 8 bytes. Writing through the union ptr field would only
initialize the lower 4 bytes, leaving the upper half uninitialized.
Subsequent reads through read_val() would then be UB.

Fix: use cfg(target_pointer_width = "64") to gate the union approach.
On 32-bit targets, Meta falls back to a plain u64 struct with
exposed provenance for the root_node round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace self-hosted runner with ubuntu-latest to avoid queue delays
when no self-hosted runner is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants