Skip to content

Implement LOAD_ATTR inline caching with adaptive specialization#7292

Merged
youknowone merged 6 commits intoRustPython:mainfrom
youknowone:load-attr
Mar 2, 2026
Merged

Implement LOAD_ATTR inline caching with adaptive specialization#7292
youknowone merged 6 commits intoRustPython:mainfrom
youknowone:load-attr

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 1, 2026

Add type version counter (tp_version_tag) to PyType with subclass invalidation cascade. Add cache read/write methods (u16/u32/u64) to CodeUnits. Implement adaptive specialization in load_attr that replaces the opcode with specialized variants on first execution:

  • LoadAttrMethodNoDict: cached method lookup for slotted types
  • LoadAttrMethodWithValues: cached method with dict shadow check
  • LoadAttrInstanceValue: direct dict lookup skipping descriptors

Specialized opcodes guard on type_version_tag and deoptimize back to generic LOAD_ATTR with backoff counter on cache miss.

Summary by CodeRabbit

  • New Features

    • Adaptive bytecode caching with warmup/backoff and lazy quickening for hot instructions
    • Per-type and per-function versioning to manage cache invalidation
    • Instruction specialization and deoptimization pathways to revert optimized opcodes
    • Fast-path call/load and specialization fallback for improved runtime performance
  • Refactor

    • Bytecode serialization now uses original instruction bytes directly
  • Chores

    • Updated spell-checker dictionary and ignore list entries

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

📝 Walkthrough

Walkthrough

Implements adaptive specialization and inline-caching: adds per-function/type versioning, adaptive counters and quickening, instruction deoptimization, cache read/write helpers, and integrates these into frame execution, code objects, serialization, and de-instrumentation paths.

Changes

Cohort / File(s) Summary
Spell-checker
\.cspell.dict/cpython.txt, \.cspell.json
Added tokens deopt, deoptimize, and kwonly to spell-check configuration.
Bytecode helpers & constants
crates/compiler-core/src/bytecode.rs
Added ADAPTIVE_WARMUP_VALUE and ADAPTIVE_BACKOFF_VALUE; added CodeUnits cache read/write helpers, adaptive counter accessors, original_bytes(), quicken(), and improved TryFrom<&[u8]> error propagation.
Instruction deoptimization
crates/compiler-core/src/bytecode/instruction.rs
Added Instruction::deoptimize() mapping specialized/instrumented opcodes back to base variants and adjusted cache_entries to compute from base opcodes.
Codegen cache emission
crates/codegen/src/ir.rs
Guarded emitting CACHE code units: emit only when cache_count > 0.
Serialization
crates/compiler-core/src/marshal.rs
Replaced unsafe transmute extraction with code.instructions.original_bytes() during serialize_code.
Code object adaptive view
crates/vm/src/builtins/code.rs
Added quickened: AtomicBool to PyCode, initialize to false, and added _co_code_adaptive() to return in-memory (possibly quickened) bytecode.
Function versioning & fast-paths
crates/vm/src/builtins/function.rs
Added per-function func_version and global counter, func_version() accessor, can_specialize_call() eligibility, invoke_exact_args() fast path, and invalidation on code/defaults mutations.
Type versioning & invalidation
crates/vm/src/builtins/type.rs, crates/vm/src/object/core.rs
Added tp_version_tag: AtomicU32 to PyType, NEXT_TYPE_VERSION counter, assign_version_tag() and modified() for cascading invalidation; wired calls to modified() on attribute and bases mutation paths and initialized new field in core type payloads.
Frame execution (adaptive quickening)
crates/vm/src/frame.rs
Introduced adaptive/quickening logic for hot instructions (LOAD_ATTR, BinaryOp, CALL, RESUME), adaptive counters, specializer/deoptimizer helpers, descriptor/type-version caching, and extensive opcode handler updates to emit/consume specialized opcodes and fall back to base opcodes.
Monitoring de-instrumentation
crates/vm/src/stdlib/sys/monitoring.rs
Phase 3 de-instrumentation now uses deoptimize() and explicitly clears per-opcode CACHE entries, advancing by 1 + caches per deoptimized instruction.

Sequence Diagram

sequenceDiagram
    participant Frame as ExecutingFrame
    participant Instr as Instruction
    participant Code as CodeObject
    participant Cache as CacheStorage
    participant Type as PyType/FuncVersion

    Code->>Frame: provide bytecode (_co_code_adaptive)
    Frame->>Instr: fetch opcode (adaptive)
    Instr->>Cache: read adaptive counter
    alt counter > 0
        Cache->>Frame: decrement counter, execute generic opcode
    else counter == 0
        Frame->>Type: check version_tag / func_version
        Cache->>Frame: read cached descriptor/target
        alt cache hit & version matches
            Frame->>Instr: specialize opcode in-place
            Frame->>Type: invoke specialized fast-path (invoke_exact_args / fast attr)
        else cache miss or mismatch
            Frame->>Cache: store new cache entry, reset warmup
            Instr->>Instr: replace with specialized variant or deopt back to base
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

🐰 I hop through bytecode, counters in paw,
warming caches, then specialize with awe.
If types shift or versions rescind,
I deopt and tidy what I pinned.
Quick paths bloom — a rabbit’s small win. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement LOAD_ATTR inline caching with adaptive specialization' directly and clearly summarizes the main objective of the pull request: adding adaptive inline caching for the LOAD_ATTR instruction with type version counters and specialized opcodes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin load-attr

@youknowone youknowone force-pushed the load-attr branch 5 times, most recently from e136814 to ddb5528 Compare March 2, 2026 01:12
@youknowone youknowone marked this pull request as ready for review March 2, 2026 02:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/builtins/type.rs (1)

1923-1945: ⚠️ Potential issue | 🟠 Major

Descriptor-setter path skips modified() invalidation.

At Line 1926, setattro returns immediately after descr_set, so Line 1944 never runs. Descriptor-backed setters that mutate type attributes (for example set___module__) can leave inline caches stale.

🛠️ Proposed fix
 if let Some(attr) = zelf.get_class_attr(attr_name) {
     let descr_set = attr.class().slots.descr_set.load();
     if let Some(descriptor) = descr_set {
-        return descriptor(&attr, zelf.to_owned().into(), value, vm);
+        descriptor(&attr, zelf.to_owned().into(), value, vm)?;
+        zelf.modified();
+        return Ok(());
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1923 - 1945, The descriptor-path
in setattro uses an early return when calling the descriptor (via descr_set and
descriptor(&attr, zelf.to_owned().into(), value, vm)), so zelf.modified() is
never executed for descriptor-backed setters; change that return into capturing
the descriptor call result (e.g. let res = descriptor(...) ), call
zelf.modified() before returning, and then return res (or propagate the error)
so that any mutations performed by descriptor setters still trigger the
inline-cache invalidation; ensure this touches the block using
zelf.get_class_attr, descr_set, descriptor(...), and the existing
zelf.modified() call.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

2679-2687: Extract duplicated deopt blocks into helpers.

These branches all perform the same deopt sequence (replace_op + write_adaptive_counter). A small helper per opcode family would reduce repetition and drift risk.

As per coding guidelines "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

Also applies to: 2719-2726, 2751-2758, 2884-2894, 2922-2932

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 2679 - 2687, Several branches perform
the same deopt sequence (calling self.code.instructions.replace_op(...) followed
by write_adaptive_counter(...)) and should be consolidated: add a small helper
on the Frame (or Instructions) such as deopt_replace_and_backoff(&mut self,
instr_idx: usize, new_op: Instruction, cache_base: usize) that calls
self.code.instructions.replace_op(instr_idx, new_op) and then
self.code.instructions.write_adaptive_counter(cache_base,
ADAPTIVE_BACKOFF_VALUE); then replace the duplicated blocks (e.g., the places
that call replace_op with Instruction::LoadAttr { idx: Arg::marker() } and
similar variants) to compute the differing value (the Instruction variant and
cache_base) and call the helper instead to remove duplication and keep the same
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 555-571: The function quicken holds a shared borrow (let units =
unsafe { &*self.0.get() }) across the loop while calling write_adaptive_counter
which performs a mutable borrow, causing aliasing UB; fix by limiting the shared
borrow's lifetime so you only read what you need (e.g., copy units[i].op, call
op.cache_entries() and op.is_instrumented() into locals) while the shared borrow
is active, then drop that borrow (end the unsafe &*self.0.get() scope) before
invoking write_adaptive_counter(cache_base, ADAPTIVE_WARMUP_VALUE); in short,
ensure quicken does not keep units alive across the mutable call to
write_adaptive_counter.

In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 515-629: deoptimize() currently falls through for JumpBackwardJit
and JumpBackwardNoJit which leaves them specialized instead of mapping back to a
base opcode; add explicit match arms for Self::JumpBackwardJit and
Self::JumpBackwardNoJit to return the appropriate base (e.g. Self::JumpBackward
{ target: Arg::marker() } or Self::JumpBackwardNoInterrupt { target:
Arg::marker() } depending on intended semantics), similar to other
specialization branches, so marshaling via original_bytes() and PEP 669
monitoring see only base opcodes; reference deoptimize(), JumpBackwardJit,
JumpBackwardNoJit, JumpBackward, JumpBackwardNoInterrupt and to_base() when
making this change and verify mapping choice against NoJit semantics.

In `@crates/vm/src/builtins/function.rs`:
- Around line 611-621: can_specialize_call currently allows exact-args fast-path
for code objects that may rely on module globals when NEWLOCALS is unset,
changing semantics; update can_specialize_call to also require the
bytecode::CodeFlags::NEWLOCALS flag (i.e., only return true when NEWLOCALS is
set) so invoke_exact_args and invoke_with_locals remain equivalent, and apply
the same NEWLOCALS guard to the similar specialization logic in the other block
referenced around lines 627-660 (where the same flags and arg count checks are
performed).

In `@crates/vm/src/builtins/type.rs`:
- Around line 202-210: The current assign_version_tag uses
NEXT_TYPE_VERSION.fetch_add and treats the returned previous value v as the
assigned tag, which fails on wraparound because tags can become 0 or be reused;
change logic to compute the new tag as v.wrapping_add(1) (call it new_v), check
if new_v == 0 (wrap occurred) and in that case store 0 into self.tp_version_tag
and return 0, otherwise store new_v with Ordering::Release and return new_v;
update references in assign_version_tag to use new_v instead of v and keep
atomic ordering semantics.

In `@crates/vm/src/frame.rs`:
- Around line 2702-2704: The instance-dict lookup currently swallows errors via
`.ok().flatten()` which hides lookup exceptions and can wrongly take the fast
cached path; change the `shadowed` computation (the `if let Some(dict) =
owner.dict() { dict.get_item_opt(attr_name, vm).ok().flatten().is_some() } else
{ ... }` code) to let lookup errors propagate instead of being dropped — e.g.
call `dict.get_item_opt(attr_name, vm)?` and check `.is_some()` (or explicitly
match `Ok(Some(_)) / Ok(None) / Err(e) => return Err(e)`), so attribute lookup
failures from `get_item_opt` bubble up rather than being ignored.

---

Outside diff comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 1923-1945: The descriptor-path in setattro uses an early return
when calling the descriptor (via descr_set and descriptor(&attr,
zelf.to_owned().into(), value, vm)), so zelf.modified() is never executed for
descriptor-backed setters; change that return into capturing the descriptor call
result (e.g. let res = descriptor(...) ), call zelf.modified() before returning,
and then return res (or propagate the error) so that any mutations performed by
descriptor setters still trigger the inline-cache invalidation; ensure this
touches the block using zelf.get_class_attr, descr_set, descriptor(...), and the
existing zelf.modified() call.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 2679-2687: Several branches perform the same deopt sequence
(calling self.code.instructions.replace_op(...) followed by
write_adaptive_counter(...)) and should be consolidated: add a small helper on
the Frame (or Instructions) such as deopt_replace_and_backoff(&mut self,
instr_idx: usize, new_op: Instruction, cache_base: usize) that calls
self.code.instructions.replace_op(instr_idx, new_op) and then
self.code.instructions.write_adaptive_counter(cache_base,
ADAPTIVE_BACKOFF_VALUE); then replace the duplicated blocks (e.g., the places
that call replace_op with Instruction::LoadAttr { idx: Arg::marker() } and
similar variants) to compute the differing value (the Instruction variant and
cache_base) and call the helper instead to remove duplication and keep the same
behavior.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b89d82 and 821e648.

📒 Files selected for processing (12)
  • .cspell.dict/cpython.txt
  • .cspell.json
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/bytecode/instruction.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/stdlib/sys/monitoring.rs

Comment on lines +202 to +210
/// Assign a fresh version tag. Returns 0 on overflow (all caches invalidated).
pub fn assign_version_tag(&self) -> u32 {
let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed);
if v == 0 {
return 0;
}
self.tp_version_tag.store(v, Ordering::Release);
v
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Version-tag overflow can reintroduce stale tag matches.

After counter wraparound, non-zero tags can be reused. Returning 0 only when fetch_add returns 0 does not prevent later tag reuse.

🛠️ Proposed fix
 pub fn assign_version_tag(&self) -> u32 {
     let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed);
-    if v == 0 {
+    if v == 0 || v == u32::MAX {
+        NEXT_TYPE_VERSION.store(0, Ordering::Relaxed);
+        self.tp_version_tag.store(0, Ordering::Release);
         return 0;
     }
     self.tp_version_tag.store(v, Ordering::Release);
     v
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Assign a fresh version tag. Returns 0 on overflow (all caches invalidated).
pub fn assign_version_tag(&self) -> u32 {
let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed);
if v == 0 {
return 0;
}
self.tp_version_tag.store(v, Ordering::Release);
v
}
/// Assign a fresh version tag. Returns 0 on overflow (all caches invalidated).
pub fn assign_version_tag(&self) -> u32 {
let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed);
if v == 0 || v == u32::MAX {
NEXT_TYPE_VERSION.store(0, Ordering::Relaxed);
self.tp_version_tag.store(0, Ordering::Release);
return 0;
}
self.tp_version_tag.store(v, Ordering::Release);
v
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 202 - 210, The current
assign_version_tag uses NEXT_TYPE_VERSION.fetch_add and treats the returned
previous value v as the assigned tag, which fails on wraparound because tags can
become 0 or be reused; change logic to compute the new tag as v.wrapping_add(1)
(call it new_v), check if new_v == 0 (wrap occurred) and in that case store 0
into self.tp_version_tag and return 0, otherwise store new_v with
Ordering::Release and return new_v; update references in assign_version_tag to
use new_v instead of v and keep atomic ordering semantics.

Add type version counter (tp_version_tag) to PyType with subclass
invalidation cascade. Add cache read/write methods (u16/u32/u64)
to CodeUnits. Implement adaptive specialization in load_attr that
replaces the opcode with specialized variants on first execution:

- LoadAttrMethodNoDict: cached method lookup for slotted types
- LoadAttrMethodWithValues: cached method with dict shadow check
- LoadAttrInstanceValue: direct dict lookup skipping descriptors

Specialized opcodes guard on type_version_tag and deoptimize back
to generic LOAD_ATTR with backoff counter on cache miss.
BINARY_OP: Specialize int add/subtract/multiply and float
add/subtract/multiply with type guards and deoptimization.

CALL: Add func_version to PyFunction, specialize simple
function calls (CallPyExactArgs, CallBoundMethodExactArgs)
with invoke_exact_args fast path that skips FuncArgs
allocation and fill_locals_from_args.
Move counter initialization from compile-time to RESUME execution,
matching CPython's _PyCode_Quicken pattern. Store counter in CACHE
entry's arg byte to preserve op=Instruction::Cache for dis/JIT.
Add PyCode.quickened flag for one-time initialization.
- deoptimize() maps specialized opcodes back to their base adaptive variant
- original_bytes() produces deoptimized bytecode with zeroed CACHE entries
- co_code now returns deoptimized bytes, _co_code_adaptive returns current bytes
- Marshal serialization uses original_bytes() instead of raw transmute
- cache_entries() returns correct count for instrumented opcodes
- deoptimize() maps instrumented opcodes back to base
- quicken() skips adaptive counter for instrumented opcodes
- instrument_code Phase 3 deoptimizes specialized opcodes and
  clears CACHE entries to prevent stale pointer dereferences
- Add bounds checks to read_cache_u16/u32/u64
- Fix quicken() aliasing UB by using &mut directly
- Add JumpBackwardJit/JumpBackwardNoJit to deoptimize()
- Guard can_specialize_call with NEWLOCALS flag check
- Use compare_exchange_weak for version tag to prevent wraparound
- Propagate dict lookup errors in LoadAttrMethodWithValues
- Apply adaptive backoff on version tag assignment failure
- Remove duplicate imports in frame.rs
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)

2722-2738: ⚠️ Potential issue | 🟠 Major

Propagate dict lookup errors instead of retrying via slow path.

The Err(_) arm deoptimizes and re-runs lookup through load_attr_slow, which drops the original error and can duplicate side effects.

💡 Proposed fix
-                            Err(_) => {
-                                // Dict lookup error → deoptimize to safe path
-                                unsafe {
-                                    self.code.instructions.replace_op(
-                                        instr_idx,
-                                        Instruction::LoadAttr { idx: Arg::marker() },
-                                    );
-                                    self.code
-                                        .instructions
-                                        .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE);
-                                }
-                                return self.load_attr_slow(vm, oparg);
-                            }
+                            Err(e) => {
+                                // Dict lookup error should propagate; keep deopt for future runs.
+                                unsafe {
+                                    self.code.instructions.replace_op(
+                                        instr_idx,
+                                        Instruction::LoadAttr { idx: Arg::marker() },
+                                    );
+                                    self.code
+                                        .instructions
+                                        .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE);
+                                }
+                                return Err(e);
+                            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 2722 - 2738, The dict lookup Err branch
currently swallows the original error, deoptimizes, and re-runs via
load_attr_slow; instead capture and propagate the original error from
dict.get_item_opt(attr_name, vm) so callers see the original failure and side
effects are not duplicated. Replace the Err(_) arm in the match over
dict.get_item_opt with Err(err) => return Err(err) (or use the ? operator)
rather than calling self.code.instructions.replace_op / write_adaptive_counter
and return self.load_attr_slow; leave the deoptimization path only for non-error
conditions where you intentionally want to fall back to the safe path.
🧹 Nitpick comments (4)
crates/vm/src/builtins/type.rs (3)

54-56: Avoid exposing tp_version_tag: AtomicU32 as a public writable field (easy invariant break).

Making the raw atomic public means any module can store arbitrary values (including non-0 “invalid” states) and bypass invalidation rules. Prefer pub(crate) + accessor(s), or keep it private and expose only load_tp_version_tag() / invalidate_tp_version_tag() / assign_version_tag().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 54 - 56, The field
tp_version_tag: AtomicU32 is currently public and allows external code to break
invariants by writing arbitrary values; change its visibility to non-public
(make it private or pub(crate)) and add controlled accessors such as
load_tp_version_tag(&self) -> u32, assign_version_tag(&self, new: u32) to set a
valid tag, and invalidate_tp_version_tag(&self) to set it to 0, ensuring writers
use AtomicU32 methods internally and enforcing any validation/ordering
semantics; update call sites that reference tp_version_tag directly to use these
new methods.

219-228: modified() should avoid unwrap() + consider iterative traversal to prevent deep recursion.

sub.downcast_ref::<PyType>().unwrap() will panic if subclasses ever contains a non-PyType (even transiently). Also, recursive subclass traversal can stack overflow in pathological hierarchies. An iterative stack/VecDeque walk is safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 219 - 228, In modified(), avoid
unwrap and recursion: replace the recursive call to
sub.downcast_ref::<PyType>().unwrap().modified() with an iterative traversal
using a stack/VecDeque of upgraded PyType refs; for each upgraded object, call
downcast_ref::<PyType>() and if it returns Some, store its tp_version_tag with
Ordering::Release and push its subclasses' weak refs (upgrading them when
needed) onto the deque, otherwise skip that entry (do not panic). Keep the
initial tp_version_tag.store(0, Ordering::Release) behavior for self and ensure
you only call modified-equivalent logic on successfully downcasted PyType
instances.

202-217: Simplify assign_version_tag() with fetch_update to reduce spin-loop complexity.

The current CAS loop pattern is correct, but fetch_update reduces boilerplate and makes the code less error-prone:

Proposed simplification
 pub fn assign_version_tag(&self) -> u32 {
-    loop {
-        let current = NEXT_TYPE_VERSION.load(Ordering::Relaxed);
-        let Some(next) = current.checked_add(1) else {
-            return 0; // Overflow: version space exhausted
-        };
-        if NEXT_TYPE_VERSION
-            .compare_exchange_weak(current, next, Ordering::Relaxed, Ordering::Relaxed)
-            .is_ok()
-        {
-            self.tp_version_tag.store(current, Ordering::Release);
-            return current;
-        }
-    }
+    let assigned = NEXT_TYPE_VERSION
+        .fetch_update(Ordering::Relaxed, Ordering::Relaxed, |current| current.checked_add(1))
+        .ok()
+        .unwrap_or(0);
+
+    if assigned == 0 {
+        return 0;
+    }
+    self.tp_version_tag.store(assigned, Ordering::Release);
+    assigned
 }

Note: Overflow (returning 0) is intentional—unversioned types don't use inline caching per the guard checks in frame.rs, and the backoff mechanism prevents re-attempting on overflow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 202 - 217, Replace the explicit
CAS spin-loop in assign_version_tag() with AtomicU32::fetch_update on
NEXT_TYPE_VERSION: call fetch_update(Relaxed, Relaxed, |current|
current.checked_add(1).ok_or(current)) and handle the Result—on Ok(prev) store
prev into self.tp_version_tag with Ordering::Release and return prev; on Err(_)
treat as overflow and return 0. Keep the same ordering semantics and the
overflow behavior (return 0) and reference NEXT_TYPE_VERSION and tp_version_tag
in your change.
crates/vm/src/frame.rs (1)

2699-2707: Extract duplicated LOAD_ATTR deopt sequence into a helper.

The same deopt/backoff block is repeated in multiple branches; consolidating it will reduce drift risk.

♻️ Refactor sketch
+    #[inline]
+    fn deoptimize_load_attr(&mut self, instr_idx: usize, cache_base: usize) {
+        unsafe {
+            self.code
+                .instructions
+                .replace_op(instr_idx, Instruction::LoadAttr { idx: Arg::marker() });
+            self.code
+                .instructions
+                .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE);
+        }
+    }
...
-                unsafe {
-                    self.code
-                        .instructions
-                        .replace_op(instr_idx, Instruction::LoadAttr { idx: Arg::marker() });
-                    self.code
-                        .instructions
-                        .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE);
-                }
+                self.deoptimize_load_attr(instr_idx, cache_base);

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

Also applies to: 2755-2762, 2787-2794

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 2699 - 2707, Duplicate deopt/backoff
logic for LOAD_ATTR should be extracted into a single helper to avoid
repetition: create a method (e.g., deopt_load_attr or write_load_attr_deopt) on
the same type as self that takes the differing values (instr_idx and
cache_base), and move the unsafe block that calls
self.code.instructions.replace_op(... Instruction::LoadAttr { idx: Arg::marker()
}) and self.code.instructions.write_adaptive_counter(cache_base,
ADAPTIVE_BACKOFF_VALUE) into that helper; then replace each repeated block with
a call to this helper passing instr_idx and cache_base. Ensure the helper
preserves the unsafe semantics and references Instruction::LoadAttr,
Arg::marker(), write_adaptive_counter, and ADAPTIVE_BACKOFF_VALUE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 602-663: invoke_exact_args currently assumes preconditions and can
OOB panic; add debug-only guards to assert those preconditions (or make the
function non-public if only internal). In the body of invoke_exact_args (and/or
at its start) add debug_asserts: debug_assert_eq!(args.len(), code.arg_count as
usize); debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS));
debug_assert_eq!(code.kwonlyarg_count, 0); and
debug_assert!(!code.flags.intersects(bytecode::CodeFlags::VARARGS |
bytecode::CodeFlags::VARKEYWORDS | bytecode::CodeFlags::GENERATOR |
bytecode::CodeFlags::COROUTINE)); this will prevent OOB writes during
development (or alternatively change pub fn invoke_exact_args to pub(crate) if
it is not externally used).
- Around line 684-686: The cache invalidation currently sets func_version to 0
in setters like set___code__, set___defaults__, and set___kwdefaults__, which
permanently disables re-validation; instead either (A) update those setters to
assign a fresh non-zero version by using FUNC_VERSION_COUNTER.fetch_add(1,
Relaxed) when invalidating func_version so subsequent validations can succeed,
or (B) add a helper fn assign_func_version(&self) -> u32 that generates/sets a
non-zero version (using FUNC_VERSION_COUNTER) and call that helper from the
hot-path cache validation code whenever func_version == 0 so the function can be
re-validated; update references to func_version, set___code__, set___defaults__,
set___kwdefaults__, and add/use assign_func_version or FUNC_VERSION_COUNTER
accordingly.

In `@crates/vm/src/builtins/type.rs`:
- Around line 843-845: When updating a class's __bases__ you already call
zelf.modified() for IC invalidation, but you must also remove the class from the
old bases' subclasses lists and add it to the new bases' subclasses to avoid
stale subclass entries; update the code path that changes __bases__ (the
function that mutates __bases__) to iterate old_bases and call something like
old_base.subclasses.remove(zelf) and then for each new_base add zelf to
new_base.subclasses, then call zelf.modified() and ensure any TODO about
removing from old bases' subclasses is implemented accordingly.
- Line 58: The global counter NEXT_TYPE_VERSION currently uses checked_add in
assign_version_tag which will stop advancing once it hits u32::MAX and then
always return 0; change the overflow policy to a defined behavior (e.g., wrap
back to 1) by replacing the checked_add logic with an atomic update that
advances the counter and, on overflow, resets it to 1; implement this using
AtomicU32::fetch_update or a loop with compare_exchange on NEXT_TYPE_VERSION in
assign_version_tag so that when the current value is u32::MAX you store 1,
otherwise you store current + 1, ensuring the counter never permanently becomes
0 and the doc comment (“all caches invalidated”) is upheld.

---

Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 2722-2738: The dict lookup Err branch currently swallows the
original error, deoptimizes, and re-runs via load_attr_slow; instead capture and
propagate the original error from dict.get_item_opt(attr_name, vm) so callers
see the original failure and side effects are not duplicated. Replace the Err(_)
arm in the match over dict.get_item_opt with Err(err) => return Err(err) (or use
the ? operator) rather than calling self.code.instructions.replace_op /
write_adaptive_counter and return self.load_attr_slow; leave the deoptimization
path only for non-error conditions where you intentionally want to fall back to
the safe path.

---

Nitpick comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 54-56: The field tp_version_tag: AtomicU32 is currently public and
allows external code to break invariants by writing arbitrary values; change its
visibility to non-public (make it private or pub(crate)) and add controlled
accessors such as load_tp_version_tag(&self) -> u32, assign_version_tag(&self,
new: u32) to set a valid tag, and invalidate_tp_version_tag(&self) to set it to
0, ensuring writers use AtomicU32 methods internally and enforcing any
validation/ordering semantics; update call sites that reference tp_version_tag
directly to use these new methods.
- Around line 219-228: In modified(), avoid unwrap and recursion: replace the
recursive call to sub.downcast_ref::<PyType>().unwrap().modified() with an
iterative traversal using a stack/VecDeque of upgraded PyType refs; for each
upgraded object, call downcast_ref::<PyType>() and if it returns Some, store its
tp_version_tag with Ordering::Release and push its subclasses' weak refs
(upgrading them when needed) onto the deque, otherwise skip that entry (do not
panic). Keep the initial tp_version_tag.store(0, Ordering::Release) behavior for
self and ensure you only call modified-equivalent logic on successfully
downcasted PyType instances.
- Around line 202-217: Replace the explicit CAS spin-loop in
assign_version_tag() with AtomicU32::fetch_update on NEXT_TYPE_VERSION: call
fetch_update(Relaxed, Relaxed, |current| current.checked_add(1).ok_or(current))
and handle the Result—on Ok(prev) store prev into self.tp_version_tag with
Ordering::Release and return prev; on Err(_) treat as overflow and return 0.
Keep the same ordering semantics and the overflow behavior (return 0) and
reference NEXT_TYPE_VERSION and tp_version_tag in your change.

In `@crates/vm/src/frame.rs`:
- Around line 2699-2707: Duplicate deopt/backoff logic for LOAD_ATTR should be
extracted into a single helper to avoid repetition: create a method (e.g.,
deopt_load_attr or write_load_attr_deopt) on the same type as self that takes
the differing values (instr_idx and cache_base), and move the unsafe block that
calls self.code.instructions.replace_op(... Instruction::LoadAttr { idx:
Arg::marker() }) and self.code.instructions.write_adaptive_counter(cache_base,
ADAPTIVE_BACKOFF_VALUE) into that helper; then replace each repeated block with
a call to this helper passing instr_idx and cache_base. Ensure the helper
preserves the unsafe semantics and references Instruction::LoadAttr,
Arg::marker(), write_adaptive_counter, and ADAPTIVE_BACKOFF_VALUE.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 821e648 and 79ca9c1.

📒 Files selected for processing (12)
  • .cspell.dict/cpython.txt
  • .cspell.json
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/bytecode/instruction.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/stdlib/sys/monitoring.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • crates/compiler-core/src/marshal.rs
  • .cspell.dict/cpython.txt
  • crates/codegen/src/ir.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/stdlib/sys/monitoring.rs
  • crates/vm/src/object/core.rs
  • .cspell.json

Comment on lines +602 to 663
/// Returns the function version, or 0 if invalidated.
#[inline]
pub fn func_version(&self) -> u32 {
self.func_version.load(Relaxed)
}

/// Check if this function is eligible for exact-args call specialization.
/// Returns true if: no VARARGS, no VARKEYWORDS, no kwonly args, not generator/coroutine,
/// and effective_nargs matches co_argcount.
pub(crate) fn can_specialize_call(&self, effective_nargs: u32) -> bool {
let code = self.code.lock();
let flags = code.flags;
flags.contains(bytecode::CodeFlags::NEWLOCALS)
&& !flags.intersects(
bytecode::CodeFlags::VARARGS
| bytecode::CodeFlags::VARKEYWORDS
| bytecode::CodeFlags::GENERATOR
| bytecode::CodeFlags::COROUTINE,
)
&& code.kwonlyarg_count == 0
&& code.arg_count == effective_nargs
}

/// Fast path for calling a simple function with exact positional args.
/// Skips FuncArgs allocation, prepend_arg, and fill_locals_from_args.
/// Only valid when: no VARARGS, no VARKEYWORDS, no kwonlyargs, not generator/coroutine,
/// and nargs == co_argcount.
pub fn invoke_exact_args(&self, args: &[PyObjectRef], vm: &VirtualMachine) -> PyResult {
let code = self.code.lock().clone();

let locals = ArgMapping::from_dict_exact(vm.ctx.new_dict());

let frame = Frame::new(
code.clone(),
Scope::new(Some(locals), self.globals.clone()),
self.builtins.clone(),
self.closure.as_ref().map_or(&[], |c| c.as_slice()),
Some(self.to_owned().into()),
vm,
)
.into_ref(&vm.ctx);

// Copy args directly into fastlocals
{
let fastlocals = unsafe { frame.fastlocals.borrow_mut() };
for (i, arg) in args.iter().enumerate() {
fastlocals[i] = Some(arg.clone());
}
}

// Handle cell2arg
if let Some(cell2arg) = code.cell2arg.as_deref() {
let fastlocals = unsafe { frame.fastlocals.borrow_mut() };
for (cell_idx, arg_idx) in cell2arg.iter().enumerate().filter(|(_, i)| **i != -1) {
let x = fastlocals[*arg_idx as usize].take();
frame.set_cell_contents(cell_idx, x);
}
}

vm.run_frame(frame)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

invoke_exact_args() should enforce (at least in debug) its preconditions to avoid OOB writes/panics.

Since this is pub fn, add debug_assert! guards (or reduce visibility to pub(crate) if only used by the interpreter). At minimum:

  • debug_assert_eq!(args.len(), code.arg_count as usize)
  • debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS))
  • debug_assert!(code.kwonlyarg_count == 0) and no VARARGS/VARKEYWORDS/GENERATOR/COROUTINE
Minimal defensive diff (debug-only)
 pub fn invoke_exact_args(&self, args: &[PyObjectRef], vm: &VirtualMachine) -> PyResult {
     let code = self.code.lock().clone();
+
+    debug_assert_eq!(args.len(), code.arg_count as usize);
+    debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS));
+    debug_assert!(
+        !code.flags.intersects(
+            bytecode::CodeFlags::VARARGS
+                | bytecode::CodeFlags::VARKEYWORDS
+                | bytecode::CodeFlags::GENERATOR
+                | bytecode::CodeFlags::COROUTINE
+        )
+    );
+    debug_assert_eq!(code.kwonlyarg_count, 0);

     let locals = ArgMapping::from_dict_exact(vm.ctx.new_dict());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 602 - 663, invoke_exact_args
currently assumes preconditions and can OOB panic; add debug-only guards to
assert those preconditions (or make the function non-public if only internal).
In the body of invoke_exact_args (and/or at its start) add debug_asserts:
debug_assert_eq!(args.len(), code.arg_count as usize);
debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS));
debug_assert_eq!(code.kwonlyarg_count, 0); and
debug_assert!(!code.flags.intersects(bytecode::CodeFlags::VARARGS |
bytecode::CodeFlags::VARKEYWORDS | bytecode::CodeFlags::GENERATOR |
bytecode::CodeFlags::COROUTINE)); this will prevent OOB writes during
development (or alternatively change pub fn invoke_exact_args to pub(crate) if
it is not externally used).

Comment on lines 684 to 686
*self.code.lock() = code;
// TODO: jit support
// #[cfg(feature = "jit")]
// {
// // If available, clear cached compiled code.
// let _ = self.jitted_code.take();
// }
self.func_version.store(0, Relaxed);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's look at the specific lines mentioned in the review comment
echo "=== Code at lines 684-686 and related lines (693-696, 703-706) ==="
sed -n '680,710p' crates/vm/src/builtins/function.rs

echo -e "\n=== Search for func_version usage patterns ==="
rg -n --type=rust '\bfunc_version\b' crates/vm/src/builtins/function.rs | head -30

echo -e "\n=== Search for func_version.store() calls ==="
rg -n --type=rust 'func_version\.store\(' crates/vm/src/builtins/function.rs

echo -e "\n=== Search for func_version increment/assignment patterns ==="
rg -n --type=rust 'func_version\s*[=+]' crates/vm/src

Repository: RustPython/RustPython

Length of output: 1810


🏁 Script executed:

#!/bin/bash
echo "=== Search for func_version() calls (reading the value) ==="
rg -n --type=rust 'func_version\(\)' crates/vm/src --max-count=20

echo -e "\n=== Search for FUNC_VERSION_COUNTER ==="
rg -n --type=rust 'FUNC_VERSION_COUNTER' crates/vm/src

echo -e "\n=== Search for version checks (== 0, != 0) ==="
rg -n --type=rust 'func_version.*[!=]=' crates/vm/src

echo -e "\n=== Check function.rs for caching logic ==="
rg -n --type=rust 'cache|version' crates/vm/src/builtins/function.rs | head -20

Repository: RustPython/RustPython

Length of output: 1551


🏁 Script executed:

#!/bin/bash
echo "=== Cache validation and storage context in frame.rs ==="
sed -n '2900,2950p' crates/vm/src/frame.rs
echo ""
sed -n '4680,4695p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 3484


🏁 Script executed:

#!/bin/bash
echo "=== Search for set___code__, set___defaults__, set___kwdefaults__ usage ==="
rg -n --type=rust 'set___(code|defaults|kwdefaults)' crates/vm/src --max-count=20

echo -e "\n=== Check if there's any doc or comment about version 0 behavior ==="
sed -n '600,610p' crates/vm/src/builtins/function.rs

echo -e "\n=== Look for any version reassignment logic ==="
rg -n --type=rust 'FUNC_VERSION_COUNTER|fetch_add' crates/vm/src/builtins/function.rs

Repository: RustPython/RustPython

Length of output: 1097


Fix cache invalidation design: func_version set to 0 has no mechanism to re-validate after function mutation.

The func_version field is documented as "0 if invalidated" and used as a cache key. Currently, setting it to 0 (in set___code__, set___defaults__, set___kwdefaults__) is permanent—no mechanism reassigns a fresh non-zero version. Once mutated, the cache can never be re-validated for that function object if it's mutated again.

Options:

  • Store a fresh counter value instead of 0 by calling FUNC_VERSION_COUNTER.fetch_add(1, Relaxed) when invalidating, OR
  • Keep 0 as the sentinel but add a helper fn assign_func_version(&self) -> u32 and call it from cache validation hot paths when seeing 0.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 684 - 686, The cache
invalidation currently sets func_version to 0 in setters like set___code__,
set___defaults__, and set___kwdefaults__, which permanently disables
re-validation; instead either (A) update those setters to assign a fresh
non-zero version by using FUNC_VERSION_COUNTER.fetch_add(1, Relaxed) when
invalidating func_version so subsequent validations can succeed, or (B) add a
helper fn assign_func_version(&self) -> u32 that generates/sets a non-zero
version (using FUNC_VERSION_COUNTER) and call that helper from the hot-path
cache validation code whenever func_version == 0 so the function can be
re-validated; update references to func_version, set___code__, set___defaults__,
set___kwdefaults__, and add/use assign_func_version or FUNC_VERSION_COUNTER
accordingly.

pub tp_version_tag: AtomicU32,
}

static NEXT_TYPE_VERSION: AtomicU32 = AtomicU32::new(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Define/verify global overflow policy for NEXT_TYPE_VERSION.

With the current checked_add, the counter will eventually reach u32::MAX and then assign_version_tag() will return 0 forever (effectively disabling new tags). That may be totally acceptable, but the doc comment says “all caches invalidated” — which is stronger than what the code actually enforces.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` at line 58, The global counter
NEXT_TYPE_VERSION currently uses checked_add in assign_version_tag which will
stop advancing once it hits u32::MAX and then always return 0; change the
overflow policy to a defined behavior (e.g., wrap back to 1) by replacing the
checked_add logic with an atomic update that advances the counter and, on
overflow, resets it to 1; implement this using AtomicU32::fetch_update or a loop
with compare_exchange on NEXT_TYPE_VERSION in assign_version_tag so that when
the current value is u32::MAX you store 1, otherwise you store current + 1,
ensuring the counter never permanently becomes 0 and the doc comment (“all
caches invalidated”) is upheld.

Comment on lines +843 to +845
// Invalidate inline caches
zelf.modified();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Calling modified() after __bases__ changes is correct; watch for stale subclass lists.

This makes sense for IC invalidation. One caveat: the method still has a TODO about removing the class from old bases’ subclasses lists; stale entries can amplify invalidation cascades and work.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 843 - 845, When updating a
class's __bases__ you already call zelf.modified() for IC invalidation, but you
must also remove the class from the old bases' subclasses lists and add it to
the new bases' subclasses to avoid stale subclass entries; update the code path
that changes __bases__ (the function that mutates __bases__) to iterate
old_bases and call something like old_base.subclasses.remove(zelf) and then for
each new_base add zelf to new_base.subclasses, then call zelf.modified() and
ensure any TODO about removing from old bases' subclasses is implemented
accordingly.

@youknowone youknowone merged commit c315033 into RustPython:main Mar 2, 2026
22 of 24 checks passed
@youknowone youknowone deleted the load-attr branch March 2, 2026 07:36
@coderabbitai coderabbitai bot mentioned this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant