Implement LOAD_ATTR inline caching with adaptive specialization#7292
Implement LOAD_ATTR inline caching with adaptive specialization#7292youknowone merged 6 commits intoRustPython:mainfrom
Conversation
📝 WalkthroughWalkthroughImplements adaptive specialization and inline-caching: adds per-function/type versioning, adaptive counters and quickening, instruction deoptimization, cache read/write helpers, and integrates these into frame execution, code objects, serialization, and de-instrumentation paths. Changes
Sequence DiagramsequenceDiagram
participant Frame as ExecutingFrame
participant Instr as Instruction
participant Code as CodeObject
participant Cache as CacheStorage
participant Type as PyType/FuncVersion
Code->>Frame: provide bytecode (_co_code_adaptive)
Frame->>Instr: fetch opcode (adaptive)
Instr->>Cache: read adaptive counter
alt counter > 0
Cache->>Frame: decrement counter, execute generic opcode
else counter == 0
Frame->>Type: check version_tag / func_version
Cache->>Frame: read cached descriptor/target
alt cache hit & version matches
Frame->>Instr: specialize opcode in-place
Frame->>Type: invoke specialized fast-path (invoke_exact_args / fast attr)
else cache miss or mismatch
Frame->>Cache: store new cache entry, reset warmup
Instr->>Instr: replace with specialized variant or deopt back to base
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin load-attr |
e136814 to
ddb5528
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/builtins/type.rs (1)
1923-1945:⚠️ Potential issue | 🟠 MajorDescriptor-setter path skips
modified()invalidation.At Line 1926,
setattroreturns immediately afterdescr_set, so Line 1944 never runs. Descriptor-backed setters that mutate type attributes (for exampleset___module__) can leave inline caches stale.🛠️ Proposed fix
if let Some(attr) = zelf.get_class_attr(attr_name) { let descr_set = attr.class().slots.descr_set.load(); if let Some(descriptor) = descr_set { - return descriptor(&attr, zelf.to_owned().into(), value, vm); + descriptor(&attr, zelf.to_owned().into(), value, vm)?; + zelf.modified(); + return Ok(()); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 1923 - 1945, The descriptor-path in setattro uses an early return when calling the descriptor (via descr_set and descriptor(&attr, zelf.to_owned().into(), value, vm)), so zelf.modified() is never executed for descriptor-backed setters; change that return into capturing the descriptor call result (e.g. let res = descriptor(...) ), call zelf.modified() before returning, and then return res (or propagate the error) so that any mutations performed by descriptor setters still trigger the inline-cache invalidation; ensure this touches the block using zelf.get_class_attr, descr_set, descriptor(...), and the existing zelf.modified() call.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)
2679-2687: Extract duplicated deopt blocks into helpers.These branches all perform the same deopt sequence (
replace_op+write_adaptive_counter). A small helper per opcode family would reduce repetition and drift risk.As per coding guidelines "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
Also applies to: 2719-2726, 2751-2758, 2884-2894, 2922-2932
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2679 - 2687, Several branches perform the same deopt sequence (calling self.code.instructions.replace_op(...) followed by write_adaptive_counter(...)) and should be consolidated: add a small helper on the Frame (or Instructions) such as deopt_replace_and_backoff(&mut self, instr_idx: usize, new_op: Instruction, cache_base: usize) that calls self.code.instructions.replace_op(instr_idx, new_op) and then self.code.instructions.write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); then replace the duplicated blocks (e.g., the places that call replace_op with Instruction::LoadAttr { idx: Arg::marker() } and similar variants) to compute the differing value (the Instruction variant and cache_base) and call the helper instead to remove duplication and keep the same behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 555-571: The function quicken holds a shared borrow (let units =
unsafe { &*self.0.get() }) across the loop while calling write_adaptive_counter
which performs a mutable borrow, causing aliasing UB; fix by limiting the shared
borrow's lifetime so you only read what you need (e.g., copy units[i].op, call
op.cache_entries() and op.is_instrumented() into locals) while the shared borrow
is active, then drop that borrow (end the unsafe &*self.0.get() scope) before
invoking write_adaptive_counter(cache_base, ADAPTIVE_WARMUP_VALUE); in short,
ensure quicken does not keep units alive across the mutable call to
write_adaptive_counter.
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 515-629: deoptimize() currently falls through for JumpBackwardJit
and JumpBackwardNoJit which leaves them specialized instead of mapping back to a
base opcode; add explicit match arms for Self::JumpBackwardJit and
Self::JumpBackwardNoJit to return the appropriate base (e.g. Self::JumpBackward
{ target: Arg::marker() } or Self::JumpBackwardNoInterrupt { target:
Arg::marker() } depending on intended semantics), similar to other
specialization branches, so marshaling via original_bytes() and PEP 669
monitoring see only base opcodes; reference deoptimize(), JumpBackwardJit,
JumpBackwardNoJit, JumpBackward, JumpBackwardNoInterrupt and to_base() when
making this change and verify mapping choice against NoJit semantics.
In `@crates/vm/src/builtins/function.rs`:
- Around line 611-621: can_specialize_call currently allows exact-args fast-path
for code objects that may rely on module globals when NEWLOCALS is unset,
changing semantics; update can_specialize_call to also require the
bytecode::CodeFlags::NEWLOCALS flag (i.e., only return true when NEWLOCALS is
set) so invoke_exact_args and invoke_with_locals remain equivalent, and apply
the same NEWLOCALS guard to the similar specialization logic in the other block
referenced around lines 627-660 (where the same flags and arg count checks are
performed).
In `@crates/vm/src/builtins/type.rs`:
- Around line 202-210: The current assign_version_tag uses
NEXT_TYPE_VERSION.fetch_add and treats the returned previous value v as the
assigned tag, which fails on wraparound because tags can become 0 or be reused;
change logic to compute the new tag as v.wrapping_add(1) (call it new_v), check
if new_v == 0 (wrap occurred) and in that case store 0 into self.tp_version_tag
and return 0, otherwise store new_v with Ordering::Release and return new_v;
update references in assign_version_tag to use new_v instead of v and keep
atomic ordering semantics.
In `@crates/vm/src/frame.rs`:
- Around line 2702-2704: The instance-dict lookup currently swallows errors via
`.ok().flatten()` which hides lookup exceptions and can wrongly take the fast
cached path; change the `shadowed` computation (the `if let Some(dict) =
owner.dict() { dict.get_item_opt(attr_name, vm).ok().flatten().is_some() } else
{ ... }` code) to let lookup errors propagate instead of being dropped — e.g.
call `dict.get_item_opt(attr_name, vm)?` and check `.is_some()` (or explicitly
match `Ok(Some(_)) / Ok(None) / Err(e) => return Err(e)`), so attribute lookup
failures from `get_item_opt` bubble up rather than being ignored.
---
Outside diff comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 1923-1945: The descriptor-path in setattro uses an early return
when calling the descriptor (via descr_set and descriptor(&attr,
zelf.to_owned().into(), value, vm)), so zelf.modified() is never executed for
descriptor-backed setters; change that return into capturing the descriptor call
result (e.g. let res = descriptor(...) ), call zelf.modified() before returning,
and then return res (or propagate the error) so that any mutations performed by
descriptor setters still trigger the inline-cache invalidation; ensure this
touches the block using zelf.get_class_attr, descr_set, descriptor(...), and the
existing zelf.modified() call.
---
Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 2679-2687: Several branches perform the same deopt sequence
(calling self.code.instructions.replace_op(...) followed by
write_adaptive_counter(...)) and should be consolidated: add a small helper on
the Frame (or Instructions) such as deopt_replace_and_backoff(&mut self,
instr_idx: usize, new_op: Instruction, cache_base: usize) that calls
self.code.instructions.replace_op(instr_idx, new_op) and then
self.code.instructions.write_adaptive_counter(cache_base,
ADAPTIVE_BACKOFF_VALUE); then replace the duplicated blocks (e.g., the places
that call replace_op with Instruction::LoadAttr { idx: Arg::marker() } and
similar variants) to compute the differing value (the Instruction variant and
cache_base) and call the helper instead to remove duplication and keep the same
behavior.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.cspell.dict/cpython.txt.cspell.jsoncrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/instruction.rscrates/compiler-core/src/marshal.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/object/core.rscrates/vm/src/stdlib/sys/monitoring.rs
| /// Assign a fresh version tag. Returns 0 on overflow (all caches invalidated). | ||
| pub fn assign_version_tag(&self) -> u32 { | ||
| let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed); | ||
| if v == 0 { | ||
| return 0; | ||
| } | ||
| self.tp_version_tag.store(v, Ordering::Release); | ||
| v | ||
| } |
There was a problem hiding this comment.
Version-tag overflow can reintroduce stale tag matches.
After counter wraparound, non-zero tags can be reused. Returning 0 only when fetch_add returns 0 does not prevent later tag reuse.
🛠️ Proposed fix
pub fn assign_version_tag(&self) -> u32 {
let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed);
- if v == 0 {
+ if v == 0 || v == u32::MAX {
+ NEXT_TYPE_VERSION.store(0, Ordering::Relaxed);
+ self.tp_version_tag.store(0, Ordering::Release);
return 0;
}
self.tp_version_tag.store(v, Ordering::Release);
v
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Assign a fresh version tag. Returns 0 on overflow (all caches invalidated). | |
| pub fn assign_version_tag(&self) -> u32 { | |
| let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed); | |
| if v == 0 { | |
| return 0; | |
| } | |
| self.tp_version_tag.store(v, Ordering::Release); | |
| v | |
| } | |
| /// Assign a fresh version tag. Returns 0 on overflow (all caches invalidated). | |
| pub fn assign_version_tag(&self) -> u32 { | |
| let v = NEXT_TYPE_VERSION.fetch_add(1, Ordering::Relaxed); | |
| if v == 0 || v == u32::MAX { | |
| NEXT_TYPE_VERSION.store(0, Ordering::Relaxed); | |
| self.tp_version_tag.store(0, Ordering::Release); | |
| return 0; | |
| } | |
| self.tp_version_tag.store(v, Ordering::Release); | |
| v | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/type.rs` around lines 202 - 210, The current
assign_version_tag uses NEXT_TYPE_VERSION.fetch_add and treats the returned
previous value v as the assigned tag, which fails on wraparound because tags can
become 0 or be reused; change logic to compute the new tag as v.wrapping_add(1)
(call it new_v), check if new_v == 0 (wrap occurred) and in that case store 0
into self.tp_version_tag and return 0, otherwise store new_v with
Ordering::Release and return new_v; update references in assign_version_tag to
use new_v instead of v and keep atomic ordering semantics.
Add type version counter (tp_version_tag) to PyType with subclass invalidation cascade. Add cache read/write methods (u16/u32/u64) to CodeUnits. Implement adaptive specialization in load_attr that replaces the opcode with specialized variants on first execution: - LoadAttrMethodNoDict: cached method lookup for slotted types - LoadAttrMethodWithValues: cached method with dict shadow check - LoadAttrInstanceValue: direct dict lookup skipping descriptors Specialized opcodes guard on type_version_tag and deoptimize back to generic LOAD_ATTR with backoff counter on cache miss.
BINARY_OP: Specialize int add/subtract/multiply and float add/subtract/multiply with type guards and deoptimization. CALL: Add func_version to PyFunction, specialize simple function calls (CallPyExactArgs, CallBoundMethodExactArgs) with invoke_exact_args fast path that skips FuncArgs allocation and fill_locals_from_args.
Move counter initialization from compile-time to RESUME execution, matching CPython's _PyCode_Quicken pattern. Store counter in CACHE entry's arg byte to preserve op=Instruction::Cache for dis/JIT. Add PyCode.quickened flag for one-time initialization.
- deoptimize() maps specialized opcodes back to their base adaptive variant - original_bytes() produces deoptimized bytecode with zeroed CACHE entries - co_code now returns deoptimized bytes, _co_code_adaptive returns current bytes - Marshal serialization uses original_bytes() instead of raw transmute
- cache_entries() returns correct count for instrumented opcodes - deoptimize() maps instrumented opcodes back to base - quicken() skips adaptive counter for instrumented opcodes - instrument_code Phase 3 deoptimizes specialized opcodes and clears CACHE entries to prevent stale pointer dereferences
- Add bounds checks to read_cache_u16/u32/u64 - Fix quicken() aliasing UB by using &mut directly - Add JumpBackwardJit/JumpBackwardNoJit to deoptimize() - Guard can_specialize_call with NEWLOCALS flag check - Use compare_exchange_weak for version tag to prevent wraparound - Propagate dict lookup errors in LoadAttrMethodWithValues - Apply adaptive backoff on version tag assignment failure - Remove duplicate imports in frame.rs
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)
2722-2738:⚠️ Potential issue | 🟠 MajorPropagate dict lookup errors instead of retrying via slow path.
The
Err(_)arm deoptimizes and re-runs lookup throughload_attr_slow, which drops the original error and can duplicate side effects.💡 Proposed fix
- Err(_) => { - // Dict lookup error → deoptimize to safe path - unsafe { - self.code.instructions.replace_op( - instr_idx, - Instruction::LoadAttr { idx: Arg::marker() }, - ); - self.code - .instructions - .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); - } - return self.load_attr_slow(vm, oparg); - } + Err(e) => { + // Dict lookup error should propagate; keep deopt for future runs. + unsafe { + self.code.instructions.replace_op( + instr_idx, + Instruction::LoadAttr { idx: Arg::marker() }, + ); + self.code + .instructions + .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); + } + return Err(e); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2722 - 2738, The dict lookup Err branch currently swallows the original error, deoptimizes, and re-runs via load_attr_slow; instead capture and propagate the original error from dict.get_item_opt(attr_name, vm) so callers see the original failure and side effects are not duplicated. Replace the Err(_) arm in the match over dict.get_item_opt with Err(err) => return Err(err) (or use the ? operator) rather than calling self.code.instructions.replace_op / write_adaptive_counter and return self.load_attr_slow; leave the deoptimization path only for non-error conditions where you intentionally want to fall back to the safe path.
🧹 Nitpick comments (4)
crates/vm/src/builtins/type.rs (3)
54-56: Avoid exposingtp_version_tag: AtomicU32as a public writable field (easy invariant break).Making the raw atomic public means any module can store arbitrary values (including non-0 “invalid” states) and bypass invalidation rules. Prefer
pub(crate)+ accessor(s), or keep it private and expose onlyload_tp_version_tag()/invalidate_tp_version_tag()/assign_version_tag().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 54 - 56, The field tp_version_tag: AtomicU32 is currently public and allows external code to break invariants by writing arbitrary values; change its visibility to non-public (make it private or pub(crate)) and add controlled accessors such as load_tp_version_tag(&self) -> u32, assign_version_tag(&self, new: u32) to set a valid tag, and invalidate_tp_version_tag(&self) to set it to 0, ensuring writers use AtomicU32 methods internally and enforcing any validation/ordering semantics; update call sites that reference tp_version_tag directly to use these new methods.
219-228:modified()should avoidunwrap()+ consider iterative traversal to prevent deep recursion.
sub.downcast_ref::<PyType>().unwrap()will panic ifsubclassesever contains a non-PyType(even transiently). Also, recursive subclass traversal can stack overflow in pathological hierarchies. An iterative stack/VecDeque walk is safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 219 - 228, In modified(), avoid unwrap and recursion: replace the recursive call to sub.downcast_ref::<PyType>().unwrap().modified() with an iterative traversal using a stack/VecDeque of upgraded PyType refs; for each upgraded object, call downcast_ref::<PyType>() and if it returns Some, store its tp_version_tag with Ordering::Release and push its subclasses' weak refs (upgrading them when needed) onto the deque, otherwise skip that entry (do not panic). Keep the initial tp_version_tag.store(0, Ordering::Release) behavior for self and ensure you only call modified-equivalent logic on successfully downcasted PyType instances.
202-217: Simplifyassign_version_tag()withfetch_updateto reduce spin-loop complexity.The current CAS loop pattern is correct, but
fetch_updatereduces boilerplate and makes the code less error-prone:Proposed simplification
pub fn assign_version_tag(&self) -> u32 { - loop { - let current = NEXT_TYPE_VERSION.load(Ordering::Relaxed); - let Some(next) = current.checked_add(1) else { - return 0; // Overflow: version space exhausted - }; - if NEXT_TYPE_VERSION - .compare_exchange_weak(current, next, Ordering::Relaxed, Ordering::Relaxed) - .is_ok() - { - self.tp_version_tag.store(current, Ordering::Release); - return current; - } - } + let assigned = NEXT_TYPE_VERSION + .fetch_update(Ordering::Relaxed, Ordering::Relaxed, |current| current.checked_add(1)) + .ok() + .unwrap_or(0); + + if assigned == 0 { + return 0; + } + self.tp_version_tag.store(assigned, Ordering::Release); + assigned }Note: Overflow (returning 0) is intentional—unversioned types don't use inline caching per the guard checks in
frame.rs, and the backoff mechanism prevents re-attempting on overflow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 202 - 217, Replace the explicit CAS spin-loop in assign_version_tag() with AtomicU32::fetch_update on NEXT_TYPE_VERSION: call fetch_update(Relaxed, Relaxed, |current| current.checked_add(1).ok_or(current)) and handle the Result—on Ok(prev) store prev into self.tp_version_tag with Ordering::Release and return prev; on Err(_) treat as overflow and return 0. Keep the same ordering semantics and the overflow behavior (return 0) and reference NEXT_TYPE_VERSION and tp_version_tag in your change.crates/vm/src/frame.rs (1)
2699-2707: Extract duplicated LOAD_ATTR deopt sequence into a helper.The same deopt/backoff block is repeated in multiple branches; consolidating it will reduce drift risk.
♻️ Refactor sketch
+ #[inline] + fn deoptimize_load_attr(&mut self, instr_idx: usize, cache_base: usize) { + unsafe { + self.code + .instructions + .replace_op(instr_idx, Instruction::LoadAttr { idx: Arg::marker() }); + self.code + .instructions + .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); + } + } ... - unsafe { - self.code - .instructions - .replace_op(instr_idx, Instruction::LoadAttr { idx: Arg::marker() }); - self.code - .instructions - .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); - } + self.deoptimize_load_attr(instr_idx, cache_base);As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Also applies to: 2755-2762, 2787-2794
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2699 - 2707, Duplicate deopt/backoff logic for LOAD_ATTR should be extracted into a single helper to avoid repetition: create a method (e.g., deopt_load_attr or write_load_attr_deopt) on the same type as self that takes the differing values (instr_idx and cache_base), and move the unsafe block that calls self.code.instructions.replace_op(... Instruction::LoadAttr { idx: Arg::marker() }) and self.code.instructions.write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE) into that helper; then replace each repeated block with a call to this helper passing instr_idx and cache_base. Ensure the helper preserves the unsafe semantics and references Instruction::LoadAttr, Arg::marker(), write_adaptive_counter, and ADAPTIVE_BACKOFF_VALUE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 602-663: invoke_exact_args currently assumes preconditions and can
OOB panic; add debug-only guards to assert those preconditions (or make the
function non-public if only internal). In the body of invoke_exact_args (and/or
at its start) add debug_asserts: debug_assert_eq!(args.len(), code.arg_count as
usize); debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS));
debug_assert_eq!(code.kwonlyarg_count, 0); and
debug_assert!(!code.flags.intersects(bytecode::CodeFlags::VARARGS |
bytecode::CodeFlags::VARKEYWORDS | bytecode::CodeFlags::GENERATOR |
bytecode::CodeFlags::COROUTINE)); this will prevent OOB writes during
development (or alternatively change pub fn invoke_exact_args to pub(crate) if
it is not externally used).
- Around line 684-686: The cache invalidation currently sets func_version to 0
in setters like set___code__, set___defaults__, and set___kwdefaults__, which
permanently disables re-validation; instead either (A) update those setters to
assign a fresh non-zero version by using FUNC_VERSION_COUNTER.fetch_add(1,
Relaxed) when invalidating func_version so subsequent validations can succeed,
or (B) add a helper fn assign_func_version(&self) -> u32 that generates/sets a
non-zero version (using FUNC_VERSION_COUNTER) and call that helper from the
hot-path cache validation code whenever func_version == 0 so the function can be
re-validated; update references to func_version, set___code__, set___defaults__,
set___kwdefaults__, and add/use assign_func_version or FUNC_VERSION_COUNTER
accordingly.
In `@crates/vm/src/builtins/type.rs`:
- Around line 843-845: When updating a class's __bases__ you already call
zelf.modified() for IC invalidation, but you must also remove the class from the
old bases' subclasses lists and add it to the new bases' subclasses to avoid
stale subclass entries; update the code path that changes __bases__ (the
function that mutates __bases__) to iterate old_bases and call something like
old_base.subclasses.remove(zelf) and then for each new_base add zelf to
new_base.subclasses, then call zelf.modified() and ensure any TODO about
removing from old bases' subclasses is implemented accordingly.
- Line 58: The global counter NEXT_TYPE_VERSION currently uses checked_add in
assign_version_tag which will stop advancing once it hits u32::MAX and then
always return 0; change the overflow policy to a defined behavior (e.g., wrap
back to 1) by replacing the checked_add logic with an atomic update that
advances the counter and, on overflow, resets it to 1; implement this using
AtomicU32::fetch_update or a loop with compare_exchange on NEXT_TYPE_VERSION in
assign_version_tag so that when the current value is u32::MAX you store 1,
otherwise you store current + 1, ensuring the counter never permanently becomes
0 and the doc comment (“all caches invalidated”) is upheld.
---
Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 2722-2738: The dict lookup Err branch currently swallows the
original error, deoptimizes, and re-runs via load_attr_slow; instead capture and
propagate the original error from dict.get_item_opt(attr_name, vm) so callers
see the original failure and side effects are not duplicated. Replace the Err(_)
arm in the match over dict.get_item_opt with Err(err) => return Err(err) (or use
the ? operator) rather than calling self.code.instructions.replace_op /
write_adaptive_counter and return self.load_attr_slow; leave the deoptimization
path only for non-error conditions where you intentionally want to fall back to
the safe path.
---
Nitpick comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 54-56: The field tp_version_tag: AtomicU32 is currently public and
allows external code to break invariants by writing arbitrary values; change its
visibility to non-public (make it private or pub(crate)) and add controlled
accessors such as load_tp_version_tag(&self) -> u32, assign_version_tag(&self,
new: u32) to set a valid tag, and invalidate_tp_version_tag(&self) to set it to
0, ensuring writers use AtomicU32 methods internally and enforcing any
validation/ordering semantics; update call sites that reference tp_version_tag
directly to use these new methods.
- Around line 219-228: In modified(), avoid unwrap and recursion: replace the
recursive call to sub.downcast_ref::<PyType>().unwrap().modified() with an
iterative traversal using a stack/VecDeque of upgraded PyType refs; for each
upgraded object, call downcast_ref::<PyType>() and if it returns Some, store its
tp_version_tag with Ordering::Release and push its subclasses' weak refs
(upgrading them when needed) onto the deque, otherwise skip that entry (do not
panic). Keep the initial tp_version_tag.store(0, Ordering::Release) behavior for
self and ensure you only call modified-equivalent logic on successfully
downcasted PyType instances.
- Around line 202-217: Replace the explicit CAS spin-loop in
assign_version_tag() with AtomicU32::fetch_update on NEXT_TYPE_VERSION: call
fetch_update(Relaxed, Relaxed, |current| current.checked_add(1).ok_or(current))
and handle the Result—on Ok(prev) store prev into self.tp_version_tag with
Ordering::Release and return prev; on Err(_) treat as overflow and return 0.
Keep the same ordering semantics and the overflow behavior (return 0) and
reference NEXT_TYPE_VERSION and tp_version_tag in your change.
In `@crates/vm/src/frame.rs`:
- Around line 2699-2707: Duplicate deopt/backoff logic for LOAD_ATTR should be
extracted into a single helper to avoid repetition: create a method (e.g.,
deopt_load_attr or write_load_attr_deopt) on the same type as self that takes
the differing values (instr_idx and cache_base), and move the unsafe block that
calls self.code.instructions.replace_op(... Instruction::LoadAttr { idx:
Arg::marker() }) and self.code.instructions.write_adaptive_counter(cache_base,
ADAPTIVE_BACKOFF_VALUE) into that helper; then replace each repeated block with
a call to this helper passing instr_idx and cache_base. Ensure the helper
preserves the unsafe semantics and references Instruction::LoadAttr,
Arg::marker(), write_adaptive_counter, and ADAPTIVE_BACKOFF_VALUE.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.cspell.dict/cpython.txt.cspell.jsoncrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/instruction.rscrates/compiler-core/src/marshal.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/object/core.rscrates/vm/src/stdlib/sys/monitoring.rs
🚧 Files skipped from review as they are similar to previous changes (7)
- crates/compiler-core/src/marshal.rs
- .cspell.dict/cpython.txt
- crates/codegen/src/ir.rs
- crates/vm/src/builtins/code.rs
- crates/vm/src/stdlib/sys/monitoring.rs
- crates/vm/src/object/core.rs
- .cspell.json
| /// Returns the function version, or 0 if invalidated. | ||
| #[inline] | ||
| pub fn func_version(&self) -> u32 { | ||
| self.func_version.load(Relaxed) | ||
| } | ||
|
|
||
| /// Check if this function is eligible for exact-args call specialization. | ||
| /// Returns true if: no VARARGS, no VARKEYWORDS, no kwonly args, not generator/coroutine, | ||
| /// and effective_nargs matches co_argcount. | ||
| pub(crate) fn can_specialize_call(&self, effective_nargs: u32) -> bool { | ||
| let code = self.code.lock(); | ||
| let flags = code.flags; | ||
| flags.contains(bytecode::CodeFlags::NEWLOCALS) | ||
| && !flags.intersects( | ||
| bytecode::CodeFlags::VARARGS | ||
| | bytecode::CodeFlags::VARKEYWORDS | ||
| | bytecode::CodeFlags::GENERATOR | ||
| | bytecode::CodeFlags::COROUTINE, | ||
| ) | ||
| && code.kwonlyarg_count == 0 | ||
| && code.arg_count == effective_nargs | ||
| } | ||
|
|
||
| /// Fast path for calling a simple function with exact positional args. | ||
| /// Skips FuncArgs allocation, prepend_arg, and fill_locals_from_args. | ||
| /// Only valid when: no VARARGS, no VARKEYWORDS, no kwonlyargs, not generator/coroutine, | ||
| /// and nargs == co_argcount. | ||
| pub fn invoke_exact_args(&self, args: &[PyObjectRef], vm: &VirtualMachine) -> PyResult { | ||
| let code = self.code.lock().clone(); | ||
|
|
||
| let locals = ArgMapping::from_dict_exact(vm.ctx.new_dict()); | ||
|
|
||
| let frame = Frame::new( | ||
| code.clone(), | ||
| Scope::new(Some(locals), self.globals.clone()), | ||
| self.builtins.clone(), | ||
| self.closure.as_ref().map_or(&[], |c| c.as_slice()), | ||
| Some(self.to_owned().into()), | ||
| vm, | ||
| ) | ||
| .into_ref(&vm.ctx); | ||
|
|
||
| // Copy args directly into fastlocals | ||
| { | ||
| let fastlocals = unsafe { frame.fastlocals.borrow_mut() }; | ||
| for (i, arg) in args.iter().enumerate() { | ||
| fastlocals[i] = Some(arg.clone()); | ||
| } | ||
| } | ||
|
|
||
| // Handle cell2arg | ||
| if let Some(cell2arg) = code.cell2arg.as_deref() { | ||
| let fastlocals = unsafe { frame.fastlocals.borrow_mut() }; | ||
| for (cell_idx, arg_idx) in cell2arg.iter().enumerate().filter(|(_, i)| **i != -1) { | ||
| let x = fastlocals[*arg_idx as usize].take(); | ||
| frame.set_cell_contents(cell_idx, x); | ||
| } | ||
| } | ||
|
|
||
| vm.run_frame(frame) | ||
| } | ||
| } |
There was a problem hiding this comment.
invoke_exact_args() should enforce (at least in debug) its preconditions to avoid OOB writes/panics.
Since this is pub fn, add debug_assert! guards (or reduce visibility to pub(crate) if only used by the interpreter). At minimum:
debug_assert_eq!(args.len(), code.arg_count as usize)debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS))debug_assert!(code.kwonlyarg_count == 0)and no VARARGS/VARKEYWORDS/GENERATOR/COROUTINE
Minimal defensive diff (debug-only)
pub fn invoke_exact_args(&self, args: &[PyObjectRef], vm: &VirtualMachine) -> PyResult {
let code = self.code.lock().clone();
+
+ debug_assert_eq!(args.len(), code.arg_count as usize);
+ debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS));
+ debug_assert!(
+ !code.flags.intersects(
+ bytecode::CodeFlags::VARARGS
+ | bytecode::CodeFlags::VARKEYWORDS
+ | bytecode::CodeFlags::GENERATOR
+ | bytecode::CodeFlags::COROUTINE
+ )
+ );
+ debug_assert_eq!(code.kwonlyarg_count, 0);
let locals = ArgMapping::from_dict_exact(vm.ctx.new_dict());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/function.rs` around lines 602 - 663, invoke_exact_args
currently assumes preconditions and can OOB panic; add debug-only guards to
assert those preconditions (or make the function non-public if only internal).
In the body of invoke_exact_args (and/or at its start) add debug_asserts:
debug_assert_eq!(args.len(), code.arg_count as usize);
debug_assert!(code.flags.contains(bytecode::CodeFlags::NEWLOCALS));
debug_assert_eq!(code.kwonlyarg_count, 0); and
debug_assert!(!code.flags.intersects(bytecode::CodeFlags::VARARGS |
bytecode::CodeFlags::VARKEYWORDS | bytecode::CodeFlags::GENERATOR |
bytecode::CodeFlags::COROUTINE)); this will prevent OOB writes during
development (or alternatively change pub fn invoke_exact_args to pub(crate) if
it is not externally used).
| *self.code.lock() = code; | ||
| // TODO: jit support | ||
| // #[cfg(feature = "jit")] | ||
| // { | ||
| // // If available, clear cached compiled code. | ||
| // let _ = self.jitted_code.take(); | ||
| // } | ||
| self.func_version.store(0, Relaxed); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the specific lines mentioned in the review comment
echo "=== Code at lines 684-686 and related lines (693-696, 703-706) ==="
sed -n '680,710p' crates/vm/src/builtins/function.rs
echo -e "\n=== Search for func_version usage patterns ==="
rg -n --type=rust '\bfunc_version\b' crates/vm/src/builtins/function.rs | head -30
echo -e "\n=== Search for func_version.store() calls ==="
rg -n --type=rust 'func_version\.store\(' crates/vm/src/builtins/function.rs
echo -e "\n=== Search for func_version increment/assignment patterns ==="
rg -n --type=rust 'func_version\s*[=+]' crates/vm/srcRepository: RustPython/RustPython
Length of output: 1810
🏁 Script executed:
#!/bin/bash
echo "=== Search for func_version() calls (reading the value) ==="
rg -n --type=rust 'func_version\(\)' crates/vm/src --max-count=20
echo -e "\n=== Search for FUNC_VERSION_COUNTER ==="
rg -n --type=rust 'FUNC_VERSION_COUNTER' crates/vm/src
echo -e "\n=== Search for version checks (== 0, != 0) ==="
rg -n --type=rust 'func_version.*[!=]=' crates/vm/src
echo -e "\n=== Check function.rs for caching logic ==="
rg -n --type=rust 'cache|version' crates/vm/src/builtins/function.rs | head -20Repository: RustPython/RustPython
Length of output: 1551
🏁 Script executed:
#!/bin/bash
echo "=== Cache validation and storage context in frame.rs ==="
sed -n '2900,2950p' crates/vm/src/frame.rs
echo ""
sed -n '4680,4695p' crates/vm/src/frame.rsRepository: RustPython/RustPython
Length of output: 3484
🏁 Script executed:
#!/bin/bash
echo "=== Search for set___code__, set___defaults__, set___kwdefaults__ usage ==="
rg -n --type=rust 'set___(code|defaults|kwdefaults)' crates/vm/src --max-count=20
echo -e "\n=== Check if there's any doc or comment about version 0 behavior ==="
sed -n '600,610p' crates/vm/src/builtins/function.rs
echo -e "\n=== Look for any version reassignment logic ==="
rg -n --type=rust 'FUNC_VERSION_COUNTER|fetch_add' crates/vm/src/builtins/function.rsRepository: RustPython/RustPython
Length of output: 1097
Fix cache invalidation design: func_version set to 0 has no mechanism to re-validate after function mutation.
The func_version field is documented as "0 if invalidated" and used as a cache key. Currently, setting it to 0 (in set___code__, set___defaults__, set___kwdefaults__) is permanent—no mechanism reassigns a fresh non-zero version. Once mutated, the cache can never be re-validated for that function object if it's mutated again.
Options:
- Store a fresh counter value instead of 0 by calling
FUNC_VERSION_COUNTER.fetch_add(1, Relaxed)when invalidating, OR - Keep 0 as the sentinel but add a helper
fn assign_func_version(&self) -> u32and call it from cache validation hot paths when seeing 0.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/function.rs` around lines 684 - 686, The cache
invalidation currently sets func_version to 0 in setters like set___code__,
set___defaults__, and set___kwdefaults__, which permanently disables
re-validation; instead either (A) update those setters to assign a fresh
non-zero version by using FUNC_VERSION_COUNTER.fetch_add(1, Relaxed) when
invalidating func_version so subsequent validations can succeed, or (B) add a
helper fn assign_func_version(&self) -> u32 that generates/sets a non-zero
version (using FUNC_VERSION_COUNTER) and call that helper from the hot-path
cache validation code whenever func_version == 0 so the function can be
re-validated; update references to func_version, set___code__, set___defaults__,
set___kwdefaults__, and add/use assign_func_version or FUNC_VERSION_COUNTER
accordingly.
| pub tp_version_tag: AtomicU32, | ||
| } | ||
|
|
||
| static NEXT_TYPE_VERSION: AtomicU32 = AtomicU32::new(1); |
There was a problem hiding this comment.
Define/verify global overflow policy for NEXT_TYPE_VERSION.
With the current checked_add, the counter will eventually reach u32::MAX and then assign_version_tag() will return 0 forever (effectively disabling new tags). That may be totally acceptable, but the doc comment says “all caches invalidated” — which is stronger than what the code actually enforces.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/type.rs` at line 58, The global counter
NEXT_TYPE_VERSION currently uses checked_add in assign_version_tag which will
stop advancing once it hits u32::MAX and then always return 0; change the
overflow policy to a defined behavior (e.g., wrap back to 1) by replacing the
checked_add logic with an atomic update that advances the counter and, on
overflow, resets it to 1; implement this using AtomicU32::fetch_update or a loop
with compare_exchange on NEXT_TYPE_VERSION in assign_version_tag so that when
the current value is u32::MAX you store 1, otherwise you store current + 1,
ensuring the counter never permanently becomes 0 and the doc comment (“all
caches invalidated”) is upheld.
| // Invalidate inline caches | ||
| zelf.modified(); | ||
|
|
There was a problem hiding this comment.
Calling modified() after __bases__ changes is correct; watch for stale subclass lists.
This makes sense for IC invalidation. One caveat: the method still has a TODO about removing the class from old bases’ subclasses lists; stale entries can amplify invalidation cascades and work.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/type.rs` around lines 843 - 845, When updating a
class's __bases__ you already call zelf.modified() for IC invalidation, but you
must also remove the class from the old bases' subclasses lists and add it to
the new bases' subclasses to avoid stale subclass entries; update the code path
that changes __bases__ (the function that mutates __bases__) to iterate
old_bases and call something like old_base.subclasses.remove(zelf) and then for
each new_base add zelf to new_base.subclasses, then call zelf.modified() and
ensure any TODO about removing from old bases' subclasses is implemented
accordingly.
Add type version counter (tp_version_tag) to PyType with subclass invalidation cascade. Add cache read/write methods (u16/u32/u64) to CodeUnits. Implement adaptive specialization in load_attr that replaces the opcode with specialized variants on first execution:
Specialized opcodes guard on type_version_tag and deoptimize back to generic LOAD_ATTR with backoff counter on cache miss.
Summary by CodeRabbit
New Features
Refactor
Chores