Releases: microsoft/bocpy
Release list
v0.12.0 - Matrix capabilities
A Matrix capability release. The dense matrix type gains NumPy-style fancy
indexing, comparison masks and lexicographic comparison operators, a
where selector, a single-rounding fma with vector broadcasting, and
sqrt. Two older method names move to their NumPy spellings: select
becomes take and clip adopts min / max keyword bounds.
New Features
- Fancy indexing —
m[[r0, r1]]/m[[r0, r1], :]gather rows and
m[:, [c0, c1]]gathers columns, returning a :class:Matrix; the
matching assignment forms scatter into rows or columns with last-write-wins
duplicates and all-or-nothing validation. New :meth:Matrix.takeand
:meth:Matrix.putexpose the same gather/scatter as methods,putwith
anaccumulate=Truemode that folds duplicate indices. - Comparison masks — :meth:
Matrix.less,less_equal,greater,
greater_equal,equal, andnot_equalreturn a1.0/0.0
mask matrix, accepting a same-shape matrix, a scalar (includingbool), a
1x1matrix, a broadcasting row/column vector, or a list/tuple of
numbers. Distinct from the comparison operators, which return a single
bool. - Lexicographic comparison operators —
<<=>>===
!=compare element by element in row-major order and return a single
:class:bool.==/!=are total: a shape mismatch or an
uncoercible list/tuple yieldsFalse/Truerather than raising, so
matrix in some_listworks. ANaNnever decides the comparison, so an
all-NaNmatrix compares==equal to itself. Defining value equality
makes :class:Matrixunhashable. Matrix.where(mask, a, b)— a NumPy-style selector taking a where the
mask is non-zero (NaNcounts as non-zero) and b elsewhere; a and b
may each be a scalar, a same-shape matrix, or a list/tuple of numbers.Matrix.fma(b, c)— fused multiply-add computing single-rounding
self * b + c; b and c may be a same-shape matrix, a1x1matrix,
a scalar, or a row / column vector that broadcasts againstself. The
contraction kernel is preserved so hardware FMA still applies. Use it as an
accuracy primitive — compare results with :meth:Matrix.allclose, never
==.Matrix.sqrt()— element-wise square root (negative inputs map to
NaN), with anin_place=Trueform.
Breaking Changes
Matrix.selectrenamed toMatrix.take. The gather method is now
spelled :meth:Matrix.taketo match NumPy and pair with the new
:meth:Matrix.put. Replacem.select(indices, axis)with
m.take(indices, axis); the signature and semantics are otherwise
unchanged.Matrix.clipbounds are nowmin/maxkeywords. The signature
changes fromclip(min_or_maxval, maxval=None)to
clip(min=None, max=None), matching :func:numpy.clip. Either bound may
be omitted to leave that side unbounded:m.clip(min=0.0)clamps only
below,m.clip(max=255.0)only above.
Documentation
- Expanded the :doc:
apimatrix surface for the new indexing, masking,
comparison,where,fma, andsqrtmethods via the
__init__.pyistub docstrings, including the totality,NaN, and
broadcasting rules.
Tests
- Extensive
test_matrix.pyadditions covering fancy-index gather/scatter,
take/put(including accumulate and all-or-nothing validation),
the comparison masks, lexicographic operators (totality,NaN,
reflected-scalar, and list/tuple/bool coercion edge cases),where
selection and value propagation, andfmarow/column broadcasting.
Internal
- New
bench_fmaandbench_takemicro-benchmarks in
scripts/bench_matrix.py. Theexamples/boids.pydemo migrates from
selecttotake.
v0.11.0 - Cross-module behaviors
A behavior-dispatch release. @when becomes a runtime decorator backed by
a content-addressed marshalled-code registry instead of a transpile-time
call-site rewrite. A behavior's code object is marshalled once, stored under a
hex content-hash key, and resolved by value inside each worker
sub-interpreter — so behaviors may now live in any importable module, not
just __main__. The transpiler shrinks to a bindings reducer that exposes the
defining module's imports, classes, functions, and constants to workers, and
captures are now declared explicitly as trailing default parameters.
New Features
- Cross-module behaviors — a
@whenbody defined in any importable
module now resolves on workers via the marshalled-code registry, lifting
the previous__main__-only restriction. A new worker-importable
bocpy_testfixtures package exercises cross-module resolution, dunder
handling, key collisions, and chained-global cases. - Behaviors in the REPL,
python -c, and piped stdin — a@whenbody
defined in a__main__with no source file on disk now runs on workers.
The runtime reduces the live interactive namespace to its imported modules
(each guarded so an environment module that cannot load in a
sub-interpreter, such asreadline, is skipped rather than fatal), and an
interactive behavior is validated at decoration to reference only builtins,
imported modules, and explicit captures. whencall(func, args, captures)— the lower-level escape hatch behind
@when: registers a behavior function against cowns and explicit capture
values without the decorator sugar.
Breaking Changes
-
@whenbehaviors must declare their captures explicitly. Implicit
capture of enclosing-scope variables as free variables is no longer
supported: a@whenbody may only reference its cown parameters, the
values it captures as trailing parameters, and names resolvable at the
defining module's scope (imports, module-level classes/functions,
constants, and builtins). Capture an enclosing local by adding a
trailing parameter with a same-named default — the canonical
name=namerecipe:# before — factor captured implicitly from the enclosing frame @when(x) def b(x): return x.value * factor # after — factor captured explicitly @when(x) def b(x, factor=factor): return x.value * factor
The loop-snapshot form
def b(c, i=i)and the rename form
def b(c, x=y)continue to work unchanged. This is the migration that
lets a behavior's code object be marshalled and resolved by value
across worker sub-interpreters.
Improvements
- Decoration-time capture validation —
@whennow rejects malformed
behaviors where the mistake is made, with actionable messages: a bare
trailing parameter (no default) raisesTypeErrornaming the cown/param
counts; a body closing over an enclosing-function local raises
SyntaxErrornaming the variable and suggesting thename=namefix; and
async def/ generator behaviors raiseSyntaxError. Computed defaults
(k=expensive()) are allowed and snapshotted once at schedule time. - Interactive traceback labels — behaviors defined interactively are
relabelled<behavior:hash>so two distinct interactive behaviors no
longer collide under a shared<stdin>/<string>filename.
Documentation
- Reworded the :doc:
c_abiandmessagingpages and the downstream
consumer template to describe the worker bindings module and bindings
reducer, replacing the retired transpile-and-rewrite vocabulary.
Tests
- New
test_registry.pycovering registry round-trips, key derivation,
capture validation, andResolverdispatch.test_transpiler.pyrewritten
for the bindings reducer; the dead constant-tracking machinery and its
tests were removed.
Internal
- New
boc_registry.c/boc_registry.hC subsystem storing marshalled code
objects under opaque hex keys. The transpiler is reduced to a bindings
reducer (MainBindings/bind_*); the legacy call-site rewriter, skeleton
fallback, andexport_module.pywere removed in a clean cut. Behavior keys
are content-addressed with length-prefixed framing and 128-bit truncation.
v0.10.0
A result-reading and documentation release. Cown.unwrap() replaces
ad-hoc context-manager reads of behavior results with a single
quiescence-guarded call lowered to the C capsule, and the test suite
moves wholesale to the quiesce() + unwrap() pattern. Matrix
gains arg-reductions (argmin / argmax) and an explicit PRNG
seed, and its matmul kernel is re-ordered for cache-friendly
auto-vectorization (bit-for-bit identical output). The legacy
notice_sync shim is removed in favour of
quiesce(noticeboard=True) for reads and the new notice_seed
for synchronous main-thread seeding.
New Features
notice_seed(key, value)— a synchronous, main-interpreter-only
noticeboard write that commits under the noticeboard mutex before it
returns, so every behavior scheduled afterwards observes it. Unlike
the fire-and-forgetnotice_write, it gives read-your-writes ordering
for installing read-mostly configuration before scheduling the
behaviors that read it, and it starts the runtime if called first — so
seeding can be a program's first bocpy call with no explicitstart().
It is a plain overwrite and does not providenotice_update's
read-modify-write atomicity. Calling it from a worker raises
RuntimeError.Cown.unwrap()— return the cown's stored value, or re-raise a
captured behavior exception on the caller's thread (Rust
Result::unwrapshape). Acquires the cown for the read and requires
global quiescence (quiesce/wait) first, raising
RuntimeErrorotherwise so a result is never read while its
producer is still in flight. Lowered to a C-level
CownCapsule.unwrap, so a behavior that returns aCown
(surfacing downstream as a bareCownCapsule) unwraps the same way
without rewrapping.unwrap()consumes the cown: it takes the
stored payload by reference and resets the cown to holdNonebefore
releasing it, so the returned object is never re-serialized back into
the cown. This matters for move-typed payloads such asMatrix,
whose ownership would otherwise be flipped away from the caller on
release, leaving an unreadable result. Because the payload is removed,
a captured exception is not re-reported when the cown is dropped, and a
secondunwrap()returnsNone. The emptied cown stays schedulable,
so a later behavior can refill it.Matrix.argmin(axis=None)/Matrix.argmax(axis=None)— index
of the minimum / maximum element, first occurrence on ties. Flat
(axis=None) returns a row-majorint;axis=0/axis=1
return per-column / per-row index vectors. NaN elements are skipped
unless the running extreme starts at NaN, which pins the result to
that position (this differs from NumPy, which propagates NaN).Matrix.seed(value)— classmethod seeding the process-global C
PRNG used bynormal()/uniform(), making subsequent draws
reproducible when generation stays on a single thread.Matrixpickling —Matrixnow supportspickle(all
protocols) andcopy.deepcopyvia__reduce__, so a matrix nested in
a pickled container (dict, list, …) round-trips with its neighbours
instead of raisingTypeError. Serialization copies the raw,
native-endian, row-majordoublebuffer in one block, so the cost is
linear in the element count with no per-element Python object churn and
every value (includingNaN,±inf,-0.0, and subnormals) is
preserved bit-for-bit. The current interpreter must own the matrix:
pickling one that has been released into aCownraises
RuntimeError. The encoding is native-endian, so a pickle is not
portable across architectures of differing byte order.examples/fanout_benchmark.py— a dispatch-rate microbenchmark
for the fanout workload (a producer that allocates fresh consumer
cowns it does not hold and dispatches one@wheneach), surfacing
per-worker queue contention (enqueue_cas_retries) as the gating
signal. Complements the chain workload inexamples/benchmark.py.
Improvements
- matmul cache-friendly reorder —
impl_matmulis re-ordered from
ijktoikjso the inner loop walks contiguous rows of the
right-hand operand and the output, enabling compiler
auto-vectorization. Output is bit-for-bit identical (each inner
product still accumulateskin ascending order); measured ~2.9–3.2×
faster on square shapes, ~1.5–1.8× on rectangular ones. A
bitwise-reproducibility regression test pins the accumulation order.
Bug Fixes
A warm welcome and thank-you to first-time contributor Shivanand
Mishra (@xemishra), who tracked down and fixed a subtle transpiler
bug this release — exactly the kind of sharp-eyed catch that makes the
project better.
@whenresult assignment dropped for module-level behaviors
(#30, thanks @xemishra) — a behavior defined at module level
transpiled without its result cown, so the exported module silently
dropped the return value and downstream behaviors could not schedule
over it. Fixed, with a regression test guarding the exported-module
shape.- Nested
@whencapture — the transpiler now correctly surfaces a
nested@when's free names as the outer behavior's captures and
resolves its cown arguments in the outer frame, instead of leaving
them to Python's closure machinery where they could not be reached
from the worker interpreter. Matrixrange/return checks — added overflow and return-value
checks on therange_readpath uncovered while migrating the
matrix tests.
Breaking Changes
notice_syncremoved — the noticeboard-sync shim is gone from
bocpy.__all__. Usequiesce(noticeboard=True)instead, which
blocks until in-flight behaviors complete and returns a noticeboard
snapshot without tearing the runtime down.
Documentation
- Removed the
notice_syncreferences fromnoticeboardand the
type stubs; documented the NaN tie-break behavior of
argmin/argmax; corrected the happens-after example in the
thinking-in-bocskill to order across genuinely unrelated data;
added afanout_benchmark.pysection to the examples README.
Tests
- Migrated
test_boc.py,test_noticeboard.py, and the scheduler /
pinned-pump suites to thequiesce()+Cown.unwrap()pattern.
Added matmul bitwise-reproducibility andargmin/argmaxNaN
regression tests.
Dependencies
- Bumped the
github-actionsgroup (#31, #27, dependabot):
actions/checkout6.0.2 → 6.0.3 andpypa/cibuildwheel
3.4.1 → 4.0.0.
Internal
- Large comment scrub across the C extensions, Python runtime, scripts,
and tests, followed by a remediation pass that restored load-bearing
rationale (memory-ordering fences, UAF guards, deliberate-leak notes,
and the vendored Apache-2.0 provenance header) as condensed
summaries. - Ignored Sphinx-related updates in
dependabot.ymlto keep the docs
toolchain pinned.
v0.9.0 - Main Pinned Cowns
Main-pinned cowns — a new PinnedCown subclass holds its
value as a plain PyObject * on the main interpreter, never
round-tripped through XIData. Behaviors whose request set contains
any pinned cown are routed by the scheduler to a single-consumer
main-thread queue and drained by the new pump entry point
(or implicitly by wait, which auto-pumps when pinned cowns
exist). Designed for objects that cannot survive cross-interpreter
shipping — pyglet shapes, Tk widgets, GPU contexts, open file
handles, ctypes pointers. The companion examples/boids.py
rewrite demonstrates the coarse-grained pinned-dispatch pattern:
per-cell physics stays on workers, and one @when(PinnedCown)
per frame batches the write-back into main-thread matrices.
Also in this release: quiesce, a non-tearing-down
checkpoint primitive.
New Features
quiesce(timeout=None, *, stats=False, noticeboard=False)—
blocks until every in-flight behavior completes, without tearing
down workers or the noticeboard thread. Implemented via a new
terminator_seed_incpeer ofterminator_seed_dec
(Pyrona-style seed-up / seed-down pairing) so quiescence becomes
a checkpoint rather than a shutdown. Useful for parallel-search
patterns that need to inspect a best-so-far cown between rounds
and for tests that must read a worker-producedsendqueue
before its producer interpreter is destroyed. Thestatsand
noticeboardflags mirrorwait: returnsNoneby
default, a per-worker statslist[dict]whenstats=True,
a noticeboarddict[str, Any]whennoticeboard=True, or a
WaitResultwhen both are set. RaisesTimeoutError
if quiescence is not reached withintimeout. Exported from
bocpy.__all__.PinnedCown(Cown[T])— a cown whose value lives
permanently on the main interpreter. Constructible only from the
main interpreter (raisesRuntimeErrorfrom workers);
the value is never picklable, never reified twice, and never
reconstructed in a worker. The capsule handle remains a
first-class cross-interpreter shareable — workers may hold it,
embed it in a regularCownvalue graph, and place it in
noticeboard entries, but only the main thread may acquire the
value. See the newpinned_cownspage for the full
contract and the coarse-grained-dispatch pattern.pump(deadline_ms=None, max_behaviors=None, raise_on_error=False)
— drains the main-thread queue of behaviors whose request sets
contain aPinnedCown. Call from your event loop's
idle / on-tick hook (pygletschedule_interval, Tkafter,
asyncio task, …); script-mode programs need not call it
explicitly becausewaitpumps internally. Non-preemptive:
deadline_msgates starting the next behavior, not
interrupting one already running. Body exceptions default to
landing on the result cown's.exception;
raise_on_error=Truere-raises the first body exception after
drain. Returns a newPumpResultNamedTuple
(executed,deadline_reached,raised).set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None)
— configure the pinned-queue starvation watchdog. Both thresholds
gate on queue-non-empty time, not raw last-pump time, so
programs running only unpinned work never trip them. Default is
warn-only; users opt into fail-fast via an explicitraise_ms
so interactive debugger sessions are not wedged by a breakpoint.set_wait_pump_poll(ms=50)— set the poll cadence for
wait's auto-pump loop. Re-read every iteration so a
concurrent call updates the active wait immediately.bocpy.PumpResult— three-fieldNamedTuplereturned by
pump.executedcounts pinned behaviors whose lifecycle
completed (including acquire-failure paths whose MCS chain still
drained).deadline_reachedisTrueonly when the
deadline_msbudget tripped before the queue drained.
raisedcounts only body exceptions captured to a result cown
(cleanup-path failures usePyErr_WriteUnraisableand do not
count). Exported frombocpy.__all__.- Coarse-grained pinned-dispatch
examples/boids.py— the
per-cellsend("update")/ main-threadreceive("update")
barrier is replaced by per-cell physics on workers plus one
pinned@whenper frame that captures every per-cell result
cown together with the two main-threadPinnedCownmatrices
and performs the batched write-back. Same visual output, fully
worker-parallel per-cell work, single main-thread touchpoint.
Public C ABI
bocpy_main_interpid()— newstatic inlinehelper in
<bocpy/bocpy.h>returningPyInterpreterState_GetID( PyInterpreterState_Main())pre-typed asint_least64_tto
matchbocpy_interpidfor owner-field equality checks.
Safe to call from a worker sub-interpreter for diagnostic /
assert use. Additive — existing consumers recompile unchanged;
BOCPY_ABIis unchanged at 1. The
templates/c_abi_consumerbocpy~=pin moves to
~=0.9to signal the new ABI surface it was authored against.
Improvements
@whenloop-variable snapshot via default arg — the
transpiler now acceptsdef b(c, i=i)as an explicit
loop-snapshot idiom in addition to the existing implicit form
(just reference the loop variable in the body). Trailing
positional parameters beyond the cown count are also
auto-captured by name (def b(c, factor)captures
factor).@whenalias decorators — the transpiler now recognises
from bocpy import when as boc_whenandimport bocpy [as alias]followed by@bocpy.when(...)or
@alias.when(...), provided the aliasing import is at module
level. Previously only the bare@whenform was detected.Behaviors.start()compiles the export module on main —
the transpiler's rewritten module is now also instantiated as an
in-memorytypes.ModuleTypeon the main thread (plus a
linecacheentry for traceback fidelity) sopumpcan
resolve__behavior__Nthe same way workers do via their
bootstrap.- Scheduler-owned behavior pre-header —
bq_nodeand the
newpinnedOR-fold byte moved out of the opaque
BOCBehaviorinto a scheduler-ownedboc_behavior_prehdr_t
allocated immediately before each behavior (CPython
_PyGC_Headstyle).boc_sched.cno longer needs any
knowledge ofBOCBehavior's internal layout; layout drift
between the scheduler and its users is impossible by
construction. terminator_wait_pumpable— new entry in
boc_terminator.{c,h}lets the auto-pump loop wake on either
count-zero or main-pinned-depth-becoming-non-zero, both wired
through the existing single condition variable. Single-pumper
enforcement on free-threaded builds (Py_GIL_DISABLED) lives
alongside via aMAIN_PUMP_THREADCAS that raises
RuntimeErrorif a second thread tries to pump
concurrently, cleared on every exit path including
BaseException.
Bug Fixes
- CWE-401: inheriting INCREF leak in
cown_decref_inline—
CownCapsule_reducepacks an encodedXIDatapayload by
taking an inheritingCOWN_INCREFper embedded
CownCapsule, normally balanced when the bytes are
unpickled inside a worker. On the orphan-death path (the
consumer side never deserialised the payload) the matching
COWN_DECREFs never fired and every embedded cown leaked.
cown_decref_inlinenow feeds the encoded bytes through
pickle.loadsand immediately drops the result, which lets
CPython's GC fire the matchingCOWN_DECREFs recursively.
Gated on thepickledflag so nativeXIDataround-trips
(e.g.Matrix) skip the work entirely. - Main-pump behavior reference leak — both
_core_main_pump_boundedand_core_main_pump_drain_all
popped aBehaviorCapsulefromMAIN_PINNED_QUEUEbut
never released the strong reference the capsule held on the
underlyingBOCBehavior. Each pinned behavior leaked
one reference until the runtime was torn down. The pump
helpers nowBEHAVIOR_DECREFthe behavior immediately after
the worker-equivalent cleanup runs. - MSVC
<stdatomic.h>compatibility — Microsoft's
<stdatomic.h>(used by CPython's headers on Windows) does
not expose the unsignedatomic_uint_least64_tor
atomic_uintptr_tforms that the pinned-pump bookkeeping
used.MAIN_PINNED_DEPTH,MAIN_PINNED_NONEMPTY_SINCE_NS,
LAST_PUMP_NS,WATCHDOG_WARN_MS,WATCHDOG_LAST_WARN_NS,
WATCHDOG_ON_STARVEandMAIN_PUMP_THREADare now
atomic_int_least64_t/atomic_intptr_t. Depth never
goes negative; pointer bits round-trip losslessly through the
signed atomic boundary. - CPython 3.10/3.11
PyErr_SetRaisedExceptionpolyfill —
added toinclude/bocpy/xidata.halongside the existing
PyErr_GetRaisedExceptionpolyfill so the public C ABI's
exception-stash pattern compiles on Python versions before
3.12.BOCPY_ABIis unchanged. - Portable
boc_max_align_t— added toboc_compat.has
a union of the most-strictly-aligned fundamental types
(long long,long double,void *, function pointer).
MSVC exposes the C11max_align_tonly under/std:c11,
which the CPython build does not pass; the
boc_behavior_prehdr_tsize assertion now uses
alignof(boc_max_align_t)so the alignment contract holds on
every supported toolchain. - PEP 678
add_note3.10 fallback — the new
Behaviors.quiesceexception-context shim attaches a note
describing the seed-inc / seed-dec balance on failure. CPython
3.10 predatesBaseException.add_note; the shim now
writes toBaseException.__notes__directly whenadd_note
is missing. - Transpiler
except ... as Xmis-classification —
ExceptHandlerbindsXon the handler node
itself rather than viaName`Stor...
v0.8.0 - Matrix/Vector methods and optimisation
Vector-oriented Matrix API — six new methods (vecdot,
cross, normalize, perpendicular, angle,
magnitude_squared), two new read-only properties (size,
length), and a unified in_place= keyword on every unary
method round out Matrix as a first-class vector and
batch-of-vectors type — plus an internal X-macro template refactor
of every _math.c op family that restores the compiler's
auto-vectoriser. 44 of 71 benched rows improved by ≥10%, with
representative wins of −50% to −88% on aggregates, broadcast
arithmetic, and normalize. The _math extension now ships
with -O3 (Linux/macOS) / /O2 (Windows) so end users pick
up the wins by default.
New Features
-
Vector-oriented
Matrixmethods — six new methods designed
for theNx2/2xN/Nx3/3xNvector and
batch-of-vectors shapes that show up inexamples/boids.pyand
similar simulation code:magnitude_squared(axis=None)— squared L2 norm without the
sqrtstep. Cheaper thanmagnitude()and safe for
sub-normal thresholding.vecdot(other, axis=None)— axis-aware inner product matching
numpy.linalg.vecdot. Not equivalent tonumpy.dot;
use@for matrix multiplication. Same-shape, row-broadcast
(1xNvsMxN), and column-broadcast (Mx1vsMxN)
operands are all supported.cross(other, axis=None)— 2D scalar z-component or 3D cross
product. Five shape paths share one method:1x2/2x1
returns a float;1x3/3x1returns a same-orientation
Matrix;Nx2/2xNbatches collect per-vector
scalars;Nx3/3xNbatches return same-shapeMatrix
results.axis=disambiguates the square2x2/3x3
shapes (default per-row).normalize(axis=None, in_place=False)— divide every element
by its magnitude. Zero-magnitude rows / columns are returned as
exact zeros (no NaN, no division by zero).axis=selects
per-row, per-column, or total normalisation.perpendicular(axis=None, in_place=False)— rotate every 2D
vector 90° counter-clockwise:(x, y) -> (-y, x). Accepts a
single 2D vector, anNx2row batch, or a2xNcolumn
batch.angle(axis=None)— polar angleatan2(y, x)of every 2D
vector. Returns a float for a single 2D vector input,
otherwise aMatrixof per-vector angles.
-
Matrix.sizeproperty — total element count
(rows * columns). Matchesnumpy.ndarray.size. -
Matrix.lengthproperty — Frobenius (L2) magnitude as a
read-only@propertyso vector-like code reads naturally
(direction.length,velocity.length) without the
parentheses of a method call. Equivalent tomagnitude()with
no axis argument. -
in_place=keyword on every unaryMatrixmethod —
transpose,ceil,floor,round,negate,
abs, plus the newnormalizeandperpendicularall
acceptin_place=Trueto mutateselfand return it.
Replaces the oldertranspose_in_place()method (see
Breaking Changes below). -
axis=keyword on aggregate methods —sum,mean,
min,max,magnitude, and the newmagnitude_squared
now share a tri-stateaxis=argument (None/0/1)
decoded through a single classifier. Negative axes (-1/
-2) accepted for NumPy parity.
Improvements
-
Auto-vectorised
_math.cop kernels — the binary,
aggregate, unary, and two-operand-aggregate op families inside
_math.care now stamped from per-family descriptor tables,
one kernel per (op, shape) combination. Each per-element body is
literally substituted into its own monomorphic inner loop,
restoring the precondition for GCC's / Clang's auto-vectoriser.
Representative wins (lower is better):Bench row 0.7.0 (ns) 0.8.0 (ns) Δ mean()shape=(1000, 100)44179.6 9001.6 −79.6% mean(1)shape=(1000, 100)51699.4 7058.5 −86.3% max(1)shape=(1000, 100)97184.2 11322.7 −88.3% magnitude()shape=(1000, 3)1098.2 306.8 −72.1% add col-bcastshape=(1000, 100)37823.4 20172.5 −46.7% div same-shapeshape=(1000, 100)80134.2 45458.9 −43.3% normalize()shape=(1000, 3) axis=None3644.6 1775.5 −51.3% Four rows in code paths untouched by the refactor regressed by
5–15% from layout drift (_math.so.textgrew +125% from
kernel specialisation); none are on a hot path. No behavioural
change;test_matrix.pypasses unchanged. -
-O3//O2onbocpy._math— the math extension now
sets per-platformextra_compile_argsinsetup.py
(-O3 -fno-plton Linux/macOS,/O2on Windows) so end-user
wheels and editable installs both pick up the auto-vectoriser
wins above. Otherbocpyextensions are unaffected. The SBOM
hash for_math.*.sowill drift accordingly — see
:doc:sbomfor the auditor-facing note.
Breaking Changes
Matrix.transpose_in_place()removed — superseded by
Matrix.transpose(in_place=True), which returnsselfand
so composes the same way every other unary method does.
Migration is mechanical: replacem.transpose_in_place()with
m.transpose(in_place=True).
Documentation
- New
MatrixAPI entries in :doc:apiforsize,length,
magnitude_squared,vecdot,cross,normalize,
perpendicular, andangle, plus updatedin_place=
keyword signatures on the existing unary methods.
Tests
- 234 new test cases for the new
Matrixmethods and
properties (1571 → 1805 passed). Coverage includes a stub-guard
test that greps__init__.pyifor every new C-level name and
in-cown coverage exercising each new method inside@when. - Portable overflow regex + cross 2x3/3x2 contract pinning —
the cross-product test for the doubly-valid2x3/3x2
shapes now pins the 2D-batch interpretation explicitly, locking
the documented behaviour.
Internal
scripts/bench_matrix.py— bench harness used to gate the
refactor:--jsonappend mode,--report-medianper-row
merge, 200 ms warmup, batch-size auto-tuning.scripts/validate_wheel.py+
scripts/_vendored_warehouse_wheel.py— stdlib-only wheel
RECORDvalidator and a vendored slice of Warehouse's wheel
parser; used by the PR gate to catchRECORDregressions
before PyPI does.
CI / build
cibuildwheelv3.4.0 → v3.4.1 andclang-format-action
pin normalised to the underlying commit SHA (Dependabot's
preferred format). Both pins move in lock-step with the
github-actions Dependabot group.idna3.16 → 3.17 inci/constraints-docs.txt. Five
other Dependabot proposals (docutils0.23,ruamel-yaml
0.19,sphinx-tabs3.4.7+,sphinx-toolbox4.2, and
standard-imghdr3.13) require Python ≥3.11 and so cannot
enter a universal lock that still includes Python 3.10; a
comment aboverequires-python = ">=3.10"in
pyproject.tomllists them for the post-3.10-EOL bump.flake8extend-excludefor.copilot/,build/,
sphinx/build/, and the scratch.env*venvs so the walker
no longer trips on generated or vendored Python files.
0.7.0 - SBOM and Dependency Auditing
Cown-lifecycle correctness fixes — three use-after-free paths in the
CownCapsule pickle / acquire / noticeboard machinery now hold the
inner BOCCown alive across the writer's wrapper drop — plus
supply-chain hardening: pinned and hash-verified Python dependencies,
SHA-pinned GitHub Actions, dependabot coverage, vulnerability scanning,
and PEP 770 SBOMs embedded in every wheel.
New Features
- PEP 770 SBOMs in every wheel — every wheel built by
.github/workflows/build_wheels.ymlnow embeds a
CycloneDX 1.6 <https://cyclonedx.org/specification/overview/>_
JSON SBOM under<dist>-<version>.dist-info/sboms/bocpy.cdx.json.
Generation runs inside cibuildwheel's repair step on every platform
(Linuxauditwheel, macOSdelocate, Windows direct injection)
via the new stdlib-onlyscripts/build_sbom.py. The
injectsubcommand rewrites the wheel'sRECORDatomically
(temp file + rename). - SBOM verification in CI — the new
verify_sbomsjob in
build_wheels.ymlre-downloads the extracted SBOM artifact and
runs two checks:scripts/validate_sbom.py(stdlib-only
structural validator pinning bocpy's wire format) and
grype <https://github.com/anchore/grype>_ (third-party SBOM
scanner) with--fail-on high. A separatesbomsartifact is
also uploaded by themergejob for downstream consumers. bocpy.__version__— a runtime version attribute derived
fromimportlib.metadata.version("bocpy"), with a
PackageNotFoundErrorfallback. Exported frombocpy.__all__
and documented in__init__.pyi.pyproject.tomlremains the
single source of truth for the version.- New documentation — :doc:
sbomwalk-through covering the
embedded SBOM format, extraction recipes, and verification commands. wait(noticeboard=True)final-state capture — :func:wait
now accepts anoticeboardkeyword that returns the final
noticeboard contents as a plaindictat shutdown (after the
noticeboard thread exits, before the entries are freed). Useful
for surfacing an early-stopping result, last error, or aggregated
counter that a behavior deposited just before the runtime
quiesced, replacing the oldersend/receivehandshake
that earlier examples used. Combined withstats=Trueit
returns a new :class:WaitResultNamedTuple(also exported
frombocpy.__all__) carrying both snapshots. The
examples/prime_factor.pyexample was migrated to the new
pattern.
Bug Fixes
- Cown-in-cown use-after-free — a
Cownembedded inside
another cown's value, a message-queue payload, or a noticeboard
snapshot was previously freed when the writer's local wrapper
dropped, because pickle bytes carry no refcount on their own.
CownCapsule_reducenow takes an inheritingCOWN_INCREFthat
_cown_capsule_from_pointer_inheritingconsumes on unpickle, so
the innerBOCCownsurvives until the consumer drops its
decoded wrapper. Affects every cross-cown reference shape — see
the newTestCownInCownclass for the full container-shape fuzz. - Acquire-failure poisoned-state — when
pickle.loadsfailed
partway throughcown_acquire, the cown was left in a
half-acquired state with the encoded bytes still in place. A retry
would re-run pickle against bytes whose embedded inherited refs
had already been partially consumed by pickle's error path,
risking dereferences of freedBOCCown*pointers. The cown's
xidatais now recycled on the failure path and a guard at the
top ofcown_acquirerejects any future acquire with a
deterministicRuntimeError; the worker recovery arm surfaces
it on the failing behavior's result cown. - Noticeboard hidden-cown audit — when a noticeboard value
reached aCownvia a route the pin walker cannot see — custom
__reduce__/__getstate__,copyreg.dispatch_table,
closure capture, module-level cache — the borrowing reconstructor
produced a token whose innerBOCCownwas not held alive by
the entry's pin set, leaving the next reader to UAF after the
writer's wrapper dropped. A per-thread borrowing context
(BOC_NB_CTX) now audits everyCownCapsule_reduceagainst
the caller's pin set during the noticeboard write pickle and
fails the wholenotice_write/notice_updateclosed if
any cown is unaccounted for. UnicodeDecodeErroron non-UTF-8 Windows locales —
Behaviors.startreadworker.pywithopen(path), which
picks uplocale.getpreferredencoding(False). On cp1252
(English Windows) the UTF-8 em-dashes in the worker source were
silently mojibake-d; on cp949 (Korean Windows) the read failed
withUnicodeDecodeError: 'cp949' codec can't decode byte 0xe2
andbocpycould not start at all (reported in
#14 <https://github.com/microsoft/bocpy/issues/14>_ by
@Forthoney <https://github.com/Forthoney>_). Fixed by passing
encoding="utf-8"explicitly inBehaviors.start, and the
same fix was applied to every otheropen()site in the repo
that reads or writes text known to contain non-ASCII bytes
(sphinx/source/conf.py,examples/sketches.pyx2,
export_module.py).- Silent worker-startup failures —
Behaviors.start_workers
raninterpreters.create()andinterpreters.run_string()
on the worker thread without a try/except, so a failure in either
killed the thread without ever replying onboc_behavior. The
parent's boundedreceive()then timed out with no diagnostic.
Both calls are now wrapped, and every failure path sends a
formatted traceback overboc_behaviorso the parent sees a
structured error instead of a timeout. - Silent worker bootstrap import failures — the generated
bootstrap script that loads the user module into each worker
sub-interpreter is now wrapped in a top-level try/except. Any
BaseExceptionis formatted with the user module name and sent
overboc_behavior(falls back tosys.stderrif the
message-queuesenditself raises), then re-raised so
run_stringreports it as well. Module-import failures that
previously surfaced only as a worker-startup timeout now arrive
as a proper traceback. boc_sched_worker_pop_slowskippedpopped_local— the
slow-path pending-fallback and WSQ-dequeue branches returned
work without bumpingpopped_local(the fast path always
did), so the documented producer/consumer identity in
:c:type:boc_sched_stats_twas violated whenever the fairness
arm fired or a worker entered the slow path directly. Both
branches now incrementpopped_localand reset the batch
budget, matching the fast path. The header's reconciliation
paragraph was also tightened to a "near-identity" that explicitly
accounts for fairness-token pops (which are re-enqueued via raw
boc_wsq_enqueuerather thanboc_sched_dispatch, leaving
consumer-side counters without a matching producer-side bump).
Supply Chain
- Hashed and pinned Python dependencies — every CI dependency is
resolved into aci/constraints-<extra>.txtfile via
uv pip compile --universal --generate-hashesand installed with
pip install --require-hashes. Covers thetest,linting,
docs, and newauditextras.bocpyitself is then
installed viapip install -e . --no-depsso an editable build
cannot smuggle in an unpinned transitive dependency. - Vulnerability scanning — new
auditjob inpr_gate.yml
runspip-audit --strictagainst every constraints file on every
PR.pip-audititself is pinned viaci/constraints-audit.txt
and self-checked. A new.github/workflows/nightly_audit.yml
re-runs the audit nightly againstmain. - SHA-pinned GitHub Actions — every
uses:line in
.github/workflows/is now pinned to a full 40-char commit SHA
with a trailing# vX.Y.Zcomment. - Dependabot coverage — new
.github/dependabot.ymlcovers
three ecosystems (piprooted at/ci,github-actions
rooted at/,piprooted at
/templates/c_abi_consumer), grouped weekly per ecosystem. - Downstream template pinned —
templates/c_abi_consumer
pinsbocpy~=MAJOR.MINORas both a build requirement and a
runtime dependency. Thefinalize-prskill bumps it in
lock-step with the root version. - New
SUPPLY_CHAIN.md— top-level policy doc describing
everything above with the exact regeneration commands.
Documentation
- Cown pickle-leak note — :class:
Cownnow documents that
pickle.dumpson a cown produces bytes that carry one strong
reference per embedded cown; orphan bytes (never unpickled in the
producing process) leak one strong ref per byte string. The bocpy
runtime never produces orphan bytes; the leak surface only
applies to third-party code that callspickle.dumps(cown)
directly. - Noticeboard cown-lifetime guarantee — :func:
notice_writeand
:func:notice_updatenow document that values may embed
:class:Cownreferences and that the noticeboard keeps each
embedded cown alive for as long as the entry remains. The new
paragraph in :doc:noticeboardmirrors this guarantee for
readers. - Noticeboard final-state capture guide — :doc:
noticeboard
gained a "Reading the Final State at Shutdown" section covering
thewait(noticeboard=True)contract, the combined
wait(stats=True, noticeboard=True)form returning
:class:WaitResult, the empty-dict fallbacks for the
never-started and never-written cases, and the recommendation
to usesnap.get(key)since :func:waitquiesces as soon as
every behavior ...
v0.6.0 - C ABI
Public C ABI for downstream extensions, enabling C-level participation
in behavior-oriented concurrency across worker sub-interpreters.
New Features
- Decorator composition with
@when— decorators stacked below
@whenare now preserved on the generated behavior function and
compose with the behavior body on the worker. Decorators placed
above@whenraise aSyntaxErrorat transpile time with
actionable guidance.async deffunctions with@whenare
also explicitly rejected. - Public C ABI (
<bocpy/bocpy.h>) — downstream C extensions can
now link against bocpy to register custom Python types as
cross-interpreter shareable so :class:Cowncan carry instances of
them across worker interpreters. The header is C-only, version-gated
via theBOCPY_ABImacro, and bumped on any incompatible change
tobocpy.horxidata.h. Wheels remain CPython-version-tagged
so a runtime ABI mismatch cannot occur. bocpy.get_include()/bocpy.get_sources()— Python-level
helpers that downstreamsetup.pyfiles use to locate the bocpy
headers and the small set of C sources that must be compiled into
the consuming extension.templates/c_abi_consumer/— a ready-to-copy template for
building a C extension against the bocpy ABI, including a
setup.py, a probe extension exercising the public surface, and
a pytest suite (test_public_c_abi.py) that validates the ABI
end-to-end.- C source reorganisation — the per-subsystem translation units
introduced in 0.5.0 have been renamed with aboc_prefix
(boc_compat.[ch],boc_sched.[ch],boc_tags.[ch],
boc_terminator.[ch],boc_noticeboard.[ch],boc_cown.h)
to give the public ABI a stable, namespaced identity.xidata.h
has moved underinclude/bocpy/alongsidebocpy.h.
Documentation
- New :doc:
c_abi, :doc:messaging, and :doc:noticeboardpages
in the Sphinx site; the API reference has been expanded to cover
the public ABI surface.
Breaking Changes
noticeboard_versionremoved — the global monotonic version
counter introduced in 0.4.0 has been removed. It exposed an
implementation detail of the snapshot cache that did not survive
the C ABI review and had no use case that was not better served
bynotice_syncplus an explicitnoticeboard()read.
v0.5.0
Highlights
This release delivers a Verona-RT-style work-stealing scheduler, a global noticeboard (shared key-value store), removal of the central scheduler thread in favour of direct dispatch, and a major C source refactor into per-subsystem translation units with a portable atomics layer.
New Features
- Work-stealing scheduler — the single behavior queue is replaced with a distributed scheduler. Each worker owns an MPMC behavior queue, pops locally first, and steals from peers when idle. Idle workers park on per-worker condition variables and are signalled directly by producer/victim.
- Per-worker fairness tokens — a token node advances through each worker's queue so long-running behaviors cannot monopolise dispatch slots; also drives cooperative shutdown.
- Noticeboard — a shared key-value store (up to 64 keys) readable/writable without acquiring cowns. Writes are non-blocking; reads return a cached per-behavior snapshot. Includes
notice_write,notice_read,notice_update,notice_delete,notice_sync,noticeboard_version, and theREMOVEDsentinel. - Distributed scheduler — two-phase locking, request linking, and dispatch run directly on the caller's thread in C; cown release runs on the executing worker. MCS-style intrusive linked list per cown for zero-bounce handoff.
Cown.exceptionproperty — indicates whether the held value is from an unhandled exception.compat.h/compat.cportability layer — uniformBOCMutex,BOCCond,boc_atomic_*_explicit, monotonic-time, and sleep primitives across MSVC, pthreads, and C11<threads.h>.xidata.hcross-interpreter shim — centralised_PyXIData_*/_PyCrossInterpreterData_*version ladders for CPython 3.12–3.15 (including free-threaded builds).fanout_benchmarkexample — fan-out/fan-in benchmark exercising scheduler throughput under heavy producer load.- Prime factor example (
examples/prime_factor.py) — parallel factorisation via Pollard's rho with noticeboard-coordinated early termination. - Benchmark harness (
examples/benchmark.py) — micro-benchmarks for scheduling throughput, message-queue latency, and noticeboard contention.
Bug Fixes
- Transpiler aliased imports —
visit_Import/visit_ImportFromnow track alias names (import X as Y), preventing spurious "name not found" errors and duplicatewhencallinjection. - Global variable capture —
@whenclosure capture falls back toframe.f_globalswhen a name is not in any local scope, fixingNameErrorfor module-level variables.
Improvements
- In-memory transpiled-module loading — workers
execthe transpiled source from a string literal instead of writing to disk, eliminating filesystem round-trips and leftover.pyfiles. - Nested
@whencapture — the transpiler recurses into nested@when-decorated functions when computing outer captures, so child behaviors can close over the outer frame. - C extension split —
_core.creduced from ~5,000 to ~3,500 lines by extractingsched.{c,h},noticeboard.{c,h},terminator.{c,h},tags.{c,h},cown.h,compat.{c,h}, andxidata.h. - Direct dispatch on cown release —
behavior_release_allhands resolved successors directly to workers viaboc_sched_dispatch, removing one queue hop per handoff. - Cooperative worker shutdown —
boc_sched_worker_request_stop_all/boc_sched_unpause_allprovide a clean stop/drain protocol. - Matrix docstrings — all
MatrixC methods now carry built-in docstrings. - Examples package relocated — moved to top-level
examples/directory (still importable asbocpy.examples). - Filtered PyPI README —
setup.pystrips<!-- pypi-skip-start -->regions before publishing. - Documentation refresh — expanded coverage of noticeboard, distributed scheduler, and new APIs.
Internal Test Modules (opt-in via BOCPY_BUILD_INTERNAL_TESTS=1)
_internal_test_atomics— correctness tests forcompat.htyped-atomics._internal_test_bq— torture tests for the MPMC behavior queue._internal_test_wsq— tests for work-stealing primitives (fast pop, slow pop, steal, park/unpark).
Test Suite
test_noticeboard.py— snapshot semantics,notice_updateatomicity,REMOVED,notice_sync, version monotonicity.test_scheduler_integration.py,test_scheduler_stats.py,test_scheduler_steal.py— end-to-end and per-primitive scheduler tests.test_compat_atomics.py— portable atomics smoke tests.test_stop_retry_composition.py—stop()/start()/wait()retry composition.test_scheduling_stress.py— expanded with fan-out, work-stealing, and shutdown stress scenarios.test_transpiler.py— AST extraction, capture rewriting, aliased imports, module export.
Full changelog: v0.3.1...v0.5.0
v0.3.1
CownCapsule serialization support for nested cowns.
Bug Fixes
- Removed the ownership check in
_cown_sharedthat prevented a
CownCapsulefrom being serialized to XIData when it was the value
of anotherCown. The check was unnecessary —_cown_sharedonly
stores a pointer and ownership is enforced at acquire time.
Improvements
- Added
CownCapsule.__reduce__withCOWN_INCREFpinning so that a
CownCapsuleembedded in a container (dict, list, etc.) can survive
the pickle round-trip used byobject_to_xidata. A module-level
reconstructor (_cown_capsule_from_pointer) inherits the pin without
a redundantCOWN_INCREF, and validates the process ID on unpickle to
guard against cross-process misuse.
v0.3.0
Improvements
- Added
CownCapsule.disown()— abandons a cown's value without
serializing it and resets ownership toNO_OWNER. Used during worker
cleanup to safely discard orphan cowns before the owning interpreter
is destroyed, preventing dangling Python object references. - Rewrote
receiveto use a two-phase spin-then-park strategy for
single-tag untimed receives. Phase 1 spins forBOC_SPIN_COUNT
iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
busy-wait CPU burn. Timed receives and multi-tag receives use
spin-then-backoff with exponential sleep (1 µs → 1 ms cap). - Added platform-abstracted condvar primitives (
BOCParkMutex/
BOCParkCond) with implementations for Windows (SRWLOCK /
CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads). - Each
BOCQueuenow carries awaiterscounter,park_mutex, and
park_cond. Producers signal parked receivers after enqueue;
drainandset_tagsbroadcast to wake all parked threads. - Replaced the fixed
thrd_sleepinsendwith asched_yield/
SwitchToThread, reducing send-side latency. - Refactored the monolithic
_core_receiveintoreceive_single_tag
andreceive_multi_tag, each with its own backoff/parking logic. - Moved the
BOC_QUEUE_DISABLEDcheck earlier inget_queue_for_tag
so callers skip disabled queues instead of returning NULL after
tag resolution. - Added Windows-compatible
atomic_load_explicit/
atomic_fetch_add_explicit/atomic_fetch_sub_explicitmacros
usingInterlockedExchangeAdd64. - Declared
Py_mod_gil = Py_MOD_GIL_NOT_USEDin both_coreand
_mathC extensions so that importing bocpy on a free-threaded
Python build (3.13t+) does not re-enable the GIL. - Replaced
PyDict_GetItem(borrowed reference) with
PyDict_GetItemRef(strong reference) inBOCRecycleQueue_recycle
on Python 3.13+, improving forward-compatibility with free-threaded
builds.
Bug Fixes
- Fixed a deadlock when the same cown is passed multiple times to
@when
(e.g.@when(c, c)). Duplicate requests for the same cown caused the
MCS-queue-based two-phase locking to spin-wait on itself. Requests are
now deduplicated by target cown inBehavior.__init__, with
compensatingresolve_onecalls to maintain the behavior count
invariant.
Tests
TestLostWakeStress: single-producer random delays, bursty producer,
and repeated single-message wake to detect lost-wake races.TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
delayed arrival, per-tag FIFO ordering, timeout, and interleaved
producers.TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
zero-timeout immediacy.- Added tests for duplicate cowns in
@when: same cown twice, thrice,
non-adjacent duplicates, duplicates within a group, and mutation
aliasing semantics.
CI
- Added a
free-threadedCI job that tests against Python 3.13t and
3.14t on Linux, with explicit assertions that the GIL remains disabled
after import.
Full Changelog: v0.2.2...v0.3.0