Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- **Context-firewall sizing, budgeting & summary fidelity.** A grouped pass over
how the firewall measures, bounds, and represents payloads:
- **Allocation-free size estimation (#207).** `firewall.estimated_size` walks
a value to approximate its serialised length instead of `json.dumps`-ing the
whole payload just to measure it; the raw-mode budget check now uses it, so
multi-MB results are no longer fully serialised for sizing.
- **Byte-aware handle budgeting (#211).** `HandleStore` accepts optional
`max_total_bytes` (evict oldest-first) and `max_entry_bytes` (reject an
over-cap payload with the new `HandleTooLarge` error). Both default to
`None`, leaving existing behaviour unchanged; `current_bytes` exposes
resident usage.
- **tiktoken token counter (#218).** `firewall.make_tiktoken_counter()`
implements the `TokenCounter` seam with a real tokenizer (configurable
encoding, default `cl100k_base`), wired via `BudgetManager(token_counter=…)`.
`tiktoken` is imported lazily so the base install stays dependency-light;
install the existing `weaver-kernel[tiktoken]` extra to use it.
- **Honest summaries (#174).** Boolean columns are summarised as true/false
counts instead of being averaged as numbers, and truncated fact lists carry
an explicit "N more facts omitted" marker.
- **CI / supply-chain hardening.** A focused pass over the build pipeline and
repository automation:
- **Bare-install CI job (#208).** Installs `weaver-kernel` with no extras,
Expand Down
66 changes: 55 additions & 11 deletions docs/context_firewall.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ Budgets(
)
```

The character size used for budget comparisons is computed by an allocation-free
estimator (`weaver_kernel.firewall.estimated_size`) that walks the structure
rather than serialising it with `json.dumps` — so a multi-MB raw result is never
fully serialised just to measure it. The estimate is deterministic and tracks
the serialised length closely; only threshold comparisons depend on it.

## Response modes

| Mode | What you get | When to use |
Expand Down Expand Up @@ -51,6 +57,31 @@ expanded = kernel.expand(handle, query={"fields": ["id", "name"]}, principal=pri
expanded = kernel.expand(handle, query={"filter": {"status": "unpaid"}}, principal=principal)
```

### Bounding handle memory by size

The store holds *raw, pre-firewall* datasets, and entry count is a poor proxy
for memory — one deployment's 10k entries are kilobytes, another's are
gigabytes. `HandleStore` accepts two optional byte budgets (both `None` =
disabled, so default behaviour is unchanged):

```python
from weaver_kernel import HandleStore

store = HandleStore(
max_total_bytes=512 * 1024 * 1024, # evict oldest-first until within budget
max_entry_bytes=64 * 1024 * 1024, # reject a single over-cap payload
)
```

Sizes are estimated with the same `estimated_size` walk used for budgets.
`max_total_bytes` evicts oldest-first after each store (never the just-stored
entry); `max_entry_bytes` rejects an over-cap payload with `HandleTooLarge`
rather than truncating it, keeping expansion faithful to the original dataset. A
single entry larger than `max_total_bytes` can never fit, so it is rejected the
same way — `current_bytes` therefore never exceeds `max_total_bytes`. Expanding
an evicted handle raises the usual `HandleNotFound`. Tighter budgets mean more
"handle expired/evicted" experiences — tune for your workload.

## Redaction

When a capability has `SensitivityTag.PII` or `SensitivityTag.PCI`:
Expand Down Expand Up @@ -84,11 +115,18 @@ never becomes a sensitive-data sink (see `docs/security.md`).
## Summarization

Summaries are produced deterministically:
- **list of dicts** → row count + top keys + numeric stats + categorical distributions
- **list of dicts** → row count + top keys + numeric stats + categorical/boolean
distributions
- **dict** → key list + per-value type/value
- **string** → truncated to 500 chars
- **other** → repr() truncated to 200 chars

Boolean columns are reported as `True`/`False` counts, never averaged (a `bool`
is an `int` subclass in Python, so "mean of `is_active` = 0.7" is nonsense). When
the fact list is capped by `max_facts`, the final fact is an explicit omission
marker (`… (N more facts omitted; full data via handle)`) so a truncated summary
is never mistaken for a complete one.

## Cross-invocation budgets

The per-invocation `Budgets` above cap a single Frame. A separate
Expand Down Expand Up @@ -129,21 +167,27 @@ remaining drops *below* 5% does `handle_only` take over.
`budget_remaining` in the returned `DryRunResult`, so callers can preview
what their next live invocation would actually return.

Plug a different token counter (for example, a `tiktoken`-based one) via the
`TokenCounter` protocol:
The default counter (`default_token_counter`) is a character-based
`len(json.dumps(value)) // 4` approximation with no extra dependencies. For real
token counts, install the `tiktoken` extra and use the shipped factory:

```python
import tiktoken # pip install weaver-kernel[tiktoken]
enc = tiktoken.encoding_for_model("gpt-4o")
from weaver_kernel.firewall import BudgetManager, make_tiktoken_counter

def tiktoken_counter(value):
return len(enc.encode(str(value)))

manager = BudgetManager(total_budget=128_000, token_counter=tiktoken_counter)
# pip install weaver-kernel[tiktoken]
manager = BudgetManager(
total_budget=128_000,
token_counter=make_tiktoken_counter(), # default cl100k_base
# token_counter=make_tiktoken_counter("o200k_base"), # GPT-4o / o-series
)
```

The default counter (`default_token_counter`) is a character-based
`len(json.dumps(value)) // 4` approximation with no extra dependencies.
`make_tiktoken_counter` resolves and caches the encoder eagerly, so a missing
extra (`ImportError`) or an unknown encoding name (`FirewallError`) fails at
construction rather than mid-budgeting. The encoding is explicit because models
tokenize differently — name the one you budget against. `tiktoken` is imported
lazily, so `import weaver_kernel` never pulls the heavyweight dependency. Any
callable matching the `TokenCounter` protocol works too.

## Streaming

Expand Down
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,7 @@ ignore_missing_imports = true
[[tool.mypy.overrides]]
module = "weaver_contracts.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "tiktoken.*"
ignore_missing_imports = true
8 changes: 4 additions & 4 deletions src/weaver_kernel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,7 @@
from weaver_kernel import SQLiteTraceStore, JsonlTraceStore
from weaver_kernel import SQLiteRevocationStore, InMemoryRevocationStore
from weaver_kernel import verify_chain, ChainVerificationResult, TraceRecord
from weaver_kernel import (
TraceStoreProtocol, RevocationStoreProtocol, HandleStoreProtocol,
)
from weaver_kernel import TraceStoreProtocol, RevocationStoreProtocol, HandleStoreProtocol

LLM tool-format adapters::

Expand All @@ -62,7 +60,7 @@
DriverError, FirewallError, AdapterParseError,
BudgetExhausted, BudgetConfigError,
CapabilityNotFound, CapabilityAlreadyRegistered,
HandleNotFound, HandleExpired, HandleConstraintViolation,
HandleNotFound, HandleExpired, HandleTooLarge, HandleConstraintViolation,
NamespaceNotFound, FederationError, ManifestError, ManifestSignatureError,
TrustPolicyError, DiscoveryError,
)
Expand Down Expand Up @@ -91,6 +89,7 @@
HandleConstraintViolation,
HandleExpired,
HandleNotFound,
HandleTooLarge,
ManifestError,
ManifestSignatureError,
NamespaceNotFound,
Expand Down Expand Up @@ -246,6 +245,7 @@
"HandleConstraintViolation",
"HandleExpired",
"HandleNotFound",
"HandleTooLarge",
"ManifestError",
"ManifestSignatureError",
"NamespaceNotFound",
Expand Down
23 changes: 20 additions & 3 deletions src/weaver_kernel/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,13 @@ class BudgetExhausted(AgentKernelError):


class BudgetConfigError(AgentKernelError):
"""Raised when a :class:`~weaver_kernel.firewall.budget_manager.BudgetManager` is
constructed with invalid parameters, or asked to allocate/record/release
a negative amount.
"""Raised when a budget is constructed with invalid parameters.

Covers the :class:`~weaver_kernel.firewall.budget_manager.BudgetManager`
(non-positive ``total_budget``/``default_request``, or a negative
allocate/record/release amount) and the
:class:`~weaver_kernel.handles.HandleStore` byte budgets (non-positive
``max_total_bytes``/``max_entry_bytes``).

Used in place of bare :class:`ValueError` so callers can catch budget
configuration mistakes without swallowing unrelated stdlib errors.
Expand Down Expand Up @@ -128,6 +132,19 @@ class HandleExpired(AgentKernelError):
"""Raised when a handle's TTL has elapsed."""


class HandleTooLarge(AgentKernelError):
"""Raised when a single handle payload exceeds the store's byte ceiling.

Fires from :meth:`~weaver_kernel.handles.HandleStore.store` when a byte
budget is configured and the estimated size of the data exceeds the binding
per-store ceiling — ``max_entry_bytes``, or ``max_total_bytes`` (a single
entry larger than the whole budget can never fit). The data is *not*
retained — rejecting an over-cap payload outright (rather than silently
truncating it) keeps handle expansion faithful to the original dataset and
bounds resident raw data.
"""


class HandleConstraintViolation(AgentKernelError):
"""Raised when a handle expansion request violates the grant's constraints.

Expand Down
4 changes: 4 additions & 0 deletions src/weaver_kernel/firewall/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
from .budget_manager import BudgetManager
from .budgets import Budgets
from .redaction import redact
from .size_estimate import estimated_size
from .summarize import summarize
from .token_counting import TokenCounter, default_token_counter
from .token_counting_tiktoken import make_tiktoken_counter
from .transform import Firewall

__all__ = [
Expand All @@ -13,6 +15,8 @@
"Firewall",
"TokenCounter",
"default_token_counter",
"estimated_size",
"make_tiktoken_counter",
"redact",
"summarize",
]
112 changes: 112 additions & 0 deletions src/weaver_kernel/firewall/size_estimate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
"""Allocation-free payload size estimation for the firewall hot path.

``estimated_size`` approximates ``len(json.dumps(value, default=str))`` by
walking the structure and summing approximate character contributions, *without*
serialising the value to an intermediate string. The firewall runs on every
egress path and only needs the size to compare against budget thresholds, so a
single allocation-free pass (with an optional early exit) replaces a full JSON
serialisation that would double peak memory on large payloads.

The estimate is intentionally approximate but **deterministic**: the same input
always yields the same number (the total is order-independent). Self-referential
structures are handled gracefully — each container is counted at most once, so a
cycle can never hang the walk (unlike ``json.dumps``, which raises). Counting
each container once also means a structure that legitimately reuses the same
container in several positions (a DAG, not a cycle) is counted once, so the
estimate is a *lower bound* for shared-reference inputs; firewall and handle
payloads are deserialised JSON without shared references, where this does not
arise. It is used by
:mod:`~weaver_kernel.firewall.transform` (raw-mode budget warning, issue #207)
and by :class:`~weaver_kernel.handles.HandleStore` (byte-size budgeting,
issue #211).
"""

from __future__ import annotations

from typing import Any

# Approximate JSON literal widths and structural overhead. json.dumps with the
# default separators emits ``", "`` between items and ``": "`` between a key and
# its value; object/array delimiters add two characters per container. These
# constants keep the estimate close to the real serialised length without
# reproducing json's exact escaping rules (bounded error is acceptable because
# only threshold comparisons depend on the result).
_NULL_LEN = 4 # "null"
_TRUE_LEN = 4 # "true"
_FALSE_LEN = 5 # "false"
_QUOTES = 2 # surrounding "" on a string
_CONTAINER_DELIMS = 2 # {} or []
_ITEM_SEP = 2 # ", "
_KV_SEP = 2 # ": "


def estimated_size(value: Any, *, limit: int | None = None) -> int:
"""Approximate the serialised character size of *value* without serialising.

Walks *value* iteratively (so deeply nested structures cannot exhaust the
recursion limit) summing approximate JSON character contributions. ``bool``
is handled before ``int`` because ``bool`` is an ``int`` subclass in Python.
Each container is visited at most once, so self-referential inputs terminate
instead of looping forever.

Args:
value: Any value the firewall might serialise. Non-JSON types fall back
to ``len(str(value))`` plus string quoting, mirroring the
``default=str`` behaviour of the previous ``json.dumps`` measurement.
limit: Optional early-exit threshold. When the running total exceeds
*limit* the walk stops and returns a value greater than *limit* — use
this when only the boolean ``size > limit`` decision is needed, not
the exact size. (Which member tips the total past *limit* is
unspecified, but the returned value is always ``> limit``.)

Returns:
A non-negative integer approximating ``len(json.dumps(value,
default=str))``.

Example:
>>> estimated_size(None)
4
>>> estimated_size("hi")
4
>>> estimated_size([1, 2, 3])
9
"""
total = 0
seen: set[int] = set() # container ids already counted, to break cycles
stack: list[Any] = [value]
while stack:
Comment thread
dgenio marked this conversation as resolved.
cur = stack.pop()
if cur is None:
total += _NULL_LEN
elif cur is True:
total += _TRUE_LEN
elif cur is False:
total += _FALSE_LEN
elif isinstance(cur, str):
total += len(cur) + _QUOTES
elif isinstance(cur, bool): # pragma: no cover - True/False caught above
total += _TRUE_LEN
elif isinstance(cur, (int, float)):
total += len(str(cur))
elif isinstance(cur, dict):
if id(cur) in seen:
continue
seen.add(id(cur))
n = len(cur)
total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP
for key, val in cur.items():
total += len(str(key)) + _QUOTES + _KV_SEP
stack.append(val)
elif isinstance(cur, (list, tuple)):
if id(cur) in seen:
continue
seen.add(id(cur))
n = len(cur)
total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP
stack.extend(cur)
Comment thread
dgenio marked this conversation as resolved.
else:
# Mirrors json.dumps(default=str): rendered as a quoted string.
total += len(str(cur)) + _QUOTES
if limit is not None and total > limit:
return total
return total
Loading
Loading