dgenio · dgenio · Jun 24, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **Context-firewall sizing, budgeting & summary fidelity.** A grouped pass over
+  how the firewall measures, bounds, and represents payloads:
+  - **Allocation-free size estimation (#207).** `firewall.estimated_size` walks
+    a value to approximate its serialised length instead of `json.dumps`-ing the
+    whole payload just to measure it; the raw-mode budget check now uses it, so
+    multi-MB results are no longer fully serialised for sizing.
+  - **Byte-aware handle budgeting (#211).** `HandleStore` accepts optional
+    `max_total_bytes` (evict oldest-first) and `max_entry_bytes` (reject an
+    over-cap payload with the new `HandleTooLarge` error). Both default to
+    `None`, leaving existing behaviour unchanged; `current_bytes` exposes
+    resident usage.
+  - **tiktoken token counter (#218).** `firewall.make_tiktoken_counter()`
+    implements the `TokenCounter` seam with a real tokenizer (configurable
+    encoding, default `cl100k_base`), wired via `BudgetManager(token_counter=…)`.
+    `tiktoken` is imported lazily so the base install stays dependency-light;
+    install the existing `weaver-kernel[tiktoken]` extra to use it.
+  - **Honest summaries (#174).** Boolean columns are summarised as true/false
+    counts instead of being averaged as numbers, and truncated fact lists carry
+    an explicit "N more facts omitted" marker.
 - **CI / supply-chain hardening.** A focused pass over the build pipeline and
   repository automation:
   - **Bare-install CI job (#208).** Installs `weaver-kernel` with no extras,

diff --git a/docs/context_firewall.md b/docs/context_firewall.md
@@ -19,6 +19,12 @@ Budgets(
 )
 ```
 
+The character size used for budget comparisons is computed by an allocation-free
+estimator (`weaver_kernel.firewall.estimated_size`) that walks the structure
+rather than serialising it with `json.dumps` — so a multi-MB raw result is never
+fully serialised just to measure it. The estimate is deterministic and tracks
+the serialised length closely; only threshold comparisons depend on it.
+
 ## Response modes
 
 | Mode | What you get | When to use |
@@ -51,6 +57,31 @@ expanded = kernel.expand(handle, query={"fields": ["id", "name"]}, principal=pri
 expanded = kernel.expand(handle, query={"filter": {"status": "unpaid"}}, principal=principal)
 ```
 
+### Bounding handle memory by size
+
+The store holds *raw, pre-firewall* datasets, and entry count is a poor proxy
+for memory — one deployment's 10k entries are kilobytes, another's are
+gigabytes. `HandleStore` accepts two optional byte budgets (both `None` =
+disabled, so default behaviour is unchanged):
+
+```python
+from weaver_kernel import HandleStore
+
+store = HandleStore(
+    max_total_bytes=512 * 1024 * 1024,  # evict oldest-first until within budget
+    max_entry_bytes=64 * 1024 * 1024,   # reject a single over-cap payload
+)
+```
+
+Sizes are estimated with the same `estimated_size` walk used for budgets.
+`max_total_bytes` evicts oldest-first after each store (never the just-stored
+entry); `max_entry_bytes` rejects an over-cap payload with `HandleTooLarge`
+rather than truncating it, keeping expansion faithful to the original dataset. A
+single entry larger than `max_total_bytes` can never fit, so it is rejected the
+same way — `current_bytes` therefore never exceeds `max_total_bytes`. Expanding
+an evicted handle raises the usual `HandleNotFound`. Tighter budgets mean more
+"handle expired/evicted" experiences — tune for your workload.
+
 ## Redaction
 
 When a capability has `SensitivityTag.PII` or `SensitivityTag.PCI`:
@@ -84,11 +115,18 @@ never becomes a sensitive-data sink (see `docs/security.md`).
 ## Summarization
 
 Summaries are produced deterministically:
-- **list of dicts** → row count + top keys + numeric stats + categorical distributions
+- **list of dicts** → row count + top keys + numeric stats + categorical/boolean
+  distributions
 - **dict** → key list + per-value type/value
 - **string** → truncated to 500 chars
 - **other** → repr() truncated to 200 chars
 
+Boolean columns are reported as `True`/`False` counts, never averaged (a `bool`
+is an `int` subclass in Python, so "mean of `is_active` = 0.7" is nonsense). When
+the fact list is capped by `max_facts`, the final fact is an explicit omission
+marker (`… (N more facts omitted; full data via handle)`) so a truncated summary
+is never mistaken for a complete one.
+
 ## Cross-invocation budgets
 
 The per-invocation `Budgets` above cap a single Frame. A separate
@@ -129,21 +167,27 @@ remaining drops *below* 5% does `handle_only` take over.
 `budget_remaining` in the returned `DryRunResult`, so callers can preview
 what their next live invocation would actually return.
 
-Plug a different token counter (for example, a `tiktoken`-based one) via the
-`TokenCounter` protocol:
+The default counter (`default_token_counter`) is a character-based
+`len(json.dumps(value)) // 4` approximation with no extra dependencies. For real
+token counts, install the `tiktoken` extra and use the shipped factory:
 
 ```python
-import tiktoken                         # pip install weaver-kernel[tiktoken]
-enc = tiktoken.encoding_for_model("gpt-4o")
+from weaver_kernel.firewall import BudgetManager, make_tiktoken_counter
 
-def tiktoken_counter(value):
-    return len(enc.encode(str(value)))
-
-manager = BudgetManager(total_budget=128_000, token_counter=tiktoken_counter)
+# pip install weaver-kernel[tiktoken]
+manager = BudgetManager(
+    total_budget=128_000,
+    token_counter=make_tiktoken_counter(),              # default cl100k_base
+    # token_counter=make_tiktoken_counter("o200k_base"),  # GPT-4o / o-series
+)
 ```
 
-The default counter (`default_token_counter`) is a character-based
-`len(json.dumps(value)) // 4` approximation with no extra dependencies.
+`make_tiktoken_counter` resolves and caches the encoder eagerly, so a missing
+extra (`ImportError`) or an unknown encoding name (`FirewallError`) fails at
+construction rather than mid-budgeting. The encoding is explicit because models
+tokenize differently — name the one you budget against. `tiktoken` is imported
+lazily, so `import weaver_kernel` never pulls the heavyweight dependency. Any
+callable matching the `TokenCounter` protocol works too.
 
 ## Streaming
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -137,3 +137,7 @@ ignore_missing_imports = true
 [[tool.mypy.overrides]]
 module = "weaver_contracts.*"
 ignore_missing_imports = true
+
+[[tool.mypy.overrides]]
+module = "tiktoken.*"
+ignore_missing_imports = true
diff --git a/src/weaver_kernel/__init__.py b/src/weaver_kernel/__init__.py
@@ -40,9 +40,7 @@
     from weaver_kernel import SQLiteTraceStore, JsonlTraceStore
     from weaver_kernel import SQLiteRevocationStore, InMemoryRevocationStore
     from weaver_kernel import verify_chain, ChainVerificationResult, TraceRecord
-    from weaver_kernel import (
-        TraceStoreProtocol, RevocationStoreProtocol, HandleStoreProtocol,
-    )
+    from weaver_kernel import TraceStoreProtocol, RevocationStoreProtocol, HandleStoreProtocol
 
 LLM tool-format adapters::
 
@@ -62,7 +60,7 @@
         DriverError, FirewallError, AdapterParseError,
         BudgetExhausted, BudgetConfigError,
         CapabilityNotFound, CapabilityAlreadyRegistered,
-        HandleNotFound, HandleExpired, HandleConstraintViolation,
+        HandleNotFound, HandleExpired, HandleTooLarge, HandleConstraintViolation,
         NamespaceNotFound, FederationError, ManifestError, ManifestSignatureError,
         TrustPolicyError, DiscoveryError,
     )
@@ -91,6 +89,7 @@
     HandleConstraintViolation,
     HandleExpired,
     HandleNotFound,
+    HandleTooLarge,
     ManifestError,
     ManifestSignatureError,
     NamespaceNotFound,
@@ -246,6 +245,7 @@
     "HandleConstraintViolation",
     "HandleExpired",
     "HandleNotFound",
+    "HandleTooLarge",
     "ManifestError",
     "ManifestSignatureError",
     "NamespaceNotFound",

diff --git a/src/weaver_kernel/errors.py b/src/weaver_kernel/errors.py
@@ -78,9 +78,13 @@ class BudgetExhausted(AgentKernelError):
 
 
 class BudgetConfigError(AgentKernelError):
-    """Raised when a :class:`~weaver_kernel.firewall.budget_manager.BudgetManager` is
-    constructed with invalid parameters, or asked to allocate/record/release
-    a negative amount.
+    """Raised when a budget is constructed with invalid parameters.
+
+    Covers the :class:`~weaver_kernel.firewall.budget_manager.BudgetManager`
+    (non-positive ``total_budget``/``default_request``, or a negative
+    allocate/record/release amount) and the
+    :class:`~weaver_kernel.handles.HandleStore` byte budgets (non-positive
+    ``max_total_bytes``/``max_entry_bytes``).
 
     Used in place of bare :class:`ValueError` so callers can catch budget
     configuration mistakes without swallowing unrelated stdlib errors.
@@ -128,6 +132,19 @@ class HandleExpired(AgentKernelError):
     """Raised when a handle's TTL has elapsed."""
 
 
+class HandleTooLarge(AgentKernelError):
+    """Raised when a single handle payload exceeds the store's byte ceiling.
+
+    Fires from :meth:`~weaver_kernel.handles.HandleStore.store` when a byte
+    budget is configured and the estimated size of the data exceeds the binding
+    per-store ceiling — ``max_entry_bytes``, or ``max_total_bytes`` (a single
+    entry larger than the whole budget can never fit). The data is *not*
+    retained — rejecting an over-cap payload outright (rather than silently
+    truncating it) keeps handle expansion faithful to the original dataset and
+    bounds resident raw data.
+    """
+
+
 class HandleConstraintViolation(AgentKernelError):
     """Raised when a handle expansion request violates the grant's constraints.
 

diff --git a/src/weaver_kernel/firewall/__init__.py b/src/weaver_kernel/firewall/__init__.py
@@ -3,8 +3,10 @@
 from .budget_manager import BudgetManager
 from .budgets import Budgets
 from .redaction import redact
+from .size_estimate import estimated_size
 from .summarize import summarize
 from .token_counting import TokenCounter, default_token_counter
+from .token_counting_tiktoken import make_tiktoken_counter
 from .transform import Firewall
 
 __all__ = [
@@ -13,6 +15,8 @@
     "Firewall",
     "TokenCounter",
     "default_token_counter",
+    "estimated_size",
+    "make_tiktoken_counter",
     "redact",
     "summarize",
 ]
diff --git a/src/weaver_kernel/firewall/size_estimate.py b/src/weaver_kernel/firewall/size_estimate.py
@@ -0,0 +1,112 @@
+"""Allocation-free payload size estimation for the firewall hot path.
+
+``estimated_size`` approximates ``len(json.dumps(value, default=str))`` by
+walking the structure and summing approximate character contributions, *without*
+serialising the value to an intermediate string. The firewall runs on every
+egress path and only needs the size to compare against budget thresholds, so a
+single allocation-free pass (with an optional early exit) replaces a full JSON
+serialisation that would double peak memory on large payloads.
+
+The estimate is intentionally approximate but **deterministic**: the same input
+always yields the same number (the total is order-independent). Self-referential
+structures are handled gracefully — each container is counted at most once, so a
+cycle can never hang the walk (unlike ``json.dumps``, which raises). Counting
+each container once also means a structure that legitimately reuses the same
+container in several positions (a DAG, not a cycle) is counted once, so the
+estimate is a *lower bound* for shared-reference inputs; firewall and handle
+payloads are deserialised JSON without shared references, where this does not
+arise. It is used by
+:mod:`~weaver_kernel.firewall.transform` (raw-mode budget warning, issue #207)
+and by :class:`~weaver_kernel.handles.HandleStore` (byte-size budgeting,
+issue #211).
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+# Approximate JSON literal widths and structural overhead. json.dumps with the
+# default separators emits ``", "`` between items and ``": "`` between a key and
+# its value; object/array delimiters add two characters per container. These
+# constants keep the estimate close to the real serialised length without
+# reproducing json's exact escaping rules (bounded error is acceptable because
+# only threshold comparisons depend on the result).
+_NULL_LEN = 4  # "null"
+_TRUE_LEN = 4  # "true"
+_FALSE_LEN = 5  # "false"
+_QUOTES = 2  # surrounding "" on a string
+_CONTAINER_DELIMS = 2  # {} or []
+_ITEM_SEP = 2  # ", "
+_KV_SEP = 2  # ": "
+
+
+def estimated_size(value: Any, *, limit: int | None = None) -> int:
+    """Approximate the serialised character size of *value* without serialising.
+
+    Walks *value* iteratively (so deeply nested structures cannot exhaust the
+    recursion limit) summing approximate JSON character contributions. ``bool``
+    is handled before ``int`` because ``bool`` is an ``int`` subclass in Python.
+    Each container is visited at most once, so self-referential inputs terminate
+    instead of looping forever.
+
+    Args:
+        value: Any value the firewall might serialise. Non-JSON types fall back
+            to ``len(str(value))`` plus string quoting, mirroring the
+            ``default=str`` behaviour of the previous ``json.dumps`` measurement.
+        limit: Optional early-exit threshold. When the running total exceeds
+            *limit* the walk stops and returns a value greater than *limit* — use
+            this when only the boolean ``size > limit`` decision is needed, not
+            the exact size. (Which member tips the total past *limit* is
+            unspecified, but the returned value is always ``> limit``.)
+
+    Returns:
+        A non-negative integer approximating ``len(json.dumps(value,
+        default=str))``.
+
+    Example:
+        >>> estimated_size(None)
+        4
+        >>> estimated_size("hi")
+        4
+        >>> estimated_size([1, 2, 3])
+        9
+    """
+    total = 0
+    seen: set[int] = set()  # container ids already counted, to break cycles
+    stack: list[Any] = [value]
+    while stack:
+        cur = stack.pop()
+        if cur is None:
+            total += _NULL_LEN
+        elif cur is True:
+            total += _TRUE_LEN
+        elif cur is False:
+            total += _FALSE_LEN
+        elif isinstance(cur, str):
+            total += len(cur) + _QUOTES
+        elif isinstance(cur, bool):  # pragma: no cover - True/False caught above
+            total += _TRUE_LEN
+        elif isinstance(cur, (int, float)):
+            total += len(str(cur))
+        elif isinstance(cur, dict):
+            if id(cur) in seen:
+                continue
+            seen.add(id(cur))
+            n = len(cur)
+            total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP
+            for key, val in cur.items():
+                total += len(str(key)) + _QUOTES + _KV_SEP
+                stack.append(val)
+        elif isinstance(cur, (list, tuple)):
+            if id(cur) in seen:
+                continue
+            seen.add(id(cur))
+            n = len(cur)
+            total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP
+            stack.extend(cur)
+        else:
+            # Mirrors json.dumps(default=str): rendered as a quoted string.
+            total += len(str(cur)) + _QUOTES
+        if limit is not None and total > limit:
+            return total
+    return total