You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add caps and expiry-based cleanup to the two unbounded in-memory structures: TraceStore (plain dict, no cap) and HMACTokenProvider._revoked / _principal_tokens (an acknowledged TODO), mirroring the eviction design HandleStore already has.
Why this matters
Long-lived agent processes accumulate one trace per invocation and one revocation
entry per revoked token forever. That is a slow memory leak and a denial-of-service
vector in high-volume deployments, and it contradicts the polish elsewhere
(HandleStore has max_entries=10_000 plus lazy/interval eviction).
Current evidence
trace.py: TraceStore stores traces in a plain dict with no cap or eviction.
tokens.py:196: # TODO: consider TTL-based cleanup to bound growth over long-lived instances above _principal_tokens; _revoked is an unbounded set[str].
HMACTokenProvider: revocation entries can be dropped once the underlying token's expires_at has passed (an expired token fails verification anyway). Track
expiry alongside revoked ids and sweep on an interval, holding _revocation_lock.
Reuse the HandleStore lazy + interval eviction pattern for consistency.
Safety edge case: never drop a revocation entry before the token expires — that would un-revoke a live token. Test this explicitly.
Determinism: eviction order must be deterministic.
Audit edge case: evicting traces loses audit data; the cap must be configurable and the eviction observable.
Acceptance criteria
TraceStore never exceeds its configured cap; eviction is oldest-first and counted.
A revoked, unexpired token still fails verification after any sweep.
Revocation entries for expired tokens are eventually removed.
Memory growth under a loop of grant/revoke is bounded (regression test).
Test plan
Unit tests with injected clocks for sweep timing and the revoked-but-unexpired edge
case; cap tests for TraceStore. Run make ci.
Documentation plan
Document retention defaults in docs/security.md and docs/architecture.md;
CHANGELOG Added/Fixed.
Migration and compatibility notes
Defaults chosen high enough that typical sessions are unaffected; deployments needing
infinite retention should adopt persistence (#126). Not expected to require migration.
Risks and tradeoffs
Trace eviction trades audit completeness for boundedness — make the cap loud
(warning on first eviction). Revocation sweep correctness depends on clock handling;
reuse the injectable-clock pattern.
Summary
Add caps and expiry-based cleanup to the two unbounded in-memory structures:
TraceStore(plain dict, no cap) andHMACTokenProvider._revoked/_principal_tokens(an acknowledged TODO), mirroring the eviction designHandleStorealready has.Why this matters
Long-lived agent processes accumulate one trace per invocation and one revocation
entry per revoked token forever. That is a slow memory leak and a denial-of-service
vector in high-volume deployments, and it contradicts the polish elsewhere
(
HandleStorehasmax_entries=10_000plus lazy/interval eviction).Current evidence
trace.py:TraceStorestores traces in a plain dict with no cap or eviction.tokens.py:196:# TODO: consider TTL-based cleanup to bound growth over long-lived instancesabove_principal_tokens;_revokedis an unboundedset[str].handles.py:17-32: documented eviction design (max_entries=10_000,_EVICT_INTERVAL=128,evict_expired()), proving the repo's preferred pattern.External context
Not required for this issue.
Proposed implementation
TraceStore: addmax_entries(default generous, e.g. 10_000) with FIFO/oldesteviction and an
evicted_countso audit consumers know truncation occurred;document interplay with persistence plans ([Feature] Pluggable persistence for TraceStore, HandleStore, and token revocation (SQLite + JSONL backends) #126, [Feature] Hash-chained, verifiable audit log with retention and export controls #127).
HMACTokenProvider: revocation entries can be dropped once the underlying token'sexpires_athas passed (an expired token fails verification anyway). Trackexpiry alongside revoked ids and sweep on an interval, holding
_revocation_lock.HandleStorelazy + interval eviction pattern for consistency.AI-agent execution notes
trace.py,tokens.py(revocation paths,_revocation_lock),handles.py(pattern to mirror),tests/test_trace.py,tests/test_tokens.py.Acceptance criteria
TraceStorenever exceeds its configured cap; eviction is oldest-first and counted.Test plan
Unit tests with injected clocks for sweep timing and the revoked-but-unexpired edge
case; cap tests for
TraceStore. Runmake ci.Documentation plan
Document retention defaults in
docs/security.mdanddocs/architecture.md;CHANGELOG
Added/Fixed.Migration and compatibility notes
Defaults chosen high enough that typical sessions are unaffected; deployments needing
infinite retention should adopt persistence (#126). Not expected to require migration.
Risks and tradeoffs
Trace eviction trades audit completeness for boundedness — make the cap loud
(warning on first eviction). Revocation sweep correctness depends on clock handling;
reuse the injectable-clock pattern.
Suggested labels
security, reliability, performance