feat(instrument): Multi-tenancy hardening — cache + queue + OTel correlation + logging#123
Closed
mmercuri wants to merge 1 commit into
Conversation
…elation + logging Closes 4 adjacent CLAUDE.md gaps that PR #118 (per-event org_id propagation) did not address. Stacks on feat/instrument-multitenancy-org-id-propagation. Gap 1 — Cache audit + sweep Adds tests/instrument/adapters/_base/test_cache_tenant_isolation.py (14 tests). Proves every BaseAdapter in-memory cache (circuit breaker counters, _trace_events buffer, sink registry, lock, org_id binding) is per-instance and therefore inherits the single-tenant binding established in PR #118. Concurrent stress tests with 2 / 3 tenants and 8 emit threads prove no cross-tenant pollution under contention. Two guard tests forbid future introduction of class-level OR module-level mutable containers that would silently merge tenants. Gap 2 — Per-tenant stream isolation in IngestionPipelineSink Replaces the single global buffer with dict[org_id, list[event]]. Each tenant gets an independent buffer with its own max_per_tenant_buffer_size cap (default 1000) and FIFO eviction scoped to THAT tenant — a noisy tenant can never displace a quieter tenant's events. Adds buffer_size_per_tenant() snapshot and dropped_per_tenant counter for the sink_per_tenant_buffer_size{org_id} gauge. flush() now issues one ingest() call per tenant under that tenant's org_id. Tests/instrument/adapters/_base/test_sinks_per_tenant.py (10 tests) covers: per-tenant partitioning, isolated FIFO eviction under burst contention, defensive snapshot copies, and per-tenant flush call shape. Gap 3 — OTel ↔ SDK org_id correlation Adds _set_current_span_org_id() that stamps layerlens.org_id on the active OTel span on every emit_event / emit_dict_event. No-op when OTel is absent, when the active span is the no-op INVALID_SPAN, or when is_recording() returns False — never faults the adapter hot path. Tests/instrument/adapters/_base/ test_otel_correlation.py (8 tests) covers: dict + typed payload paths, missing OTel, non-recording spans, set_attribute failures, and per-span tenant attribution. Gap 4 — Tenant-aware logging propagation New module src/layerlens/instrument/adapters/_base/logging.py with TenantContextLogAdapter and get_tenant_logger() factory. Every log record carries org_id in extras AND in a '[org_id=...]' message prefix. BaseAdapter exposes self.tlogger bound to its org_id; subclasses can drop in self.tlogger for logging.getLogger(__name__). Caller-supplied extra={'org_id': ...} cannot impersonate another tenant. Tests/instrument/adapters/_base/test_tenant_logger.py (12 tests) covers: fail-fast on empty/whitespace/non-string org_id, per-instance binding under shared logger names, and end-to-end wiring through BaseAdapter. 44 new tests, all green (61 tests in tests/instrument/adapters/_base total). mypy --strict + ruff check clean for all changed source files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on PR #118 (per-event
org_idpropagation). Closes 4 adjacent CLAUDE.md gaps that #118 did not address.Gaps closed
Gap 1 — Cache audit + sweep
Adds
tests/instrument/adapters/_base/test_cache_tenant_isolation.py(14 tests). Proves everyBaseAdapterin-memory cache is per-instance and inherits the single-tenant binding from #118:_circuit_openflag /_circuit_opened_at_trace_eventsbuffer_event_sinksregistry_lockorg_idbinding (immutability check)Concurrent stress: 2-tenant @ 500 events each, 3-tenant @ 200 events each, 8-thread × 100-event single-tenant burst. All assert zero cross-tenant pollution under contention. Two structural guard tests forbid future class-level OR module-level mutable containers that would silently merge tenants.
Gap 2 — Per-tenant stream isolation in
IngestionPipelineSinkReplaces single global buffer with
dict[org_id, list[event]]. Each tenant gets:max_per_tenant_buffer_sizecap (default1000)dropped_per_tenantcounter for thesink_per_tenant_dropped{org_id}gaugebuffer_size_per_tenant()snapshot for thesink_per_tenant_buffer_size{org_id}gaugeflush()issues oneingest()call per tenant under THAT tenant'sorg_id— never a mixed batch.tests/instrument/adapters/_base/test_sinks_per_tenant.py(10 tests) covers per-tenant partitioning, isolated FIFO eviction under burst contention, defensive snapshot copies, and per-tenant flush call shape.Gap 3 — OTel ↔ SDK
org_idcorrelationAdds
_set_current_span_org_id()that stampslayerlens.org_idon the active OTel span on everyemit_event/emit_dict_event. No-op when:INVALID_SPANis_recording()returns FalseThe hot path never faults on observability failures.
tests/instrument/adapters/_base/test_otel_correlation.py(8 tests) covers dict + typed payload paths, missing OTel, non-recording spans, set_attribute failures, and per-span tenant attribution.Gap 4 — Tenant-aware logging propagation
New module
src/layerlens/instrument/adapters/_base/logging.pywithTenantContextLogAdapterandget_tenant_logger(). Every log record carriesorg_idinextraAND in a[org_id=...]message prefix.BaseAdapterexposesself.tloggerbound to itsorg_id; subclasses can drop-in replacelogging.getLogger(__name__).Caller-supplied
extra={'org_id': ...}cannot impersonate another tenant — adapter binding always wins.tests/instrument/adapters/_base/test_tenant_logger.py(12 tests) covers fail-fast on empty/whitespace/non-stringorg_id, per-instance binding under shared logger names, and end-to-end wiring throughBaseAdapter.Test plan
pytest tests/instrument/adapters/_base/test_{cache_tenant_isolation,otel_correlation,sinks_per_tenant,tenant_logger}.py -x— 44 passedpytest tests/instrument/adapters/_base/— 61 passed (44 new + 17 from fix(instrument): Propagate org_id through all event emissions (multi-tenancy CLAUDE.md fix) #118, no regression)mypy --strict src/.../adapters/_base/{adapter,sinks,logging}.py— Success: no issuesruff check src/.../adapters/_base/ tests/.../adapters/_base/— All checks passedFiles modified per gap
tests/instrument/adapters/_base/test_cache_tenant_isolation.py(new), comments insrc/layerlens/instrument/adapters/_base/adapter.pysrc/layerlens/instrument/adapters/_base/sinks.py(per-tenant buffer + cap + metrics),tests/instrument/adapters/_base/test_sinks_per_tenant.py(new)src/layerlens/instrument/adapters/_base/adapter.py(_set_current_span_org_idhelper + emit hooks),tests/instrument/adapters/_base/test_otel_correlation.py(new)src/layerlens/instrument/adapters/_base/logging.py(new),src/layerlens/instrument/adapters/_base/adapter.py(tloggerproperty + wiring),tests/instrument/adapters/_base/test_tenant_logger.py(new)