Skip to content

fix(tracing): Fix memory leak in SGP tracing processors#302

Merged
smoreinis merged 2 commits intomainfrom
stas/traces-cleanup
Apr 3, 2026
Merged

fix(tracing): Fix memory leak in SGP tracing processors#302
smoreinis merged 2 commits intomainfrom
stas/traces-cleanup

Conversation

@smoreinis
Copy link
Copy Markdown
Contributor

@smoreinis smoreinis commented Apr 2, 2026

Summary

  • Fix memory leak in SGPSyncTracingProcessor and SGPAsyncTracingProcessor where self._spans dict grew linearly with every request and never shrank
  • on_span_end() used dict.get() (read-only) instead of dict.pop() (read-and-remove), so completed spans were never cleaned up
  • shutdown() was the only cleanup path but is never called anywhere in the codebase
  • After 100k requests this would accumulate 400k orphaned span objects in memory

Changes

  • Changed self._spans.get(span.id)self._spans.pop(span.id, None) in both SGPSyncTracingProcessor.on_span_end() and SGPAsyncTracingProcessor.on_span_end()
  • Added 6 new tests proving the leak existed and is now fixed (3 sync, 3 async)

Test plan

  • New tests fail before fix (100 spans leaked after 100 complete lifecycles)
  • New tests pass after fix (0 spans after 100 complete lifecycles)
  • Full test suite passes (591 passed, 959 skipped)
  • Lint clean

Greptile Summary

This PR fixes a genuine memory leak in SGPSyncTracingProcessor and SGPAsyncTracingProcessor by changing self._spans.get(span.id) to self._spans.pop(span.id, None) in each processor's on_span_end() method. The fix is correct, minimal, and well-targeted — completed spans are now removed from self._spans as soon as they finish rather than accumulating indefinitely.

Key changes:

  • sgp_tracing_processor.py: .get().pop() in both SGPSyncTracingProcessor.on_span_end and SGPAsyncTracingProcessor.on_span_end
  • New test file with 6 tests (3 sync / 3 async) covering the full span lifecycle, mid-lifecycle assertion, and the unknown-span no-op path
  • Three new __init__.py files to make test directories proper packages

Notes:

  • asyncio_mode = "auto" is set in pyproject.toml, so the async test methods run correctly without needing per-test @pytest.mark.asyncio markers
  • Both _make_processor helpers now correctly use with patch(...) context managers (not decorator-style), and test bodies re-patch create_span — the pattern is consistent between sync and async classes

Confidence Score: 5/5

Safe to merge — the fix is a minimal, correct one-method change in each processor with no API surface impact.

The core change is a single well-understood substitution (.get() → .pop()) that directly and correctly addresses the described leak. asyncio_mode = auto is confirmed in pyproject.toml so async tests are actually exercised. All remaining observations are pre-existing or already flagged in prior review threads, leaving no new P0/P1 findings in this revision.

No files require special attention.

Important Files Changed

Filename Overview
src/agentex/lib/core/tracing/processors/sgp_tracing_processor.py Core bug fix: .get() → .pop() in both on_span_end methods; change is correct and complete. Also removes unnecessary multi-line formatting for warning log strings.
tests/lib/core/tracing/processors/test_sgp_tracing_processor.py New test file with 6 tests covering sync and async memory-leak scenarios; asyncio_mode = auto ensures async tests run, and patching patterns are correct in both sync and async helpers.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant P as SGP[Sync|Async]TracingProcessor
    participant S as self._spans (dict)
    participant SGP as SGPClient

    C->>P: on_span_start(span)
    P->>SGP: create_span(...)
    SGP-->>P: sgp_span
    P->>SGP: flush / upsert_batch
    P->>S: self._spans[span.id] = sgp_span

    Note over S: span lives in dict while active

    C->>P: on_span_end(span)
    P->>S: pop(span.id, None)  ← FIX (was .get())
    S-->>P: sgp_span (removed from dict)
    P->>SGP: flush / upsert_batch (with end_time)

    Note over S: dict entry gone — no leak
Loading

Reviews (2): Last reviewed commit: "fix(tests): Use context manager patches ..." | Re-trigger Greptile

SGPSyncTracingProcessor and SGPAsyncTracingProcessor accumulated spans
in self._spans dict on every request but never removed them, since
on_span_end() used dict.get() (read-only) instead of dict.pop()
(read-and-remove). The only cleanup was in shutdown() which is never
called. After this fix, spans are removed from the dict when they
complete, preventing unbounded memory growth.
@smoreinis smoreinis force-pushed the stas/traces-cleanup branch from 0604794 to 97bcaf4 Compare April 2, 2026 21:20
@smoreinis smoreinis marked this pull request as ready for review April 2, 2026 21:29
@patch decorators on _make_processor expired before test bodies ran,
so on_span_start/on_span_end hit the real create_span and flush.
Refactored to @staticmethod with 'with patch(...)' context managers
matching the async test class pattern.
@smoreinis smoreinis merged commit f43dac4 into main Apr 3, 2026
32 checks passed
@smoreinis smoreinis deleted the stas/traces-cleanup branch April 3, 2026 00:53
@stainless-app stainless-app bot mentioned this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants