fix: capture claude agent sdk session ids by sipercai · Pull Request #222 · alibaba/loongsuite-python

sipercai · 2026-06-18T01:52:20Z

Description

This PR captures Claude Agent SDK session IDs on agent, LLM, and tool spans so traces from resumed or client-managed SDK sessions can be correlated by gen_ai.session.id.

The change propagates session IDs from SDK init/system messages, stream events, ClaudeSDKClient.query(..., session_id=...), standalone query(..., options.resume=...), and result-message fallbacks. When an upstream Entry span has already propagated gen_ai.session.id through OpenTelemetry Baggage, that Entry session is used ahead of the Claude SDK's internal session so downstream spans keep the request-level LoongSuite identity. It also preserves the caller's active OpenTelemetry context instead of clearing it with an empty context, matching the Robin fix for broken parent-child trace linkage, and keeps per-stream tool state local so concurrent streams that reuse tool IDs do not cross-contaminate session or trace state.

Fixes # (N/A)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

python $PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py --repo .
tox -e precommit
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_session_capture.py -q
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests -q -m "not requires_cli"
ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_semantic_conventions -q -s
ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_no_sensitive_data -q
Bounded live Claude Agent SDK smoke with a real provider-compatible endpoint and one Read tool call.
Bounded live Claude Agent SDK concurrency smoke with two simultaneous queries.
Weaver JSON sample live-check with weaver registry live-check -r <loongsuite-semantic-conventions>/model --advice-profile loongsuite-genai --input-format json.

Validation Evidence

Spec and Scope

Linked issue/spec: No GitHub issue; customer-reported Claude Agent SDK session-capture gap.
Approved spec/comment: Direct bug-fix request and PR submission approval in the implementation thread.
Changed surface: loongsuite-instrumentation-claude-agent-sdk runtime patch, focused session tests, and plugin changelog.

Local Checks

Check	Command	Result	Notes
Static readiness	`python $PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py --repo .`	pass	LoongSuite pipeline static readiness checker passed.
Review matrix planning	`python $PIPELINE_SKILL_DIR/scripts/plan_review_matrix.py --repo . --format markdown`	pass	Matrix identified GenAI agent/session telemetry coverage requirements.
Precommit	`tox -e precommit`	pass	Repository formatting and lint hooks passed.
Focused session tests	`OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_session_capture.py -q`	pass	13 session propagation tests passed.
Parent context preservation	`OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_session_capture.py::test_wrap_query_preserves_active_parent_context -q`	pass	Regression coverage for the Robin empty-context broken-link fix.
Plugin test suite	`OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests -q -m "not requires_cli"`	pass	69 passed, 9 deselected.
Live CLI smoke	`ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_semantic_conventions -q -s`	pass	1 live provider-compatible CLI test passed.
No-content live CLI smoke	`ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_no_sensitive_data -q`	pass	1 live provider-compatible NO_CONTENT test passed.
Focused tox env	`tox -c tox-loongsuite.ini -e py312-test-loongsuite-instrumentation-claude-agent-sdk-latest -- -q -m "not requires_cli"`	blocked	Attempted twice; both runs stalled in upstream OpenTelemetry git dependency installation before test execution. Focused pytest and live CLI smoke passed.
Claude review	`codex-claude-review-loop`	blocked	Two earlier review rounds completed; first P2 test-coverage finding was fixed, second round had zero remaining findings. After the final Entry-baggage priority change, a third review attempt produced only session heartbeat output and no review content, so this PR is opened as draft with local test evidence.
Privacy scan	Secret/local-path scan over changed Claude Agent SDK tests	pass	No credentials or private paths introduced; only existing mock env placeholder remained outside this PR diff.
Diff whitespace	`git diff --check`	pass	No whitespace errors.

Real E2E Matrix

Scenario	Status	Command or Demo	Evidence
non-streaming	pass	Live `test_span_attributes_semantic_conventions` consumed a bounded SDK query to completion.	AGENT/LLM spans were produced with semantic attributes.
streaming	pass	Focused StreamEvent/session tests plus local telemetry smoke.	Stream-event session IDs populate agent and LLM spans before result finalization.
concurrency	pass	Bounded live concurrency smoke with two simultaneous SDK queries.	2 agent roots, 2 trace IDs, and 2 session IDs; no cross-trace contamination.
agent/tool/ReAct	pass	Bounded live SDK smoke using one `Read` tool call.	Produced AGENT, LLM, and TOOL spans with one session.
tool-heavy	pass	Focused mock stream and local telemetry smoke with repeated/multiple tool calls.	Tool spans inherit session ID and per-stream tool state remains isolated.
error path	pass	`python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests -q -m "not requires_cli"`	Existing edge-case/error tests passed in the plugin suite.

Telemetry and Weaver

Check	Status	Command or Artifact	Notes
Span tree / span kinds	pass	Local telemetry smoke plus trace validator	Verified AGENT, LLM, and TOOL span kinds with required `gen_ai.session.id`.
Entry baggage identity priority	pass	Focused session propagation test	When Entry baggage has `gen_ai.session.id`, AGENT, LLM, and TOOL spans use it instead of Claude's internal session id.
Parent-child trace linkage	pass	Focused parent context preservation regression test	Agent spans keep an active caller span as parent when one exists, instead of starting after an empty context reset.
Content capture modes	pass	SPAN_ONLY focused tests and live NO_CONTENT test	SPAN_ONLY captures expected span content; NO_CONTENT live test did not leak sensitive prompt text.
Concurrency isolation	pass	Focused parallel stream tests and bounded live concurrency smoke	Same tool ID reused across streams stayed trace/session isolated.
Weaver live-check	pass	`weaver registry live-check -r <loongsuite-semantic-conventions>/model --input-source <generated JSON sample> --input-format json --advice-profile loongsuite-genai --skip-policies true`	Generated sample contained AGENT, LLM, and TOOL spans; Weaver returned zero violations.

CI

GitHub checks: Not run yet; this branch has not been pushed before this PR.
Known unrelated failures: None known.
Follow-up needed: Watch CI after PR creation. The local focused tox env was blocked by upstream dependency installation and should be retried if CI reproduces a dependency-resolution issue. Re-run Claude review when the review CLI produces normal output again.

Does This PR Require a Core Repo Change?

Yes. - Link to PR:
No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

ralf0131

Summary

This PR adds session ID capture on agent, LLM, and tool spans for the Claude Agent SDK instrumentation. The core improvement is replacing the module-level global _client_managed_runs dict with a per-invocation local dict, eliminating cross-invocation state leakage. Session IDs are propagated from multiple sources (SystemMessage, StreamEvent, ResultMessage, client query, baggage) with entry baggage taking precedence. 562 lines of new tests provide comprehensive coverage. All CI checks pass.

Findings

[Warning] patch.py ~line 731 — The removal of otel_context.detach(empty_context_token) changes trace topology. Previously, each agent invocation explicitly detached to an empty context to guarantee independent root traces. Now, if a caller has an active span, the agent span becomes a child of that span. Tests test_wrap_query_sequential_calls_create_independent_root_traces and test_wrap_query_preserves_active_parent_context verify both behaviors, so this is intentional. Consider noting this behavior change in the CHANGELOG entry (currently only mentions session ID capture).
[Info] patch.py ~line 940 — session_id = getattr(options, "resume", None) uses the Claude SDK resume field as session_id. This is the correct convention for conversation resumption.
[Info] patch.py ~line 881 — _otel_session_id is set in wrap_claude_client_query and read via getattr with None default in wrap_claude_client_receive_response. The fallback is safe for cases where receive_response is called without a preceding query.
[Info] The module-level global _client_managed_runs → per-invocation local dict is a good concurrency safety improvement. This prevents state leakage between parallel/interleaved agent invocations.

Suggestions

The PR is well-structured and ready for merge. The one suggestion is to expand the CHANGELOG entry to mention the context propagation behavior change, since it affects trace topology for downstream consumers.

Automated review by github-manager-bot

fix: capture claude agent sdk session ids

6889e2d

github-actions Bot assigned 123liuziming, Cirilla-zmh and ralf0131 Jun 18, 2026

github-actions Bot requested review from 123liuziming, Cirilla-zmh and ralf0131 June 18, 2026 20:24

sipercai marked this pull request as ready for review June 22, 2026 01:55

ralf0131 reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: capture claude agent sdk session ids#222

fix: capture claude agent sdk session ids#222
sipercai wants to merge 1 commit into
mainfrom
fix/claude-agent-sdk-session-id

sipercai commented Jun 18, 2026

Uh oh!

ralf0131 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sipercai commented Jun 18, 2026

Description

Type of change

How Has This Been Tested?

Validation Evidence

Spec and Scope

Local Checks

Real E2E Matrix

Telemetry and Weaver

CI

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

ralf0131 left a comment

Choose a reason for hiding this comment

Summary

Findings

Suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants