Skip to content

Add OpenTelemetry v2 integration with enhanced features and comprehensive testing#1314

Open
tconley1428 wants to merge 12 commits intomainfrom
opentelemetryv2-improvements
Open

Add OpenTelemetry v2 integration with enhanced features and comprehensive testing#1314
tconley1428 wants to merge 12 commits intomainfrom
opentelemetryv2-improvements

Conversation

@tconley1428
Copy link
Contributor

Summary

This PR introduces a new OpenTelemetry v2 integration for the Temporal Python SDK with significant enhancements over the existing OpenTelemetry support. The integration provides deterministic tracing, comprehensive test coverage, and improved maintainability.

Key Features Added:

  • Deterministic ID generation: Uses Temporal's workflow-safe random generator for consistent span/trace IDs across replays
  • Comprehensive operation support: Full tracing for workflows, activities, local activities, child workflows, timers, signals, updates, queries, and Nexus operations
  • Context propagation: Proper trace context propagation across all Temporal operation boundaries
  • Plugin architecture: Clean plugin-based integration using Temporal's SimplePlugin base class
  • Workflow-safe span creation: temporalio.contrib.opentelemetryv2.workflow.start_as_current_span() for user workflow tracing

Architecture Improvements:

  • Modular design: Restructured as proper package with separate modules for interceptor, plugin, processor, and workflow utilities
  • Replay-aware processing: TemporalSpanProcessor skips span export during workflow replay to prevent duplicate telemetry
  • Read-only mode detection: Added workflow.unsafe.is_read_only() to handle queries and update validators safely
  • Enhanced interceptor: Comprehensive TracingInterceptor covering all client and worker operations

Testing & Quality:

  • Comprehensive test suite: Added test_opentelemetryv2_comprehensive_tracing covering all workflow operations with proper span hierarchy validation
  • Test isolation: Pytest fixtures ensure OpenTelemetry state doesn't leak between tests
  • Span hierarchy validation: Tests use dump_spans() for maintainable hierarchy validation similar to existing OpenTelemetry tests
  • Complete linting compliance: All code passes ruff, pyright, mypy, and pydocstyle checks with comprehensive docstrings

Test plan

  • All existing tests continue to pass
  • New comprehensive test validates tracing across all Temporal operations
  • Span hierarchy validation ensures proper parent-child relationships
  • Test isolation prevents OpenTelemetry state conflicts between test runs
  • Integration works with both basic tracing (add_temporal_spans=False) and comprehensive tracing (add_temporal_spans=True)
  • Deterministic span IDs maintained across workflow replays
  • No span export during workflow replay to prevent duplicate telemetry
  • Read-only operations (queries, update validators) handle tracing safely
  • All linting and type checking passes

🤖 Generated with Claude Code

tconley1428 and others added 2 commits January 20, 2026 12:38
This commit adds a new OpenTelemetry interceptor (opentelemetryv2) with enhanced
capabilities for Temporal workflow integration:

Features:
- Deterministic ID generation for spans/traces in workflows using TemporalIdGenerator
- Context propagation across workflow and activity boundaries
- Support for workflow-level span creation via workflow.start_as_current_span
- Enhanced interceptor with context propagation to activities and nexus operations
- Compatible with existing opentelemetry module while providing additional functionality

Implementation:
- New TemporalIdGenerator uses workflow.random() for deterministic IDs in workflows
- TracingInterceptor handles client, worker, activity, workflow, and nexus operations
- Workflow-safe span creation context manager in workflow module
- Comprehensive test coverage for trace propagation scenarios

This is separate from the OpenAI agents OTEL integration and provides
general-purpose OpenTelemetry improvements for Temporal workflows.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…inting fixes

This commit significantly improves the OpenTelemetry v2 integration for the Temporal SDK with the following enhancements:

## Core Features Added:
- **Comprehensive test coverage**: Added `test_opentelemetryv2_comprehensive_tracing` covering all workflow operations including activities, local activities, child workflows, timers, signals, updates, queries, and Nexus operations
- **Read-only mode detection**: Implemented `workflow.unsafe.is_read_only()` to prevent span ID generation errors during queries and update validators
- **Test isolation**: Added pytest fixture to reset OpenTelemetry tracer provider state between test runs
- **Span hierarchy validation**: Refactored tests to use `dump_spans()` hierarchy validation for better maintainability

## Linting and Documentation:
- Fixed all import path issues for OpenTelemetry ID generators
- Added comprehensive docstrings for all public classes and methods
- Fixed type annotations and null handling throughout the codebase
- Resolved Nexus headers access issues with proper type protocols
- Achieved complete pydocstyle compliance

## Technical Improvements:
- Enhanced `TemporalSpanProcessor` with proper replay handling
- Improved `TemporalIdGenerator` with deterministic workflow-safe random generation
- Updated span parenting validation to ensure proper trace relationships
- Added max_cached_workflows=0 to all test workers for deterministic behavior

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@tconley1428 tconley1428 requested a review from a team as a code owner February 3, 2026 01:39

@asynccontextmanager
async def run_context() -> AsyncIterator[None]:
set_tracer_provider(self._provider)
Copy link
Member

@cretz cretz Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scary to mutate a global like this. What happens if I run 40 workers, do these all stomp on each other? What happens if I run a worker at the same time as the rest of my django app, have we removed our users' ability to use their configured tracer providers.

I am concerned about altering globals, we very intentionally expect to be used alongside libraries/frameworks safely.

I think we need to step back and see how we can work within a user's setup instead of taking it over. I think we should:

  • Ask users to set our ID generator in their provider if they're creating a provider (which is very common)
  • Raise if the getattr(opentelemetry.trace.get_tracer_provider(), "id_generator") is not ours, but have an option to not raise in that case
  • Expose our ID generator for user use directly, and make sure it can accept one to delegate to (can default that to random though)
  • Do the same accept-or-create tracer concept we do in the interceptor today
  • Do not use a TemporalSpanProcessor to get replay safety, rather wrap the returned workflow tracer itself or find some other way to prevent spans in non-replay that still allows users to configure their own processors
  • Provide a static helper that does basically what the TracerProvider constructor does, but also adds our ID generator and sets the global trace provider (or could just return the tracer provider and ask user to set). This is what many users will call to do one-time init (and likely our samples too)

This allows us to be good library neighbors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants