Skip to content

Native instrumentation with OpenTelemetry#329

Draft
rodrigobr-msft wants to merge 10 commits intomainfrom
users/robrandao/otel
Draft

Native instrumentation with OpenTelemetry#329
rodrigobr-msft wants to merge 10 commits intomainfrom
users/robrandao/otel

Conversation

@rodrigobr-msft
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings February 24, 2026 17:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces native OpenTelemetry instrumentation to the Microsoft Agents SDK, enabling comprehensive observability through traces, metrics, and logs. The changes add telemetry hooks at key points in the agent lifecycle including turn processing, adapter operations, and storage interactions, while also providing a test sample demonstrating the integration.

Changes:

  • Added OpenTelemetry integration to core SDK with AgentTelemetry class for instrumentation
  • Integrated telemetry tracking into agent turn processing, adapter operations, and storage methods
  • Created test sample demonstrating OpenTelemetry configuration and usage with aiohttp
  • Enhanced testing framework to support Activity template aliasing (from/from_property compatibility)

Reviewed changes

Copilot reviewed 21 out of 24 changed files in this pull request and generated 27 comments.

Show a summary per file
File Description
test_samples/otel/src/telemetry.py OpenTelemetry configuration helper for test sample
test_samples/otel/src/start_server.py Sample aiohttp server with telemetry integration
test_samples/otel/src/requirements.txt Dependencies for OpenTelemetry sample
test_samples/otel/src/main.py Entry point configuring telemetry before agent initialization
test_samples/otel/src/env.TEMPLATE Environment configuration template with OTLP settings
test_samples/otel/src/agent_metric.py Sample-specific metrics wrapper (has import name mismatch)
test_samples/otel/src/agent.py Sample agent with manual telemetry instrumentation
libraries/microsoft-agents-hosting-core/setup.py Added OpenTelemetry dependencies
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/observability/_agent_telemetry.py Core telemetry infrastructure with context managers for instrumentation
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/observability/init.py Exports AgentTelemetry and agent_telemetry singleton
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/app/agent_application.py Wraps turn processing with telemetry
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/storage/storage.py Adds telemetry to base storage operations
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/storage/memory_storage.py Adds telemetry to in-memory storage implementation
libraries/microsoft-agents-storage-cosmos/microsoft_agents/storage/cosmos/cosmos_db_storage.py Minor whitespace change
libraries/microsoft-agents-hosting-fastapi/microsoft_agents/hosting/fastapi/cloud_adapter.py Wraps adapter processing with telemetry
libraries/microsoft-agents-hosting-aiohttp/microsoft_agents/hosting/aiohttp/cloud_adapter.py Wraps adapter processing with telemetry
dev/tests/sdk/observability/test_observability.py Tests for telemetry functionality
dev/tests/scenarios/quickstart.py Renamed init_app to init_agent
dev/tests/scenarios/init.py Updated to use renamed init_agent function
dev/microsoft-agents-testing/tests/core/fluent/test_model_template.py Added tests for from/from_property aliasing
dev/microsoft-agents-testing/microsoft_agents/testing/core/fluent/utils.py Added from/from_property aliasing logic
dev/microsoft-agents-testing/microsoft_agents/testing/core/fluent/model_template.py Integrated aliasing into template classes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +6
from microsoft_agents.hosting.core import TurnContext

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: TurnContext is imported but never used in this file. Consider removing it to keep imports clean.

Suggested change
from microsoft_agents.hosting.core import TurnContext

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +14
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter


Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports: TracerProvider, SimpleSpanProcessor, and ConsoleSpanExporter are imported but never used in this file. Remove them to keep imports clean.

Suggested change
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +17
def _ts() -> float:
"""Helper function to get current timestamp in milliseconds"""
return datetime.now(timezone.utc).timestamp() * 1000

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function: The _ts() helper function is defined but never used in this file. Consider removing it.

Suggested change
def _ts() -> float:
"""Helper function to get current timestamp in milliseconds"""
return datetime.now(timezone.utc).timestamp() * 1000

Copilot uses AI. Check for mistakes.

def __init__(self, tracer: Tracer | None = None, meter: Meter | None = None):
if tracer is None:
tracer = trace.get_tracer("M365.agents", "1.0.0")
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: Line 33 has an extra space before 'tracer'. Should be a single space like line 35.

Suggested change
tracer = trace.get_tracer("M365.agents", "1.0.0")
tracer = trace.get_tracer("M365.agents", "1.0.0")

Copilot uses AI. Check for mistakes.
with self.tracer.start_as_current_span(span_name) as span:
attributes = self._extract_attributes_from_context(context)
span.set_attributes(attributes)
# span.add_event(f"{span_name} started", attributes)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code: Line 97 contains a commented-out span.add_event() call. Either remove it if it's not needed or uncomment it if it should be active.

Suggested change
# span.add_event(f"{span_name} started", attributes)

Copilot uses AI. Check for mistakes.
async def entry_point(req: Request) -> Response:

logger.info("Request received at /api/messages endpoint.")
text = await req.text()
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable: The 'text' variable on line 23 is assigned but never used. Remove it if it's not needed.

Suggested change
text = await req.text()

Copilot uses AI. Check for mistakes.
APP["adapter"] = agent_application.adapter

try:
run_app(APP, host="localhost", port=environ.get("PORT", 3978))
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: environ.get("PORT", 3978) returns a string or int, but the port parameter expects an int. The default value 3978 is an int, but if PORT is set in the environment it will be a string. Convert to int: port=int(environ.get("PORT", 3978)).

Suggested change
run_app(APP, host="localhost", port=environ.get("PORT", 3978))
run_app(APP, host="localhost", port=int(environ.get("PORT", 3978)))

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +29
self.tracer = trace.get_tracer("A365.AgentFramework")
self.meter = metrics.get_meter("A365.AgentFramework", "1.0.0")
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: The test sample uses "A365.AgentFramework" for the tracer/meter name (line 28-29), while the core library uses "M365.agents" (see libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/observability/_agent_telemetry.py line 33). Consider using consistent naming across the codebase for better observability and to avoid confusion.

Suggested change
self.tracer = trace.get_tracer("A365.AgentFramework")
self.meter = metrics.get_meter("A365.AgentFramework", "1.0.0")
self.tracer = trace.get_tracer("M365.agents")
self.meter = metrics.get_meter("M365.agents", "1.0.0")

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +142
success_callback(span, duration)
else:

if failure_callback:
failure_callback(span, exception)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for callbacks: If success_callback or failure_callback raise an exception in lines 137-142, it will propagate and potentially mask the original exception. Consider wrapping callback invocations in try-except blocks to ensure telemetry errors don't break application logic.

Suggested change
success_callback(span, duration)
else:
if failure_callback:
failure_callback(span, exception)
try:
success_callback(span, duration)
except Exception as callback_exc:
# Ensure telemetry callback failures do not affect application logic
span.record_exception(callback_exc)
else:
if failure_callback:
try:
failure_callback(span, exception)
except Exception as callback_exc:
# Ensure telemetry callback failures do not mask the original exception
span.record_exception(callback_exc)

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +14
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*"

OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_REQUEST=".*"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_RESPONSE=".*" No newline at end of file
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_* with value ".*" causes OpenTelemetry to capture and export all HTTP request and response headers, including Authorization, Cookie, and other secret-bearing headers. Anyone with access to the telemetry backend or pipeline could then retrieve credentials or other sensitive data from these exported headers. Limit these variables to a specific allowlist of non-sensitive headers and explicitly exclude authentication and other secret-related headers from capture.

Suggested change
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_REQUEST=".*"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_RESPONSE=".*"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST="user-agent,x-request-id,x-correlation-id"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE="content-type,content-length,x-request-id,x-correlation-id"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_REQUEST="user-agent,x-request-id,x-correlation-id"
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_RESPONSE="content-type,content-length,x-request-id,x-correlation-id"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants