Skip to content

OpenTelemetry context not propagating #18572

@estiller

Description

@estiller

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/nestjs

SDK Version

10.31.0

Framework Version

NestJS 11.1.9

Link to Sentry event

No response

Reproduction Example/SDK Setup

This is the code we use to set up Sentry and OpenTelemetry. It follows the documentation here and is called as the first import when the application starts up.

import * as os from 'os';

import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import {
  CompositePropagator,
  W3CTraceContextPropagator,
} from '@opentelemetry/core';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { Resource, resourceFromAttributes } from '@opentelemetry/resources';
import * as opentelemetry from '@opentelemetry/sdk-node';
import {
  BatchSpanProcessor,
  ParentBasedSampler,
} from '@opentelemetry/sdk-trace-node';
import {
  ATTR_SERVICE_NAME,
  ATTR_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions';
import * as Sentry from '@sentry/nestjs';
import { SentryPropagator, SentrySpanProcessor } from '@sentry/opentelemetry';

import { CustomSampler } from '@application/common-util';

if (process.env.TELEMETRY_ENABLED === 'true') {
  const sentryClient = Sentry.init({
    dsn: 'REMOVED_XXX',
    environment: process.env.DEPLOYMENT_ENV || 'unknown',
    skipOpenTelemetrySetup: true,
    integrations: [Sentry.httpIntegration({ spans: false })],
    tracesSampleRate: 1.0,
    enabled: process.env.ERROR_MONITORING_ENABLED === 'true',
  });

  const resource: Resource = resourceFromAttributes({
    [ATTR_SERVICE_NAME]: 'cloud-backend',
    [ATTR_SERVICE_VERSION]: sentryClient?.getOptions().release, // Set by sentryWebpackPlugin
    'service.instance.id':
      process.env.SERVICE_INSTANCE_ID || os.hostname() || 'unknown',
  });

  const traceExporter = new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
  });
  const customSampler = new ParentBasedSampler({
    root: new CustomSampler() /** Sampler called for spans with no parent */,
  });
  const sdk = new opentelemetry.NodeSDK({
    resource: resource,
    traceExporter,
    textMapPropagator: new CompositePropagator({
      propagators: [new W3CTraceContextPropagator(), new SentryPropagator()],
    }),
    contextManager: new Sentry.SentryContextManager(),
    sampler: customSampler,
    spanProcessors: [
      new BatchSpanProcessor(traceExporter),
      new SentrySpanProcessor(),
    ],
    instrumentations: [
      getNodeAutoInstrumentations({
        '@opentelemetry/instrumentation-fs': {
          enabled: false,
        },
        '@opentelemetry/instrumentation-dns': {
          enabled: false,
        },
        '@opentelemetry/instrumentation-http': {
          enabled: true,
        },
        '@opentelemetry/instrumentation-nestjs-core': {
          enabled: true,
        },
        '@opentelemetry/instrumentation-winston': {
          enabled: true,
        },
        '@opentelemetry/instrumentation-pg': {
          enabled: true,
          enhancedDatabaseReporting: true,
        },
        '@opentelemetry/instrumentation-grpc': {
          enabled: true,
        },
      }),
    ],
  });

  sdk.start();

  // Validate that the setup is correct
  Sentry.validateOpenTelemetrySetup();
}

These are the current Sentry, Nest & OTel versions we are using:

    "@nestjs/cache-manager": "3.0.1",
    "@nestjs/common": "11.1.9",
    "@nestjs/config": "4.0.2",
    "@nestjs/core": "11.1.9",
    "@nestjs/microservices": "11.1.9",
    "@nestjs/platform-express": "11.1.9",
    "@nestjs/swagger": "11.2.3",
    "@nestjs/terminus": "11.0.0",
    "@opentelemetry/api": "1.9.0",
    "@opentelemetry/auto-instrumentations-node": "0.67.2",
    "@opentelemetry/core": "2.2.0",
    "@opentelemetry/exporter-trace-otlp-grpc": "0.208.0",
    "@opentelemetry/resources": "2.2.0",
    "@opentelemetry/sdk-node": "0.208.0",
    "@opentelemetry/sdk-trace-node": "2.2.0",
    "@opentelemetry/semantic-conventions": "1.38.0",
    "@sentry/nestjs": "10.31.0",
    "@sentry/opentelemetry": "10.31.0",
    "@sentry/webpack-plugin": "4.6.1",

Steps to Reproduce

We have Sentry set up in a NestJS app alongside a custom OpenTelemetry instrumentation that we send to a 3rd-party collector. We only want to use Sentry for error monitoring, not tracing & profiling.

This worked fine up until Sentry SDK version 10.25.0. After the upgrade to version 10.27.0, our OpenTelemetry instrumentation started breaking in weird ways, with the core one being that context is randomly not propagated correctly within the service, causing spans that are clearly internal to the process to appear as root spans and breaking a single logical trace into multiple partial traces at random. This is something we weren't able to reproduce locally, but it occurs frequently in our production environment.

After browsing the changelogs and previous issues, we suspect it's related to this issue and this PR, released in 10.27.0, but we can't be sure.

We know for sure that this is related to Sentry, since if we disable Sentry without changing any code (set ERROR_MONITORING_ENABLED in the code above to false), everything OpenTelemetry-related works perfectly without any issues, and no context propagation is lost.

Also worth noting is that, while we have multiple services configured with the code above (Sentry & custom OpenTelemetry collector), we only see these issues in a service with many gRPC endpoints. Services that are HTTP-only work fine, so I suspect it's something to do with how gRPC works in NestJS.

At the moment, our workaround is to disable Sentry altogether on that service to get tracing working properly.

Is our setup correct? Should we disable the Sentry Context Manager? The Sentry docs state that it is required.

Expected Result

OpenTelemetry traces shouldn't randomly be broken into multiple partial traces. Context should be propagated correctly across sections of the code.

Here is an example of a proper trace. As you can see, it starts with an incoming web request (and is even distribued accross services), as expected:
Image

Actual Result

Context is randomly not propagated correctly within the service, causing spans that are clearly internal to the process to appear as root spans and breaking a single logical trace into multiple partial traces at random.

Here are examples of "broken" traces, where various validators and internal NestJS components appear as the root component of a trace:

Image Image Image

Additional Context

No response

Priority

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it.

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Waiting for: Community

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions