From d494b8454e47f805c2fda568efabce13b969dbcc Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Fri, 31 Oct 2025 15:49:00 +0000 Subject: [PATCH 01/18] OTEP: Process Context: Sharing Resource Attributes with External Readers This OTEP introduces a standard mechanism for OpenTelemetry SDKs to publish process-level resource attributes for access by out-of-process readers such as the OpenTelemetry eBPF Profiler. External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings. When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process. _I'm opening this PR as a draft with the intention of sharing with the Profiling SIG for an extra round of feedback before asking for a wider review._ _This OTEP is based on [Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler](https://docs.google.com/document/d/1-4jo29vWBZZ0nKKAOG13uAQjRcARwmRc4P313LTbPOE/edit?tab=t.0), big thanks to everyone that provided feedback and helped refine the idea so far._ --- oteps/profiles/0000-process-ctx.md | 334 +++++++++++++++++++++++++++++ 1 file changed, 334 insertions(+) create mode 100644 oteps/profiles/0000-process-ctx.md diff --git a/oteps/profiles/0000-process-ctx.md b/oteps/profiles/0000-process-ctx.md new file mode 100644 index 00000000000..6a1bbfb339a --- /dev/null +++ b/oteps/profiles/0000-process-ctx.md @@ -0,0 +1,334 @@ +# Process Context: Sharing Resource Attributes with External Readers + +Introduce a standard mechanism for OpenTelemetry SDKs to publish process-level resource attributes for access by out-of-process readers such as the OpenTelemetry eBPF Profiler. + +## Motivation + +External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: + +- **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate profiles with other telemetry (such as traces and spans!) from the same service instance (especially in runtimes that employ multiple processes). + +- **Inconsistent resource attributes across signals**: Configuration such as `service.name`, `deployment.environment.name`, and `service.version` is not always available or resolves consistently between the OpenTelemetry SDKs and external readers, leading to configuration drift and inconsistent tagging. + +- **Correlation is dependent on process activity**: If a service is idle or not emitting other signals, external readers have difficulty identifying it, since resource attributes or identifiers are only sent along when signals are reported. + +## Explanation + +We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings. + +When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. + +The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process. + +## Internal details + +The process context is split between a header (stored in an anonymous mapping) and a payload. + +### Header Structure + +The header is stored in a fixed-size anonymous memory mapping of 2 pages with the following format: + +| Field | Type | Description | +|-------------------|-----------|----------------------------------------------------------------------| +| `signature` | `char[8]` | Set to `"OTEL_CTX"` when the payload is ready (written last) | +| `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | +| `published_at_ns` | `uint64` | Timestamp when the context was published, in nanoseconds since epoch | +| `payload_size` | `uint32` | Number of bytes of the encoded payload | +| `payload` | `char*` | Pointer to payload, in protobuf format | + +**Why 2 pages**: On Linux kernels prior to 5.17, readers cannot filter mappings by name and must scan anonymous mappings. Using a fixed size allows readers to quickly filter candidate mappings by size (among other attributes) before checking the signature, avoiding the need to check most mappings in a process. + +The `payload` can optionally be placed after the header (with the `payload` pointer field correctly pointing at it) or optionally elsewhere in the process memory. + +### Payload Format + +The payload uses protobuf with the [following schema](https://github.com/open-telemetry/sig-profiling/pull/13): + +The implementation distinguishes between: + +* **First-class fields** correspond to recommended OpenTelemetry semantic conventions. If a key in the `resources` map matches a first-class field name, the first-class field takes precedence. Readers MAY fall back to `resources` if the corresponding first-class field is empty, or they MAY ignore it entirely. + +* **Resources map** allows for arbitrary additional attributes. This enables easily adding more information that needs to be carried over, as well as vendor-specific extensions and experimentation. + +```protobuf +syntax = "proto3"; + +package otel_process_ctx.v1development; + +message OtelProcessCtx { + // Additional key/value pairs as resources https://opentelemetry.io/docs/specs/otel/resource/sdk/ + // Similar to baggage https://opentelemetry.io/docs/concepts/signals/baggage/ / https://opentelemetry.io/docs/specs/otel/overview/#baggage-signal + // + // Providing resources is optional. + // + // If a key in this field would match one of the attributes already defined as a first-class field below (e.g. `service.name`), + // the first-class field must always take priority. + // Readers MAY choose to fallback to a value in `resources` if its corresponding first-class field is empty, or they CAN ignore it. + map resources = 1; + + // We strongly recommend that the following first-class fields are provided, but they can be empty if needed. + // In particular for `deployment_environment_name` and `service_version` often need to be configured for a given application + // and cannot be inferred. For the others, see the semantic conventions documentation for recommended ways of setting them. + + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name + string deployment_environment_name = 2; + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id + string service_instance_id = 3; + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-name + string service_name = 4; + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-version + string service_version = 5; + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-language + string telemetry_sdk_language = 6; + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-version + string telemetry_sdk_version = 7; + // https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-name + string telemetry_sdk_name = 8; + + // New first-class fields should be added only if: + // * Providing them is strongly recommended + // * They match a new or existing OTEL semantic convention + // + // Otherwise, `resources` should be used instead. +} +``` + +### Publication Protocol + +Publishing the context should follow these steps: + +1. **Drop existing mapping**: If a previous context was published, unmap/free it +2. **Allocate new mapping**: Create a 2-page anonymous mapping via `mmap()` (These pages are always zeroed by Linux) +3. **Prevent fork inheritance**: Apply `madvise(..., MADV_DONTFORK)` to prevent child processes from inheriting stale data +4. **Encode payload**: Serialize the `OtelProcessCtx` message using protobuf (storing it either following the header OR in a regular memory block) +5. **Write header fields**: Populate `version`, `published_at_ns`, `payload_size`, `payload` +7. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding +8. **Write signature**: Write `OTEL_CTX` to the signature field last +9. **Set read-only**: Apply `mprotect(..., PROT_READ)` to mark the mapping as read-only +10. **Name mapping** (Linux ≥5.17): Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping + +The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present and the mapping set to read-only, the entire mapping is considered valid and immutable. + +If resource attributes are updated during the process lifetime, the previous mapping should be removed and a new one published following the same steps. + +If any of the steps above fail (other than naming the mapping on older Linux versions), publication is considered to have failed, and the process context will not be available. + +The process context is treated as a singleton: there SHOULD NOT be more than one process context active for the same process. + +The context MAY be dropped during SDK shutdown, or kept around until the process itself terminates and the OS takes care of cleaning the process memory. + +### Reading Protocol + +External readers (such as the OpenTelemetry eBPF Profiler) discover and read process context as follows: + +1. **Locate mapping**: + - **Preferred** (Linux ≥5.17): Parse `/proc//maps` and search for read-only entries with name `[anon:OTEL_CTX]` + - **Fallback** (older kernels): Parse `/proc//maps` and search for anonymous read-only mappings exactly 2 pages in size, then read the first 8 bytes to check for the `"OTEL_CTX"` signature + +2. **Validate signature and version**: + - Read the header and verify first 8 bytes matches `OTEL_CTX` + - Read the version field and verify it is supported (currently `2`) + - If either check fails, skip this mapping + +3. **Read payload**: Read `payload_size` bytes starting after the header + +4. **Re-read header**: If the header has not changed, the read of header + payload is consistent. This ensures there were no concurrent changes to the process context. If the header changed, restart at 1. + +5. **Decode payload**: Deserialize the bytes as a Protocol Buffer `OtelProcessCtx` message + +6. **Apply attributes**: Use the decoded resource attributes to enrich telemetry collected from this process + +Readers SHOULD gracefully handle missing, incomplete, or invalid mappings. If a process does not publish context or if decoding fails, readers SHOULD fall back to default resource detection mechanisms. + +### Interaction with Existing Functionality + +This mechanism is additive and does not modify existing OpenTelemetry SDK behavior: + +- Resource attributes continue to work as before for exporters within the process +- The mapping is process-scoped and does not affect thread-local context propagation + +SDKs that do not implement this feature continue to function normally; external readers simply will not have access to their runtime-generated resource attributes. + +## Trade-offs and mitigations + +### Host and Permission Requirements + +This mechanism requires that the external reader (such as an eBPF profiler) is running on the same host as the instrumented process and has sufficient privileges to access the memory mappings exposed by the process. + +The OpenTelemetry eBPF profiler, by design, has the necessary permissions and operates on the same machine to read this metadata. This approach does **not** support remote or cross-host correlation of process context, and attempts to access the process context mappings without appropriate permissions (e.g., from an unprivileged user) will fail. + +### Process Forking + +When a process forks, child processes do not inherit the parent's process context mapping. This is accomplished through the `madvise(MADV_DONTFORK)` flag, which explicitly marks the memory region as non-inheritable across `fork()`. + +**Why this matters**: Without this protection, child processes would inherit stale resource attributes from the parent. For example, if a parent process has `service.instance.id=uuid-parent` and forks a child that initializes its own OpenTelemetry SDK with `service.instance.id=uuid-child`, the child would initially expose the parent's UUID until it publishes its own context. This could lead to misattribution of telemetry in backend systems. + +**Behavior**: +- Child processes that initialize their own OpenTelemetry SDK will publish their own process context mapping with their own resource attributes +- Child processes that do not initialize an OpenTelemetry SDK will simply not have a process context mapping, which readers handle gracefully + +### Complexity for SDK Implementers + +Creating memory mappings and managing them adds complexity to SDK implementations. + +**Mitigation**: We've created a reference implementation in [C/C++](https://github.com/ivoanjo/proc-level-demo), as well as a [demo OTEL Java SDK extension](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo) and a [Go port as well](https://github.com/DataDog/dd-trace-go/pull/3937). + +For Go as well as modern versions of Java it's possible to create an implementation that doesn't rely on third-party libraries or native code (e.g. by directly calling into the OS or libc). Older versions of Java will need to rely on building the C/C++ into a Java native library. + +### Platform Limitations + +This mechanism relies on Linux-specific features (`mmap`, `prctl`, `/proc`). + +**Mitigation**: The feature is optional. SDKs on other platforms or environments where these features are unavailable can simply not implement it. In the future, we may explore similar mechanisms for other operating systems. + +One specific part of the design takes advantage of a Linux 5.17+ feature: adding a name to the anonymous mapping. For older Linux versions, the design includes a fallback to accommodates legacy kernels. + +The OTEL eBPF Profiler currently [requires Linux 5.4+](https://github.com/open-telemetry/opentelemetry-ebpf-profiler?tab=readme-ov-file#supported-linux-kernel-version) (old versions are Linux 4.19+). + +### Protocol Evolution + +As requirements evolve, we may need to extend the payload format. + +**Mitigation**: The design includes both versioning as well as allowing extension points +1. **Additional `resources` keys**: For experimentation and vendor extensions +2. **New protobuf fields**: For adding recommended fields following protobuf compatibility practices +3. **Version number**: For incompatible changes (not expected to change frequently) + +### Memory Overhead + +Each process publishes a 2-page (typically 8KB) mapping per SDK instance + the amount of memory needed for the payload, expected to also be in the KB range. + +### Payload Format Choice + +The proposal uses protobuf. Not all SDKs may want to carry a protobuf encoder dependency. + +**Mitigation**: Our reference implementations optionally include a limited protobuf implementation that implements only the feature set needed to emit the `OtelProcessCtx` message in < 500 LoC (C/C++ and Java). Alternatively, existing protobuf encoders can be used. + +Aside from protobuf, msgpack was also considered (see [this earlier msgpack-based reference implementation](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib)). Like for protobuf, it's possible to provide a small encoder with similar complexity. We're hoping that the community can make the final choice during specification review. + +### Trace Correlation + +The proposed mechanism only supports sharing process-level resource attributes. + +In particular, it does not support carrying trace and span ids, which would be required to provide finer-grained correlation. Prior art by Elastic and Polar Signals (see below) provide such thread-level context sharing, and there's a working doc [for supporting thread-level context sharing in the OTEL eBPF Profiler](https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm) under development for this. We expect that in the future, such correlation would be proposed as a separate OTEP. + +Process-level and thread-level context are complementary: The process-level mechanism proposed in this OTEP can be generically adopted by SDKs, and allows for flexibility in publishing metadata and in parsing it. Thread-level mechanisms, in contrast, may need specific support for individual languages/runtimes, and because they would be updated for every span, will need careful performance work. + +## Prior art and alternatives + +### Prior Art + +**Elastic apmint**: Elastic's apmint, described in and uses global variables to share process-level data. This is currently used by the Elastic Java trace agent. + +**Polar Signals Custom Labels**: Parca uses a [global variable](https://github.com/polarsignals/custom-labels/blob/master/custom-labels-v1.md#custom_labels_abi_version) to share ABI version information. + +Both approaches demonstrate the need for process-level data sharing and validate the use case, but they rely on ELF symbols which have some limitations (discussed below). + +### Alternatives Considered and Rejected + +**1. Global Variables (Current Approach in OTEL eBPF Profiler)** + +Use global variables with well-known symbol names to store process context. + +**Pros**: Low overhead, straightforward access +**Cons**: +- Symbols may not be accessible in stripped binaries +- Difficult to expose in managed languages (Java, Python) without native libraries +- Static linking can hide symbols +- Child processes inherit the parent's global variables after `fork()`, potentially exposing stale resource attributes until overwritten +- Requires polling to detect context publishing and changes + +**Why rejected**: The anonymous mapping approach is more universally accessible across languages and build configurations. + +**2. Environment Variables** + +Share resource attributes via environment variables using `setenv()`. + +**Cons**: +- `/proc//environ` is not updated by `setenv()` at runtime +- `setenv()` is not thread-safe and can cause crashes in multithreaded applications +- Some runtimes (e.g., Java) don't expose APIs to modify environment variables +- Difficult to guarantee safe timing in managed runtimes with background threads +- Child processes inherit the parent's environment, potentially exposing stale resource attributes until overwritten +- Requires polling to detect context publishing and changes + +**Why rejected**: Technical limitations make this approach non-viable. + +**3. Collector-Based Enrichment** + +Have the OpenTelemetry Collector correlate resource attributes across signals. + +**Cons**: +- Adds significant complexity and statefulness to the Collector +- Conflicts with the Collector's stateless design philosophy +- Would require tracking resource attributes for each signal and correlating by process/container IDs + +**Why rejected**: Wide impact on collector for all OTEL signals. + +**4. Custom ELF Sections** + +Use custom ELF sections (via section attribute in C/C++ or link_section in Rust). + +**Pros**: Fast lookup without searching all mappings +**Cons**: +- Not supported in all languages (e.g., Go, Java) +- Requires build-time configuration +- Still faces challenges with stripped binaries +- Requires polling to detect context publishing and changes + +**Why rejected**: Limited language support and similar limitations to global variables. + +**5. Dynamic Symbol Export** + +Ensure symbols are preserved with `-Wl,--export-dynamic` linker flags. + +**Pros**: Similar to Custom ELF sections. +**Cons**: +- Similar to Custom ELF sections +- Requires users to modify build configurations + +**Why rejected**: Creates adoption barriers and doesn't work for key languages. + +**6. File/Socket-Based Communication** + +Write resource attributes to a file or socket. + +**Pros**: Can use regular file/socket based APIs +**Cons**: +- File/socket lifecycle management (creation, cleanup, permissions) +- Also needs to deal with `fork()` and how child processes inherit the parent's open files/sockets +- Requires polling to detect context publishing and changes +- File-based not compatible with services deployed on read-only filesystems + +**Why rejected**: The technical and operational complexities (especially regarding lifecycle, `fork()` and access control) outweigh the benefits over anonymous memory mappings. + +## Open questions + +1. **Protobuf vs. msgpack vs. other**: Should the payload use protobuf or msgpack, or something else entirely (such as [Type, Length, Value](https://docs.google.com/document/d/1Ij6SYfv0lHOhTNsXNGVFpra3ZCfz-WC7QBXdB_OaoYc/edit?tab=t.0#heading=h.llbgke6lmlbd))? In our experiments, they all work well, the choice is primarily about ease of implementation in the ecosystem and standardization. + +2. **SDK implementation requirements**: Should SDKs publish this information by default whenever possible, or be opt-in? + +3. **First-class fields**: Should we expand the set of first-class fields beyond the current seven? The current set covers the most critical attributes, but we may discover others during implementation. + +## Prototypes + +The following proof-of-concept implementations demonstrate feasibility across multiple languages: + +- **[anonmapping-clib](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib)**: Complete reference implementation in C/C++ with protobuf payload +- **[otel-java-extension-demo](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo)**: OTEL Java SDK extension for automatic publication +- **[anonmapping-java](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-java)**: Pure Java implementation using FFM (no dependencies) +- **[ebpf-program](https://github.com/ivoanjo/proc-level-demo/tree/main/ebpf-program)**: Example eBPF program demonstrating event-driven publishing detection +- **[OpenTelemetry eBPF Profiler PR](https://github.com/DataDog/dd-otel-host-profiler/pull/210)**: Integration in Datadog's experimental fork + +Additional implementations have been tested with: +- [Datadog Java SDK](https://github.com/DataDog/java-profiler/pull/266) +- [Datadog Ruby SDK](https://github.com/DataDog/dd-trace-rb/pull/4865) +- [Datadog Go SDK](https://github.com/DataDog/dd-trace-go/pull/3937) + +These prototypes validate that the approach works across different languages and runtimes. + +## Future possibilities + +Supporting thread-level context sharing, to enable correlation of outside activity (e.g. profiles) with traces/spans is highly desired. + +The process context could also be used for entity detection as detailed in [OTEP 264](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/entities/0264-resource-and-entities.md). From f1c93f0f1d132e8068c7aa2ca0f2233cfee7421a Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 5 Nov 2025 12:35:31 +0000 Subject: [PATCH 02/18] Markdownlint fixes (almost all whitespace) --- oteps/profiles/0000-process-ctx.md | 121 ++++++++++++++++------------- 1 file changed, 65 insertions(+), 56 deletions(-) diff --git a/oteps/profiles/0000-process-ctx.md b/oteps/profiles/0000-process-ctx.md index 6a1bbfb339a..54bf8712e34 100644 --- a/oteps/profiles/0000-process-ctx.md +++ b/oteps/profiles/0000-process-ctx.md @@ -102,10 +102,10 @@ Publishing the context should follow these steps: 3. **Prevent fork inheritance**: Apply `madvise(..., MADV_DONTFORK)` to prevent child processes from inheriting stale data 4. **Encode payload**: Serialize the `OtelProcessCtx` message using protobuf (storing it either following the header OR in a regular memory block) 5. **Write header fields**: Populate `version`, `published_at_ns`, `payload_size`, `payload` -7. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding -8. **Write signature**: Write `OTEL_CTX` to the signature field last -9. **Set read-only**: Apply `mprotect(..., PROT_READ)` to mark the mapping as read-only -10. **Name mapping** (Linux ≥5.17): Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping +6. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding +7. **Write signature**: Write `OTEL_CTX` to the signature field last +8. **Set read-only**: Apply `mprotect(..., PROT_READ)` to mark the mapping as read-only +9. **Name mapping** (Linux ≥5.17): Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present and the mapping set to read-only, the entire mapping is considered valid and immutable. @@ -164,6 +164,7 @@ When a process forks, child processes do not inherit the parent's process contex **Why this matters**: Without this protection, child processes would inherit stale resource attributes from the parent. For example, if a parent process has `service.instance.id=uuid-parent` and forks a child that initializes its own OpenTelemetry SDK with `service.instance.id=uuid-child`, the child would initially expose the parent's UUID until it publishes its own context. This could lead to misattribution of telemetry in backend systems. **Behavior**: + - Child processes that initialize their own OpenTelemetry SDK will publish their own process context mapping with their own resource attributes - Child processes that do not initialize an OpenTelemetry SDK will simply not have a process context mapping, which readers handle gracefully @@ -190,6 +191,7 @@ The OTEL eBPF Profiler currently [requires Linux 5.4+](https://github.com/open-t As requirements evolve, we may need to extend the payload format. **Mitigation**: The design includes both versioning as well as allowing extension points + 1. **Additional `resources` keys**: For experimentation and vendor extensions 2. **New protobuf fields**: For adding recommended fields following protobuf compatibility practices 3. **Version number**: For incompatible changes (not expected to change frequently) @@ -226,81 +228,87 @@ Both approaches demonstrate the need for process-level data sharing and validate ### Alternatives Considered and Rejected -**1. Global Variables (Current Approach in OTEL eBPF Profiler)** +1. Global Variables (Current Approach in OTEL eBPF Profiler) + + Use global variables with well-known symbol names to store process context. + + **Pros**: Low overhead, straightforward access + **Cons**: + + - Symbols may not be accessible in stripped binaries + - Difficult to expose in managed languages (Java, Python) without native libraries + - Static linking can hide symbols + - Child processes inherit the parent's global variables after `fork()`, potentially exposing stale resource attributes until overwritten + - Requires polling to detect context publishing and changes -Use global variables with well-known symbol names to store process context. + **Why rejected**: The anonymous mapping approach is more universally accessible across languages and build configurations. -**Pros**: Low overhead, straightforward access -**Cons**: -- Symbols may not be accessible in stripped binaries -- Difficult to expose in managed languages (Java, Python) without native libraries -- Static linking can hide symbols -- Child processes inherit the parent's global variables after `fork()`, potentially exposing stale resource attributes until overwritten -- Requires polling to detect context publishing and changes +2. Environment Variables -**Why rejected**: The anonymous mapping approach is more universally accessible across languages and build configurations. + Share resource attributes via environment variables using `setenv()`. -**2. Environment Variables** + **Cons**: -Share resource attributes via environment variables using `setenv()`. + - `/proc//environ` is not updated by `setenv()` at runtime + - `setenv()` is not thread-safe and can cause crashes in multithreaded applications + - Some runtimes (e.g., Java) don't expose APIs to modify environment variables + - Difficult to guarantee safe timing in managed runtimes with background threads + - Child processes inherit the parent's environment, potentially exposing stale resource attributes until overwritten + - Requires polling to detect context publishing and changes -**Cons**: -- `/proc//environ` is not updated by `setenv()` at runtime -- `setenv()` is not thread-safe and can cause crashes in multithreaded applications -- Some runtimes (e.g., Java) don't expose APIs to modify environment variables -- Difficult to guarantee safe timing in managed runtimes with background threads -- Child processes inherit the parent's environment, potentially exposing stale resource attributes until overwritten -- Requires polling to detect context publishing and changes + **Why rejected**: Technical limitations make this approach non-viable. -**Why rejected**: Technical limitations make this approach non-viable. +3. Collector-Based Enrichment -**3. Collector-Based Enrichment** + Have the OpenTelemetry Collector correlate resource attributes across signals. -Have the OpenTelemetry Collector correlate resource attributes across signals. + **Cons**: -**Cons**: -- Adds significant complexity and statefulness to the Collector -- Conflicts with the Collector's stateless design philosophy -- Would require tracking resource attributes for each signal and correlating by process/container IDs + - Adds significant complexity and statefulness to the Collector + - Conflicts with the Collector's stateless design philosophy + - Would require tracking resource attributes for each signal and correlating by process/container IDs -**Why rejected**: Wide impact on collector for all OTEL signals. + **Why rejected**: Wide impact on collector for all OTEL signals. -**4. Custom ELF Sections** +4. Custom ELF Sections -Use custom ELF sections (via section attribute in C/C++ or link_section in Rust). + Use custom ELF sections (via section attribute in C/C++ or link_section in Rust). -**Pros**: Fast lookup without searching all mappings -**Cons**: -- Not supported in all languages (e.g., Go, Java) -- Requires build-time configuration -- Still faces challenges with stripped binaries -- Requires polling to detect context publishing and changes + **Pros**: Fast lookup without searching all mappings + **Cons**: -**Why rejected**: Limited language support and similar limitations to global variables. + - Not supported in all languages (e.g., Go, Java) + - Requires build-time configuration + - Still faces challenges with stripped binaries + - Requires polling to detect context publishing and changes -**5. Dynamic Symbol Export** + **Why rejected**: Limited language support and similar limitations to global variables. -Ensure symbols are preserved with `-Wl,--export-dynamic` linker flags. +5. Dynamic Symbol Export -**Pros**: Similar to Custom ELF sections. -**Cons**: -- Similar to Custom ELF sections -- Requires users to modify build configurations + Ensure symbols are preserved with `-Wl,--export-dynamic` linker flags. -**Why rejected**: Creates adoption barriers and doesn't work for key languages. + **Pros**: Similar to Custom ELF sections. + **Cons**: -**6. File/Socket-Based Communication** + - Similar to Custom ELF sections + - Requires users to modify build configurations -Write resource attributes to a file or socket. + **Why rejected**: Creates adoption barriers and doesn't work for key languages. -**Pros**: Can use regular file/socket based APIs -**Cons**: -- File/socket lifecycle management (creation, cleanup, permissions) -- Also needs to deal with `fork()` and how child processes inherit the parent's open files/sockets -- Requires polling to detect context publishing and changes -- File-based not compatible with services deployed on read-only filesystems +6. File/Socket-Based Communication -**Why rejected**: The technical and operational complexities (especially regarding lifecycle, `fork()` and access control) outweigh the benefits over anonymous memory mappings. + Write resource attributes to a file or socket. + + **Pros**: Can use regular file/socket based APIs + **Cons**: + + - File/socket lifecycle management (creation, cleanup, permissions) + - Also needs to deal with `fork()` and how child processes inherit the parent's open files/sockets + - Requires polling to detect context publishing and changes + - File-based not compatible with services deployed on read-only filesystems + + **Why rejected**: The technical and operational complexities (especially regarding lifecycle, `fork()` and access control) outweigh the benefits over anonymous memory mappings. ## Open questions @@ -321,6 +329,7 @@ The following proof-of-concept implementations demonstrate feasibility across mu - **[OpenTelemetry eBPF Profiler PR](https://github.com/DataDog/dd-otel-host-profiler/pull/210)**: Integration in Datadog's experimental fork Additional implementations have been tested with: + - [Datadog Java SDK](https://github.com/DataDog/java-profiler/pull/266) - [Datadog Ruby SDK](https://github.com/DataDog/dd-trace-rb/pull/4865) - [Datadog Go SDK](https://github.com/DataDog/dd-trace-go/pull/3937) From 967067a7aac470284a108b8a83f7b84cdaa751de Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 5 Nov 2025 12:37:03 +0000 Subject: [PATCH 03/18] Update OTEP number based on PR number --- oteps/profiles/{0000-process-ctx.md => 4719-process-ctx.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename oteps/profiles/{0000-process-ctx.md => 4719-process-ctx.md} (100%) diff --git a/oteps/profiles/0000-process-ctx.md b/oteps/profiles/4719-process-ctx.md similarity index 100% rename from oteps/profiles/0000-process-ctx.md rename to oteps/profiles/4719-process-ctx.md From e823eb4d27aed3ab1aac311c0b929884864234ac Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 19 Nov 2025 17:07:51 +0000 Subject: [PATCH 04/18] Apply suggestions from code review Co-authored-by: Florian Lehner --- oteps/profiles/4719-process-ctx.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 54bf8712e34..c5216a1db33 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -4,13 +4,13 @@ Introduce a standard mechanism for OpenTelemetry SDKs to publish process-level r ## Motivation -External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: +External readers like OpenTelemetry eBPF Profiler or OpenTelemetry eBPF Instrumentation operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: - **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate profiles with other telemetry (such as traces and spans!) from the same service instance (especially in runtimes that employ multiple processes). -- **Inconsistent resource attributes across signals**: Configuration such as `service.name`, `deployment.environment.name`, and `service.version` is not always available or resolves consistently between the OpenTelemetry SDKs and external readers, leading to configuration drift and inconsistent tagging. +- **Inconsistent resource attributes across signals**: Running in different scopes, configuration such as `service.name`, `deployment.environment.name`, and `service.version` are not always available or resolves consistently between the OpenTelemetry SDKs and external readers, leading to configuration drift and inconsistent tagging. -- **Correlation is dependent on process activity**: If a service is idle or not emitting other signals, external readers have difficulty identifying it, since resource attributes or identifiers are only sent along when signals are reported. +- **Correlation is dependent on process activity**: If a service is blocked (such as when doing slow I/O, or threads are actually deadlocked) and not emitting other signals, external readers have difficulty identifying it, since resource attributes or identifiers are only sent along when signals are reported. ## Explanation @@ -30,7 +30,7 @@ The header is stored in a fixed-size anonymous memory mapping of 2 pages with th | Field | Type | Description | |-------------------|-----------|----------------------------------------------------------------------| -| `signature` | `char[8]` | Set to `"OTEL_CTX"` when the payload is ready (written last) | +| `signature` | `char[8]` | Always set to `"OTEL_CTX"`| | `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | | `published_at_ns` | `uint64` | Timestamp when the context was published, in nanoseconds since epoch | | `payload_size` | `uint32` | Number of bytes of the encoded payload | @@ -109,7 +109,7 @@ Publishing the context should follow these steps: The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present and the mapping set to read-only, the entire mapping is considered valid and immutable. -If resource attributes are updated during the process lifetime, the previous mapping should be removed and a new one published following the same steps. +If resource attributes are updated during the process lifetime, the previous mapping should be removed before publishing new ones following the same steps. If any of the steps above fail (other than naming the mapping on older Linux versions), publication is considered to have failed, and the process context will not be available. From 3fee35250ff753cbd44a4e50674af14f54cd4572 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 26 Nov 2025 10:31:19 +0000 Subject: [PATCH 05/18] Document "loose coordination" intent --- oteps/profiles/4719-process-ctx.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index c5216a1db33..1c816b90238 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -10,7 +10,7 @@ External readers like OpenTelemetry eBPF Profiler or OpenTelemetry eBPF Instrume - **Inconsistent resource attributes across signals**: Running in different scopes, configuration such as `service.name`, `deployment.environment.name`, and `service.version` are not always available or resolves consistently between the OpenTelemetry SDKs and external readers, leading to configuration drift and inconsistent tagging. -- **Correlation is dependent on process activity**: If a service is blocked (such as when doing slow I/O, or threads are actually deadlocked) and not emitting other signals, external readers have difficulty identifying it, since resource attributes or identifiers are only sent along when signals are reported. +- **Correlation is dependent on process activity**: If a service is blocked (such as when doing slow I/O, or threads are actually deadlocked) and not emitting other signals, external readers have difficulty identifying it, since resource attributes or identifiers are only sent along when signals are reported. ## Explanation @@ -18,7 +18,14 @@ We propose a mechanism for OpenTelemetry SDKs to publish process-level resource When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. -The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process. +This mechanism is designed to support loose coordination between the publishing process and external readers: + +- **Publisher-first deployment**: The publishing process can start and publish its context before any readers are running, with readers discovering it later +- **Reader flexibility**: Readers are not limited to eBPF-based implementations; any external process with sufficient system permissions to read `/proc//maps` and process memory can access this information +- **Runtime compatibility**: The mechanism works even in environments where eBPF function hooking is unavailable or restricted +- **Independent of process activity**: The context can be read at any time, including when the application is deadlocked, blocked on I/O, or otherwise idle, without relying on active hook points or the process emitting signals + +External readers such as the OpenTelemetry eBPF Profiler will, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples or other telemetry collected from that process. ## Internal details From 0d7334538a33ec6e23b4c39ad9476e655ad23a21 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 26 Nov 2025 11:47:54 +0000 Subject: [PATCH 06/18] Rework payload to use `Resource` message Following discussion so far, we can probably avoid having our home-grown `OtelProcessCtx` and instead use the common OTEL `Resource` message. --- oteps/profiles/4719-process-ctx.md | 99 ++++++++++++++++-------------- 1 file changed, 54 insertions(+), 45 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 1c816b90238..466013c6a8c 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -49,57 +49,66 @@ The `payload` can optionally be placed after the header (with the `payload` poin ### Payload Format -The payload uses protobuf with the [following schema](https://github.com/open-telemetry/sig-profiling/pull/13): +The payload uses protobuf with the OTEL [`opentelemetry.proto.resource.v1.Resource`](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/resource/v1/resource.proto) schema (reproduced below for quick reference). -The implementation distinguishes between: +```protobuf +// Copyright 2019, OpenTelemetry Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. -* **First-class fields** correspond to recommended OpenTelemetry semantic conventions. If a key in the `resources` map matches a first-class field name, the first-class field takes precedence. Readers MAY fall back to `resources` if the corresponding first-class field is empty, or they MAY ignore it entirely. +syntax = "proto3"; -* **Resources map** allows for arbitrary additional attributes. This enables easily adding more information that needs to be carried over, as well as vendor-specific extensions and experimentation. +package opentelemetry.proto.resource.v1; -```protobuf -syntax = "proto3"; +import "opentelemetry/proto/common/v1/common.proto"; -package otel_process_ctx.v1development; +option csharp_namespace = "OpenTelemetry.Proto.Resource.V1"; +option java_multiple_files = true; +option java_package = "io.opentelemetry.proto.resource.v1"; +option java_outer_classname = "ResourceProto"; +option go_package = "go.opentelemetry.io/proto/otlp/resource/v1"; -message OtelProcessCtx { - // Additional key/value pairs as resources https://opentelemetry.io/docs/specs/otel/resource/sdk/ - // Similar to baggage https://opentelemetry.io/docs/concepts/signals/baggage/ / https://opentelemetry.io/docs/specs/otel/overview/#baggage-signal - // - // Providing resources is optional. +// Resource information. +message Resource { + // Set of attributes that describe the resource. + // Attribute keys MUST be unique (it is not allowed to have more than one + // attribute with the same key). + // The behavior of software that receives duplicated keys can be unpredictable. + repeated opentelemetry.proto.common.v1.KeyValue attributes = 1; + + // The number of dropped attributes. If the value is 0, then + // no attributes were dropped. + uint32 dropped_attributes_count = 2; + + // Set of entities that participate in this Resource. // - // If a key in this field would match one of the attributes already defined as a first-class field below (e.g. `service.name`), - // the first-class field must always take priority. - // Readers MAY choose to fallback to a value in `resources` if its corresponding first-class field is empty, or they CAN ignore it. - map resources = 1; - - // We strongly recommend that the following first-class fields are provided, but they can be empty if needed. - // In particular for `deployment_environment_name` and `service_version` often need to be configured for a given application - // and cannot be inferred. For the others, see the semantic conventions documentation for recommended ways of setting them. - - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name - string deployment_environment_name = 2; - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id - string service_instance_id = 3; - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-name - string service_name = 4; - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-version - string service_version = 5; - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-language - string telemetry_sdk_language = 6; - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-version - string telemetry_sdk_version = 7; - // https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-name - string telemetry_sdk_name = 8; - - // New first-class fields should be added only if: - // * Providing them is strongly recommended - // * They match a new or existing OTEL semantic convention + // Note: keys in the references MUST exist in attributes of this message. // - // Otherwise, `resources` should be used instead. + // Status: [Development] + repeated opentelemetry.proto.common.v1.EntityRef entity_refs = 3; } ``` +The following attributes are especially relevant to the OTEL eBPF Profiler and thus are recommended to be provided by publishers: + +* [service.instance.id](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) +* [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name) +* [service.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-name) +* [service.version](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-version) +* [telemetry.sdk.language](https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-language) +* [telemetry.sdk.version](https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-version) +* [telemetry.sdk.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-name) + ### Publication Protocol Publishing the context should follow these steps: @@ -107,7 +116,7 @@ Publishing the context should follow these steps: 1. **Drop existing mapping**: If a previous context was published, unmap/free it 2. **Allocate new mapping**: Create a 2-page anonymous mapping via `mmap()` (These pages are always zeroed by Linux) 3. **Prevent fork inheritance**: Apply `madvise(..., MADV_DONTFORK)` to prevent child processes from inheriting stale data -4. **Encode payload**: Serialize the `OtelProcessCtx` message using protobuf (storing it either following the header OR in a regular memory block) +4. **Encode payload**: Serialize the payload message using protobuf (storing it either following the header OR in a separate memory allocation) 5. **Write header fields**: Populate `version`, `published_at_ns`, `payload_size`, `payload` 6. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding 7. **Write signature**: Write `OTEL_CTX` to the signature field last @@ -141,7 +150,7 @@ External readers (such as the OpenTelemetry eBPF Profiler) discover and read pro 4. **Re-read header**: If the header has not changed, the read of header + payload is consistent. This ensures there were no concurrent changes to the process context. If the header changed, restart at 1. -5. **Decode payload**: Deserialize the bytes as a Protocol Buffer `OtelProcessCtx` message +5. **Decode payload**: Deserialize the bytes as a Protocol Buffer payload message 6. **Apply attributes**: Use the decoded resource attributes to enrich telemetry collected from this process @@ -199,7 +208,7 @@ As requirements evolve, we may need to extend the payload format. **Mitigation**: The design includes both versioning as well as allowing extension points -1. **Additional `resources` keys**: For experimentation and vendor extensions +1. **Additional `attributes` keys**: These can be used for experimentation and vendor extensions 2. **New protobuf fields**: For adding recommended fields following protobuf compatibility practices 3. **Version number**: For incompatible changes (not expected to change frequently) @@ -211,7 +220,7 @@ Each process publishes a 2-page (typically 8KB) mapping per SDK instance + the a The proposal uses protobuf. Not all SDKs may want to carry a protobuf encoder dependency. -**Mitigation**: Our reference implementations optionally include a limited protobuf implementation that implements only the feature set needed to emit the `OtelProcessCtx` message in < 500 LoC (C/C++ and Java). Alternatively, existing protobuf encoders can be used. +**Mitigation**: Our reference implementations optionally include a limited protobuf implementation that implements only the feature set needed to emit a minimal payload message in < 500 LoC (C/C++ and Java). Alternatively, existing protobuf encoders can be used. Aside from protobuf, msgpack was also considered (see [this earlier msgpack-based reference implementation](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib)). Like for protobuf, it's possible to provide a small encoder with similar complexity. We're hoping that the community can make the final choice during specification review. @@ -323,7 +332,7 @@ Both approaches demonstrate the need for process-level data sharing and validate 2. **SDK implementation requirements**: Should SDKs publish this information by default whenever possible, or be opt-in? -3. **First-class fields**: Should we expand the set of first-class fields beyond the current seven? The current set covers the most critical attributes, but we may discover others during implementation. +3. **Protobuf format**: Is the `opentelemetry.proto.resource.v1.Resource` message the right one to use for this? Do we want or need to wrap it in some other envelope? ## Prototypes From 5fbcb8cd1309ce01999590427d736a0f12cdddfd Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 1 Dec 2025 17:00:38 +0000 Subject: [PATCH 07/18] Omit recommended attributes As pointed out during review, these don't necessarily exist for some resources so let's streamline the spec for now. --- oteps/profiles/4719-process-ctx.md | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 466013c6a8c..3ca1f190d2f 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -99,16 +99,6 @@ message Resource { } ``` -The following attributes are especially relevant to the OTEL eBPF Profiler and thus are recommended to be provided by publishers: - -* [service.instance.id](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) -* [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name) -* [service.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-name) -* [service.version](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-version) -* [telemetry.sdk.language](https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-language) -* [telemetry.sdk.version](https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-version) -* [telemetry.sdk.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/telemetry/#telemetry-sdk-name) - ### Publication Protocol Publishing the context should follow these steps: From c5989b84619118ea1f3c1145edad1ee0bfac74b8 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 1 Dec 2025 17:06:33 +0000 Subject: [PATCH 08/18] Update link to C/C++ example implementation --- oteps/profiles/4719-process-ctx.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 3ca1f190d2f..1e4d80494ab 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -178,7 +178,7 @@ When a process forks, child processes do not inherit the parent's process contex Creating memory mappings and managing them adds complexity to SDK implementations. -**Mitigation**: We've created a reference implementation in [C/C++](https://github.com/ivoanjo/proc-level-demo), as well as a [demo OTEL Java SDK extension](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo) and a [Go port as well](https://github.com/DataDog/dd-trace-go/pull/3937). +**Mitigation**: We've created a reference implementation in [C/C++](https://github.com/open-telemetry/sig-profiling/pull/23), as well as a [demo OTEL Java SDK extension](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo) and a [Go port as well](https://github.com/DataDog/dd-trace-go/pull/3937). For Go as well as modern versions of Java it's possible to create an implementation that doesn't rely on third-party libraries or native code (e.g. by directly calling into the OS or libc). Older versions of Java will need to rely on building the C/C++ into a Java native library. @@ -212,7 +212,7 @@ The proposal uses protobuf. Not all SDKs may want to carry a protobuf encoder de **Mitigation**: Our reference implementations optionally include a limited protobuf implementation that implements only the feature set needed to emit a minimal payload message in < 500 LoC (C/C++ and Java). Alternatively, existing protobuf encoders can be used. -Aside from protobuf, msgpack was also considered (see [this earlier msgpack-based reference implementation](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib)). Like for protobuf, it's possible to provide a small encoder with similar complexity. We're hoping that the community can make the final choice during specification review. +Aside from protobuf, msgpack was also trialed; similarly to protobuf, it's possible to provide a small msgpack encoder with low complexity. We're hoping that the community can make the final choice during specification review. ### Trace Correlation @@ -328,7 +328,7 @@ Both approaches demonstrate the need for process-level data sharing and validate The following proof-of-concept implementations demonstrate feasibility across multiple languages: -- **[anonmapping-clib](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib)**: Complete reference implementation in C/C++ with protobuf payload +- **[anonmapping-clib](https://github.com/open-telemetry/sig-profiling/pull/23)**: Complete reference implementation in C/C++ with protobuf payload - **[otel-java-extension-demo](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo)**: OTEL Java SDK extension for automatic publication - **[anonmapping-java](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-java)**: Pure Java implementation using FFM (no dependencies) - **[ebpf-program](https://github.com/ivoanjo/proc-level-demo/tree/main/ebpf-program)**: Example eBPF program demonstrating event-driven publishing detection From b1583c6f942c1efea0b49e117afb374c18f55272 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 8 Dec 2025 09:42:20 +0000 Subject: [PATCH 09/18] Tweak description of cross-signal identifiers --- oteps/profiles/4719-process-ctx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 1e4d80494ab..d879fa0fb3a 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -6,7 +6,7 @@ Introduce a standard mechanism for OpenTelemetry SDKs to publish process-level r External readers like OpenTelemetry eBPF Profiler or OpenTelemetry eBPF Instrumentation operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: -- **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate profiles with other telemetry (such as traces and spans!) from the same service instance (especially in runtimes that employ multiple processes). +- **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate various signals with each other (especially in runtimes that employ multiple processes). - **Inconsistent resource attributes across signals**: Running in different scopes, configuration such as `service.name`, `deployment.environment.name`, and `service.version` are not always available or resolves consistently between the OpenTelemetry SDKs and external readers, leading to configuration drift and inconsistent tagging. From d84adeac46f08d7fe430343842df7e17336e2280 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 8 Dec 2025 10:14:14 +0000 Subject: [PATCH 10/18] Apply suggestions from code review Co-authored-by: Christos Kalkanis --- oteps/profiles/4719-process-ctx.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index d879fa0fb3a..88729bb7ad6 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -21,7 +21,7 @@ When an SDK initializes (or updates its resource attributes) it publishes this i This mechanism is designed to support loose coordination between the publishing process and external readers: - **Publisher-first deployment**: The publishing process can start and publish its context before any readers are running, with readers discovering it later -- **Reader flexibility**: Readers are not limited to eBPF-based implementations; any external process with sufficient system permissions to read `/proc//maps` and process memory can access this information +- **Reader flexibility**: Readers are not limited to eBPF-based implementations; any external process with sufficient system permissions to read `/proc//maps` and read target process memory can access this information - **Runtime compatibility**: The mechanism works even in environments where eBPF function hooking is unavailable or restricted - **Independent of process activity**: The context can be read at any time, including when the application is deadlocked, blocked on I/O, or otherwise idle, without relying on active hook points or the process emitting signals @@ -159,7 +159,7 @@ SDKs that do not implement this feature continue to function normally; external ### Host and Permission Requirements -This mechanism requires that the external reader (such as an eBPF profiler) is running on the same host as the instrumented process and has sufficient privileges to access the memory mappings exposed by the process. +This mechanism requires that the external reader (such as an eBPF profiler) is running on the same host as the instrumented process and has sufficient privileges to both access the memory mappings exposed by the process and read target process memory. The OpenTelemetry eBPF profiler, by design, has the necessary permissions and operates on the same machine to read this metadata. This approach does **not** support remote or cross-host correlation of process context, and attempts to access the process context mappings without appropriate permissions (e.g., from an unprivileged user) will fail. From 9c8d9ed90766922c0057db0a43ba37175c38fb41 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 8 Dec 2025 10:51:33 +0000 Subject: [PATCH 11/18] Clarify limitations regarding OBI --- oteps/profiles/4719-process-ctx.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 88729bb7ad6..d6daf3462d0 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -4,7 +4,7 @@ Introduce a standard mechanism for OpenTelemetry SDKs to publish process-level r ## Motivation -External readers like OpenTelemetry eBPF Profiler or OpenTelemetry eBPF Instrumentation operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: +External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: - **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate various signals with each other (especially in runtimes that employ multiple processes). @@ -222,6 +222,14 @@ In particular, it does not support carrying trace and span ids, which would be r Process-level and thread-level context are complementary: The process-level mechanism proposed in this OTEP can be generically adopted by SDKs, and allows for flexibility in publishing metadata and in parsing it. Thread-level mechanisms, in contrast, may need specific support for individual languages/runtimes, and because they would be updated for every span, will need careful performance work. +### Applicability to OpenTelemetry eBPF Instrumentation + +The [OpenTelemetry eBPF Instrumentation (OBI)](https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation) auto-instrumentation tool, when used in the application observability mode, uses a combination of Linux uprobes and [userspace writes](https://opentelemetry.io/docs/zero-code/obi/security/) to emit traces and metrics from otherwise unmodified applications. + +The protocol proposed by this specification requires the ability to, inside the target application, allocate (small amounts of) memory, as well as invoking system calls to set up the naming and the inheritance permissions. This is not something that can currently be done with an eBPF-based approach and thus this spec can't currently be implemented using OBI. + +**Mitigation**: For OBI-to-OTEL eBPF Profiler communication, we can separately introduce an out-of-band channel using the existing kernel eBPF primitives, given both tools operate in kernel space. + ## Prior art and alternatives ### Prior Art From 4c871acff81c8d3742631a2dba6cf40c7cdb8665 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 8 Dec 2025 11:07:38 +0000 Subject: [PATCH 12/18] Update spec to reflect that named mappings are not always available on Linux 5.17+ See https://github.com/open-telemetry/sig-profiling/pull/23 for a wider discussion of this. --- oteps/profiles/4719-process-ctx.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index d6daf3462d0..d857042f8d0 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -43,7 +43,7 @@ The header is stored in a fixed-size anonymous memory mapping of 2 pages with th | `payload_size` | `uint32` | Number of bytes of the encoded payload | | `payload` | `char*` | Pointer to payload, in protobuf format | -**Why 2 pages**: On Linux kernels prior to 5.17, readers cannot filter mappings by name and must scan anonymous mappings. Using a fixed size allows readers to quickly filter candidate mappings by size (among other attributes) before checking the signature, avoiding the need to check most mappings in a process. +**Why 2 pages**: On Linux kernels prior to 5.17 as well as those without the `CONFIG_ANON_VMA_NAME` feature, readers cannot filter mappings by name and must scan anonymous mappings. Using a fixed size allows readers to quickly filter candidate mappings by a combination of attributes before checking the signature, avoiding the need to check most mappings in a process. The `payload` can optionally be placed after the header (with the `payload` pointer field correctly pointing at it) or optionally elsewhere in the process memory. @@ -111,7 +111,7 @@ Publishing the context should follow these steps: 6. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding 7. **Write signature**: Write `OTEL_CTX` to the signature field last 8. **Set read-only**: Apply `mprotect(..., PROT_READ)` to mark the mapping as read-only -9. **Name mapping** (Linux ≥5.17): Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping +9. **Name mapping**: Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping. This step should be done unconditionally, although naming mappings is not always supported by the kernel. The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present and the mapping set to read-only, the entire mapping is considered valid and immutable. @@ -123,13 +123,17 @@ The process context is treated as a singleton: there SHOULD NOT be more than one The context MAY be dropped during SDK shutdown, or kept around until the process itself terminates and the OS takes care of cleaning the process memory. +Naming the mapping is only available on Linux 5.17+ when the `CONFIG_ANON_VMA_NAME` feature on the kernel is enabled. Many Linux distributions such as Ubuntu and Arch enable it. On earlier kernel versions or kernels without the feature, the `prctl` call with return an error which should be ignored. The reading protocol specified below is able to work regardless on naming being available. + +Note that on legacy kernels and those without `CONFIG_ANON_VMA_NAME` it's possible, using eBPF, [to optionally hook on `prctl`](https://github.com/ivoanjo/proc-level-demo/tree/main/ebpf-program) naming attempts as a way of detecting new mappings being published. For this reason, this step should always be done even if the publisher somehow is aware that naming will not be successful on the current system. + ### Reading Protocol External readers (such as the OpenTelemetry eBPF Profiler) discover and read process context as follows: 1. **Locate mapping**: - - **Preferred** (Linux ≥5.17): Parse `/proc//maps` and search for read-only entries with name `[anon:OTEL_CTX]` - - **Fallback** (older kernels): Parse `/proc//maps` and search for anonymous read-only mappings exactly 2 pages in size, then read the first 8 bytes to check for the `"OTEL_CTX"` signature + - **Preferred** (Linux 5.17+ with `CONFIG_ANON_VMA_NAME`): Parse `/proc//maps` and search for read-only entries with name `[anon:OTEL_CTX]` + - **Fallback**: Parse `/proc//maps` and search for anonymous read-only mappings exactly 2 pages in size, then read the first 8 bytes to check for the `"OTEL_CTX"` signature 2. **Validate signature and version**: - Read the header and verify first 8 bytes matches `OTEL_CTX` @@ -188,10 +192,6 @@ This mechanism relies on Linux-specific features (`mmap`, `prctl`, `/proc`). **Mitigation**: The feature is optional. SDKs on other platforms or environments where these features are unavailable can simply not implement it. In the future, we may explore similar mechanisms for other operating systems. -One specific part of the design takes advantage of a Linux 5.17+ feature: adding a name to the anonymous mapping. For older Linux versions, the design includes a fallback to accommodates legacy kernels. - -The OTEL eBPF Profiler currently [requires Linux 5.4+](https://github.com/open-telemetry/opentelemetry-ebpf-profiler?tab=readme-ov-file#supported-linux-kernel-version) (old versions are Linux 4.19+). - ### Protocol Evolution As requirements evolve, we may need to extend the payload format. From a647b8ae27ac78cb9e38645dc3d059b02bd205eb Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 8 Dec 2025 11:09:31 +0000 Subject: [PATCH 13/18] Minor: Linting fix --- oteps/profiles/4719-process-ctx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index d857042f8d0..9916d31b427 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -37,7 +37,7 @@ The header is stored in a fixed-size anonymous memory mapping of 2 pages with th | Field | Type | Description | |-------------------|-----------|----------------------------------------------------------------------| -| `signature` | `char[8]` | Always set to `"OTEL_CTX"`| +| `signature` | `char[8]` | Always set to `"OTEL_CTX"` | | `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | | `published_at_ns` | `uint64` | Timestamp when the context was published, in nanoseconds since epoch | | `payload_size` | `uint32` | Number of bytes of the encoded payload | From f00d9f9aaf6b3d4fdca0177e67a940592286285d Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 8 Dec 2025 11:52:17 +0000 Subject: [PATCH 14/18] Reorder fields to make sure `published_at_ns` and `payload` are 8-byte aligned --- oteps/profiles/4719-process-ctx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 9916d31b427..5aa6a563b54 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -39,8 +39,8 @@ The header is stored in a fixed-size anonymous memory mapping of 2 pages with th |-------------------|-----------|----------------------------------------------------------------------| | `signature` | `char[8]` | Always set to `"OTEL_CTX"` | | `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | -| `published_at_ns` | `uint64` | Timestamp when the context was published, in nanoseconds since epoch | | `payload_size` | `uint32` | Number of bytes of the encoded payload | +| `published_at_ns` | `uint64` | Timestamp when the context was published, in nanoseconds since epoch | | `payload` | `char*` | Pointer to payload, in protobuf format | **Why 2 pages**: On Linux kernels prior to 5.17 as well as those without the `CONFIG_ANON_VMA_NAME` feature, readers cannot filter mappings by name and must scan anonymous mappings. Using a fixed size allows readers to quickly filter candidate mappings by a combination of attributes before checking the signature, avoiding the need to check most mappings in a process. From 17ec933d0f4059e0f1ac6c384eaff86abc4f7188 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 10 Dec 2025 10:06:33 +0000 Subject: [PATCH 15/18] Add mention to custom attributes + suggest following existing semconv --- oteps/profiles/4719-process-ctx.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 5aa6a563b54..31114a627af 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -99,6 +99,9 @@ message Resource { } ``` +Note that attributes mean not ["process attributes"](https://opentelemetry.io/docs/specs/semconv/resource/process/) in particular, but any attribute that the publisher wants to share with external readers, including custom attributes. +If applicable, these should follow [existing semantic conventions](https://opentelemetry.io/docs/specs/semconv/). + ### Publication Protocol Publishing the context should follow these steps: From 81169a1bef33122e623e999d016b5716a9673496 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 10 Dec 2025 11:19:59 +0000 Subject: [PATCH 16/18] Minor: Tighten intro by focusing on resources and less on what they are --- oteps/profiles/4719-process-ctx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 31114a627af..dcab3ab62b7 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -1,6 +1,6 @@ # Process Context: Sharing Resource Attributes with External Readers -Introduce a standard mechanism for OpenTelemetry SDKs to publish process-level resource attributes for access by out-of-process readers such as the OpenTelemetry eBPF Profiler. +Introduce a standard mechanism for OpenTelemetry SDKs to publish resource attributes for access by out-of-process readers such as the OpenTelemetry eBPF Profiler. ## Motivation From 3caecfba222bdec728b6e7363800f27547c3cf8d Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Thu, 18 Dec 2025 15:23:54 +0000 Subject: [PATCH 17/18] Create mapping from memfd, use in-place updates and remove probing to find mappings After discussion in the PR and great suggestions/experiments from @christos68k, the specification has been updated as such: * Instead of always using an anonymous mapping, try first to create a memfd and create a mapping from the memfd. If due to security restrictions memfd is not available, fall back to an anonymous mapping instead. * Remove probing as a fallback for when naming a mapping fails. Because the name of a memfd also shows up in `/proc//maps`, we expect that having `memfd` naming as a fallback for when `prctl` is not available is enough. * Drop requirement for 2-page size and read-only permissions on the header memory pages. These were intented to support the "probing as a fallback for naming failure", so they are no longer needed. * Document "Updating Protocol" for in-place updates to process context. This allows efficient updates. In particular, it makes it easier for the reader to detect updates and avoids reparsing `/proc//maps` for updates. --- oteps/profiles/4719-process-ctx.md | 77 +++++++++++++++++------------- 1 file changed, 44 insertions(+), 33 deletions(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index dcab3ab62b7..2c126e9c2b5 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -14,16 +14,16 @@ External readers like the OpenTelemetry eBPF Profiler operate outside the instru ## Explanation -We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings. +We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux memory mappings. -When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. +When an SDK initializes (or updates its resource attributes) it publishes this information to a small memory region that external processes can discover and read. This mechanism is designed to support loose coordination between the publishing process and external readers: - **Publisher-first deployment**: The publishing process can start and publish its context before any readers are running, with readers discovering it later - **Reader flexibility**: Readers are not limited to eBPF-based implementations; any external process with sufficient system permissions to read `/proc//maps` and read target process memory can access this information - **Runtime compatibility**: The mechanism works even in environments where eBPF function hooking is unavailable or restricted -- **Independent of process activity**: The context can be read at any time, including when the application is deadlocked, blocked on I/O, or otherwise idle, without relying on active hook points or the process emitting signals +- **Independent of process activity**: The context can be read at any time, including while the application is deadlocked, blocked on I/O, or otherwise idle, without relying on active hook points or the process emitting signals External readers such as the OpenTelemetry eBPF Profiler will, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples or other telemetry collected from that process. @@ -33,19 +33,17 @@ The process context is split between a header (stored in an anonymous mapping) a ### Header Structure -The header is stored in a fixed-size anonymous memory mapping of 2 pages with the following format: +The header is stored in a memory mapping with the following format: | Field | Type | Description | |-------------------|-----------|----------------------------------------------------------------------| | `signature` | `char[8]` | Always set to `"OTEL_CTX"` | -| `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | +| `version` | `uint32` | Format version. Currently `2` (`1` can be used for development) | | `payload_size` | `uint32` | Number of bytes of the encoded payload | | `published_at_ns` | `uint64` | Timestamp when the context was published, in nanoseconds since epoch | | `payload` | `char*` | Pointer to payload, in protobuf format | -**Why 2 pages**: On Linux kernels prior to 5.17 as well as those without the `CONFIG_ANON_VMA_NAME` feature, readers cannot filter mappings by name and must scan anonymous mappings. Using a fixed size allows readers to quickly filter candidate mappings by a combination of attributes before checking the signature, avoiding the need to check most mappings in a process. - -The `payload` can optionally be placed after the header (with the `payload` pointer field correctly pointing at it) or optionally elsewhere in the process memory. +The `payload` can optionally be placed after the header (with the `payload` pointer field correctly pointing at it) or elsewhere in the process memory. ### Payload Format @@ -106,27 +104,28 @@ If applicable, these should follow [existing semantic conventions](https://opent Publishing the context should follow these steps: -1. **Drop existing mapping**: If a previous context was published, unmap/free it -2. **Allocate new mapping**: Create a 2-page anonymous mapping via `mmap()` (These pages are always zeroed by Linux) -3. **Prevent fork inheritance**: Apply `madvise(..., MADV_DONTFORK)` to prevent child processes from inheriting stale data -4. **Encode payload**: Serialize the payload message using protobuf (storing it either following the header OR in a separate memory allocation) -5. **Write header fields**: Populate `version`, `published_at_ns`, `payload_size`, `payload` -6. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding -7. **Write signature**: Write `OTEL_CTX` to the signature field last -8. **Set read-only**: Apply `mprotect(..., PROT_READ)` to mark the mapping as read-only -9. **Name mapping**: Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping. This step should be done unconditionally, although naming mappings is not always supported by the kernel. +1. **Check for existing mapping**: If a previous context was published, follow the "Updating Protocol" instead +2. **Allocate new memfd and size it**: Create a new memfd using `memfd_create("OTEL_CTX", ...)`, size it with `ftruncate` +3. **Allocate a new mmap from the memfd then close the memfd**: This makes the memfd show up in `/proc//maps`; afterwards the file descriptor can be closed +4. **If memfd is not available (step 2)**: Fall back to creating a new anonymous mapping using `mmap` and use that instead +5. **Prevent fork inheritance**: Apply `madvise(..., MADV_DONTFORK)` to prevent child processes from inheriting stale data +6. **Encode payload**: Serialize the payload message using protobuf (storing it either following the header OR in a separate memory allocation) +7. **Write header fields**: Populate `version`, `published_at_ns`, `payload_size`, `payload` +8. **Memory barrier**: Use language/compiler-specific techniques to ensure all previous writes complete before proceeding +9. **Write signature**: Write `"OTEL_CTX"` to the signature field last +10. **Name mapping**: Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping. This step should be done unconditionally, although naming mappings is not always supported by the kernel. -The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present and the mapping set to read-only, the entire mapping is considered valid and immutable. +The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present (and thus all fields are non-zero), the entire mapping is considered valid. -If resource attributes are updated during the process lifetime, the previous mapping should be removed before publishing new ones following the same steps. +If resource attributes are updated during the process lifetime, the "Updating Protocol" should be followed. -If any of the steps above fail (other than naming the mapping on older Linux versions), publication is considered to have failed, and the process context will not be available. +If any of the steps above fail (other than naming or allocating a new memfd), publication is considered to have failed, and the process context will not be available. The process context is treated as a singleton: there SHOULD NOT be more than one process context active for the same process. The context MAY be dropped during SDK shutdown, or kept around until the process itself terminates and the OS takes care of cleaning the process memory. -Naming the mapping is only available on Linux 5.17+ when the `CONFIG_ANON_VMA_NAME` feature on the kernel is enabled. Many Linux distributions such as Ubuntu and Arch enable it. On earlier kernel versions or kernels without the feature, the `prctl` call with return an error which should be ignored. The reading protocol specified below is able to work regardless on naming being available. +Naming the mapping is only available on Linux 5.17+ when the `CONFIG_ANON_VMA_NAME` feature on the kernel is enabled. Many Linux distributions such as Ubuntu and Arch enable it. On earlier kernel versions or kernels without the feature, the `prctl` call will return an error which should be ignored. The reading protocol specified below is able to work regardless of naming being available. Note that on legacy kernels and those without `CONFIG_ANON_VMA_NAME` it's possible, using eBPF, [to optionally hook on `prctl`](https://github.com/ivoanjo/proc-level-demo/tree/main/ebpf-program) naming attempts as a way of detecting new mappings being published. For this reason, this step should always be done even if the publisher somehow is aware that naming will not be successful on the current system. @@ -134,18 +133,17 @@ Note that on legacy kernels and those without `CONFIG_ANON_VMA_NAME` it's possib External readers (such as the OpenTelemetry eBPF Profiler) discover and read process context as follows: -1. **Locate mapping**: - - **Preferred** (Linux 5.17+ with `CONFIG_ANON_VMA_NAME`): Parse `/proc//maps` and search for read-only entries with name `[anon:OTEL_CTX]` - - **Fallback**: Parse `/proc//maps` and search for anonymous read-only mappings exactly 2 pages in size, then read the first 8 bytes to check for the `"OTEL_CTX"` signature +1. **Locate mapping**: Parse `/proc//maps` and search for entries with name starting with `[anon_shmem:OTEL_CTX]` or `/memfd:OTEL_CTX` 2. **Validate signature and version**: - Read the header and verify first 8 bytes matches `OTEL_CTX` - Read the version field and verify it is supported (currently `2`) + - Verify that all other fields are non-zero (indicating the context is currently being written) - If either check fails, skip this mapping 3. **Read payload**: Read `payload_size` bytes starting after the header -4. **Re-read header**: If the header has not changed, the read of header + payload is consistent. This ensures there were no concurrent changes to the process context. If the header changed, restart at 1. +4. **Re-read header**: If the header has not changed, the read of header + payload is consistent. This ensures there were no concurrent changes to the process context. If the header changed, restart at 2. 5. **Decode payload**: Deserialize the bytes as a Protocol Buffer payload message @@ -153,6 +151,20 @@ External readers (such as the OpenTelemetry eBPF Profiler) discover and read pro Readers SHOULD gracefully handle missing, incomplete, or invalid mappings. If a process does not publish context or if decoding fails, readers SHOULD fall back to default resource detection mechanisms. +### Updating Protocol + +When the resource attributes change, the process context mapping should be updated following these steps: + +1. **Prepare new payload**: Serialize the new payload message +2. **Signal update start**: Write `0` to the `published_at_ns` field. This signals to readers that an update is in progress (readers verify this field is non-zero). +3. **Memory barrier**: Ensure the write to `published_at_ns` is visible before proceeding. +4. **Update payload fields**: Update the `payload` pointer and `payload_size` fields to point to the new payload. +5. **Memory barrier**: Ensure the payload fields are updated before finalizing the timestamp. +6. **Signal update complete**: Write the new timestamp to `published_at_ns`. +7. **Name mapping**: Re-issue the `prctl(PR_SET_VMA, ...)` call to name the mapping. This step should be done unconditionally, although naming mappings is not always supported by the kernel. + +As the reader checks `published_at_ns` before and after reading the payload, it will detect concurrent updates and avoid concurrency issues. + ### Interaction with Existing Functionality This mechanism is additive and does not modify existing OpenTelemetry SDK behavior: @@ -185,7 +197,7 @@ When a process forks, child processes do not inherit the parent's process contex Creating memory mappings and managing them adds complexity to SDK implementations. -**Mitigation**: We've created a reference implementation in [C/C++](https://github.com/open-telemetry/sig-profiling/pull/23), as well as a [demo OTEL Java SDK extension](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo) and a [Go port as well](https://github.com/DataDog/dd-trace-go/pull/3937). +**Mitigation**: We've created a reference implementation in [C/C++](https://github.com/open-telemetry/sig-profiling/tree/main/process-context/c-and-cpp), as well as a [demo OTEL Java SDK extension](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo) and a [Go port as well](https://github.com/DataDog/dd-trace-go/pull/3937). For Go as well as modern versions of Java it's possible to create an implementation that doesn't rely on third-party libraries or native code (e.g. by directly calling into the OS or libc). Older versions of Java will need to rely on building the C/C++ into a Java native library. @@ -201,13 +213,12 @@ As requirements evolve, we may need to extend the payload format. **Mitigation**: The design includes both versioning as well as allowing extension points -1. **Additional `attributes` keys**: These can be used for experimentation and vendor extensions -2. **New protobuf fields**: For adding recommended fields following protobuf compatibility practices -3. **Version number**: For incompatible changes (not expected to change frequently) +1. **Additional `attributes` keys**: The `Resource` message can carry custom attributes +2. **Version number**: For incompatible changes (not expected to change frequently) ### Memory Overhead -Each process publishes a 2-page (typically 8KB) mapping per SDK instance + the amount of memory needed for the payload, expected to also be in the KB range. +Each process publishes a small (typically one memory page) mapping + the amount of memory needed for the payload, expected to also be in the KB range. ### Payload Format Choice @@ -258,7 +269,7 @@ Both approaches demonstrate the need for process-level data sharing and validate - Child processes inherit the parent's global variables after `fork()`, potentially exposing stale resource attributes until overwritten - Requires polling to detect context publishing and changes - **Why rejected**: The anonymous mapping approach is more universally accessible across languages and build configurations. + **Why rejected**: The mapping approach is more universally accessible across languages and build configurations. 2. Environment Variables @@ -325,7 +336,7 @@ Both approaches demonstrate the need for process-level data sharing and validate - Requires polling to detect context publishing and changes - File-based not compatible with services deployed on read-only filesystems - **Why rejected**: The technical and operational complexities (especially regarding lifecycle, `fork()` and access control) outweigh the benefits over anonymous memory mappings. + **Why rejected**: The technical and operational complexities (especially regarding lifecycle, `fork()` and access control) outweigh the benefits over memory mappings. ## Open questions @@ -339,7 +350,7 @@ Both approaches demonstrate the need for process-level data sharing and validate The following proof-of-concept implementations demonstrate feasibility across multiple languages: -- **[anonmapping-clib](https://github.com/open-telemetry/sig-profiling/pull/23)**: Complete reference implementation in C/C++ with protobuf payload +- **[process-context-c-and-cpp](https://github.com/open-telemetry/sig-profiling/tree/main/process-context/c-and-cpp)**: Complete reference implementation in C/C++ with protobuf payload - **[otel-java-extension-demo](https://github.com/ivoanjo/proc-level-demo/tree/main/otel-java-extension-demo)**: OTEL Java SDK extension for automatic publication - **[anonmapping-java](https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-java)**: Pure Java implementation using FFM (no dependencies) - **[ebpf-program](https://github.com/ivoanjo/proc-level-demo/tree/main/ebpf-program)**: Example eBPF program demonstrating event-driven publishing detection From 5e1956a5f21f7c819b2910e8299fa6297703d697 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Thu, 18 Dec 2025 15:35:09 +0000 Subject: [PATCH 18/18] Minor: Make it clear why the fallback is there --- oteps/profiles/4719-process-ctx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oteps/profiles/4719-process-ctx.md b/oteps/profiles/4719-process-ctx.md index 2c126e9c2b5..a413369c133 100644 --- a/oteps/profiles/4719-process-ctx.md +++ b/oteps/profiles/4719-process-ctx.md @@ -107,7 +107,7 @@ Publishing the context should follow these steps: 1. **Check for existing mapping**: If a previous context was published, follow the "Updating Protocol" instead 2. **Allocate new memfd and size it**: Create a new memfd using `memfd_create("OTEL_CTX", ...)`, size it with `ftruncate` 3. **Allocate a new mmap from the memfd then close the memfd**: This makes the memfd show up in `/proc//maps`; afterwards the file descriptor can be closed -4. **If memfd is not available (step 2)**: Fall back to creating a new anonymous mapping using `mmap` and use that instead +4. **If memfd is not available (step 2)**: If system security restrictions disallow memfd, fall back to creating a new anonymous mapping using `mmap` and use that instead 5. **Prevent fork inheritance**: Apply `madvise(..., MADV_DONTFORK)` to prevent child processes from inheriting stale data 6. **Encode payload**: Serialize the payload message using protobuf (storing it either following the header OR in a separate memory allocation) 7. **Write header fields**: Populate `version`, `published_at_ns`, `payload_size`, `payload`