From 3060d57a0091fa0559cfb73356b2f775a9710e47 Mon Sep 17 00:00:00 2001 From: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Date: Thu, 2 Jul 2026 16:48:05 -0700 Subject: [PATCH 1/2] Fix duplicate guest trace batches The host stamped the TraceBatch magic in r8 on every OUT and polled r8 on every VM exit. r8 is sticky, so the host re-read the same batch on later exits and emitted duplicate spans. Deliver batches on a dedicated TraceBatch port. The host reads guest registers only on that exit, so each batch is read once. Aborts flush the buffer over the same port so the final batch still reaches the host. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> --- docs/hyperlight-metrics-logs-and-traces.md | 29 +++--- src/hyperlight_guest/src/arch/amd64/exit.rs | 40 +------- src/hyperlight_guest/src/exit.rs | 13 ++- src/hyperlight_guest_tracing/src/lib.rs | 91 +------------------ src/hyperlight_guest_tracing/src/state.rs | 22 +---- .../src/hypervisor/hyperlight_vm/mod.rs | 12 ++- .../src/sandbox/trace/context.rs | 5 - 7 files changed, 37 insertions(+), 175 deletions(-) diff --git a/docs/hyperlight-metrics-logs-and-traces.md b/docs/hyperlight-metrics-logs-and-traces.md index 8120e2d44..a4b203abb 100644 --- a/docs/hyperlight-metrics-logs-and-traces.md +++ b/docs/hyperlight-metrics-logs-and-traces.md @@ -137,27 +137,30 @@ runs `no_std` which `opentelemetry` doesn't know. ##### Guest -When the guest starts executing the `entrypoint` function, it receives a `max_log_level` parameter that tells the guest what kind of logging level is expected from it. +When the guest starts executing the `entrypoint` function, it receives a `max_log_level` parameter that tells the guest what logging level the host expects. -The `trace_guest` logic takes advantage of this parameter and when the `max_log_level` is `trace`, it allocates a custom made `GuestSubscriber` that implements the `Subscriber` -trait from `tracing_core` that allows defining a subscriber for the `tracing` crate to handle new spans and events. +The `trace_guest` logic uses this parameter. When `max_log_level` is not `Off`, the guest installs a custom `GuestSubscriber` that implements the `Subscriber` trait from `tracing_core`, so it receives every span and event created through the `tracing` crate. -This custom subscriber stores the spans and events in a buffer initialized only when tracing is enabled. For each new span and event, a method is called on the custom subscriber which not only stores the data, but also keeps track of the hierarchy and dependencies between the other spans/events. -**NOTE**: The spans/events attributes are truncated to fit in the allocated buffer. +The subscriber writes spans and events into a fixed-size buffer in guest memory. Each record keeps its parent link so the host can rebuild the hierarchy. Every span and event also records the current `TSC` value, which the host later turns into wall-clock timestamps. -When the storage space is filled, the guest triggers a VM Exit that sends the guest pointers to the host. The host can access the guest memory, get the data and parse it to create the `spans` and `events` using the `opentelemetry` crate which allows specifying the starting and ending timestamps -which are captured in the guest using the `TSC`. - -To improve performance, for each VMExit, the guest adds metadata for the host to be able to report the tracing data and free space. +The guest delivers a batch by triggering a VM exit on the dedicated `TraceBatch` port. It passes the buffer pointer in `r9`, its length in `r10`, and the `TraceBatch` magic in `r8`. A batch is sent in two cases: +* The buffer fills up. +* The guest calls `flush()`. This happens after guest initialization, at the end of each guest function call, and on a deliberate abort. ##### Host -When a guest exits, the host checks for metadata from the guest reporting tracing data. -If tracing data is found, the host starts parsing it and reconstructing a tree which represents the spans hierarchy. +The host reads a batch only on the VM exit that carries it, the `TraceBatch` port exit. It reads the pointer and length from the guest registers, walks the guest page tables to translate the guest virtual address to a physical address (needed because copy-on-write means guest pages are not identity-mapped), and reads the buffer. + +The host decodes the batch and rebuilds the span hierarchy. It creates one `opentelemetry` span or event per guest record, using the `TSC` values to set explicit start and end times. It also inserts a `call-to-host` span for each exit into the host, so a call across the VM boundary shows host work as a child of the active guest span. + +A batch that fails to decode is logged and skipped. Trace data is advisory, so the guest keeps running. -Additionally, the host also adds new children `span`s to the guest's reported active span, emphasizing the spans created on the host as a result of a temporary VM Exit. This helps visualize a call into the guest with context propagated across the VM boundary. +#### Limitations -The host creates `opentelemetry` spans and events for each guest span and event reported. +* Batches are delivered only on the `TraceBatch` exit, on buffer overflow or `flush()`. Normal function returns, guest initialization, deliberate aborts, and out-of-memory aborts flush. Panic and exception aborts use a raw abort path that does not flush, so their final buffered batch is lost. +* A record larger than the buffer produces a batch that exceeds the host size limit, so the host drops that whole batch. +* Guest tracing is gated at runtime by `max_log_level` (`Off` disables it) and at compile time by the `trace_guest` feature. +* The host reconstructs guest spans with `opentelemetry`, so enabling `trace_guest` on the host pulls in the `opentelemetry` crates. ### Inspecting Guest memory Trace Files (for mem_profile) diff --git a/src/hyperlight_guest/src/arch/amd64/exit.rs b/src/hyperlight_guest/src/arch/amd64/exit.rs index a680cb359..e494cd966 100644 --- a/src/hyperlight_guest/src/arch/amd64/exit.rs +++ b/src/hyperlight_guest/src/arch/amd64/exit.rs @@ -16,44 +16,10 @@ limitations under the License. use core::arch::asm; -#[cfg(feature = "trace_guest")] -use hyperlight_common::outb::OutBAction; - -/// OUT function for sending a 32-bit value to the host. -/// `out32` can be called from an exception context, so we must be careful -/// with the tracing state that might be locked at that time. -/// The tracing state calls `try_lock` internally to avoid deadlocks. -/// Furthermore, the instrument macro is not used here to avoid creating spans -/// in exception contexts. Because if the trace state is already locked, trying to create a span -/// would cause a panic, which is undesirable in exception handling. +/// OUT function for sending a 32-bit value to the host on the given port. +/// A pure I/O function that issues a single OUT instruction, safe to call +/// from an exception context. pub(crate) unsafe fn out32(port: u16, val: u32) { - #[cfg(feature = "trace_guest")] - { - if let Some((ptr, len)) = hyperlight_guest_tracing::serialized_data() { - // If tracing is enabled and there is data to send, send it along with the OUT action - unsafe { - asm!("out dx, eax", - in("dx") port, - in("eax") val, - in("r8") OutBAction::TraceBatch as u64, - in("r9") ptr, - in("r10") len, - options(preserves_flags, nostack) - ) - }; - - // Reset the trace state after sending the batch - // This clears all existing spans/events ensuring a clean state for the next operations - // The trace state is expected to be flushed before this call - hyperlight_guest_tracing::reset(); - } else { - // If tracing is not enabled, just send the value - unsafe { - asm!("out dx, eax", in("dx") port, in("eax") val, options(preserves_flags, nostack)) - }; - } - } - #[cfg(not(feature = "trace_guest"))] unsafe { asm!("out dx, eax", in("dx") port, in("eax") val, options(preserves_flags, nostack)); } diff --git a/src/hyperlight_guest/src/exit.rs b/src/hyperlight_guest/src/exit.rs index c7e724791..535d0378d 100644 --- a/src/hyperlight_guest/src/exit.rs +++ b/src/hyperlight_guest/src/exit.rs @@ -31,9 +31,11 @@ pub extern "C" fn abort() -> ! { /// Exits the VM with an Abort OUT action and a specific code. pub fn abort_with_code(code: &[u8]) -> ! { - // End any ongoing trace before aborting + // Send the buffered trace batch to the host before aborting. `flush` + // closes open spans, which allocates. An abort caused by allocation + // failure may lose the batch here. #[cfg(all(feature = "trace_guest", target_arch = "x86_64"))] - hyperlight_guest_tracing::end_trace(); + hyperlight_guest_tracing::flush(); outb(OutBAction::Abort as u16, code); outb(OutBAction::Abort as u16, &[0xFF]); // send abort terminator (if not included in code) unreachable!() @@ -44,9 +46,11 @@ pub fn abort_with_code(code: &[u8]) -> ! { /// # Safety /// This function is unsafe because it dereferences a raw pointer. pub unsafe fn abort_with_code_and_message(code: &[u8], message_ptr: *const c_char) -> ! { - // End any ongoing trace before aborting + // Send the buffered trace batch to the host before aborting. `flush` + // closes open spans, which allocates. An abort caused by allocation + // failure may lose the batch here. #[cfg(all(feature = "trace_guest", target_arch = "x86_64"))] - hyperlight_guest_tracing::end_trace(); + hyperlight_guest_tracing::flush(); unsafe { // Step 1: Send abort code (typically 1 byte, but `code` allows flexibility) outb(OutBAction::Abort as u16, code); @@ -75,7 +79,6 @@ pub fn write_abort(code: &[u8]) { /// OUT bytes to the host through multiple exits. pub(crate) fn outb(port: u16, data: &[u8]) { - // Ensure all tracing data is flushed before sending OUT bytes unsafe { let mut i = 0; while i < data.len() { diff --git a/src/hyperlight_guest_tracing/src/lib.rs b/src/hyperlight_guest_tracing/src/lib.rs index 6fae94925..c219a3b5e 100644 --- a/src/hyperlight_guest_tracing/src/lib.rs +++ b/src/hyperlight_guest_tracing/src/lib.rs @@ -32,14 +32,8 @@ mod subscriber; #[cfg(feature = "trace")] mod visitor; -/// Type to get the relevant information from the internal state -/// and expose it to the host #[cfg(feature = "trace")] -pub use state::TraceBatchInfo; -#[cfg(feature = "trace")] -pub use trace::{ - end_trace, flush, init_guest_tracing, is_trace_enabled, new_call, reset, serialized_data, -}; +pub use trace::{flush, init_guest_tracing, is_trace_enabled, new_call}; /// This module is gated because some of these types are also used on the host, but we want /// only the guest to allocate and allow the functionality intended for the guest. @@ -77,36 +71,6 @@ mod trace { let _ = tracing_core::dispatcher::set_global_default(tracing_core::Dispatch::new(sub)); } - /// Ends the current trace by ending all active spans in the - /// internal state and storing the end timestamps. - /// - /// This expects an outb call to send the spans to the host. - /// After calling this function, the internal state is marked - /// for cleaning on the next access. - /// - /// NOTE: Panics if unable to lock the guest state. - pub fn end_trace() { - if let Some(w) = GUEST_STATE.get() - && let Some(state_mutex) = w.upgrade() - { - // We want to protect against re-entrancy issues produced by tracing code that locks - // the state and then causes an exception that tries to lock the state again. - // - // For example: - // - 1. A span is created, locking the state - // - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.) - // - 3. The exception handler uses the tracing API to send the trace data to the host - // or just create spans/events for logging purposes. - // - 4. The tracing API tries to lock the state again, causing a deadlock. - // To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal - // the issue. - let mut state = state_mutex - .try_lock() - .expect("guest_tracing: Unable to lock guest tracing state in `end_trace`"); - state.end_trace(); - } - } - /// Flushes the current trace data to prepare it for reading by the host. /// NOTE: Panics if unable to lock the guest state. pub fn flush() { @@ -158,59 +122,6 @@ mod trace { } } - /// Cleans the internal trace state by removing closed spans and events. - /// This ensures that after a VM exit, we keep the spans that - /// are still active (in the stack) and remove all other spans and events. - /// NOTE: Panics if unable to lock the guest state. - pub fn reset() { - if let Some(w) = GUEST_STATE.get() - && let Some(state_mutex) = w.upgrade() - { - // We want to protect against re-entrancy issues produced by tracing code that locks - // the state and then causes an exception that tries to lock the state again. - // - // For example: - // - 1. A span is created, locking the state - // - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.) - // - 3. The exception handler uses the tracing API to send the trace data to the host - // or just create spans/events for logging purposes. - // - 4. The tracing API tries to lock the state again, causing a deadlock. - // To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal - // the issue. - let mut state = state_mutex - .try_lock() - .expect("Unable to lock GuestState in `reset`"); - - state.reset(); - } - } - - /// Returns information about the current trace state needed by the host to read the spans. - pub fn serialized_data() -> Option<(u64, u64)> { - if let Some(w) = GUEST_STATE.get() - && let Some(state_mutex) = w.upgrade() - { - // We want to protect against re-entrancy issues produced by tracing code that locks - // the state and then causes an exception that tries to lock the state again. - // - // For example: - // - 1. A span is created, locking the state - // - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.) - // - 3. The exception handler uses the tracing API to send the trace data to the host - // or just create spans/events for logging purposes. - // - 4. The tracing API tries to lock the state again, causing a deadlock. - // To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal - // the issue. - let state = state_mutex - .try_lock() - .expect("Unable to lock GuestState in `serialized_data`"); - - state.serialized_data() - } else { - None - } - } - /// Returns true if tracing is enabled (the guest tracing state is initialized). pub fn is_trace_enabled() -> bool { GUEST_STATE diff --git a/src/hyperlight_guest_tracing/src/state.rs b/src/hyperlight_guest_tracing/src/state.rs index 01ad896c8..77cbf5491 100644 --- a/src/hyperlight_guest_tracing/src/state.rs +++ b/src/hyperlight_guest_tracing/src/state.rs @@ -29,10 +29,6 @@ use tracing_core::span::{Attributes, Id, Record}; use crate::invariant_tsc; use crate::visitor::FieldsVisitor; -pub struct TraceBatchInfo { - pub serialized_data: Vec, -} - /// Internal state of the tracing subscriber pub(crate) struct GuestState { /// Encoder for events @@ -57,6 +53,7 @@ fn send_to_host(data: &[u8]) { in("r8") OutBAction::TraceBatch as u64, in("r9") data.as_ptr() as u64, in("r10") data.len() as u64, + options(nostack, preserves_flags), ); } } @@ -104,12 +101,6 @@ impl GuestState { .encode(&GuestEvent::GuestStart { tsc: start_tsc }); } - /// Reset the trace state, clearing all existing spans and events - /// This is called after the trace has been flushed to the host - pub(crate) fn reset(&mut self) { - self.encoder.reset(); - } - /// Closes the trace by ending all spans /// NOTE: This expects an outb call to send the spans to the host. pub(crate) fn end_trace(&mut self) { @@ -126,17 +117,6 @@ impl GuestState { } } - /// Return (ptr, len) for serialized data if any is available - pub(crate) fn serialized_data(&self) -> Option<(u64, u64)> { - let data = self.encoder.finish(); - - if data.is_empty() { - None - } else { - Some((data.as_ptr() as u64, data.len() as u64)) - } - } - /// Create a new span and push it on the stack pub(crate) fn new_span(&mut self, attrs: &Attributes) -> Id { let (idn, id) = self.alloc_id(); diff --git a/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs b/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs index 56f7d635d..3300c913c 100644 --- a/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs +++ b/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs @@ -622,11 +622,15 @@ impl HyperlightVm { #[cfg(feature = "trace_guest")] { tc.end_host_trace(); - // Handle the guest trace data if any - let regs = self.vm.regs().map_err(RunVmError::GetRegs)?; - // Only parse the trace if it has reported - if tc.has_trace_data(®s) { + // Guest trace batches arrive on the dedicated `TraceBatch` + // port. Read the batch only on the exit that carries it. A + // parse failure is logged and execution continues, since + // trace data is advisory to the guest. + if let Ok(VmExit::IoOut(port, _)) = &result + && *port == hyperlight_common::outb::OutBAction::TraceBatch as u16 + { + let regs = self.vm.regs().map_err(RunVmError::GetRegs)?; let root_pt = self.get_root_pt().map_err(RunVmError::PageTableAccess)?; // If something goes wrong with parsing the trace data, we log the error and diff --git a/src/hyperlight_host/src/sandbox/trace/context.rs b/src/hyperlight_host/src/sandbox/trace/context.rs index fbf5817b3..a9e897242 100644 --- a/src/hyperlight_host/src/sandbox/trace/context.rs +++ b/src/hyperlight_host/src/sandbox/trace/context.rs @@ -198,11 +198,6 @@ impl TraceContext { + Duration::from_micros(rel_start_us as u64)) } - /// Check if the registers indicate that there is trace data to be handled. - pub fn has_trace_data(&self, regs: &CommonRegisters) -> bool { - regs.r8 == OutBAction::TraceBatch as u64 - } - pub fn handle_trace( &mut self, regs: &CommonRegisters, From 9c871f5f00b50ce856e3b8eff1168eadd83c4bbf Mon Sep 17 00:00:00 2001 From: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Date: Thu, 2 Jul 2026 18:27:57 -0700 Subject: [PATCH 2/2] docs: correct call-to-host span and OOM flush wording Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> --- docs/hyperlight-metrics-logs-and-traces.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/hyperlight-metrics-logs-and-traces.md b/docs/hyperlight-metrics-logs-and-traces.md index a4b203abb..03e28b684 100644 --- a/docs/hyperlight-metrics-logs-and-traces.md +++ b/docs/hyperlight-metrics-logs-and-traces.md @@ -151,13 +151,13 @@ The guest delivers a batch by triggering a VM exit on the dedicated `TraceBatch` The host reads a batch only on the VM exit that carries it, the `TraceBatch` port exit. It reads the pointer and length from the guest registers, walks the guest page tables to translate the guest virtual address to a physical address (needed because copy-on-write means guest pages are not identity-mapped), and reads the buffer. -The host decodes the batch and rebuilds the span hierarchy. It creates one `opentelemetry` span or event per guest record, using the `TSC` values to set explicit start and end times. It also inserts a `call-to-host` span for each exit into the host, so a call across the VM boundary shows host work as a child of the active guest span. +The host decodes the batch and rebuilds the span hierarchy. It creates one `opentelemetry` span or event per guest record, using the `TSC` values to set explicit start and end times. After decoding a batch that leaves a guest span open, the host opens one `call-to-host` span parented to that guest span. It captures host work until the next VM exit, so work done between guest exits shows as a child of the active guest span. A batch that fails to decode is logged and skipped. Trace data is advisory, so the guest keeps running. #### Limitations -* Batches are delivered only on the `TraceBatch` exit, on buffer overflow or `flush()`. Normal function returns, guest initialization, deliberate aborts, and out-of-memory aborts flush. Panic and exception aborts use a raw abort path that does not flush, so their final buffered batch is lost. +* Batches are delivered only on the `TraceBatch` exit, on buffer overflow or `flush()`. Normal function returns and guest initialization flush. Deliberate and out-of-memory aborts attempt a flush, but `flush()` allocates while closing spans, so an out-of-memory abort may lose its final batch. Panic and exception aborts use a raw abort path that does not flush, so their final buffered batch is lost. * A record larger than the buffer produces a batch that exceeds the host size limit, so the host drops that whole batch. * Guest tracing is gated at runtime by `max_log_level` (`Off` disables it) and at compile time by the `trace_guest` feature. * The host reconstructs guest spans with `opentelemetry`, so enabling `trace_guest` on the host pulls in the `opentelemetry` crates.