diff --git a/docs/hyperlight-metrics-logs-and-traces.md b/docs/hyperlight-metrics-logs-and-traces.md index 8120e2d44..03e28b684 100644 --- a/docs/hyperlight-metrics-logs-and-traces.md +++ b/docs/hyperlight-metrics-logs-and-traces.md @@ -137,27 +137,30 @@ runs `no_std` which `opentelemetry` doesn't know. ##### Guest -When the guest starts executing the `entrypoint` function, it receives a `max_log_level` parameter that tells the guest what kind of logging level is expected from it. +When the guest starts executing the `entrypoint` function, it receives a `max_log_level` parameter that tells the guest what logging level the host expects. -The `trace_guest` logic takes advantage of this parameter and when the `max_log_level` is `trace`, it allocates a custom made `GuestSubscriber` that implements the `Subscriber` -trait from `tracing_core` that allows defining a subscriber for the `tracing` crate to handle new spans and events. +The `trace_guest` logic uses this parameter. When `max_log_level` is not `Off`, the guest installs a custom `GuestSubscriber` that implements the `Subscriber` trait from `tracing_core`, so it receives every span and event created through the `tracing` crate. -This custom subscriber stores the spans and events in a buffer initialized only when tracing is enabled. For each new span and event, a method is called on the custom subscriber which not only stores the data, but also keeps track of the hierarchy and dependencies between the other spans/events. -**NOTE**: The spans/events attributes are truncated to fit in the allocated buffer. +The subscriber writes spans and events into a fixed-size buffer in guest memory. Each record keeps its parent link so the host can rebuild the hierarchy. Every span and event also records the current `TSC` value, which the host later turns into wall-clock timestamps. -When the storage space is filled, the guest triggers a VM Exit that sends the guest pointers to the host. The host can access the guest memory, get the data and parse it to create the `spans` and `events` using the `opentelemetry` crate which allows specifying the starting and ending timestamps -which are captured in the guest using the `TSC`. - -To improve performance, for each VMExit, the guest adds metadata for the host to be able to report the tracing data and free space. +The guest delivers a batch by triggering a VM exit on the dedicated `TraceBatch` port. It passes the buffer pointer in `r9`, its length in `r10`, and the `TraceBatch` magic in `r8`. A batch is sent in two cases: +* The buffer fills up. +* The guest calls `flush()`. This happens after guest initialization, at the end of each guest function call, and on a deliberate abort. ##### Host -When a guest exits, the host checks for metadata from the guest reporting tracing data. -If tracing data is found, the host starts parsing it and reconstructing a tree which represents the spans hierarchy. +The host reads a batch only on the VM exit that carries it, the `TraceBatch` port exit. It reads the pointer and length from the guest registers, walks the guest page tables to translate the guest virtual address to a physical address (needed because copy-on-write means guest pages are not identity-mapped), and reads the buffer. + +The host decodes the batch and rebuilds the span hierarchy. It creates one `opentelemetry` span or event per guest record, using the `TSC` values to set explicit start and end times. After decoding a batch that leaves a guest span open, the host opens one `call-to-host` span parented to that guest span. It captures host work until the next VM exit, so work done between guest exits shows as a child of the active guest span. + +A batch that fails to decode is logged and skipped. Trace data is advisory, so the guest keeps running. -Additionally, the host also adds new children `span`s to the guest's reported active span, emphasizing the spans created on the host as a result of a temporary VM Exit. This helps visualize a call into the guest with context propagated across the VM boundary. +#### Limitations -The host creates `opentelemetry` spans and events for each guest span and event reported. +* Batches are delivered only on the `TraceBatch` exit, on buffer overflow or `flush()`. Normal function returns and guest initialization flush. Deliberate and out-of-memory aborts attempt a flush, but `flush()` allocates while closing spans, so an out-of-memory abort may lose its final batch. Panic and exception aborts use a raw abort path that does not flush, so their final buffered batch is lost. +* A record larger than the buffer produces a batch that exceeds the host size limit, so the host drops that whole batch. +* Guest tracing is gated at runtime by `max_log_level` (`Off` disables it) and at compile time by the `trace_guest` feature. +* The host reconstructs guest spans with `opentelemetry`, so enabling `trace_guest` on the host pulls in the `opentelemetry` crates. ### Inspecting Guest memory Trace Files (for mem_profile) diff --git a/src/hyperlight_guest/src/arch/amd64/exit.rs b/src/hyperlight_guest/src/arch/amd64/exit.rs index a680cb359..e494cd966 100644 --- a/src/hyperlight_guest/src/arch/amd64/exit.rs +++ b/src/hyperlight_guest/src/arch/amd64/exit.rs @@ -16,44 +16,10 @@ limitations under the License. use core::arch::asm; -#[cfg(feature = "trace_guest")] -use hyperlight_common::outb::OutBAction; - -/// OUT function for sending a 32-bit value to the host. -/// `out32` can be called from an exception context, so we must be careful -/// with the tracing state that might be locked at that time. -/// The tracing state calls `try_lock` internally to avoid deadlocks. -/// Furthermore, the instrument macro is not used here to avoid creating spans -/// in exception contexts. Because if the trace state is already locked, trying to create a span -/// would cause a panic, which is undesirable in exception handling. +/// OUT function for sending a 32-bit value to the host on the given port. +/// A pure I/O function that issues a single OUT instruction, safe to call +/// from an exception context. pub(crate) unsafe fn out32(port: u16, val: u32) { - #[cfg(feature = "trace_guest")] - { - if let Some((ptr, len)) = hyperlight_guest_tracing::serialized_data() { - // If tracing is enabled and there is data to send, send it along with the OUT action - unsafe { - asm!("out dx, eax", - in("dx") port, - in("eax") val, - in("r8") OutBAction::TraceBatch as u64, - in("r9") ptr, - in("r10") len, - options(preserves_flags, nostack) - ) - }; - - // Reset the trace state after sending the batch - // This clears all existing spans/events ensuring a clean state for the next operations - // The trace state is expected to be flushed before this call - hyperlight_guest_tracing::reset(); - } else { - // If tracing is not enabled, just send the value - unsafe { - asm!("out dx, eax", in("dx") port, in("eax") val, options(preserves_flags, nostack)) - }; - } - } - #[cfg(not(feature = "trace_guest"))] unsafe { asm!("out dx, eax", in("dx") port, in("eax") val, options(preserves_flags, nostack)); } diff --git a/src/hyperlight_guest/src/exit.rs b/src/hyperlight_guest/src/exit.rs index c7e724791..535d0378d 100644 --- a/src/hyperlight_guest/src/exit.rs +++ b/src/hyperlight_guest/src/exit.rs @@ -31,9 +31,11 @@ pub extern "C" fn abort() -> ! { /// Exits the VM with an Abort OUT action and a specific code. pub fn abort_with_code(code: &[u8]) -> ! { - // End any ongoing trace before aborting + // Send the buffered trace batch to the host before aborting. `flush` + // closes open spans, which allocates. An abort caused by allocation + // failure may lose the batch here. #[cfg(all(feature = "trace_guest", target_arch = "x86_64"))] - hyperlight_guest_tracing::end_trace(); + hyperlight_guest_tracing::flush(); outb(OutBAction::Abort as u16, code); outb(OutBAction::Abort as u16, &[0xFF]); // send abort terminator (if not included in code) unreachable!() @@ -44,9 +46,11 @@ pub fn abort_with_code(code: &[u8]) -> ! { /// # Safety /// This function is unsafe because it dereferences a raw pointer. pub unsafe fn abort_with_code_and_message(code: &[u8], message_ptr: *const c_char) -> ! { - // End any ongoing trace before aborting + // Send the buffered trace batch to the host before aborting. `flush` + // closes open spans, which allocates. An abort caused by allocation + // failure may lose the batch here. #[cfg(all(feature = "trace_guest", target_arch = "x86_64"))] - hyperlight_guest_tracing::end_trace(); + hyperlight_guest_tracing::flush(); unsafe { // Step 1: Send abort code (typically 1 byte, but `code` allows flexibility) outb(OutBAction::Abort as u16, code); @@ -75,7 +79,6 @@ pub fn write_abort(code: &[u8]) { /// OUT bytes to the host through multiple exits. pub(crate) fn outb(port: u16, data: &[u8]) { - // Ensure all tracing data is flushed before sending OUT bytes unsafe { let mut i = 0; while i < data.len() { diff --git a/src/hyperlight_guest_tracing/src/lib.rs b/src/hyperlight_guest_tracing/src/lib.rs index 6fae94925..c219a3b5e 100644 --- a/src/hyperlight_guest_tracing/src/lib.rs +++ b/src/hyperlight_guest_tracing/src/lib.rs @@ -32,14 +32,8 @@ mod subscriber; #[cfg(feature = "trace")] mod visitor; -/// Type to get the relevant information from the internal state -/// and expose it to the host #[cfg(feature = "trace")] -pub use state::TraceBatchInfo; -#[cfg(feature = "trace")] -pub use trace::{ - end_trace, flush, init_guest_tracing, is_trace_enabled, new_call, reset, serialized_data, -}; +pub use trace::{flush, init_guest_tracing, is_trace_enabled, new_call}; /// This module is gated because some of these types are also used on the host, but we want /// only the guest to allocate and allow the functionality intended for the guest. @@ -77,36 +71,6 @@ mod trace { let _ = tracing_core::dispatcher::set_global_default(tracing_core::Dispatch::new(sub)); } - /// Ends the current trace by ending all active spans in the - /// internal state and storing the end timestamps. - /// - /// This expects an outb call to send the spans to the host. - /// After calling this function, the internal state is marked - /// for cleaning on the next access. - /// - /// NOTE: Panics if unable to lock the guest state. - pub fn end_trace() { - if let Some(w) = GUEST_STATE.get() - && let Some(state_mutex) = w.upgrade() - { - // We want to protect against re-entrancy issues produced by tracing code that locks - // the state and then causes an exception that tries to lock the state again. - // - // For example: - // - 1. A span is created, locking the state - // - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.) - // - 3. The exception handler uses the tracing API to send the trace data to the host - // or just create spans/events for logging purposes. - // - 4. The tracing API tries to lock the state again, causing a deadlock. - // To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal - // the issue. - let mut state = state_mutex - .try_lock() - .expect("guest_tracing: Unable to lock guest tracing state in `end_trace`"); - state.end_trace(); - } - } - /// Flushes the current trace data to prepare it for reading by the host. /// NOTE: Panics if unable to lock the guest state. pub fn flush() { @@ -158,59 +122,6 @@ mod trace { } } - /// Cleans the internal trace state by removing closed spans and events. - /// This ensures that after a VM exit, we keep the spans that - /// are still active (in the stack) and remove all other spans and events. - /// NOTE: Panics if unable to lock the guest state. - pub fn reset() { - if let Some(w) = GUEST_STATE.get() - && let Some(state_mutex) = w.upgrade() - { - // We want to protect against re-entrancy issues produced by tracing code that locks - // the state and then causes an exception that tries to lock the state again. - // - // For example: - // - 1. A span is created, locking the state - // - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.) - // - 3. The exception handler uses the tracing API to send the trace data to the host - // or just create spans/events for logging purposes. - // - 4. The tracing API tries to lock the state again, causing a deadlock. - // To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal - // the issue. - let mut state = state_mutex - .try_lock() - .expect("Unable to lock GuestState in `reset`"); - - state.reset(); - } - } - - /// Returns information about the current trace state needed by the host to read the spans. - pub fn serialized_data() -> Option<(u64, u64)> { - if let Some(w) = GUEST_STATE.get() - && let Some(state_mutex) = w.upgrade() - { - // We want to protect against re-entrancy issues produced by tracing code that locks - // the state and then causes an exception that tries to lock the state again. - // - // For example: - // - 1. A span is created, locking the state - // - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.) - // - 3. The exception handler uses the tracing API to send the trace data to the host - // or just create spans/events for logging purposes. - // - 4. The tracing API tries to lock the state again, causing a deadlock. - // To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal - // the issue. - let state = state_mutex - .try_lock() - .expect("Unable to lock GuestState in `serialized_data`"); - - state.serialized_data() - } else { - None - } - } - /// Returns true if tracing is enabled (the guest tracing state is initialized). pub fn is_trace_enabled() -> bool { GUEST_STATE diff --git a/src/hyperlight_guest_tracing/src/state.rs b/src/hyperlight_guest_tracing/src/state.rs index 01ad896c8..77cbf5491 100644 --- a/src/hyperlight_guest_tracing/src/state.rs +++ b/src/hyperlight_guest_tracing/src/state.rs @@ -29,10 +29,6 @@ use tracing_core::span::{Attributes, Id, Record}; use crate::invariant_tsc; use crate::visitor::FieldsVisitor; -pub struct TraceBatchInfo { - pub serialized_data: Vec, -} - /// Internal state of the tracing subscriber pub(crate) struct GuestState { /// Encoder for events @@ -57,6 +53,7 @@ fn send_to_host(data: &[u8]) { in("r8") OutBAction::TraceBatch as u64, in("r9") data.as_ptr() as u64, in("r10") data.len() as u64, + options(nostack, preserves_flags), ); } } @@ -104,12 +101,6 @@ impl GuestState { .encode(&GuestEvent::GuestStart { tsc: start_tsc }); } - /// Reset the trace state, clearing all existing spans and events - /// This is called after the trace has been flushed to the host - pub(crate) fn reset(&mut self) { - self.encoder.reset(); - } - /// Closes the trace by ending all spans /// NOTE: This expects an outb call to send the spans to the host. pub(crate) fn end_trace(&mut self) { @@ -126,17 +117,6 @@ impl GuestState { } } - /// Return (ptr, len) for serialized data if any is available - pub(crate) fn serialized_data(&self) -> Option<(u64, u64)> { - let data = self.encoder.finish(); - - if data.is_empty() { - None - } else { - Some((data.as_ptr() as u64, data.len() as u64)) - } - } - /// Create a new span and push it on the stack pub(crate) fn new_span(&mut self, attrs: &Attributes) -> Id { let (idn, id) = self.alloc_id(); diff --git a/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs b/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs index 56f7d635d..3300c913c 100644 --- a/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs +++ b/src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs @@ -622,11 +622,15 @@ impl HyperlightVm { #[cfg(feature = "trace_guest")] { tc.end_host_trace(); - // Handle the guest trace data if any - let regs = self.vm.regs().map_err(RunVmError::GetRegs)?; - // Only parse the trace if it has reported - if tc.has_trace_data(®s) { + // Guest trace batches arrive on the dedicated `TraceBatch` + // port. Read the batch only on the exit that carries it. A + // parse failure is logged and execution continues, since + // trace data is advisory to the guest. + if let Ok(VmExit::IoOut(port, _)) = &result + && *port == hyperlight_common::outb::OutBAction::TraceBatch as u16 + { + let regs = self.vm.regs().map_err(RunVmError::GetRegs)?; let root_pt = self.get_root_pt().map_err(RunVmError::PageTableAccess)?; // If something goes wrong with parsing the trace data, we log the error and diff --git a/src/hyperlight_host/src/sandbox/trace/context.rs b/src/hyperlight_host/src/sandbox/trace/context.rs index fbf5817b3..a9e897242 100644 --- a/src/hyperlight_host/src/sandbox/trace/context.rs +++ b/src/hyperlight_host/src/sandbox/trace/context.rs @@ -198,11 +198,6 @@ impl TraceContext { + Duration::from_micros(rel_start_us as u64)) } - /// Check if the registers indicate that there is trace data to be handled. - pub fn has_trace_data(&self, regs: &CommonRegisters) -> bool { - regs.r8 == OutBAction::TraceBatch as u64 - } - pub fn handle_trace( &mut self, regs: &CommonRegisters,