Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 16 additions & 13 deletions docs/hyperlight-metrics-logs-and-traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,27 +137,30 @@ runs `no_std` which `opentelemetry` doesn't know.

##### Guest

When the guest starts executing the `entrypoint` function, it receives a `max_log_level` parameter that tells the guest what kind of logging level is expected from it.
When the guest starts executing the `entrypoint` function, it receives a `max_log_level` parameter that tells the guest what logging level the host expects.

The `trace_guest` logic takes advantage of this parameter and when the `max_log_level` is `trace`, it allocates a custom made `GuestSubscriber` that implements the `Subscriber`
trait from `tracing_core` that allows defining a subscriber for the `tracing` crate to handle new spans and events.
The `trace_guest` logic uses this parameter. When `max_log_level` is not `Off`, the guest installs a custom `GuestSubscriber` that implements the `Subscriber` trait from `tracing_core`, so it receives every span and event created through the `tracing` crate.

This custom subscriber stores the spans and events in a buffer initialized only when tracing is enabled. For each new span and event, a method is called on the custom subscriber which not only stores the data, but also keeps track of the hierarchy and dependencies between the other spans/events.
**NOTE**: The spans/events attributes are truncated to fit in the allocated buffer.
The subscriber writes spans and events into a fixed-size buffer in guest memory. Each record keeps its parent link so the host can rebuild the hierarchy. Every span and event also records the current `TSC` value, which the host later turns into wall-clock timestamps.

When the storage space is filled, the guest triggers a VM Exit that sends the guest pointers to the host. The host can access the guest memory, get the data and parse it to create the `spans` and `events` using the `opentelemetry` crate which allows specifying the starting and ending timestamps
which are captured in the guest using the `TSC`.

To improve performance, for each VMExit, the guest adds metadata for the host to be able to report the tracing data and free space.
The guest delivers a batch by triggering a VM exit on the dedicated `TraceBatch` port. It passes the buffer pointer in `r9`, its length in `r10`, and the `TraceBatch` magic in `r8`. A batch is sent in two cases:
* The buffer fills up.
* The guest calls `flush()`. This happens after guest initialization, at the end of each guest function call, and on a deliberate abort.

##### Host

When a guest exits, the host checks for metadata from the guest reporting tracing data.
If tracing data is found, the host starts parsing it and reconstructing a tree which represents the spans hierarchy.
The host reads a batch only on the VM exit that carries it, the `TraceBatch` port exit. It reads the pointer and length from the guest registers, walks the guest page tables to translate the guest virtual address to a physical address (needed because copy-on-write means guest pages are not identity-mapped), and reads the buffer.

The host decodes the batch and rebuilds the span hierarchy. It creates one `opentelemetry` span or event per guest record, using the `TSC` values to set explicit start and end times. After decoding a batch that leaves a guest span open, the host opens one `call-to-host` span parented to that guest span. It captures host work until the next VM exit, so work done between guest exits shows as a child of the active guest span.

A batch that fails to decode is logged and skipped. Trace data is advisory, so the guest keeps running.

Additionally, the host also adds new children `span`s to the guest's reported active span, emphasizing the spans created on the host as a result of a temporary VM Exit. This helps visualize a call into the guest with context propagated across the VM boundary.
#### Limitations

The host creates `opentelemetry` spans and events for each guest span and event reported.
* Batches are delivered only on the `TraceBatch` exit, on buffer overflow or `flush()`. Normal function returns and guest initialization flush. Deliberate and out-of-memory aborts attempt a flush, but `flush()` allocates while closing spans, so an out-of-memory abort may lose its final batch. Panic and exception aborts use a raw abort path that does not flush, so their final buffered batch is lost.
* A record larger than the buffer produces a batch that exceeds the host size limit, so the host drops that whole batch.
* Guest tracing is gated at runtime by `max_log_level` (`Off` disables it) and at compile time by the `trace_guest` feature.
* The host reconstructs guest spans with `opentelemetry`, so enabling `trace_guest` on the host pulls in the `opentelemetry` crates.

### Inspecting Guest memory Trace Files (for mem_profile)

Expand Down
40 changes: 3 additions & 37 deletions src/hyperlight_guest/src/arch/amd64/exit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,44 +16,10 @@ limitations under the License.

use core::arch::asm;

#[cfg(feature = "trace_guest")]
use hyperlight_common::outb::OutBAction;

/// OUT function for sending a 32-bit value to the host.
/// `out32` can be called from an exception context, so we must be careful
/// with the tracing state that might be locked at that time.
/// The tracing state calls `try_lock` internally to avoid deadlocks.
/// Furthermore, the instrument macro is not used here to avoid creating spans
/// in exception contexts. Because if the trace state is already locked, trying to create a span
/// would cause a panic, which is undesirable in exception handling.
/// OUT function for sending a 32-bit value to the host on the given port.
/// A pure I/O function that issues a single OUT instruction, safe to call
/// from an exception context.
pub(crate) unsafe fn out32(port: u16, val: u32) {
#[cfg(feature = "trace_guest")]
{
if let Some((ptr, len)) = hyperlight_guest_tracing::serialized_data() {
// If tracing is enabled and there is data to send, send it along with the OUT action
unsafe {
asm!("out dx, eax",
in("dx") port,
in("eax") val,
in("r8") OutBAction::TraceBatch as u64,
in("r9") ptr,
in("r10") len,
options(preserves_flags, nostack)
)
};

// Reset the trace state after sending the batch
// This clears all existing spans/events ensuring a clean state for the next operations
// The trace state is expected to be flushed before this call
hyperlight_guest_tracing::reset();
} else {
// If tracing is not enabled, just send the value
unsafe {
asm!("out dx, eax", in("dx") port, in("eax") val, options(preserves_flags, nostack))
};
}
}
#[cfg(not(feature = "trace_guest"))]
unsafe {
asm!("out dx, eax", in("dx") port, in("eax") val, options(preserves_flags, nostack));
}
Expand Down
13 changes: 8 additions & 5 deletions src/hyperlight_guest/src/exit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,11 @@ pub extern "C" fn abort() -> ! {

/// Exits the VM with an Abort OUT action and a specific code.
pub fn abort_with_code(code: &[u8]) -> ! {
// End any ongoing trace before aborting
// Send the buffered trace batch to the host before aborting. `flush`
// closes open spans, which allocates. An abort caused by allocation
// failure may lose the batch here.
#[cfg(all(feature = "trace_guest", target_arch = "x86_64"))]
hyperlight_guest_tracing::end_trace();
hyperlight_guest_tracing::flush();
outb(OutBAction::Abort as u16, code);
outb(OutBAction::Abort as u16, &[0xFF]); // send abort terminator (if not included in code)
unreachable!()
Expand All @@ -44,9 +46,11 @@ pub fn abort_with_code(code: &[u8]) -> ! {
/// # Safety
/// This function is unsafe because it dereferences a raw pointer.
pub unsafe fn abort_with_code_and_message(code: &[u8], message_ptr: *const c_char) -> ! {
// End any ongoing trace before aborting
// Send the buffered trace batch to the host before aborting. `flush`
// closes open spans, which allocates. An abort caused by allocation
// failure may lose the batch here.
#[cfg(all(feature = "trace_guest", target_arch = "x86_64"))]
hyperlight_guest_tracing::end_trace();
hyperlight_guest_tracing::flush();
unsafe {
// Step 1: Send abort code (typically 1 byte, but `code` allows flexibility)
outb(OutBAction::Abort as u16, code);
Expand Down Expand Up @@ -75,7 +79,6 @@ pub fn write_abort(code: &[u8]) {

/// OUT bytes to the host through multiple exits.
pub(crate) fn outb(port: u16, data: &[u8]) {
// Ensure all tracing data is flushed before sending OUT bytes
unsafe {
let mut i = 0;
while i < data.len() {
Expand Down
91 changes: 1 addition & 90 deletions src/hyperlight_guest_tracing/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,8 @@ mod subscriber;
#[cfg(feature = "trace")]
mod visitor;

/// Type to get the relevant information from the internal state
/// and expose it to the host
#[cfg(feature = "trace")]
pub use state::TraceBatchInfo;
#[cfg(feature = "trace")]
pub use trace::{
end_trace, flush, init_guest_tracing, is_trace_enabled, new_call, reset, serialized_data,
};
pub use trace::{flush, init_guest_tracing, is_trace_enabled, new_call};

/// This module is gated because some of these types are also used on the host, but we want
/// only the guest to allocate and allow the functionality intended for the guest.
Expand Down Expand Up @@ -77,36 +71,6 @@ mod trace {
let _ = tracing_core::dispatcher::set_global_default(tracing_core::Dispatch::new(sub));
}

/// Ends the current trace by ending all active spans in the
/// internal state and storing the end timestamps.
///
/// This expects an outb call to send the spans to the host.
/// After calling this function, the internal state is marked
/// for cleaning on the next access.
///
/// NOTE: Panics if unable to lock the guest state.
pub fn end_trace() {
if let Some(w) = GUEST_STATE.get()
&& let Some(state_mutex) = w.upgrade()
{
// We want to protect against re-entrancy issues produced by tracing code that locks
// the state and then causes an exception that tries to lock the state again.
//
// For example:
// - 1. A span is created, locking the state
// - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.)
// - 3. The exception handler uses the tracing API to send the trace data to the host
// or just create spans/events for logging purposes.
// - 4. The tracing API tries to lock the state again, causing a deadlock.
// To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal
// the issue.
let mut state = state_mutex
.try_lock()
.expect("guest_tracing: Unable to lock guest tracing state in `end_trace`");
state.end_trace();
}
}

/// Flushes the current trace data to prepare it for reading by the host.
/// NOTE: Panics if unable to lock the guest state.
pub fn flush() {
Expand Down Expand Up @@ -158,59 +122,6 @@ mod trace {
}
}

/// Cleans the internal trace state by removing closed spans and events.
/// This ensures that after a VM exit, we keep the spans that
/// are still active (in the stack) and remove all other spans and events.
/// NOTE: Panics if unable to lock the guest state.
pub fn reset() {
if let Some(w) = GUEST_STATE.get()
&& let Some(state_mutex) = w.upgrade()
{
// We want to protect against re-entrancy issues produced by tracing code that locks
// the state and then causes an exception that tries to lock the state again.
//
// For example:
// - 1. A span is created, locking the state
// - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.)
// - 3. The exception handler uses the tracing API to send the trace data to the host
// or just create spans/events for logging purposes.
// - 4. The tracing API tries to lock the state again, causing a deadlock.
// To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal
// the issue.
let mut state = state_mutex
.try_lock()
.expect("Unable to lock GuestState in `reset`");

state.reset();
}
}

/// Returns information about the current trace state needed by the host to read the spans.
pub fn serialized_data() -> Option<(u64, u64)> {
if let Some(w) = GUEST_STATE.get()
&& let Some(state_mutex) = w.upgrade()
{
// We want to protect against re-entrancy issues produced by tracing code that locks
// the state and then causes an exception that tries to lock the state again.
//
// For example:
// - 1. A span is created, locking the state
// - 2. An exception occurs while the span is being created (e.g. not enough memory, etc.)
// - 3. The exception handler uses the tracing API to send the trace data to the host
// or just create spans/events for logging purposes.
// - 4. The tracing API tries to lock the state again, causing a deadlock.
// To avoid this, we use try_lock and if we cannot acquire the lock, we panic to signal
// the issue.
let state = state_mutex
.try_lock()
.expect("Unable to lock GuestState in `serialized_data`");

state.serialized_data()
} else {
None
}
}

/// Returns true if tracing is enabled (the guest tracing state is initialized).
pub fn is_trace_enabled() -> bool {
GUEST_STATE
Expand Down
22 changes: 1 addition & 21 deletions src/hyperlight_guest_tracing/src/state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,6 @@ use tracing_core::span::{Attributes, Id, Record};
use crate::invariant_tsc;
use crate::visitor::FieldsVisitor;

pub struct TraceBatchInfo {
pub serialized_data: Vec<u8>,
}

/// Internal state of the tracing subscriber
pub(crate) struct GuestState {
/// Encoder for events
Expand All @@ -57,6 +53,7 @@ fn send_to_host(data: &[u8]) {
in("r8") OutBAction::TraceBatch as u64,
in("r9") data.as_ptr() as u64,
in("r10") data.len() as u64,
options(nostack, preserves_flags),
);
}
}
Expand Down Expand Up @@ -104,12 +101,6 @@ impl GuestState {
.encode(&GuestEvent::GuestStart { tsc: start_tsc });
}

/// Reset the trace state, clearing all existing spans and events
/// This is called after the trace has been flushed to the host
pub(crate) fn reset(&mut self) {
self.encoder.reset();
}

/// Closes the trace by ending all spans
/// NOTE: This expects an outb call to send the spans to the host.
pub(crate) fn end_trace(&mut self) {
Expand All @@ -126,17 +117,6 @@ impl GuestState {
}
}

/// Return (ptr, len) for serialized data if any is available
pub(crate) fn serialized_data(&self) -> Option<(u64, u64)> {
let data = self.encoder.finish();

if data.is_empty() {
None
} else {
Some((data.as_ptr() as u64, data.len() as u64))
}
}

/// Create a new span and push it on the stack
pub(crate) fn new_span(&mut self, attrs: &Attributes) -> Id {
let (idn, id) = self.alloc_id();
Expand Down
12 changes: 8 additions & 4 deletions src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -622,11 +622,15 @@ impl HyperlightVm {
#[cfg(feature = "trace_guest")]
{
tc.end_host_trace();
// Handle the guest trace data if any
let regs = self.vm.regs().map_err(RunVmError::GetRegs)?;

// Only parse the trace if it has reported
if tc.has_trace_data(&regs) {
// Guest trace batches arrive on the dedicated `TraceBatch`
// port. Read the batch only on the exit that carries it. A
// parse failure is logged and execution continues, since
// trace data is advisory to the guest.
if let Ok(VmExit::IoOut(port, _)) = &result
&& *port == hyperlight_common::outb::OutBAction::TraceBatch as u16
{
let regs = self.vm.regs().map_err(RunVmError::GetRegs)?;
let root_pt = self.get_root_pt().map_err(RunVmError::PageTableAccess)?;

// If something goes wrong with parsing the trace data, we log the error and
Expand Down
5 changes: 0 additions & 5 deletions src/hyperlight_host/src/sandbox/trace/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -198,11 +198,6 @@ impl TraceContext {
+ Duration::from_micros(rel_start_us as u64))
}

/// Check if the registers indicate that there is trace data to be handled.
pub fn has_trace_data(&self, regs: &CommonRegisters) -> bool {
regs.r8 == OutBAction::TraceBatch as u64
}

pub fn handle_trace(
&mut self,
regs: &CommonRegisters,
Expand Down
Loading