Frequently Asked Questions

Why use a hypervisor instead of modifying the kernel for determinism?

The kernel can't do this because the kernel is part of the system under test.

Trust boundary problem: If you modify the kernel to be deterministic, you're no longer testing with a real kernel. Kernel bugs (scheduler races, memory pressure, OOM killer behavior) affect real distributed systems. You want to test with those, not around them.
Instruction-level control: True determinism requires controlling CPU-level non-determinism - RDTSC, RDRAND, instruction timing, memory ordering, interrupt delivery. The kernel can intercept syscalls, but it can't intercept what the CPU does between syscalls. A hypervisor (especially TCG/emulation mode) controls every instruction.
Unmodified binaries: The hypervisor approach tests your actual production binary. Kernel approaches require recompilation, special libraries, or LD_PRELOAD hacks - so you're not testing what you ship.
Multi-node is natural: A hypervisor manages multiple VMs with virtual networks trivially. Kernel-based approaches struggle with true multi-node simulation where you need to control message ordering across processes.
Time virtualization: Hypervisors virtualize TSC/HPET completely. The kernel can hook clock_gettime() but applications can still RDTSC directly, and you can't fake time for the kernel's own timers.

Can't the kernel provide determinism for testing while running normally in production?

This argument assumes you can swap out non-determinism cleanly. But the bugs you're hunting ARE the non-deterministic behaviors.

Race conditions manifest FROM scheduler non-determinism. If you replace the scheduler with a deterministic one, you're not triggering the same interleavings that cause production bugs. You want to explore different scheduler behaviors, not eliminate them.
The kernel can't control its own inputs. The hypervisor controls what the kernel sees (time, interrupts, I/O completion order) while letting the kernel respond naturally. A "deterministic kernel" changes how the kernel responds - different thing entirely.
You'd be testing a different system. Replace the scheduler, timer subsystem, random pool, memory allocator with deterministic versions and you've written a new kernel. Your tests pass, production still fails.

The right mental model:

Hypervisor = control the environment (inputs), observe real system behavior
Kernel hooks = change the system, hope it's "close enough"

The hypervisor approach is: "let me replay exactly what happened." The kernel approach is: "let me prevent it from happening non-deterministically." Only one of those finds production bugs.

What about TLA+ prophecy variables? Couldn't those make the kernel deterministic?

In TLA+, prophecy variables allow modeling non-deterministic choices by predetermining outcomes. Theoretically, you could modify the kernel scheduler (and other non-deterministic components) to accept these prophecy variables as inputs.

The hypervisor approach achieves the same result more elegantly: the kernel does NOT need to be modified.

The hypervisor controls inputs at the hardware boundary:

Timer interrupts (TSC, HPET) delivered at exact virtual cycles
RDRAND/RDSEED trapped and replaced with seeded values
I/O completion interrupts delivered in controlled order
Network packets delivered deterministically

Given identical inputs, the unmodified kernel scheduler makes identical decisions. The "prophecy" is encoded in the controlled hardware inputs, and the kernel's normal algorithm follows deterministically.

Why hypervisor beats kernel-level prophecy variables:

Approach	Modification	Completeness
Hypervisor	Minimal, at hardware boundary	Complete by construction
Kernel prophecy	Every non-deterministic subsystem	Must identify all sources

The hypervisor approach is the minimum modification that achieves complete determinism. Kernel prophecy variables would require maximum modification for the same result.

Does Bloodhound modify the guest kernel?

Yes, but subtractively rather than additively.

What we do (subtractive):

CONFIG_SMP=n           # Single CPU only
CONFIG_PREEMPT_NONE=y  # No preemption
# No hardware RNG - use hypervisor-provided randomness

We remove kernel features that are hard to control from the hypervisor.

What prophecy variables would require (additive):

Modify scheduler to accept schedule input
Modify allocator to accept allocation decisions
Thread prophecy variables through every subsystem

Why subtractive is more practical:

Still running real kernel code paths, just fewer of them
Config change vs. code change - much simpler
Production can run same kernel with SMP=y - algorithms are identical, just more parallelism

The hypervisor still does the heavy lifting. We took pragmatic shortcuts that limit what we can test (no SMP bugs) but kept the kernel largely unmodified.

What is SMP?

SMP = Symmetric Multi-Processing

Multiple CPU cores that share memory and can execute code simultaneously.

Single CPU:          SMP (4 cores):
┌─────┐              ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│ CPU │              │ CPU │ │ CPU │ │ CPU │ │ CPU │
└──┬──┘              └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
   │                    └───────┴───┬───┴───────┘
   │                                │
┌──┴──┐              ┌──────────────┴──────────────┐
│ RAM │              │         Shared RAM          │
└─────┘              └─────────────────────────────┘

Why SMP introduces hard-to-control non-determinism:

Memory ordering: Core 1 writes A then B. Core 2 might see B before A due to CPU caching/store buffers.
Cache coherence races: Both cores read X, both increment, both write back. You get X+1 instead of X+2.
Atomic operation timing: compare_and_swap on two cores hitting the same cacheline - who wins?

This happens inside the CPU, below the instruction level. Even a hypervisor can't see it without simulating the entire CPU microarchitecture.

Can Bloodhound find SMP/multi-core bugs?

No. This is a fundamental limitation of the hypervisor approach without a full CPU simulator.

What Bloodhound can test:

Bug Type	Supported
Distributed systems (network, timing)	Yes
Single-core race conditions (thread interleaving)	Yes
SMP memory ordering, lock-free bugs	No

Why SMP is so hard: You'd need to simulate the CPU's memory model - cache coherence, store buffers, memory barriers. That's not hypervisor-level anymore, that's a full CPU simulator. The non-determinism is below the instruction level.

The pragmatic choice: Run single vCPU (CONFIG_SMP=n). You can still test thread interleaving (scheduler switches between threads on one core), just not true parallel memory access bugs. This covers the vast majority of distributed systems bugs.

What about TLB-related non-determinism?

TLB (Translation Lookaside Buffer) = cache of virtual-to-physical address translations. On real hardware:

TLB state depends on access history
TLB miss → page table walk → different timing
Eviction policy is microarchitectural (not visible to software)
Can cause non-deterministic page faults, timing variations

Antithesis reportedly had to patch their guest kernel to work around TLB non-determinism.

QEMU TCG mode solves this automatically.

TCG (Tiny Code Generator) = software CPU emulation. There's no real hardware TLB - QEMU does address translation in software (softmmu). Same inputs → same translations → deterministic.

Real Hardware:              QEMU TCG:
┌─────────────────┐         ┌─────────────────┐
│ CPU             │         │ Host CPU        │
│ ┌─────────────┐ │         │                 │
│ │ Hardware TLB│ │         │ ┌─────────────┐ │
│ │ (opaque,    │ │         │ │ Software TLB│ │
│ │  non-det)   │ │         │ │ (code, det) │ │
│ └─────────────┘ │         │ └─────────────┘ │
└─────────────────┘         └─────────────────┘

The tradeoff: TCG is ~10-100x slower than KVM, but we get determinism "for free" on things like TLB behavior, without kernel patches.

Why not just use KVM for better performance?

KVM (Kernel-based Virtual Machine) runs guest code directly on the CPU using hardware virtualization (VT-x/AMD-V). It's fast but:

Real CPU non-determinism leaks through: TLB behavior, branch prediction, speculative execution timing
Can't control instruction-level timing: KVM doesn't count instructions
Hardware RNG accessible: Guest can still execute RDRAND

QEMU TCG mode interprets every guest instruction in software. Slower, but:

Full control over timing (instruction counting)
No hardware microarchitecture non-determinism
Can trap and replace any instruction (RDRAND, RDTSC)

For deterministic simulation testing, correctness trumps performance. A 60-second simulation completing in 6 seconds (TCG) vs 0.6 seconds (KVM) is acceptable when it guarantees reproducibility.

How does Bloodhound compare to other testing approaches?

Aspect	Traditional Testing	Jepsen	Bloodhound (DST)
Timing bugs	Hard to reproduce	Statistical	Deterministic
Coverage	Random/manual	Random with analysis	Systematic exploration
Reproduction	"Works on my machine"	Sometimes	Same seed = same bug
Detection	Manual assertions	Consistency checks	Automatic property checks
Kernel tested	Real	Real	Real (mostly)
Speed	Fast	Medium	Slower (TCG overhead)

When to use Bloodhound:

Distributed systems with complex failure modes
Need to reproduce timing-sensitive bugs
Want systematic exploration of failure scenarios

When NOT to use Bloodhound:

Low-level lock-free data structures (need SMP)
Pure performance testing
Simple unit tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequently Asked Questions

Why use a hypervisor instead of modifying the kernel for determinism?

Can't the kernel provide determinism for testing while running normally in production?

What about TLA+ prophecy variables? Couldn't those make the kernel deterministic?

Does Bloodhound modify the guest kernel?

What is SMP?

Can Bloodhound find SMP/multi-core bugs?

What about TLB-related non-determinism?

Why not just use KVM for better performance?

How does Bloodhound compare to other testing approaches?

Further Reading

FilesExpand file tree

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions

Why use a hypervisor instead of modifying the kernel for determinism?

Can't the kernel provide determinism for testing while running normally in production?

What about TLA+ prophecy variables? Couldn't those make the kernel deterministic?

Does Bloodhound modify the guest kernel?

What is SMP?

Can Bloodhound find SMP/multi-core bugs?

What about TLB-related non-determinism?

Why not just use KVM for better performance?

How does Bloodhound compare to other testing approaches?

Further Reading