Skip to content

Latest commit

 

History

History
780 lines (567 loc) · 17.3 KB

File metadata and controls

780 lines (567 loc) · 17.3 KB

OnyxMesh Testing Guide

This guide covers testing OnyxMesh networks using simulation, containers, and real hardware.


Table of Contents

  1. Unit and Integration Tests
  2. Simulation Testing
  3. Container-Based Testing
  4. Real Hardware Testing
  5. Security Audit Testing
  6. Performance Benchmarking

Unit and Integration Tests

Run All Tests

# Full test suite with race detection
make test

# Unit tests only (faster)
make test-unit

# Integration tests
make test-integration

# Simulation tests
make test-simulation

# Generate coverage report
make test-coverage
open coverage.html

Fuzz Testing

# Run built-in fuzz tests (30 seconds each)
make fuzz

# Run extended wire format fuzzing (1 minute each)
make fuzz-wire

# Extended fuzzing session (recommended before release)
go test -fuzz=FuzzPulseFrame -fuzztime=10m ./test/fuzz/
go test -fuzz=FuzzPacketHeader -fuzztime=10m ./test/fuzz/
go test -fuzz=FuzzHandshakeAnnounce -fuzztime=10m ./test/fuzz/

Simulation Testing

The onyx-sim tool provides a complete network simulation environment without requiring real hardware.

Quick Start

# Build the simulator
make build

# Create a 5-node mesh network
./bin/onyx-sim create --nodes 5 --topology mesh
# Output includes simulation ID, e.g., sim-1770596481378879623

# Run simulation for 5 minutes (use the ID from create output)
./bin/onyx-sim run --id sim-1770596481378879623 --duration 5m

Simulation Scenarios

Basic Mesh Formation

# Create network with different topologies
./bin/onyx-sim create --nodes 10 --topology mesh    # Fully connected mesh
./bin/onyx-sim create --nodes 10 --topology ring    # Ring topology
./bin/onyx-sim create --nodes 10 --topology star    # Star topology
./bin/onyx-sim create --nodes 10 --topology random  # Random connections

# Start simulation (use --id from create output)
./bin/onyx-sim run --id <sim-id> --duration 10m --metrics

Network Partition Testing

# Create a network and simulate partition
./bin/onyx-sim create --nodes 10 --topology mesh
# Note the simulation ID from output, e.g., sim-xxx

# Run simulation
./bin/onyx-sim run --id <sim-id> --duration 2m

# Partition the network (nodes 0-4 vs 5-9)
./bin/onyx-sim partition --id <sim-id> --groups "0,1,2,3,4" "5,6,7,8,9"

# Wait and observe recovery
./bin/onyx-sim run --id <sim-id> --duration 2m

# Heal the partition
./bin/onyx-sim heal --id <sim-id>

# Verify recovery
./bin/onyx-sim run --id <sim-id> --duration 2m --verify-connectivity

Chaos Testing

# Enable chaos mode (random link failures, latency spikes)
./bin/onyx-sim create --nodes 20 --topology mesh
# Note the simulation ID from output

./bin/onyx-sim run --id <sim-id> --duration 10m --chaos \
    --chaos-link-failure-rate 0.1 \
    --chaos-latency-spike-rate 0.05 \
    --chaos-node-failure-rate 0.02

Device Class Simulation

# Simulate heterogeneous fleet
./bin/onyx-sim create --nodes 20 --topology mesh \
    --device-classes "EMBEDDED:8,STANDARD:6,POWERFUL:4,COMPUTE:2"
# Note the simulation ID from output

# Run with device-aware routing
./bin/onyx-sim run --id <sim-id> --duration 10m --verify-routing

Simulation Metrics

# Enable metrics collection (use --id from create output)
./bin/onyx-sim run --id <sim-id> --duration 5m --metrics --metrics-file sim-metrics.json

# View real-time metrics
./bin/onyx-sim run --id <sim-id> --duration 5m --metrics --metrics-interval 10s

# Watch metrics in real-time (in another terminal)
./bin/onyx-sim metrics --watch

Metrics collected:

  • Message delivery rate
  • Average latency
  • Route convergence time
  • Partition detection time
  • Recovery time after partition heal

Viewing Simulation Logs

# View recent events (hides tick events by default)
./bin/onyx-sim logs --tail 50

# Follow logs in real-time
./bin/onyx-sim logs --follow

# Verbose mode: show message, discovery, and routing events
./bin/onyx-sim logs --verbose --follow

# Filter by event type
./bin/onyx-sim logs --type message_sent --verbose
./bin/onyx-sim logs --type route_added --verbose
./bin/onyx-sim logs --type discovery_pulse --verbose

# Show all events including tick (noisy)
./bin/onyx-sim logs --no-tick=false

Event types:

  • Standard events (always visible):

    • link_failure, node_slowdown, node_failure
    • partition_created, partition_healed
    • packet_spike
  • Verbose events (use --verbose to see):

    • message_sent, message_received, message_dropped
    • discovery_pulse, neighbor_discovered, neighbor_lost
    • route_added, route_updated, route_removed
    • handshake_start, handshake_complete

Color coding in terminal:

  • Green: message sent/received
  • Red: message dropped, failures
  • Cyan: discovery events
  • Yellow: routing events
  • Magenta: partition events
  • Blue: handshake events

Container-Based Testing

Container-based testing provides isolated, reproducible environments that closely match production deployments. Two pre-built 5-node clusters are available: an open-mode cluster for general testing and a restricted-mode cluster for testing quorum-based admission.

Prerequisites

# Install Podman and podman-compose (rootless container runtime)
sudo pacman -S podman podman-compose  # Arch/Manjaro
# or
sudo apt install podman podman-compose  # Debian/Ubuntu

# Verify installation
podman --version

Open-Mode Cluster

A 5-node heterogeneous mesh on an IPv6-enabled bridge network (172.28.0.0/24). Nodes discover each other via Echo-Pulse multicast on ff02::1.

Container Device Class Role
onyx-node-1 POWERFUL Aggregation hub
onyx-node-2 STANDARD Mesh relay
onyx-node-3 EMBEDDED Leaf sensor
onyx-node-4 STANDARD Mesh relay
onyx-node-5 COMPUTE Fleet coordinator
# Build containers and start the cluster
make mesh-up

# Rebuild only onyxd and restart (fast iteration)
make mesh-quick

# Check cluster health
make mesh-status

# Follow logs from all nodes
make mesh-logs

# Live packet capture on a node (tcpdump-like)
make mesh-traffic NODE=2

# Interactive TUI traffic monitor (iptraf-like, with topology diagram)
make mesh-traffic-ui NODE=1

# Traffic stats for a node
podman exec onyx-node-1 onyxd stats

# Data-plane flood test
make mesh-flood FLOOD_RATE=5000 FLOOD_SIZE=256 FLOOD_DUR=10s

# Chaos/stress test (partition, heal, flood, RPC throughput)
make mesh-stress

# Stop cluster and remove volumes
make mesh-down

Restricted-Mode Cluster

A separate 5-node cluster (172.29.0.0/24) for testing quorum-based admission. Seed nodes (1-3) bootstrap with genesis certificates, then joining nodes (4-5) go through the admission protocol requiring 2/3 quorum approval.

# Start all 5 nodes with staged admission (seeds first, then joiners)
make mesh-admission-up

# Follow logs
make mesh-admission-logs

# Live packet capture / TUI on a restricted node
make mesh-admission-traffic NODE=1
make mesh-admission-traffic-ui NODE=1

# Data-plane flood on restricted cluster
make mesh-admission-flood FLOOD_RATE=5000 FLOOD_SIZE=256 FLOOD_DUR=10s

# Chaos/stress test (expects pre-running cluster)
make mesh-admission-stress

# Full end-to-end admission test (seed + join + verify + cleanup)
make mesh-admission

# Stop restricted-mode cluster
make mesh-admission-down

Container Network Fault Injection

# Simulate network partition (disconnect a node from the bridge)
podman network disconnect onyxnet onyx-node-2

# Observe partition detection
podman exec onyx-node-1 onyxd neighbors
podman exec onyx-node-3 onyxd neighbors

# Wait for partition detection
sleep 30

# Reconnect
podman network connect onyxnet onyx-node-2

# Verify recovery
podman exec onyx-node-1 onyxd neighbors

The mesh-stress and mesh-admission-stress targets automate this: they disconnect a node, verify the partition, reconnect, verify recovery, then run flood and RPC throughput tests.

Container-Based Security Audit

# Quick audit against container cluster
make forge-quick

# Standard audit (1 hour)
make forge-standard

# Full audit (8 hours, recommended before production)
make forge-full

Real Hardware Testing

Supported Hardware

Device Device Class RAM Notes
Raspberry Pi Zero 2 W EMBEDDED 512 MB Leaf nodes, sensors
Raspberry Pi 3B/3B+ STANDARD 1 GB General mesh nodes
Raspberry Pi 4B POWERFUL 2-8 GB Aggregation hubs
Raspberry Pi 5 POWERFUL 4-8 GB High-performance hubs
Jetson Orin Nano COMPUTE 8 GB AI inference nodes
Jetson Orin NX COMPUTE 16 GB Fleet coordinator
BeagleBone AI-64 POWERFUL 4 GB Industrial applications

Flashing Yocto Images

# Build Yocto image for RPi 4
cd yocto
source oe-init-build-env
MACHINE=raspberrypi4-64 bitbake onyxmesh-image-standard

# Flash to SD card
sudo dd if=tmp/deploy/images/raspberrypi4-64/onyxmesh-image-standard.wic \
    of=/dev/sdX bs=4M status=progress
sync

Manual Binary Installation

For testing on existing Linux installations:

# Cross-compile for target architecture
# ARM64 (RPi 4/5, Jetson)
make build-arm64

# ARMv7 (RPi 3, BeagleBone)
make build-armv7

# Copy to device
scp bin/onyxd-arm64 pi@raspberrypi:/usr/local/bin/onyxd
scp bin/onyxctl-arm64 pi@raspberrypi:/usr/local/bin/onyxctl
scp configs/onyxd.toml pi@raspberrypi:/etc/onyxmesh/onyxd.toml

# On the device, start daemon
ssh pi@raspberrypi
sudo systemctl enable --now onyxd

Hardware Test Network Setup

Minimal 3-Node Test (WiFi)

Required hardware:

  • 3x Raspberry Pi (any model)
  • WiFi network or ad-hoc mode
# On each Pi, configure WiFi
sudo nmcli dev wifi connect "TestNetwork" password "password123"

# Edit configuration
sudo nano /etc/onyxmesh/onyxd.toml

Node 1 (POWERFUL - RPi 4):

[identity]
fleet_id = "test-fleet-001"
device_class = "POWERFUL"

[discovery]
pulse_interval = "2s"

[link.wifi]
enabled = true
interface = "wlan0"

Node 2 (STANDARD - RPi 3):

[identity]
fleet_id = "test-fleet-001"
device_class = "STANDARD"

[discovery]
pulse_interval = "2s"

[link.wifi]
enabled = true
interface = "wlan0"

Node 3 (EMBEDDED - RPi Zero 2 W):

[identity]
fleet_id = "test-fleet-001"
device_class = "EMBEDDED"

[discovery]
pulse_interval = "3s"  # Slower for embedded

[link.wifi]
enabled = true
interface = "wlan0"
# Start daemon on each node
sudo systemctl start onyxd

# Verify mesh formation (on any node)
onyxctl status
onyxctl neighbors
onyxctl routes

Mixed-Link Test (WiFi + Serial)

For testing multi-adapter scenarios:

# Connect two Pis via serial (UART)
# Pi 1 TX -> Pi 2 RX
# Pi 1 RX -> Pi 2 TX
# Pi 1 GND -> Pi 2 GND

Configuration with serial link:

[link.wifi]
enabled = true
interface = "wlan0"

[link.serial]
enabled = true
device = "/dev/ttyAMA0"
baud_rate = 115200

LoRa Link Test

For long-range testing with LoRa modules:

# Hardware: SX1276/SX1278 LoRa module connected via SPI
[link.lora]
enabled = true
spi_device = "/dev/spidev0.0"
frequency = 915000000  # 915 MHz (US), 868 MHz (EU)
spreading_factor = 7
bandwidth = 125000
coding_rate = 5
tx_power = 17

Hardware Test Scenarios

Scenario 1: Mesh Formation

# Start all nodes
# On coordinator (POWERFUL node)
onyxctl fleet status

# Expected output:
# Fleet: test-fleet-001
# Nodes: 3
# - fc00::1234 (POWERFUL) - online
# - fc00::5678 (STANDARD) - online
# - fc00::9abc (EMBEDDED) - online

Scenario 2: Message Routing

# On node 1, publish message
onyxctl pub onyx.test.hello --data '{"msg":"Hello from node 1"}'

# On node 3, subscribe
onyxctl sub onyx.test.hello

# Verify multi-hop routing
onyxctl routes show fc00::9abc

Scenario 3: Link Failure Recovery

# Disable WiFi on middle node (simulating failure)
ssh pi@node2 "sudo ip link set wlan0 down"

# On node 1, check routing
onyxctl neighbors  # Node 2 should disappear
onyxctl routes     # Routes should update

# Verify messages still reach node 3 (if alternate path exists)
onyxctl pub onyx.test.failover --data '{"test":"failover"}'

# Re-enable WiFi
ssh pi@node2 "sudo ip link set wlan0 up"

# Verify recovery
onyxctl neighbors

Scenario 4: Partition and Recovery

# Create partition by isolating node 3
ssh pi@node3 "sudo iptables -A INPUT -s 192.168.1.0/24 -j DROP"
ssh pi@node3 "sudo iptables -A OUTPUT -d 192.168.1.0/24 -j DROP"

# Monitor on node 1
watch -n 1 onyxctl neighbors

# Wait for partition detection (30-60 seconds)

# Heal partition
ssh pi@node3 "sudo iptables -F"

# Verify recovery
onyxctl fleet status

Scenario 5: Container Deployment

# On COMPUTE or POWERFUL node
onyxctl swarm skills list

# Deploy a container
onyxctl swarm deploy my-skill:latest --target fc00::1234

# Check deployment
onyxctl swarm status

# Migrate workload
onyxctl swarm migrate container-id --to fc00::5678

Hardware Performance Testing

# Run benchmarks on device
go test -bench=. -benchmem ./pkg/crypto/...

# Expected results (RPi 4):
# BenchmarkMLDSASign-4         150     8.2ms/op
# BenchmarkMLDSAVerify-4       180     6.5ms/op
# BenchmarkAESGCMEncrypt-4    5000   245us/op
# BenchmarkSHA3256-4          8000   148us/op

Security Audit Testing

Quick Audit (5 minutes)

# Against local daemon
make forge-quick

# Against remote fleet
./bin/onyx-forge run --profile forge-quick \
    --target 192.168.1.10:9100,192.168.1.11:9100

Standard Audit (1 hour)

make forge-standard

# Or with custom options
./bin/onyx-forge run --profile forge-standard \
    --modules bruteforce,fuzzer,replay,timing,dos \
    --output forge-results.json

Full Audit (8 hours)

make forge-full

# Recommended before production deployment
./bin/onyx-forge run --profile forge-full \
    --target production-fleet \
    --output forge-full-results.json \
    --html-report forge-full-report.html

Continuous Monitoring

# Enable in onyxd.toml
[forge]
enabled = true
profile = "forge-continuous"

# Or run standalone
./bin/onyx-forge run --profile forge-continuous \
    --alert-webhook https://alerts.example.com/webhook

Interpreting Results

# Check for failures
./bin/onyx-forge report check forge-results.json --fail-on high

# Generate HTML report
./bin/onyx-forge report --format html --output report.html forge-results.json

# View summary
./bin/onyx-forge report summary forge-results.json

Expected passing criteria:

  • Key entropy: > 250 bits effective entropy
  • Nonce uniqueness: 0 collisions in 10,000 samples
  • Fuzzer crashes: 0
  • Replay success rate: 0%
  • Timing variance: < 5% for crypto operations

Performance Benchmarking

Crypto Benchmarks

# Run crypto benchmarks
make bench

# Extended benchmarks
go test -bench=. -benchmem -benchtime=10s ./pkg/crypto/...
go test -bench=. -benchmem -benchtime=10s ./test/bench/...

Routing Benchmarks

go test -bench=BenchmarkRouteCompute -benchmem ./test/bench/...
go test -bench=BenchmarkRouteLookup -benchmem ./test/bench/...

Network Throughput

# Using simulation
./bin/onyx-sim create --nodes 10 --topology mesh
# Note the simulation ID from output

./bin/onyx-sim bench --id <sim-id> --duration 60s --message-size 1024 --rate 1000

# Output:
# Throughput: 850 msg/s
# Latency p50: 12ms
# Latency p99: 45ms
# Delivery rate: 99.2%

Memory Profiling

# Run with memory profiling
go test -bench=. -memprofile=mem.prof ./pkg/crypto/...
go tool pprof mem.prof

# Check RSS during operation
./bin/onyxd --config configs/onyxd.toml &
watch -n 1 "ps -o rss= -p $(pgrep onyxd) | awk '{print \$1/1024 \" MB\"}'"

Troubleshooting

Common Issues

Mesh not forming:

# Check network connectivity
ping -c 3 <other-node-ip>

# Check firewall
sudo iptables -L -n

# Check daemon logs
journalctl -u onyxd -f

High latency:

# Check link quality
onyxctl neighbors --detailed

# Check route path
onyxctl routes trace fc00::destination

Container deployment fails:

# Check Podman
podman info

# Check image availability
onyxctl swarm images

# Check target node resources
onyxctl fleet resources fc00::target

Debug Mode

# Run daemon in debug mode
ONYX_LOG_LEVEL=debug ./bin/onyxd --config configs/onyxd.toml

# Live packet capture (tcpdump-like)
onyxd traffic

# Filtered packet capture (by layer, type, source, destination)
onyxd traffic -layer routing -src fc00:a1b2

# Interactive TUI with layer stats, top talkers, and topology diagram
onyxd traffic -ui

# Traffic counters
onyxd stats
onyxd stats -json

Test Checklist

Before deployment, verify:

  • Unit tests pass: make test-unit
  • Integration tests pass: make test-integration
  • Fuzz tests pass: make fuzz
  • Simulation tests pass: make test-simulation
  • Container cluster forms mesh correctly
  • Hardware nodes discover each other
  • Messages route across multiple hops
  • Partition detection and recovery works
  • OnyxForge quick audit passes
  • Performance meets targets for device class
  • Memory usage stays within limits