A lightweight Linux platform validation and stress-testing automation framework built in Python. It uses stress-ng to exercise CPU, memory, and disk/IO subsystems, evaluates results against configurable pass/fail thresholds, and produces structured JSON reports with CI-ready exit codes.
This project simulates the type of platform validation and automation work performed by Systems Test Engineers validating SoC platforms (e.g., Qualcomm Snapdragon) in production environments.
- Project Goal
- Architecture
- Module Reference
- Prerequisites
- Installation
- Usage
- Configuration
- How Each Test Works
- Report Format
- Recorded Test Outputs
- Observations and Learnings
- CI/CD Integration
- Extending the Framework
Build a modular, production-quality platform validation framework that:
- Stress-tests core hardware subsystems (CPU, memory, disk/IO) using
stress-ng. - Parses performance metrics (bogo ops/sec) robustly using regex — not fragile string splits.
- Evaluates results against configurable pass/fail thresholds.
- Reports structured JSON results to disk for automated consumption.
- Integrates with CI/CD pipelines via proper exit codes (
0= PASS,1= FAIL). - Collects system hardware and runtime information as part of each report.
- Maintains clean modular architecture — each test, utility, and concern lives in its own module.
The framework mirrors real-world validation workflows where hardware platforms are stress-tested after manufacturing, firmware updates, or kernel changes, and results are evaluated automatically.
linux-platform-validator/
│
├── main.py # CLI entry point — argparse, orchestration, exit codes
├── config.py # Thresholds, worker counts, default parameters
├── report.py # JSON report generation and disk persistence
├── requirements.txt # Dependencies (stress-ng is the only external dep)
├── README.md # This file
│
├── tests/ # Stress test modules (one per subsystem)
│ ├── __init__.py
│ ├── cpu_test.py # CPU stress test → run_cpu_test()
│ ├── memory_test.py # Memory stress test → run_memory_test()
│ └── disk_test.py # Disk/IO stress test → run_disk_test()
│
├── utils/ # Shared utilities
│ ├── __init__.py
│ ├── runner.py # Safe subprocess execution wrapper
│ ├── parser.py # Regex-based stress-ng output parser
│ └── system_info.py # Structured system information collector
│
└── reports/ # Generated at runtime
└── report.json # Latest validation report
CLI args (main.py)
│
├─→ run_cpu_test() ─→ runner.py ─→ stress-ng --cpu ─→ parser.py ─→ result dict
├─→ run_memory_test() ─→ runner.py ─→ stress-ng --vm ─→ parser.py ─→ result dict
├─→ run_disk_test() ─→ runner.py ─→ stress-ng --hdd ─→ parser.py ─→ result dict
│
└─→ generate_report()
├─→ get_system_info()
├─→ Aggregate results + overall PASS/FAIL
├─→ Write reports/report.json
└─→ Return report dict → print to console → exit code
Entry point. Parses CLI arguments with argparse, selects which tests to run, orchestrates execution, invokes the report generator, prints a summary table to the console, and returns a CI-ready exit code via sys.exit().
Central configuration. All thresholds, worker counts, allocation sizes, and report paths are defined here. No magic numbers exist anywhere else in the codebase.
| Parameter | Value | Description |
|---|---|---|
CPU_MIN_SCORE |
1500 | Minimum CPU bogo ops/sec to PASS |
MEMORY_MIN_SCORE |
1000 | Minimum memory bogo ops/sec to PASS |
DISK_MIN_SCORE |
500 | Minimum disk bogo ops/sec to PASS |
DEFAULT_DURATION |
10 | Default test duration (seconds) |
CPU_WORKERS |
4 | Number of CPU stressor workers |
VM_WORKERS |
2 | Number of VM (memory) stressor workers |
VM_BYTES |
70% | Memory allocation per VM worker |
HDD_WORKERS |
2 | Number of HDD stressor workers |
HDD_BYTES |
70% | Write size per HDD worker |
Accepts a list of test result dicts, collects system info, determines overall PASS/FAIL (all tests must pass), writes reports/report.json to disk, and returns the complete report dict.
Runs stress-ng --cpu {workers} --timeout {duration}s --metrics-brief. Parses the cpu stressor line for bogo ops/sec. Compares against CPU_MIN_SCORE.
Runs stress-ng --vm {workers} --vm-bytes {bytes} --timeout {duration}s --metrics-brief. Parses the vm stressor line for bogo ops/sec. Compares against MEMORY_MIN_SCORE.
Runs stress-ng --hdd {workers} --hdd-bytes {bytes} --timeout {duration}s --metrics-brief. Parses the hdd stressor line for bogo ops/sec. Compares against DISK_MIN_SCORE.
Wraps subprocess.run() with:
try/exceptforTimeoutExpiredand general exceptions- Structured return dict:
{ success, stdout, stderr, returncode } - Debug-level command logging
- Warning-level logging on non-zero exit codes
Generic regex parser for stress-ng --metrics-brief output. Accepts any stressor name (cpu, vm, hdd, etc.) and extracts the bogo-ops/s (real time) column. Handles both stdout and stderr since stress-ng behavior varies by version.
The regex pattern matches:
<stressor> <bogo-ops> <real-time> <usr-time> <sys-time> <bogo-ops/s>
Collects structured hardware and runtime information:
- CPU: model, architecture, core count, threads/core, max MHz (from
lscpu) - Memory: total, free, available, swap (from
/proc/meminfo, converted to MB) - Disk: total, used, available, usage % (from
df -h /) - Uptime: raw seconds and human-readable string (from
/proc/uptime)
All functions return Python dicts, not raw command output.
- Python: 3.10+ (uses
float | Noneunion type syntax) - stress-ng: Must be installed and available in
$PATH - Operating System: Linux (tested on Ubuntu)
# Install stress-ng
sudo apt update && sudo apt install -y stress-ng
# Clone the project
git clone <repository-url>
cd linux-platform-validator
# No Python package dependencies required — stdlib onlyAll commands must be run from the project root directory.
# Run all tests (CPU + Memory + Disk) with 30-second duration
python3 main.py --all --duration 30
# Run only the CPU test
python3 main.py --cpu --duration 30
# Run only the memory test
python3 main.py --memory --duration 15
# Run only the disk/IO test
python3 main.py --disk --duration 15
# Run CPU and memory together
python3 main.py --cpu --memory --duration 20
# Enable debug logging
python3 main.py --all --duration 10 --verbose| Flag | Description |
|---|---|
--cpu |
Run CPU stress test |
--memory |
Run memory stress test |
--disk |
Run disk/IO stress test |
--all |
Run all tests |
--duration N |
Test duration in seconds (default: 10) |
--verbose, -v |
Enable debug-level logging |
If no test flags are provided, the program exits with code 1 and an error message.
Edit config.py to adjust thresholds and parameters for your target platform.
Thresholds determine PASS/FAIL. If a test's bogo ops/sec score is below the threshold, it fails. The overall status is FAIL if any individual test fails.
Threshold values should be calibrated for the specific hardware under test. A VM will produce different scores than bare metal, and a mobile SoC will differ from a server CPU. The recommended approach is:
- Run with
--all --duration 60on your target hardware. - Note the baseline scores.
- Set thresholds to ~70-80% of baseline to allow for normal variance.
Command: stress-ng --cpu 4 --timeout {duration}s --metrics-brief
Spawns 4 CPU worker processes that exercise a variety of CPU-intensive operations (integer math, floating point, bit manipulation, etc.). The --metrics-brief flag produces a summary table at the end with bogo-ops/sec for each stressor.
What it measures: Raw computational throughput of the CPU under full load across all cores.
Why it matters: Detects CPU defects, thermal throttling, frequency scaling issues, and hypervisor overhead. A score that is significantly lower than expected for a given CPU model indicates a problem.
Command: stress-ng --vm 2 --vm-bytes 70% --timeout {duration}s --metrics-brief
Spawns 2 VM worker processes, each allocating and exercising 70% of available memory through continuous write/read/verify cycles.
What it measures: Memory bandwidth and subsystem throughput under sustained allocation pressure.
Why it matters: Detects faulty memory modules, memory controller issues, NUMA misconfigurations, and kernel memory management regressions. The vm stressor specifically tests mmap, write, read, and verify patterns.
Command: stress-ng --hdd 2 --hdd-bytes 70% --timeout {duration}s --metrics-brief
Spawns 2 HDD worker processes that perform sequential write/read operations to temporary files.
What it measures: Disk I/O throughput for sequential write operations.
Why it matters: Detects storage subsystem degradation, filesystem driver issues, I/O scheduler misconfigurations, and storage controller problems. Particularly important for platforms with eMMC/UFS storage where firmware updates can regress I/O performance.
After every run, a JSON report is written to reports/report.json and printed to the console.
{
"timestamp": "2026-02-20T13:32:17.407676",
"system_info": {
"cpu": {
"model": "Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz",
"architecture": "x86_64",
"cores": "4",
"threads_per_core": "2",
"max_mhz": "3300.0000"
},
"memory": {
"total": "7842 MB",
"free": "2160 MB",
"available": "3305 MB",
"swap_total": "4095 MB"
},
"disk": {
"total": "457G",
"used": "12G",
"available": "422G",
"use_pct": "3%"
},
"uptime": {
"seconds": 3446.37,
"human": "0h 57m"
}
},
"results": [
{
"test": "cpu",
"score": 1936.67,
"unit": "bogo ops/sec",
"threshold": 1500,
"status": "PASS"
},
{
"test": "memory",
"score": 92566.69,
"unit": "bogo ops/sec",
"threshold": 1000,
"status": "PASS"
},
{
"test": "disk",
"score": 21296.91,
"unit": "bogo ops/sec",
"threshold": 500,
"status": "PASS"
}
],
"overall_status": "PASS"
}Each test result contains:
test: Subsystem namescore: Measured bogo ops/secunit: Always "bogo ops/sec"threshold: The configured minimum score for PASSstatus: "PASS" or "FAIL"
overall_status is "PASS" only if every individual test passed.
The following are actual recorded runs on the development machine.
- CPU: Intel Core i5-5287U @ 2.90GHz (4 cores, 2 threads/core, turbo to 3.3 GHz)
- RAM: 7,842 MB total
- Disk: 457 GB total, 3% used
- OS: Ubuntu Linux (running in a VM)
$ python3 main.py --all --duration 5
=== Test Summary ===
cpu 1954.74 bogo ops/sec [FAIL] (threshold: 3000)
memory 5796.68 bogo ops/sec [PASS] (threshold: 1000)
Overall status: FAIL
Exit code: 1
Observation: CPU scored 1,954 against the initial threshold of 3,000 — a FAIL. The 5-second duration was too short for the CPU stressor to fully stabilize. Memory passed comfortably at 5.8x the threshold.
Action taken: Lowered CPU_MIN_SCORE from 3,000 to 1,500 to match the hardware class (5th-gen mobile i5 in a VM).
$ python3 main.py --cpu --duration 30
=== Test Summary ===
cpu 2268.97 bogo ops/sec [PASS] (threshold: 1500)
Overall status: PASS
Exit code: 0
Observation: With 30 seconds, the CPU score rose ~16% to 2,268.97 and comfortably passed the adjusted threshold. Longer duration allows workers to fully saturate cores and reach steady-state throughput.
$ python3 main.py --all --duration 30
=== Test Summary ===
cpu 1936.67 bogo ops/sec [PASS] (threshold: 1500)
memory 92566.69 bogo ops/sec [PASS] (threshold: 1000)
disk 21296.91 bogo ops/sec [PASS] (threshold: 500)
Overall status: PASS
Exit code: 0
Observation: All three subsystems passed. CPU scored slightly lower (1,936 vs 2,268 in isolation) because the tests ran sequentially and the system had already been under load. Memory scored dramatically higher at 30s (92,566 vs 5,796 at 5s) — the vm stressor benefits significantly from longer runtimes. Disk scored 21,296, which is 42x the threshold.
| Test | 5s Run | 30s Run (isolated) | 30s Run (combined) |
|---|---|---|---|
| CPU | 1,954.74 | 2,268.97 | 1,936.67 |
| Memory | 5,796.68 | — | 92,566.69 |
| Disk | — | — | 21,296.91 |
Short stress tests (5s) produce noisy, lower scores. Workers need time to initialize, allocate resources, and reach steady-state. A 30-second minimum is recommended for reliable measurements. For production validation, 60 seconds or more is standard.
The --metrics-brief output may appear on stdout or stderr depending on the stress-ng version. The framework handles this by combining both streams before parsing. This was a key bug fix — the original codebase only checked result.stdout, which would silently fail on newer stress-ng versions.
The original code used fragile column-index string splits to parse stress-ng output. Different versions of stress-ng may produce slightly different column spacing, extra columns, or variant stressor names (e.g., vm vs vm-rw). The regex pattern {stressor}[\w-]*\s+\d+\s+... handles all these variants robustly.
The initial CPU threshold of 3,000 was appropriate for a modern desktop CPU on bare metal, but too aggressive for a 5th-gen mobile i5 running inside a VM. Thresholds should be baselined per hardware platform. In a real SoC validation environment, each platform SKU would have its own threshold profile.
The VM environment introduces hypervisor scheduling overhead, which reduces CPU-bound throughput by roughly 20-40% compared to bare metal. This is visible in the CPU scores (1,900-2,200) which are lower than what a bare-metal i5-5287U would produce (~3,000+).
The vm stressor showed a 16x improvement from 5s to 30s (5,796 → 92,566). This is because at 5s, the stressor is still in its allocation and initialization phase. At 30s, it reaches steady-state mmap/write/verify cycling. Memory tests should always use ≥30s duration.
When running all tests together, the CPU score was slightly lower than when run in isolation (1,936 vs 2,268). Even though tests run sequentially (not in parallel), system state from prior tests (warm caches, memory fragmentation, kernel state) can influence subsequent results.
Raw console output is not machine-parseable. By producing a structured JSON report with a deterministic schema, the framework can be consumed by CI pipelines, test management systems, and automated alerting without any output scraping. The exit code provides immediate pass/fail signaling.
Each test runs in its own try/except boundary via utils/runner.py. If one stressor fails (e.g., stress-ng is not installed, or the disk is full), it returns a score of 0.0 and FAIL status without crashing the entire framework. Other tests continue to execute.
Replacing print() with Python's logging module provides:
- Timestamped output for correlating with system events
- Log levels (DEBUG, INFO, WARNING, ERROR) for filtering
- Module-namespaced loggers for tracing which component produced each message
--verboseflag support without code changes
The framework is designed for integration into CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.).
| Exit Code | Meaning |
|---|---|
0 |
All selected tests PASSED |
1 |
One or more tests FAILED, or no tests selected |
- name: Run platform validation
run: python3 main.py --all --duration 30
- name: Upload report
if: always()
uses: actions/upload-artifact@v4
with:
name: platform-report
path: reports/report.jsonstage('Platform Validation') {
sh 'python3 main.py --all --duration 30'
}
post {
always {
archiveArtifacts artifacts: 'reports/report.json'
}
}The non-zero exit code on failure will automatically fail the CI stage, preventing promotion of a platform image that doesn't meet performance thresholds.
To add a new stress test (e.g., network):
-
Add threshold and parameters in
config.py:NETWORK_MIN_SCORE = 100
-
Create the test module at
tests/network_test.py:from config import NETWORK_MIN_SCORE from utils.parser import parse_stress_ng_output from utils.runner import run_command def run_network_test(duration: int = 10) -> dict: command = f"stress-ng --sock 2 --timeout {duration}s --metrics-brief" result = run_command(command, timeout=duration + 30) combined_output = result["stdout"] + "\n" + result["stderr"] score = parse_stress_ng_output(combined_output, stressor="sock") status = "PASS" if score and score >= NETWORK_MIN_SCORE else "FAIL" return { "test": "network", "score": score or 0.0, "unit": "bogo ops/sec", "threshold": NETWORK_MIN_SCORE, "status": status, }
-
Wire into
main.py: Add a--networkargument and callrun_network_test().
The parser (utils/parser.py) is generic — it works with any stress-ng stressor name. No changes needed there.
This project is provided as-is for educational and portfolio purposes.