Unlock the full potential of your AI PC. A professional-grade benchmarking framework designed to validate, test, and optimize AI inference performance on Intel NPU (Neural Processing Unit), CPU, and GPU.
- Overview
- Key Features
- Demo Results
- Supported Models
- Installation
- User Guide
- CLI Reference
- Hardware Support
- Troubleshooting
- Contributing
- License
The OpenVINO NPU Inference Benchmark Suite is a comprehensive tool for developers, researchers, and hardware enthusiasts to measure the AI capability of their systems. It specifically targets Intel Core Ultra processors with integrated NPUs, providing deep insights into latency, throughput, and power efficiency speedups compared to traditional CPU inference.
Modern AI workloads (Generative AI, Computer Vision, LLMs) require specialized hardware. The NPU is a dedicated accelerator for these tasks, but measuring its real-world performance can be complex. This suite simplifies the process, offering:
- Direct Comparisons: CPU vs. GPU vs. NPU side-by-side.
- Optimized Pipelines: Automated conversion of PyTorch/ONNX models to NPU-friendly OpenVINO IR formats.
- Visual Analytics: Interactive web dashboards and professional HTML reports.
- ๐ Multi-Device Support: Seamlessly benchmark across CPU, Intel Arc/Integrated GPU, and Intel AI Boost NPU.
- ๐ Interactive Dashboard: A stunning, glassmorphic web UI to run tests and visualize real-time performance.
- ๐ Professional Reports: Generate detailed HTML reports with hardware specs and speedup metrics.
- ๐ Model Zoo: Curated collection of industry-standard models (ResNet, YOLO, BERT) pre-configured for NPU.
- ๐ง Advanced Optimization:
- INT8 Quantization: Compress models for faster NPU inference with negligible accuracy loss.
- Static Shape Enforcement: Automatically handles NPU-specific input requirements.
- Batch Sweeps: Find the optimal batch size for maximum throughput.
- ๐ Python API & CLI: Flexible usage for both researchers (Python) and quick testers (CLI).
Real benchmark results from Intel Core Ultra 7 255H with Intel AI Boost NPU:
| Model | CPU (ms) | NPU (ms) | Speedup |
|---|---|---|---|
| ResNet-50 | 73.1 | 8.5 | 8.6x |
| EfficientNet-B0 | 17.8 | 3.5 | 5.1x |
| MobileNetV3-Small | 3.6 | 1.2 | 3.0x |
Key Metrics:
- ๐ 8.9x Maximum NPU Speedup
- ๐ 5.3x Average NPU Speedup
- โก 1.1ms Fastest NPU Latency
The suite includes a diverse set of state-of-the-art models covering major AI domains:
| Model | Description | Use Case |
|---|---|---|
| ResNet-50 | Deep residual network with 50 layers. | Image classification standard benchmark. |
| MobileNetV3 | Optimized for mobile/edge devices. | Low-latency mobile apps. |
| EfficientNet-B0 | Balanced accuracy and efficiency. | General purpose vision tasks. |
| Model | Description | Use Case |
|---|---|---|
| YOLOv8 (Nano/Small) | "You Only Look Once" - Real-time object detection. | Security, autonomous systems, robotics. |
| YOLOv11 | Latest iteration of YOLO architecture. | Cutting-edge detection performance. |
| Model | Description | Use Case |
|---|---|---|
| BERT Base | Bidirectional Encoder Representations from Transformers. | Text classification, QA, sentiment analysis. |
| DistilBERT | Smaller, faster version of BERT. | Efficient text processing. |
| ViT (Vision Transformer) | Transformer architecture applied to images. | Advanced image recognition. |
All models are automatically downloaded, converted to OpenVINO IR (Intermediate Representation), and optimized for the NPU.
- Windows 10/11 or Linux
- Python 3.10 or higher
- Intel Core Ultra Processor (Series 1 "Meteor Lake" or Series 2 "Lunar Lake")
# Clone the repository
git clone https://github.com/singhraghvendra2104/OpenVINO-NPU-Inference-Benchmark-Suite.git
cd OpenVINO-NPU-Inference-Benchmark-Suite
# Install in editable mode
pip install -e .# For INT8 quantization support
pip install -e ".[quantization]"
# For HuggingFace transformers support
pip install -e ".[transformers]"
# Install all optional dependencies
pip install -e ".[all]"After installation, the npu-benchmark command becomes available. Here's how to get started:
# Step 1: Verify your system has NPU support
npu-benchmark verify
# Step 2: View available models
npu-benchmark models
# Step 3: Launch the web dashboard (easiest way)
npu-benchmark web
# Or run a quick benchmark from CLI
npu-benchmark run resnet50 --iterations 100The easiest and most visual way to use the benchmark suite. Launches a local web server with a beautiful glassmorphic UI.
npu-benchmark webFeatures:
- Open
http://127.0.0.1:5000in your browser - Select models from the Model Zoo
- Choose devices (CPU, NPU, GPU)
- View real-time benchmark progress
- Interactive performance charts
- One-click HTML report download
Options:
npu-benchmark web --host 0.0.0.0 --port 8080 # Custom host/port
npu-benchmark web --no-browser # Don't auto-open browserFor automation, scripting, and headless environments.
# Basic benchmark
npu-benchmark run resnet50
# With options
npu-benchmark run yolov8n --iterations 200 --warmup 20 --device CPU --device NPU
# Specify batch size
npu-benchmark run efficientnet_b0 --batch-size 4
# Choose mode (latency or throughput)
npu-benchmark run mobilenet_v3_small --mode throughput# Compare all classification models
npu-benchmark compare --category classification
# Compare specific models
npu-benchmark compare --models resnet50 --models yolov8n --models efficientnet_b0npu-benchmark batch-sweep yolov8n --batch-sizes 1,2,4,8,16npu-benchmark quantize models/resnet50.xml --samples 100 --output resnet50_int8npu-benchmark report --input ./benchmarks --output ./reports --theme darkIntegrate benchmarking into your own applications:
from npu_benchmark import BenchmarkRunner, BenchmarkConfig, DeviceType
# Configure benchmark
config = BenchmarkConfig(
devices=[DeviceType.CPU, DeviceType.NPU],
num_iterations=100,
warmup_iterations=10,
batch_size=1,
save_results=True
)
# Run benchmark
runner = BenchmarkRunner()
results = runner.run_benchmark("yolov8n", config)
# Access results
for device, metrics in results.results.items():
print(f"{device}: {metrics.latency.mean_ms:.2f}ms")
# Print speedup
speedup = results.get_speedup("CPU", "NPU")
print(f"NPU Speedup: {speedup:.2f}x")from npu_benchmark.models import ModelZoo, ModelCategory
# List available models
zoo = ModelZoo()
models = zoo.list_models(category=ModelCategory.CLASSIFICATION)
for model in models:
print(f"{model.name}: {model.description}")
# Get specific model
resnet = zoo.get_model("resnet50")
print(f"Input shape: {resnet.input_shape}")| Command | Description | Example |
|---|---|---|
info |
Show system and device information | npu-benchmark info |
verify |
Verify NPU availability and run quick test | npu-benchmark verify |
models |
List available benchmark models | npu-benchmark models --category detection |
run |
Run benchmark on a model | npu-benchmark run resnet50 --iterations 100 |
compare |
Compare multiple models | npu-benchmark compare --category classification |
batch-sweep |
Find optimal batch size | npu-benchmark batch-sweep yolov8n |
quantize |
Quantize model to INT8 | npu-benchmark quantize model.xml |
report |
Generate HTML report | npu-benchmark report --theme dark |
dashboard |
Launch terminal dashboard | npu-benchmark dashboard |
web |
Launch web dashboard | npu-benchmark web |
npu-benchmark --verbose <command> # Enable verbose/debug output
npu-benchmark --help # Show helpThis suite is optimized for the Intelยฎ Coreโข Ultra processor family.
| Component | Description |
|---|---|
| NPU (Series 1 - Meteor Lake) | ~10 TOPS, ideal for sustained background AI workloads |
| NPU (Series 2 - Lunar Lake) | ~45+ TOPS, capable of heavy generative AI tasks |
| iGPU (Intel Arc Graphics) | High-throughput parallel processing |
| CPU | Fallback and baseline comparison standard |
- Windows: Download and install the Intel NPU Driver
- Linux: Install
intel-npu-driverpackage - Run
npu-benchmark verifyto check status
- Ensure you have the latest OpenVINO version:
pip install openvino --upgrade - Check model compatibility in the Model Zoo
- For transformer models (BERT), ensure
transformersandoptimum[openvino]are installed - Use INT8 quantization for better NPU performance:
npu-benchmark quantize
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Distributed under the MIT License. See LICENSE for more information.
Built with โค๏ธ for the AI Community
