Quick Start | Building | Documentation | System Requirements
perf-cpp lets you profile specific parts of your code, not the entire program.
Tools like Linux Perf, Intel® VTune™, and AMD uProf profile everything: application startup, configuration parsing, data loading, and all your helper functions.
perf-cpp is different: place start() and stop() around exactly the code you want to measure.
Profile one sorting algorithm, or count cache misses in a single hash table lookup.
Wrap two memory allocators separately, and you get a fair comparison.
Built around Linux's perf subsystem, perf-cpp supports counting and sampling hardware events for specific code blocks:
- Record hardware events like
perf stat, but only around the code you care about (documentation) - Calculate metrics like cycles per instruction or cache miss ratios from the counters (documentation)
- Read counter values without stopping the counter for low-overhead measurements in tight loops (documentation)
- Sample instructions and memory accesses like
perf recordandperf mem record, but targeted at specific functions (documentation) - Export and visualize results: write samples to CSV, generate flame graphs, or correlate memory accesses with specific data structures
- Mix built-in events like cycles and cache misses with processor-specific PMU events (documentation)
See the examples and full documentation for details.
#include <perfcpp/event_counter.hpp>
// Initialize the counter
auto event_counter = perf::EventCounter{};
// Specify events to count
event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});
// Run the workload
event_counter.start();
code_to_profile(); // <-- Statistics recorded during execution
event_counter.stop();
// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
std::cout << event_name << ": " << value << std::endl;
}Example output:
seconds: 0.0955897
instructions: 5.92087e+07
cycles: 4.70254e+08
cache-misses: 1.35633e+07
Note
See the guides on recording event statistics and event statistics on multiple CPUs/threads. Check out the hardware events documentation for built-in and processor-specific events.
#include <perfcpp/sampler.hpp>
// Create the sampler
auto sampler = perf::Sampler{};
// Specify when a sample is recorded: every 50,000th cycle
sampler.trigger("cycles", perf::Period{50000U});
// Specify what data is included in a sample: time, CPU ID, instruction
sampler.values()
.timestamp(true)
.cpu_id(true)
.logical_instruction_pointer(true);
// Run the workload
sampler.start();
code_to_profile(); // <-- Samples recorded during execution
sampler.stop();
const auto samples = sampler.result();
// Export samples to CSV.
samples.to_csv("samples.csv");
// Or iterate samples directly.
for (const auto& record : samples)
{
const auto timestamp = record.metadata().timestamp().value();
const auto cpu_id = record.metadata().cpu_id().value();
const auto instruction = record.instruction_execution().logical_instruction_pointer().value();
std::cout
<< "Time = " << timestamp << " | CPU = " << cpu_id
<< " | Instruction = 0x" << std::hex << instruction << std::dec
<< std::endl;
}Example output:
Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c
Note
See the sampling guide for what data you can record. Also check out the sampling on multiple CPUs/threads guide for parallel sampling.
perf-cpp can be built as a static or shared library.
git clone https://github.com/jmuehlig/perf-cpp.git
cd perf-cpp
cmake . -B build
cmake --build buildNote
See the building guide for CMake integration and build options.
The full documentation is available at jmuehlig.github.io/perf-cpp.
See also: Examples | Changelog
- GCC 11 or newer, or Clang 14 or newer, with C++17 support.
- CMake version 3.10 or higher.
- Linux Kernel 4.0 or newer (some features require a newer kernel).
perf_event_paranoidsetting: Adjust as needed to allow access to performance counters (see the perf paranoid documentation).- Python 3, if you use processor-specific hardware event generation.
Contributions are welcome. Open an issue or submit a pull request.
To build and run the tests:
cmake . -B build -DBUILD_TESTS=ON
cmake --build build --target tests
./build/bin/testsNote
Most tests require access to hardware performance counters via perf_event_open. If your system restricts access (e.g., in containers or VMs), some tests will fail. See the perf paranoid documentation.
For questions or feedback: jan.muehlig@tu-dortmund.de.
Other profiling tools:
- PAPI monitors CPU counters, GPUs, I/O, and more.
- Likwid is a set of command-line tools for benchmarking with an extensive wiki.
- PerfEvent is a lightweight wrapper for performance counters.
- Intel's Instrumentation and Tracing Technology lets you control Intel VTune Profiler from your code.
- Want to go lower-level? Use perf_event_open directly.
- Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis (2017)
- Analyzing memory accesses with modern processors (2020)
- On the Precision of Precise Event Based Sampling (2020)
- CachePerf: A Unified Cache Miss Classifier via Hybrid Hardware Sampling (2022)
- Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison (2023)
- Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping (2024)
- Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE (2024)
- Breaking the Cycle - A Short Overview of Memory-Access Sampling Differences on Modern x86 CPUs (2025)
- C2C - False Sharing Detection in Linux Perf (2016)
- PMU counters and profiling basics (2018)
- Advanced profiling topics. PEBS and LBR (2018)
- The Linux perf Event Scheduling Algorithm (2019)
- Performance Speed Limits (2019)
- Detect false sharing with Data Address Profiling (2019)
- Data-type profiling for perf (2023)
- Analyze cache behavior with Perf C2C on Arm (2023)
- How Small Can a Measured Region Be Before perf Counters Lie? (2026)