Skip to content

jmuehlig/perf-cpp

Repository files navigation

perf-cpp: Hardware Performance Monitoring for C++

LGPL-3.0 LinuxKernel->=4.0 C++17 Build and Test Ask DeepWiki

Quick Start | Building | Documentation | System Requirements

perf-cpp lets you profile specific parts of your code, not the entire program.

Tools like Linux Perf, Intel® VTune™, and AMD uProf profile everything: application startup, configuration parsing, data loading, and all your helper functions. perf-cpp is different: place start() and stop() around exactly the code you want to measure. Profile one sorting algorithm, or count cache misses in a single hash table lookup. Wrap two memory allocators separately, and you get a fair comparison.

Features

Built around Linux's perf subsystem, perf-cpp supports counting and sampling hardware events for specific code blocks:

See the examples and full documentation for details.

Quick Start

Record Hardware Event Statistics

#include <perfcpp/event_counter.hpp>

// Initialize the counter
auto event_counter = perf::EventCounter{};

// Specify events to count
event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});

// Run the workload
event_counter.start();
code_to_profile(); // <-- Statistics recorded during execution
event_counter.stop();

// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
    std::cout << event_name << ": " << value << std::endl;
}

Example output:

seconds:      0.0955897
instructions: 5.92087e+07
cycles:       4.70254e+08
cache-misses: 1.35633e+07

Note

See the guides on recording event statistics and event statistics on multiple CPUs/threads. Check out the hardware events documentation for built-in and processor-specific events.

Record Samples

#include <perfcpp/sampler.hpp>

// Create the sampler
auto sampler = perf::Sampler{};

// Specify when a sample is recorded: every 50,000th cycle
sampler.trigger("cycles", perf::Period{50000U});

// Specify what data is included in a sample: time, CPU ID, instruction
sampler.values()
    .timestamp(true)
    .cpu_id(true)
    .logical_instruction_pointer(true);

// Run the workload
sampler.start();
code_to_profile(); // <-- Samples recorded during execution
sampler.stop();

const auto samples = sampler.result();

// Export samples to CSV.
samples.to_csv("samples.csv");

// Or iterate samples directly.
for (const auto& record : samples)
{
    const auto timestamp = record.metadata().timestamp().value();
    const auto cpu_id = record.metadata().cpu_id().value();
    const auto instruction = record.instruction_execution().logical_instruction_pointer().value();

    std::cout
        << "Time = " << timestamp << " | CPU = " << cpu_id
        << " | Instruction = 0x" << std::hex << instruction << std::dec
        << std::endl;
}

Example output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

See the sampling guide for what data you can record. Also check out the sampling on multiple CPUs/threads guide for parallel sampling.

Building

perf-cpp can be built as a static or shared library.

git clone https://github.com/jmuehlig/perf-cpp.git
cd perf-cpp
cmake . -B build
cmake --build build

Note

See the building guide for CMake integration and build options.

Documentation

The full documentation is available at jmuehlig.github.io/perf-cpp.

See also: Examples | Changelog

System Requirements

  • GCC 11 or newer, or Clang 14 or newer, with C++17 support.
  • CMake version 3.10 or higher.
  • Linux Kernel 4.0 or newer (some features require a newer kernel).
  • perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the perf paranoid documentation).
  • Python 3, if you use processor-specific hardware event generation.

Contributing

Contributions are welcome. Open an issue or submit a pull request.

To build and run the tests:

cmake . -B build -DBUILD_TESTS=ON
cmake --build build --target tests
./build/bin/tests

Note

Most tests require access to hardware performance counters via perf_event_open. If your system restricts access (e.g., in containers or VMs), some tests will fail. See the perf paranoid documentation.

For questions or feedback: jan.muehlig@tu-dortmund.de.


Related Projects

Other profiling tools:

Resources on Profiling

Academic Papers

Blog Posts