diff --git a/.github/workflows/deploy-docs.yml b/.github/workflows/deploy-docs.yml new file mode 100755 index 00000000..a7916244 --- /dev/null +++ b/.github/workflows/deploy-docs.yml @@ -0,0 +1,32 @@ +name: Deploy Documentation + +on: + push: + branches: + - master + paths: + - 'docs/**' # Trigger only when the source docs files are modified + +permissions: + contents: write + +jobs: + build_and_deploy: + name: Build and deploy documentation + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install sphinx sphinx_rtd_theme + - name: Build documentation + run: python -m sphinx docs build + - name: Deploy to GitHub Pages + uses: peaceiris/actions-gh-pages@373f7f263a76c20808c831209c920827a82a2847 # v3.9.2 + with: + github_token: ${{ secrets.GITHUB_TOKEN }} + publish_branch: gh-pages + publish_dir: build + force_orphan: true diff --git a/README.md b/README.md index cdb2ac50..37ea32fe 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ -Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API -================================================================================== +Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs +=============================================================================== [![Build Status](https://github.com/intel/ittapi/actions/workflows/main.yml/badge.svg?branch=master&event=push)](https://github.com/intel/ittapi/actions) [![CodeQL](https://github.com/intel/ittapi/actions/workflows/codeql.yml/badge.svg?branch=master)](https://github.com/intel/ittapi/security/code-scanning/tools/CodeQL/status) @@ -8,28 +8,38 @@ Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API [![Package on crates.io](https://img.shields.io/crates/v/ittapi.svg)](https://crates.io/crates/ittapi) [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/intel/ittapi/badge)](https://securityscorecards.dev/viewer/?uri=github.com/intel/ittapi) -This ITT/JIT open source profiling API includes: +This repository contains the following profiling APIs: - - Instrumentation and Tracing Technology (ITT) API - - Just-In-Time (JIT) Profiling API +- **Instrumentation and Tracing Technology (ITT) API** + Powers your application to generate and control the collection of trace data + during its execution, seamlessly integrating with Intel tools. +- **Just-In-Time (JIT) Profiling API** + Reports detailed information about just-in-time (JIT) compiled code, enabling + you to profile the performance of dynamically generated code. -The Instrumentation and Tracing Technology (ITT) API enables your application -to generate and control the collection of trace data during its execution -across different Intel tools. +The ITT/JIT APIs consist of two parts: -ITT API consists of two parts: a _static part_ and a _dynamic part_. The -_dynamic part_ is specific for a tool and distributed only with a particular -tool. The _static part_ is a common part shared between tools. Currently, the -static part of ITT API is distributed as a static library and released under -a BSD/GPLv2 dual license with every tool supporting ITT API. +- **Static Part** + An open-source static library that you compile and link with your application. +- **Dynamic Part** + A tool-specific shared library that collects and writes trace data. You can + find the reference implementation of the dynamic part as a *Reference Collector* + [here](./src/ittnotify_refcol/README.md). ### Build To build the library: - - On Windows, Linux, FreeBSD and OSX: requires [cmake](https://cmake.org) to be set in `PATH` - - Windows: requires Visual Studio installed or requires [Ninja](https://github.com/ninja-build/ninja/releases) to be set in `PATH` - - To enable fortran support requires [Intel Fortran Compiler](https://www.intel.com/content/www/us/en/docs/fortran-compiler/get-started-guide/current/overview.html) installed - - To list available build options execute: `python buildall.py -h` + +- Get general development tools, including C/C++ Compiler +- Install [Python](https://python.org) 3.6 or later +- Install [CMake](https://cmake.org) 3.5 or later +- For a Windows* system, install one of these: + - [Microsoft Visual Studio](https://visualstudio.microsoft.com) 2015 or later + - [Ninja](https://github.com/ninja-build/ninja/releases) 1.9 or later +- To enable support for Fortran, install the + [Intel Fortran Compiler](https://www.intel.com/content/www/us/en/docs/fortran-compiler/get-started-guide/current/overview.html) +- To list available build options execute: `python buildall.py -h` + ``` usage: buildall.py [-h] [-d] [-c] [-v] [-pt] [-ft] [--force_bits] @@ -44,15 +54,21 @@ optional arguments: --vs specify visual studio version (Windows only) --cmake_gen specify cmake build generator (Windows only) ``` + +### Documentation + +Find complete documentation for ITT/JIT APIs on the +[ITT/JIT APIs Documentation Page](https://intel.github.io/ittapi) + ### License All code in the repo is dual licensed under GPLv2 and 3-Clause BSD licenses -### Contributing +### Make Contributions -To contribute, please see our [contributing guide](CONTRIBUTING.md) -To report bugs or request enhancements, please use the "Issues" page on GitHub +Learn how to contribute using our [contribution guide](CONTRIBUTING.md) +To report bugs or request enhancements, please use the [Issues page on GitHub](https://github.com/intel/ittapi/issues) ### Security -Please refer to the [security policy](SECURITY.md) for reporting vulnerabilties. +To report vulnerabilities, refer to our [security policy](SECURITY.md) diff --git a/docs/README.md b/docs/README.md new file mode 100755 index 00000000..28b31df2 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,18 @@ +# ITT/JIT APIs Documentation + +## Overview + +This repository contains the source files for the ITT/JIT APIs online documentation, +which is hosted on GitHub Pages. [View the documentation here](link). + +## Build Documentation from Sources + +1. Install Sphinx and the required Sphinx theme: + ```bash + pip install sphinx sphinx_rtd_theme +2. Navigate to the Documentation source folder: + ```bash + cd /docs +3. Build the Documentation with the following command: + ```bash + python -m sphinx . build diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 00000000..831c7b99 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,38 @@ +# +# Copyright (C) 2025 Intel Corporation +# +# SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause +# + +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- + +project = 'ITT/JIT APIs Documentation' +copyright = '2025 Intel Corporation' +author = 'Intel Corporation' + +# -- General configuration --------------------------------------------------- + +extensions = [ + 'sphinx_rtd_theme', # ReadTheDocs theme + 'sphinx.ext.githubpages', # Support for GitHub Pages + 'sphinx.ext.ifconfig', # Conditional inclusion of content +] + +templates_path = ['_templates'] +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + +# -- Options for HTML output ------------------------------------------------- + +import sphinx_rtd_theme + +html_theme = 'sphinx_rtd_theme' +html_theme_options = { + 'style_external_links': True, +} + +html_baseurl = 'https://intel.github.io/ittapi/' diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 00000000..971bf227 --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,24 @@ +.. _index: + +The Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs +=================================================================================== + + +The Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs +are open source profiling APIs that you use with Intel software tools, to collect +and manage trace data during performance analysis. You can profile with ITT/JIT APIs +when you use Intel® VTune Profiler and Intel® Graphics Performance Analyzers (Intel® GPA). + +This repository contains documentation that explains the use of these APIs. + + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + src/overview + src/build + src/api-support + src/ref_collector + GitHub Project + diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100755 index 00000000..856b3f4c --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,2 @@ +sphinx +sphinx_rtd_theme \ No newline at end of file diff --git a/docs/src/api-support.rst b/docs/src/api-support.rst new file mode 100644 index 00000000..57c78903 --- /dev/null +++ b/docs/src/api-support.rst @@ -0,0 +1,33 @@ +.. _api-support: + +Use the ITT/JIT APIs +==================== + + +This section describes how you use ITT/JIT APIs with various environments. +The ITT/JIT APIs are a set of C/C++ functions and use no Java* or .NET* code. +For support with your runtime environment, use a Java Native Interface (JNI) +or C/C++ function call from the managed code. + + +C/C++ API Usage and Reference: +------------------------------ + +.. toctree:: + :maxdepth: 1 + + + itt-api-support + jit-api-support + + +Other Language API Bindings: +---------------------------- + +.. toctree:: + :maxdepth: 1 + + + Rust ITT API Bindings + Python ITT API Bindings + diff --git a/docs/src/build.rst b/docs/src/build.rst new file mode 100644 index 00000000..62f6999f --- /dev/null +++ b/docs/src/build.rst @@ -0,0 +1,61 @@ +.. _build: + +Build from Source Code +====================== + +Technical Requirements +---------------------- + +Before you build the ITT/JIT APIs, make sure you have the following hardware and software tools: + +- Get general development tools, including C/C++ Compiler +- Install `Python `__ 3.6 or later +- Install `CMake `__ 3.5 or later +- For a Windows* system, install one of these: + + - `Microsoft Visual Studio `__ 2015 or later + - `Ninja `__ 1.9 or later + +- To enable support for Fortran, install the `Intel Fortran Compiler + `__ + + +Get the ITT/JIT APIs Source Code +-------------------------------- + +To get the source code for the ITT/JIT APIs, do one of the following: + +- Download it from `the latest public Release `__ +- Clone the repository: + +.. code-block:: console + + git clone https://github.com/intel/ittapi.git + + +Build the ITT/JIT APIs +---------------------- + +To build the ITT/JIT APIs static library, run this command: + +.. code-block:: console + + python buildall.py + + +Use these options to configure the build process: + +.. code-block:: console + + usage: python buildall.py [-h] [-d] [-c] [-v] [-pt] [-ft] [--force_bits] + + optional arguments: + -h, --help show this help message and exit + -d, --debug specify debug build configuration (release by default) + -c, --clean delete any intermediate and output files + -v, --verbose enable verbose output from build process + -pt, --ptmark enable anomaly detection support + -ft, --fortran enable fortran support + --force_bits specify bit version for the target + --vs specify visual studio version (Windows only) + --cmake_gen specify cmake build generator (Windows only) diff --git a/docs/src/compile-and-link-with-itt-api.rst b/docs/src/compile-and-link-with-itt-api.rst new file mode 100644 index 00000000..0dd188bd --- /dev/null +++ b/docs/src/compile-and-link-with-itt-api.rst @@ -0,0 +1,167 @@ +.. _compile-and-link-with-itt-api: + +Compile and Link with ITT API +============================= + + +Step 1: Configure Your Build System +----------------------------------- + + +Before instrumenting your application with ITT API, configure your build +system to establish access to the headers and libraries of the API: + + +- Add ``\include`` to your ``INCLUDE`` path +- Add ``\build_\\bin`` + to your ``LIBRARIES`` path + + +Step 2: Include the ITT API Header/Module in Your Application +------------------------------------------------------------- + + +**C/C++ Applications** + + +For every source file that you want to instrument, add the following +``#include`` statements: + + +.. code-block:: cpp + + + #include + + +The ``ittnotify.h`` header contains definitions for ITT API routines +and important macros that provide the correct logic to invoke the API +from your application. + + +When tracing is disabled, The ITT API incur almost zero overhead. To achieve +completely zero overhead, you can compile out all ITT API calls from your +application. To do this, prior to including the ``ittnotify.h`` file, define +the ``INTEL_NO_ITTNOTIFY_API`` macro in your project at compile time. You can +do this from the compiler command line or in your source file. + + +**Fortran Applications** + + +Add the ``ITTNOTIFY`` module to your source files. Use the following +source line: + + +.. code-block:: cpp + + + USE ITTNOTIFY + + +Step 3: Insert ITT Notifications in Your Application +---------------------------------------------------- + + +To insert ITT notifications in your application, use: + ++-----------+-------------------+-----------------------------+ +| Language | Notification | Example | ++===========+===================+=============================+ +| C/C++ | .. code:: cpp | .. code:: cpp | +| | | | +| | __itt_* | __itt_pause(); | ++-----------+-------------------+-----------------------------+ +| Fortran | .. code:: cpp | .. code:: cpp | +| | | | +| | ITT_* | CALL ITT_PAUSE() | ++-----------+-------------------+-----------------------------+ + + +To learn more, open: + + +- `Instrumenting Your Application `__ +- `ITT API Reference `__ + + +Step 4: Link the libittnotify Static Library to Your Application +---------------------------------------------------------------- + + +Once you finish inserting ITT notifications in your application, the next step +is to link the libittnotify static library. This library is ``libittnotify.a`` +in Linux* and FreeBSD* systems and ``libittnotify.lib`` in Windows* systems. + +If you have enabled tracing, the static library loads the dynamic collector of +the ITT API data and forwards to the collector instrumentation data from the ITT API. + +If you have disabled tracing, the static library ignores ITT API calls, resulting in +near-zero instrumentation overhead. + + +Step 5: Load the Dynamic Library +-------------------------------- + + +After you instrument your application and link the static library, you must +load the dynamic library of the ITT API to your application. To do this, +depending on your system architecture, set the ``INTEL_LIBITTNOTIFY32`` or +the ``INTEL_LIBITTNOTIFY64`` environment variable. + + +**Windows OS:** + + +.. code-block:: bash + + + set INTEL_LIBITTNOTIFY32=\bin32\runtime\ittnotify_collector.dll + set INTEL_LIBITTNOTIFY64=\bin64\runtime\ittnotify_collector.dll + + +**Linux OS:** + + +.. code-block:: bash + + + export INTEL_LIBITTNOTIFY32=/lib32/runtime/libittnotify_collector.so + export INTEL_LIBITTNOTIFY64=/lib64/runtime/libittnotify_collector.so + + +**FreeBSD OS:** + + +.. code-block:: bash + + + setenv INTEL_LIBITTNOTIFY64=/lib64/runtime/libittnotify_collector.so + + +Additional Information: Unicode Support +--------------------------------------- + + +All API functions that take parameters of type ``__itt_char`` follow the +Windows OS Unicode convention. + +When compilation happens on a Windows system, if the ``UNICODE`` macro is +defined, ``__itt_char`` is set to ``wchar_t``. If the ``UNICODE`` macro is +not defined, ``__itt_char`` is set to ``char``. + +The actual function names are suffixed with ``A`` for the ASCII APIs and +``W`` for the Unicode APIs. Both types of functions are defined in the +DLL that implements the API. + +Strings that contain only ASCII characters are internally equivalent for both +the Unicode and ASCII API versions. For example, the following +strings are equivalent: + + +.. code-block:: cpp + + + __itt_sync_createA( addr, "OpenMP Scheduler", "Critical Section", 0); + __itt_sync_createW( addr, L"OpenMP Scheduler", L"Critical Section", 0); + diff --git a/docs/src/instrument-your-application.rst b/docs/src/instrument-your-application.rst new file mode 100644 index 00000000..ba0267dd --- /dev/null +++ b/docs/src/instrument-your-application.rst @@ -0,0 +1,102 @@ +.. _instrument-your-application: + +Instrument Your Application +=========================== + + +When you collect performance data with the ITT/JIT APIs, for optimum results, +add API calls in your code to designate logical tasks. You can then visualize +the relationship between tasks in your code (for example, when they start and +end) relative to other CPU and GPU tasks. + + +At the highest level, a task is a logical group of work that executes on a +specific thread. A task can correspond to any grouping of code within your +program that you consider important. You can mark up your code by +identifying the beginning and end of each logical task with +``__itt_task_begin`` and ``__itt_task_end`` calls. + + +To get started, use the following API calls: + +- ``__itt_domain_create()`` creates a domain that is required in most ITT + API calls. Define atleast one domain. +- ``__itt_string_handle_create()`` creates string handles to identify your + tasks. String handles are more efficient than strings to identify traces. +- ``__itt_task_begin()`` marks the beginning of a task. +- ``__itt_task_end()`` marks the end of a task. + + +Example +------- + + +This example shows how you use four basic ITT API functions in a +multi-threaded application: + + +- `Domain API `__ +- `String Handle API `__ +- `Task API `__ +- `Thread Naming API `__ + + +.. code-block:: cpp + + + #include + #include + + + // Forward declaration of a thread function. + DWORD WINAPI workerthread(LPVOID); + bool g_done = false; + + // Create a domain that is visible globally: we will use it in our example. + __itt_domain* domain = __itt_domain_create("Example.Domain.Global"); + // Create string handles which associates with the "main" task. + __itt_string_handle* handle_main = __itt_string_handle_create("main"); + __itt_string_handle* handle_createthread = __itt_string_handle_create("CreateThread"); + + void main(int, char* argv[]) + { + // Create a task associated with the "main" routine. + __itt_task_begin(domain, __itt_null, __itt_null, handle_main); + + // Now we'll create 4 worker threads + for (int i = 0; i < 4; i++) + { + // We might be curious about the cost of CreateThread. We add tracing to do the measurement. + __itt_task_begin(domain, __itt_null, __itt_null, handle_createthread); + cppCreateThread(NULL, 0, workerthread, (LPVOID)i, 0, NULL); + __itt_task_end(domain); + } + + // Wait a while,... + cppSleep(5000); + g_done = true; + + // Mark the end of the main task + __itt_task_end(domain); + } + + // Create string handle for the work task. + __itt_string_handle* handle_work = __itt_string_handle_create("work"); + DWORD WINAPI workerthread(LPVOID data) + { + // Set the name of this thread so it shows up in the UI as something meaningful + char threadname[32]; + wsprintf(threadname, "Worker Thread %d", data); + __itt_thread_set_name(threadname); + + // Each worker thread does some number of "work" tasks + while(!g_done) + { + __itt_task_begin(domain, __itt_null, __itt_null, handle_work); + cppSleep(150); + __itt_task_end(domain); + } + + return 0; + } + diff --git a/docs/src/itt-api-reference.rst b/docs/src/itt-api-reference.rst new file mode 100644 index 00000000..d65236b4 --- /dev/null +++ b/docs/src/itt-api-reference.rst @@ -0,0 +1,26 @@ +.. _itt-api-reference: + +ITT API Reference +================= + +.. toctree:: + :maxdepth: 1 + + + ittapi/clock-domain-api + ittapi/collection-control-api + ittapi/context-metadata-api + ittapi/counter-api + ittapi/domain-api + ittapi/event-api + ittapi/frame-api + ittapi/histogram-api + ittapi/load-module-api + ittapi/marker-api + ittapi/memory-allocation-apis + ittapi/metadata-api + ittapi/relation-api + ittapi/string-handle-api + ittapi/task-api + ittapi/thread-naming-api + ittapi/user-defined-synchronization-api diff --git a/docs/src/itt-api-support.rst b/docs/src/itt-api-support.rst new file mode 100644 index 00000000..f27f1dbd --- /dev/null +++ b/docs/src/itt-api-support.rst @@ -0,0 +1,39 @@ +.. _itt-api-support: + +Instrumentation and Tracing Technology (ITT) API +================================================ + + +Use the Intel® Instrumentation and Tracing Technology (ITT) API to generate +trace data and control its collection during the execution of your application. + + +Use the ITT API to: + + +- Control application performance overhead based on the amount of + traces that you collect. +- Enable trace collection without having to recompiling your application +- Enable code annotation for deeper analysis. + +You can use the ITT API to collect trace data from C, C++, or Fortran +applications that run on Windows*, Linux* or FreeBSD* systems. + + +The ITT API has **static** and **dynamic** library components. The applications +and modules you link to the static library do not have a runtime dependency +on the dynamic library. Therefore, you can run these components independently. + + +ITT API Usage and Reference +--------------------------- + +.. toctree:: + :maxdepth: 1 + + + compile-and-link-with-itt-api + instrument-your-application + minimize-itt-api-overhead + itt-api-reference + diff --git a/docs/src/ittapi/clock-domain-api.rst b/docs/src/ittapi/clock-domain-api.rst new file mode 100644 index 00000000..f8382c9c --- /dev/null +++ b/docs/src/ittapi/clock-domain-api.rst @@ -0,0 +1,46 @@ +.. _clock-domain-api: + +Clock Domain API +================ + + +Some applications require the capability to trace events with user-defined +timestamps and frequencies that are different from the ones generated by a +CPU. For example, you may want to instrument events that occur on a GPU. +To do this, you can create a clock domain. + + +To create a clock domain, use this following primitive: + +.. code:: cpp + + + __itt_clock_domain * ITTAPI __itt_clock_domain_create(__itt_get_clock_info_fn fn, void* fn_data) + + +**Parameters of the primitive:** + + ++-------+-------------+-------------------------------------------------------------+ +| [in] | ``fn`` | Pointer to a callback function that retrieves alternative | +| | | CPU timestamps and frequencies and stores them in the | +| | | clock domain structure field ``__itt_clock_info``. | ++-------+-------------+-------------------------------------------------------------+ +| [in] | ``fn_data`` | Argument passed to the callback function. Can be ``NULL``. | ++-------+-------------+-------------------------------------------------------------+ + + +Tasks issued from different clock domains display on the same timeline. This +happens by the synchronization of a referenced clock domain base timestamp +(captured at the instant when the clock domain was created) and a CPU timestamp +(captured in the same instant). + +To recalculate clock domain base timestamps and frequencies, if necessary, +for example, when a GPU frequency changes, use the following primitive: + + +.. code:: cpp + + + __itt_clock_domain_reset() + diff --git a/docs/src/ittapi/collection-control-api.rst b/docs/src/ittapi/collection-control-api.rst new file mode 100644 index 00000000..fac9f632 --- /dev/null +++ b/docs/src/ittapi/collection-control-api.rst @@ -0,0 +1,113 @@ +.. _collection-control-api: + +Collection Control API +====================== + + +Use Collection Control APIs in your code to manage how and when Intel® VTune™ +Profiler collects data for your applications. By calling these APIs, you can +pause, resume, or detach data collection to focus analysis on specific code +regions, reduce profiling overhead, or exclude unimportant sections from your +performance results. + + ++-------------------------------+----------------------------------------------+ +| Use This Primitive | To Do This | ++===============================+==============================================+ +| .. code:: cpp | Run the application without collecting data. | +| | VTune Profiler reduces the overhead of | +| void __itt_pause(void) | collection by collecting only critical | +| | information, like thread and process | +| | creation. | ++-------------------------------+----------------------------------------------+ +| .. code:: cpp | Resume data collection. | +| | | +| void __itt_resume(void) | | ++-------------------------------+----------------------------------------------+ +| .. code:: cpp | Detach data collection. VTune Profiler | +| | detaches all collectors from all processes. | +| void __itt_detach(void) | Your application continues to work but no | +| | data is collected for the running collection.| ++-------------------------------+----------------------------------------------+ + + +Pause data collection +--------------------- + + +When you pause the data collection in any thread, you pause the collection for +the entire program and not just the active thread. Also, pausing a data collection +can reduce the overhead from runtime analysis. + + +Unaffected APIs: + + - Domain API + - String Handle API + - Thread Naming API + + +Affected APIs (No Data Collection in Paused State): + + - Task API + - Frame API + - Event API + - User-Defined Synchronization API + + +.. note:: + + + The Pause/Resume API call frequency is about 1Hz for a reasonable rate. + Since this operation pauses and resumes data collection in all processes + in the analysis run with the corresponding collection state notification + sent to the GUI, for small workloads, do not call this operation on a + frequent basis. Use `Frame APIs `__ instead. + + +Usage Example: Focus on a Specific Code Section +----------------------------------------------- + + +In this code example, the pause/resume calls help to focus data collection +from a specific section of code. The application run begins when the collection +is paused. + + +.. code:: cpp + + + int main(int argc, char* argv[]) + { + // Do initialization work here + __itt_resume(); + // Do profiling work here + __itt_pause(); + // Do finalization work here + return 0; + } + + +Usage Example: Hide Sections of Code +------------------------------------ + + +This example shows how you use pause/resume calls to hide intensive work that +may not need attention for a brief period. + + +.. code:: cpp + + + int main(int argc, char* argv[]) + { + // Do work here + __itt_pause(); + // Do uninteresting work here + __itt_resume(); + // Do work here + __itt_detach(); + // Do uninteresting work here + return 0; + } + diff --git a/docs/src/ittapi/context-metadata-api.rst b/docs/src/ittapi/context-metadata-api.rst new file mode 100644 index 00000000..f96ce38d --- /dev/null +++ b/docs/src/ittapi/context-metadata-api.rst @@ -0,0 +1,213 @@ +.. _context-metadata-api: + +Context Metadata API +==================== + + +Use the Context Metadata API to define custom counters in your code with +special attributes. You can also get a set of metrics for the collected data +in any classical form of data representation (bandwidth/latency/utilization +metrics) in Intel® VTune™ Profiler. + + +You can use Context Metadata API to collect counter-based metrics and +attribute these metrics to hardware topology like: + +- PCIe devices +- Block devices +- CPU cores +- Threads + + +**Define and create a counter object** + + +Use this structure to store context metadata: + + +.. code-block:: cpp + + + __itt_context_metadata + { + __itt_context_type type; /*!< Type of the context metadata value */ + void* value; /*!< Pointer to context metadata value itself */ + } + + +The structure accepts the following types of context metadata: + + ++-------------------------------+------------------------------------+-----------------------------------------------+ +| __itt_context_type | Value | Description | ++===============================+====================================+===============================================+ +| .. code-block:: cpp | ASCII string char* / | The name of the counter-based metric. | +| | Unicode string wchar_t* | This value is required. | +| __itt_context_name | | | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | ASCII string char* / | Statistics subdomain to break down the | +| | Unicode string wchar_t* | counter samples (for example, network port | +| __itt_context_device | | ID, disk partition, etc.) | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | ASCII string char* / | Units of measurement. For measurement of | +| | Unicode string wchar_t* | time, use the ns/us/ms/s units to correct | +| __itt_context_units | | data representation in VTune Profiler. | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | ASCII string char* / | PCI address of device to associate with | +| | Unicode string wchar_t* | the counter. | +| __itt_context_pci_addr | | | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | Unsigned 64-bit integer type | | +| | | Thread ID to associate with the counter. | +| __itt_context_tid | | | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | Unsigned 64-bit integer type (0,1) | If this flag is set to 1, calculate latency | +| | | histogram and counter/sec timeline | +| __itt_context_bandwidth_flag| | distribution. | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | Unsigned 64-bit integer type (0,1) | If this flag is set to 1, calculate the | +| | | throughput histogram and counter/sec | +| __itt_context_latency_flag | | timeline distribution. | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ +| .. code-block:: cpp | Unsigned 64-bit integer type (0,1) | If this flag is set to 1, show the counter | +| | | on top of the Thread graph as percentage | +| __itt_context_on_thread_flag| | of the CPU Time distribution. | +| | | | ++-------------------------------+------------------------------------+-----------------------------------------------+ + + +Before you associate context metadata with a counter, make sure to create +an ITT API Domain and ITT API Counter Instances first. + + +The domain name provides a heading for the section of metrics for the +counters in the results of VTune Profiler. A single domain can combine +data from any number of counters. However, the name of the counters must +be unique within the same domain. + + +You can combine different counters under a single metric of the Context +Metadata. + + +**Add context information** + + +Once you have created all objects, you can add context information for +the selected counters. Use these primitives: + + +.. code-block:: cpp + + + __itt_bind_context_metadata_to_counter( + __itt_counter counter, size_t length, __itt_context_metadata* metadata); + + +**Parameters of the primitive:** + + ++--------+-------------------------------+-----------------------------------------------------+ +| Type | Parameter | Description | ++========+===============================+=====================================================+ +| [in] | .. code-block:: cpp | Pointer to the counter instance associated with the | +| | | | +| | __itt_counter counter | context metadata | ++--------+-------------------------------+-----------------------------------------------------+ +| [in] | .. code-block:: cpp | Number of elements in the array of context metadata | +| | | | +| | size_t length | | ++--------+-------------------------------+-----------------------------------------------------+ +| [in] | .. code-block:: cpp | Pointer to the array of context metadata | +| | | | +| | __itt_context_metadata* | | +| | metadata | | ++--------+-------------------------------+-----------------------------------------------------+ + + +To create counter instances and submit counter data, use: + + +.. code-block:: cpp + + + __itt_counter_create_v3(__itt_domain* domain, const char* name, __itt_metadata_type type); + __itt_counter_set_value_v3(__itt_counter counter, void *value_ptr); + + +Usage Example +------------- + + +This example creates counters with context metadata that measures random +read operation metrics for an SSD NVMe device: + + +.. code:: cpp + + + #include "ittnotify.h" + #include "ittnotify_types.h" + + + // Create domain and counters: + __itt_domain* domain = + __itt_domain_create("ITT API collected data"); + __itt_counter counter_read_op = + __itt_counter_create_v3(domain, "Read Operations", __itt_metadata_u64); + __itt_counter counter_read_mb = + __itt_counter_create_v3(domain, "Read Megabytes", __itt_metadata_u64); + __itt_counter counter_spin_time = + __itt_counter_create_v3(domain, "Spin Time", __itt_metadata_u64); + + + // Create context metadata: + __itt_context_metadata metadata_read_op[] = { + {__itt_context_name, "Reads"}, + {__itt_context_device, "NVMe SSD Intel DC 660p"}, + {__itt_context_units, "Operations"}, + {__itt_context_pci_addr, "0001:10:00.1"}, + {__itt_context_latency_flag, &true_flag} + }; + __itt_context_metadata metadata_read_mb[] = { + {__itt_context_name, "Read"}, + {__itt_context_device, "NVMe SSD Intel DC 660p"}, + {__itt_context_units, "MB"}, + {__itt_context_pci_addr, "0001:10:00.1"}, + {__itt_context_bandwidth_flag, &true_flag} + }; + __itt_context_metadata metadata_spin_time[] = { + {__itt_context_name, "Spin Time"}, + {__itt_context_device, "NVMe SSD Intel DC 660p"}, + {__itt_context_units, "ms"}, + {__itt_context_tid, &thread_id} + }; + + + // Bind context metadata to counters: + __itt_bind_context_metadata_to_counter(counter_read_op, n, metadata_read_op); + __itt_bind_context_metadata_to_counter(counter_read_mb, n, metadata_read_mb); + __itt_bind_context_metadata_to_counter(counter_spin_time, n, metadata_spin_time); + + + while(1) + { + // Get collected data: + uint64_t read_op = get_user_read_operation_num(); + uint64_t read_mb = get_user_read_megabytes_num(); + uint64_t spin_time = get_user_spin_time(); + + + // Dump collected data: + __itt_counter_set_value_v3(counter_read_op, &read_op); + __itt_counter_set_value_v3(counter_read_mb, &read_mb); + __itt_counter_set_value_v3(counter_spin_time, &spin_time); + } + diff --git a/docs/src/ittapi/counter-api.rst b/docs/src/ittapi/counter-api.rst new file mode 100644 index 00000000..83c7e263 --- /dev/null +++ b/docs/src/ittapi/counter-api.rst @@ -0,0 +1,161 @@ +.. _counter-api: + +Counter API +=========== + + +A Counter is a user-defined characteristic or metric of hardware or software +behavior that you use to collect information about execution breakdown. You can +also use counters to correlate this information with tasks, events, and markers. + +For example, the development of system-on-a-chip (SoC) benefits from several +counters that represent different parts of the SoC to count some hardware +characteristics. + + +**Define and create a counter object** + + +Use these primitives: + +.. code-block:: cpp + + + __itt_counter __itt_counter_create(const char *name, const char *domain); + + __itt_counter __itt_counter_createA(const char *name, const char *domain); + + __itt_counter __itt_counter_createW(const wchar_t *name, const wchar_t *domain); + + __itt_counter __itt_counter_create_typed(const char *name, const char *domain, __itt_metadata_type type); + + __itt_counter __itt_counter_create_typedA(const char *name, const char *domain, __itt_metadata_type type); + + __itt_counter __itt_counter_create_typedW(const wchar_t *name, const wchar_t *domain, __itt_metadata_type type); + + __itt_counter __itt_counter_create_v3(__itt_domain* domain, const char* name,__itt_metadata_type type); + + +You must specify a counter name and domain name. To load a specialized type of +data, specify the counter type. The default counter type is ``uint64_t``. + + +**Parameters of the primitives:** + + ++--------+--------------------------+-------------------+ +| Type | Parameter | Description | ++========+==========================+===================+ +| [in] | .. code-block:: cpp | Counter domain | +| | | | +| | domain | | ++--------+--------------------------+-------------------+ +| [in] | .. code-block:: cpp | Counter name | +| | | | +| | name | | ++--------+--------------------------+-------------------+ +| [in] | .. code-block:: cpp | Counter type | +| | | | +| | type | | ++--------+--------------------------+-------------------+ + + +**Increment/decrement a counter value** + + +Use these primitives: + + +.. code-block:: cpp + + + void __itt_counter_inc (__itt_counter id); + + void __itt_counter_inc_delta(__itt_counter id, unsigned long long value); + + void __itt_counter_dec(__itt_counter id); + + void __itt_counter_dec_delta(__itt_counter id, unsigned long long value); + + +.. note:: + + + These primitives are applicable to uint64 counters only. + + +**Directly set the counter value** + + +Use: + +.. code-block:: cpp + + + void __itt_counter_set_value(__itt_counter id, void *value_ptr); + + void __itt_counter_set_value_v3(__itt_counter counter, void *value_ptr); + + +Parameters of the primitive: + + ++--------+--------------------------+------------------+ +| Type | Parameter | Description | ++========+==========================+==================+ +| [in] | .. code-block:: cpp | Counter ID | +| | | | +| | id | | ++--------+--------------------------+------------------+ +| [in] | .. code-block:: cpp | Counter value | +| | | | +| | value_ptr | | ++--------+--------------------------+------------------+ + + +**Remove an existing counter** + + +Use: + +.. code-block:: cpp + + + void __itt_counter_destroy(__itt_counter id); + + +Usage Example +------------- + + +This example creates a counter that measures temperature and memory +usage metrics: + + +.. code:: cpp + + + #include "ittnotify.h" + + + __itt_counter temperatureCounter = __itt_counter_create("Temperature", "Domain"); + __itt_counter memoryUsageCounter = __itt_counter_create("Memory Usage", "Domain"); + unsigned __int64 temperature; + + + while (...) + { + ... + temperature = getTemperature(); + __itt_counter_set_value(temperatureCounter, &temperature); + + + __itt_counter_inc_delta(memoryUsageCounter, getAllocatedMemSize()); + __itt_counter_dec_delta(memoryUsageCounter, getDeallocatedMemSize()); + ... + } + + + __itt_counter_destroy(temperatureCounter); + __itt_counter_destroy(memoryUsageCounter); + diff --git a/docs/src/ittapi/domain-api.rst b/docs/src/ittapi/domain-api.rst new file mode 100644 index 00000000..3de66fca --- /dev/null +++ b/docs/src/ittapi/domain-api.rst @@ -0,0 +1,64 @@ +.. _domain-api: + +Domain API +========== + + +A ``domain`` enables you to tag trace data for different modules or +libraries in a program. You specify domains using unique character +strings. + +Each domain is represented by an opaque ``__itt_domain`` structure, +which you can use to tag each of the ITT API calls in your code. + +You can selectively enable or disable specific domains in your +application in order to filter the subsets of instrumentation that are +collected into the output trace capture file. + +To disable a domain, set its flag field to 0. This action disables tracing +for a particular domain without affecting other code portions. The overhead +of a disabled domain is a single ``if`` check. + + +**To create a domain, use the following primitives:** + +.. code:: cpp + + + __itt_domain *ITTAPI__itt_domain_create ( const char *name); + + +To create a domain name, use the URI naming convention. For example, +"com.my_company.my_application" is an acceptable format for a domain name. +The set of domains is expected to be static over the execution time of the +application. Therefore, there is no mechanism to destroy a domain. + +Any thread in the process can access any domain in the code, regardless of +the thread that created the domain. This call is thread-safe. + + +**Parameters of the primitives:** + + ++--------+--------------------------+-------------------+ +| Type | Parameter | Description | ++========+==========================+===================+ +| [in] | .. code-block:: cpp | Name of domain | +| | | | +| | name | | ++--------+--------------------------+-------------------+ + + +Usage Example +------------- + + +.. code:: cpp + + + #include "ittnotify.h" + + + __itt_domain* pD = __itt_domain_create(L"My Domain" ); + pD->flags = 0; /* disable domain */ + diff --git a/docs/src/ittapi/event-api.rst b/docs/src/ittapi/event-api.rst new file mode 100644 index 00000000..8f89f24f --- /dev/null +++ b/docs/src/ittapi/event-api.rst @@ -0,0 +1,132 @@ +.. _event-api: + +Event API +========= + + +Use the event API to: + +- Observe when demarcated events occur in your application +- Find out the time taken to execute demarcated regions of code. + +Set annotations in the application to demarcate areas where events of +interest occur. After you run the analysis, you can see the events marked +in the Timeline pane. + +The event API is a per-thread function that works in the resumed state. This +function does not work in the paused state. + + +.. note:: + + + - On Windows\* OS platforms you can define Unicode to use a wide + character version of APIs that pass strings. However, these + strings are internally converted to ASCII strings. + + + - On Linux\* OS platforms only a single variant of the API exists. + + ++----------------------------------------------------------+--------------------------------------------------------------------------+ +| Use This Primitive | To Do This | ++==========================================================+==========================================================================+ +| .. code-block:: cpp | Create an event type with the specified name and length. This API | +| | returns a handle to the event type. The handle should be passed into | +| __itt_event __itt_event_create(const __itt_char *name, | the following event start and event end APIs as a parameter. The | +| int namelen ); | namelen parameter refers to the name length in number of characters. | ++----------------------------------------------------------+--------------------------------------------------------------------------+ +| .. code-block:: cpp | Call this API with the event type handle to register an instance of the | +| | event. The Event start appears in the Timeline pane display as a tick | +| int __itt_event_start(__itt_event event); | mark. | ++----------------------------------------------------------+--------------------------------------------------------------------------+ +| .. code-block:: cpp | Call this API after a call to __itt_event_start() to show the event as a | +| | check mark with a duration line from start to end. If this API is not | +| int __itt_event_end(__itt_event event); | called, the event appears in the Timeline pane as a single check mark. | ++----------------------------------------------------------+--------------------------------------------------------------------------+ + + +Usae Guidelines +--------------- + + +- An __itt_event_end() is always matched with the nearest preceding + __itt_event_start(). Otherwise, the __itt_event_end() call is + matched with the nearest unmatched __itt_event_start() preceding it. + Any intervening events are nested. + +- You can nest user events of the same type or different types within each + other. In the case of nested events, the time is considered to have + been spent only in the most deeply nested user event region. + +- You can overlap different ITT API events. In the case of overlapping + events, the time is considered to have been spent only in the event + region with the later __itt_event_start(). Unmatched + __itt_event_end() calls are ignored. + + +.. note:: + + + To see events and user tasks in your results, `create a custom + analysis `__ (based + on the pre-defined analysis you are interested in) and select the + **Analyze user tasks, events and counters** checkbox in the analysis + settings. + + +Usage Example: Creating and Marking Single Events +------------------------------------------------- + + +The \__itt_event_create API returns a new event handle that you can +subsequently use to mark user events with the \__itt_event_start API. In +this example, two event type handles are created and used to set the +start points for tracking two different types of events. + + +.. code:: cpp + + + #include "ittnotify.h" + + + __itt_event mark_event = __itt_event_create( "User Mark", 9 ); + __itt_event frame_event = __itt_event_create( "Frame Completed", 15 ); + ... + __itt_event_start( mark_event ); + ... + for( int f ; fflags = 1; /* enable domain */ + + for (int i = 0; i < getItemCount(); ++i) + { + __itt_frame_begin_v3(pD, NULL); + do_foo(); + __itt_frame_end_v3(pD, NULL); + } + + //... + + __itt_frame_begin_v3(pD, NULL); + do_foo_1(); + __itt_frame_end_v3(pD, NULL); + + //... + + __itt_frame_begin_v3(pD, NULL); + do_foo_2(); + __itt_frame_end_v3(pD, NULL); + diff --git a/docs/src/ittapi/histogram-api-schema.png b/docs/src/ittapi/histogram-api-schema.png new file mode 100644 index 00000000..d4b8293c Binary files /dev/null and b/docs/src/ittapi/histogram-api-schema.png differ diff --git a/docs/src/ittapi/histogram-api.rst b/docs/src/ittapi/histogram-api.rst new file mode 100644 index 00000000..de840351 --- /dev/null +++ b/docs/src/ittapi/histogram-api.rst @@ -0,0 +1,188 @@ +.. _histogram-api: + +Histogram API +============= + + +Use the Histogram API to define histograms that display arbitrary data +in Intel® VTune™ Profiler. + +Histograms are particularly useful to display statistics that can be +split by individual units for cross-comparison. + + +You can use the Histogram API to: + + +- Track load distribution +- Track resource utilization +- Identify oversubscribed or underutilized worker nodes + + +Any thread in the process can access any instance of a histogram, regardless +of the thread that created it. The Histogram API call is thread-safe. + + +Define and Create a Histogram +----------------------------- + + +Before you create a histogram, you must create an `ITT API Domain +`__ . The pointer to this domain is then passed +to the primitive. + +In the result display in VTune Profiler, the domain name provides a heading +for the histogram section in the **Summary** tab. + +One domain can combine any number of histograms. However, the name of +the histogram must be unique within the same domain. + + +**Parameters of the primitives:** + + ++--------+--------------------------+-----------------------------------------+ +| Type | Parameter | Description | ++========+==========================+=========================================+ +| [in] | .. code-block:: cpp | Domain controlling the call | +| | | | +| | domain | | ++--------+--------------------------+-----------------------------------------+ +| [in] | .. code-block:: cpp | Histogram name | +| | | | +| | name | | ++--------+--------------------------+-----------------------------------------+ +| [in] | .. code-block:: cpp | Type of X axis data | +| | | | +| | x_axis_type | | ++--------+--------------------------+-----------------------------------------+ +| [in] | .. code-block:: cpp | Type of Y axis data | +| | | | +| | y_axis_type | | ++--------+--------------------------+-----------------------------------------+ + + +.. container:: fignone + :name: GUID-788CEBA6-9355-4E6D-ADF7-9ED7BD8441A1 + + + |image1| + + +**Primitives:** + + ++----------------------------------------------+--------------------------------------------------------------------------+ +| Use This Primitive | To Do This | ++==============================================+==========================================================================+ +| .. code-block:: cpp | Create a histogram instance with the specified domain, name, and data | +| | type on Linux* and Android* OS. | +| __itt_histogram* _itt_histogram_create( | | +| __itt_domain* domain, | | +| const char* name, | | +| __itt_metadata_type x_axis_type, | | +| __itt_metadata_type y_axis_type); | | ++----------------------------------------------+--------------------------------------------------------------------------+ +| .. code-block:: cpp | Create a histogram instance with the specified domain, name, and data | +| | type on Windows* OS for ASCII strings (char). | +| __itt_histogram* _itt_histogram_createA( | | +| __itt_domain* domain, | | +| const char* name, | | +| __itt_metadata_type x_axis_type, | | +| __itt_metadata_type y_axis_type); | | ++----------------------------------------------+--------------------------------------------------------------------------+ +| .. code-block:: cpp | Create a histogram instance with the specified domain, name, and data | +| | type on Windows* OS for UNICODE strings (wchar_t). | +| __itt_histogram* _itt_histogram_createW( | | +| __itt_domain* domain, | | +| const wchar_t* name, | | +| __itt_metadata_type x_axis_type, | | +| __itt_metadata_type y_axis_type); | | ++----------------------------------------------+--------------------------------------------------------------------------+ + + +Submit Data to Histogram +------------------------ + + +**Parameters of the primitives:** + + ++--------+--------------------------+-------------------------------------------------+ +| Type | Parameter | Description | ++========+==========================+=================================================+ +| [in] | .. code-block:: cpp | Histogram instance to submit data to | +| | | | +| | histogram | | ++--------+--------------------------+-------------------------------------------------+ +| [in] | .. code-block:: cpp | Number of elements in submitted axis data array | +| | | | +| | length | | ++--------+--------------------------+-------------------------------------------------+ +| [in] | .. code-block:: cpp | Array containing X axis data (may be ``NULL``). | +| | | If ``x_axis_data`` is ``NULL``, VTune Profiler | +| | x_axis_data | uses the indices of the ``y_axis_data`` array. | ++--------+--------------------------+-------------------------------------------------+ +| [in] | .. code-block:: cpp | Array containing Y axis data. | +| | | | +| | y_axis_data | | ++--------+--------------------------+-------------------------------------------------+ + + +**Primitives:** + + ++-------------------------------------+--------------------------------------------------------------------------+ +| Use This Primitive | To Do This | ++=====================================+==========================================================================+ +| .. code-block:: cpp | Submit user statistics for the selected instance of the histogram. | +| | Just like the coordinates of a point on a 2D plane, the array | +| void _itt_histogram_submit( | data for the Y-axis is mapped to the array data for the X-axis. | +| __itt_histogram* histogram, | Data submitted during a workload run is summarized into one common | +| size_t length, | histogram for all calls of this primitive. To lower collection overhead, | +| void* x_axis_data, | determine an efficient interval between data submissions. | +| void* y_axis_data); | | ++-------------------------------------+--------------------------------------------------------------------------+ + + + + +Usage Example +------------- + + +The following example creates a histogram to store worker thread +statistics: + + +.. code:: cpp + + + #include "ittnotify.h" + #include "ittnotify_types.h" + + + void submit_stats() + { + // Create domain + __itt_domain* domain = __itt_domain_create("Histogram statistics domain"); + + + // Create histogram + __itt_histogram* histogram = __itt_histogram_create(domain, "Worker TID 13454", __itt_metadata_u64, __itt_metadata_u64); + + + // Fill the statistics arrays with profiling data: + uint64_t* x_stats, y_stats; + size_t array_size; + get_worker_stats(x_stats, y_stats, array_size); + + + // Submit histogram statistics: + __itt_histogram_submit(histogram, array_size, x_stats, y_stats); + } + + +.. |image1| image:: histogram-api-schema.png + :width: 600px + diff --git a/docs/src/ittapi/load-module-api.rst b/docs/src/ittapi/load-module-api.rst new file mode 100644 index 00000000..cdc2055e --- /dev/null +++ b/docs/src/ittapi/load-module-api.rst @@ -0,0 +1,48 @@ +.. _load-module-api: + +Load Module API +=============== + + +Use the Load Module API in your code to analyze a module that +was loaded in an alternate location, and cannot otherwise be tracked by +Intel® VTune™ Profiler. For example, you could use the Load Module API to analyze code +that is typically executed in an isolated environment that contains +no visibility of the code. Use this API to explicitly set the +module location in an address space for analysis by VTune Profiler. + + ++-----------------------------------------------+------------------------------------------------------------------------+ +| Use This Primitive | To Do This | ++===============================================+========================================================================+ +| .. code-block:: cpp | Call this function after the relocation of a module. Provide the new | +| | start and end addresses for the module and the full path to the module | +| void __itt_module_loadW(void* start_addr, | on the local drive. | +| void* end_addr, | | +| const wchar_t* path); | | ++-----------------------------------------------+------------------------------------------------------------------------+ +| .. code-block:: cpp | Call this function after the relocation of a module. Provide the new | +| | start and end addresses for the module and the full path to the module | +| void __itt_module_loadA(void* start_addr, | on the local drive. | +| void* end_addr, | | +| const char* path); | | ++-----------------------------------------------+------------------------------------------------------------------------+ +| .. code-block:: cpp | Call this function after the relocation of a module. Provide the new | +| | start and end addresses for the module and the full path to the module | +| void __itt_module_load(void* start_addr, | on the local drive. | +| void* end_addr, | | +| const char* path); | | ++-----------------------------------------------+------------------------------------------------------------------------+ + + +Usage Example +------------- + + +.. code-block:: cpp + + + #include "ittnotify.h" + + __itt_module_load(relocatedBaseModuleAddress, relocatedEndModuleAddress, "/path/to/dynamic/library.so"); + diff --git a/docs/src/ittapi/marker-api.rst b/docs/src/ittapi/marker-api.rst new file mode 100644 index 00000000..2d20f070 --- /dev/null +++ b/docs/src/ittapi/marker-api.rst @@ -0,0 +1,42 @@ +.. _marker-api: + +Marker API +========== + + +A marker is an instant event on a timeline that can be associated with a +particular process, a thread, or specified in a global scope. + + +**To create a marker, use the following primitive:** + +.. code-block:: cpp + + + void __itt_marker(const __itt_domain *domain, __itt_id id, + __itt_string_handle *name, __itt_scope scope); + + +**Parameters of the primitive:** + + ++--------+-----------------------+---------------------------------------------------------+ +| Type | Parameter | Description | ++========+=======================+=========================================================+ +| [in] | .. code-block:: cpp | Marker domain | +| | | | +| | domain | | ++--------+-----------------------+---------------------------------------------------------+ +| [in] | .. code-block:: cpp | Marker name | +| | | | +| | name | | ++--------+-----------------------+---------------------------------------------------------+ +| [in] | .. code-block:: cpp | Optional parameter. Marker ID, or ``__itt_null``. | +| | | Markers with different domains cannot have the same IDs.| +| | id | | ++--------+-----------------------+---------------------------------------------------------+ +| [in] | .. code-block:: cpp | Marker scope: process, thread, and global | +| | | | +| | scope | | ++--------+-----------------------+---------------------------------------------------------+ + diff --git a/docs/src/ittapi/memory-allocation-apis.rst b/docs/src/ittapi/memory-allocation-apis.rst new file mode 100644 index 00000000..101fdbcb --- /dev/null +++ b/docs/src/ittapi/memory-allocation-apis.rst @@ -0,0 +1,173 @@ +.. _memory-allocation-apis: + +Memory Allocation APIs +====================== + + +Intel® VTune™ Profiler contains a set of APIs that help identify the +semantics of your ``malloc``-like heap management functions. + +Annotating your code with these APIs enables VTune Profiler to correctly +determine memory objects as part of the **Memory Access Analysis**. + + +Usage Guidelines +---------------- + + +When using the Memory Allocation APIs, follow these guidelines: + + +- Create *wrapper* functions for your routines. Put ``__itt_heap_*_begin`` + and ``__itt_heap_*_end`` calls in these functions. +- When your application calls ``__itt_heap_function_create``, allocate a + unique domain for each pair of allocate/free functions. This enables + VTune Profiler to verify that a matching free function gets called for + every allocate function call. +- Annotate the beginning and end of every allocate function and free + function. +- Call all function pairs from the same stack frame. Otherwise, + VTune Profiler assumes that an exception occurred, and that the allocation + attempt failed. +- Do not call an end function without calling the matching begin + function first. + + +Using Memory Allocation APIs in Your Code +----------------------------------------- + + ++----------------------------------+-----------------------------------------------------------------+ +| Use This | To Do This | ++==================================+=================================================================+ +| .. code-block:: cpp | Declare a handle type to match begin and end calls and domains. | +| | | +| typedef void* | name = Name of the function you want to annotate. | +| __itt_heap_function; | domain = String identifying a matching set of functions. | +| | | +| __itt_heap_function | For example, if there are three functions that all work with | +| __itt_heap_function_create( | my_struct, such as alloc_my_structs, free_my_structs, and | +| const __itt_char* , | realloc_my_structs, pass the same domain to all three | +| const __itt_char* | __itt_heap_function_create() calls. | +| ); | | ++----------------------------------+-----------------------------------------------------------------+ +| .. code-block:: cpp | Identify allocation functions. | +| | | +| void | h = Handle returned when this function's name was passed | +| __itt_heap_allocate_begin( | to __itt_heap_function_create(). | +| __itt_heap_function , | | +| size_t , | size = Size in bytes of the requested memory region. | +| int | | +| ); | initialized = Flag indicating if the memory region will be | +| | initialized by this routine. | +| void | | +| __itt_heap_allocate_end( | addr = Pointer to the address of the memory region this | +| __itt_heap_function , | function has allocated, or 0 if the allocation failed. | +| void**, | | +| size_t , | | +| int | | +| ); | | ++----------------------------------+-----------------------------------------------------------------+ +| .. code-block:: cpp | Identify deallocation functions. | +| | | +| void | h = Handle returned when this function's name was passed | +| __itt_heap_free_begin( | to __itt_heap_function_create(). | +| __itt_heap_function , | | +| void* | addr = Pointer to the address of the memory region this | +| ); | function is deallocating. | +| | | +| void | | +| __itt_heap_free_end( | | +| __itt_heap_function , | | +| void* | | +| ); | | ++----------------------------------+-----------------------------------------------------------------+ +| .. code-block:: cpp | Identify reallocation functions. | +| | | +| void | Note that itt_heap_reallocate_end() must be called after | +| __itt_heap_reallocate_begin( | the attempt even if no memory is returned. VTune Profiler | +| __itt_heap_function , | assumes C-runtime realloc semantics. | +| void*, | | +| size_t , | h = Handle returned when this function's name was passed | +| int | to __itt_heap_function_create(). | +| ); | | +| | addr = Pointer to the address of the memory region this | +| void | function is reallocating. If addr is NULL, the VTune Profiler | +| __itt_heap_reallocate_end( | treats this as if it is an allocation. | +| __itt_heap_function , | | +| void*, | new_addr = Pointer to a pointer to hold the address of the | +| void** , | reallocated memory region. | +| size_t , | | +| int | size = Size in bytes of the requested memory region. If | +| ); | new_size is 0, the VTune Profiler treats this as if it is | +| | a deallocation. | ++----------------------------------+-----------------------------------------------------------------+ + + +Usage Example: Heap Allocation +------------------------------ + + +.. code-block:: cpp + + + #include + + + void* user_defined_malloc(size_t size); + void user_defined_free(void *p); + void* user_defined_realloc(void *p, size_t s); + + + __itt_heap_function my_allocator; + __itt_heap_function my_reallocator; + __itt_heap_function my_freer; + + + void* my_malloc(size_t s) + { + void* p; + + + __itt_heap_allocate_begin(my_allocator, s, 0); + p = user_defined_malloc(s); + __itt_heap_allocate_end(my_allocator, &p, s, 0); + + + return p; + } + + + + + void my_free(void *p) + { + __itt_heap_free_begin (my_freer, p); + user_defined_free(p); + __itt_heap_free_end (my_freer, p); + } + + + void* my_realloc(void *p, size_t s) + { + void *np; + + + __itt_heap_reallocate_begin (my_reallocator, p, s, 0); + np = user_defined_realloc(p, s); + __itt_heap_reallocate_end(my_reallocator, p, &np, s, 0); + + + return(np); + } + + + // Make sure to call this init routine before any calls to + // user defined allocators. + void init_itt_calls() + { + my_allocator = __itt_heap_function_create("my_malloc", "mydomain"); + my_reallocator = __itt_heap_function_create("my_realloc", "mydomain"); + my_freer = __itt_heap_function_create("my_free", "mydomain"); + } + diff --git a/docs/src/ittapi/metadata-api.rst b/docs/src/ittapi/metadata-api.rst new file mode 100644 index 00000000..a07f83d3 --- /dev/null +++ b/docs/src/ittapi/metadata-api.rst @@ -0,0 +1,75 @@ +.. _metadata-api: + +Metadata API +============ + + +Metadata is additional information or generic data that can be attached to a +task, a thread, a process, etc. Metadata has a type, name, and value. +The value encoding depends on the metadata type. The encoding may contain +either string data or a number of integer or floating point values. + + +To create metadata, use the following primitives: + + +.. code-block:: cpp + + + void __itt_metadata_add(const __itt_domain *domain, __itt_id id, __itt_string_handle *key, + __itt_metadata_type type, size_t count, void *data); + + void __itt_metadata_str_addA(const __itt_domain *domain, __itt_id id, __itt_string_handle *key, + const char *data, size_t length); + + void __itt_metadata_str_addW(const __itt_domain *domain, __itt_id id, __itt_string_handle *key, + const wchar_t *data, size_t length); + + void __itt_metadata_add_with_scope(const __itt_domain *domain, __itt_scope scope, + __itt_string_handle *key, __itt_metadata_type type, + size_t count, void *data); + + void __itt_metadata_str_add_with_scopeA(const __itt_domain *domain, __itt_scope scope, + __itt_string_handle *key, const char *data, size_t length); + + void __itt_metadata_str_add_with_scopeW(const __itt_domain *domain, __itt_scope scope, + __itt_string_handle *key, const wchar_t *data, size_t length); + + +The following table defines the parameters used in the Metadata API primitives. + + ++--------+------------------------------+----------------------------------------------------+ +| Type | Parameter | Description | ++========+==============================+====================================================+ +| [in] | .. code-block:: cpp | Metadata domain | +| | | | +| | __itt_domain* domain | | ++--------+------------------------------+----------------------------------------------------+ +| [in] | .. code-block:: cpp | Metadata scope: task, thread, process, and global. | +| | | If a scope is undefined, metadata belongs to the | +| | __itt_scope scope | last task in the thread. | ++--------+------------------------------+----------------------------------------------------+ +| [in] | .. code-block:: cpp | Metadata name | +| | | | +| | __itt_string_handle* name | | ++--------+------------------------------+----------------------------------------------------+ +| [in] | .. code-block:: cpp | Metadata type; used only for numeric metadata | +| | | | +| | __itt_metadata_type type | | ++--------+------------------------------+----------------------------------------------------+ +| [in] | .. code-block:: cpp | Number of numeric metadata items | +| | | ``[in] size_t`` length | +| | size_t count | | ++--------+------------------------------+----------------------------------------------------+ +| [in] | .. code-block:: cpp | Number of symbols a metadata string | +| | | | +| | size_t length | | ++--------+------------------------------+----------------------------------------------------+ +| [in] | .. code-block:: cpp | Actual metadata (array of numerics or string) | +| | | | +| | void *data | | +| | const char *data | | +| | const wchar_t *data | | ++--------+------------------------------+----------------------------------------------------+ + diff --git a/docs/src/ittapi/relation-api.rst b/docs/src/ittapi/relation-api.rst new file mode 100644 index 00000000..d10f94ac --- /dev/null +++ b/docs/src/ittapi/relation-api.rst @@ -0,0 +1,56 @@ +.. _relation-api: + +Relation API +============ + + +The Relation API binds two named instances, like tasks, with a +reasonable relation attribute. You can add relations before or after +the actual instances are created. These relations exist independently +outside the instances. + +To group a bunch of tasks logically, you can use different types of relations: + + +.. code-block:: cpp + + + void ITTAPI __itt_relation_add(const __itt_domain *domain, __itt_id head, + __itt_relation relation, __itt_id tail); + + void ITTAPI __itt_relation_add_ex(const __itt_domain *domain, __itt_clock_domain* clock_domain, + unsigned long long timestamp, __itt_id head, + __itt_relation relation, __itt_id tail); + + +**Parameters of the primitives:** + + ++--------+-------------------------------------+---------------------------------------+ +| Type | Parameter | Description | ++========+=====================================+=======================================+ +| [in] | .. code-block:: cpp | Relation domain | +| | | | +| | __itt_domain* domain | | ++--------+-------------------------------------+---------------------------------------+ +| [in] | .. code-block:: cpp | User-defined logical relation between | +| | | two named instances | +| | __itt_relation relation | | ++--------+-------------------------------------+---------------------------------------+ +| [in] | .. code-block:: cpp | Metadata name | +| | | | +| | __itt_id head | | ++--------+-------------------------------------+---------------------------------------+ +| [in] | .. code-block:: cpp | IDs of two named related instances | +| | | ``size_t`` count | +| | __itt_id tail | | ++--------+-------------------------------------+---------------------------------------+ +| [in] | .. code-block:: cpp | User-defined clock domain | +| | | | +| | __itt_clock_domain* clock_domain | | ++--------+-------------------------------------+---------------------------------------+ +| [in] | .. code-block:: cpp | User-defined timestamp for the | +| | | corresponding clock domain | +| | unsigned long long timestamp | | ++--------+-------------------------------------+---------------------------------------+ + diff --git a/docs/src/ittapi/string-handle-api.rst b/docs/src/ittapi/string-handle-api.rst new file mode 100644 index 00000000..e5efc6f5 --- /dev/null +++ b/docs/src/ittapi/string-handle-api.rst @@ -0,0 +1,40 @@ +.. _string-handle-api: + +String Handle API +================= + + +Many API calls require names to identify API objects. String handles are +pointers to these names. String handles enable efficient handling of named +objects during runtime. The handles also make the collected trace data more +compact. + + +**To create and return a handle value that can be associated with a +string, use the following primitive:** + + +.. code-block:: cpp + + + __itt_string_handle* __itt_string_handle_create(const char *name); + + +Consecutive calls to ``__itt_string_handle_create`` with the same name return +the same value. The set of string handles is expected to remain static during +the execution time of the application. Therefore, there is no mechanism to +destroy a string handle. Any thread in the process can access any string handle, +irrespective of the thread that created the string handle. This call is thread-safe. + + +**Parameters of the primitive:** + + ++--------+------------------------+-------------------+ +| Type | Parameter | Description | ++========+========================+===================+ +| [in] | .. code-block:: cpp | The input string | +| | | | +| | name | | ++--------+------------------------+-------------------+ + diff --git a/docs/src/ittapi/task-api.rst b/docs/src/ittapi/task-api.rst new file mode 100644 index 00000000..a208ec8e --- /dev/null +++ b/docs/src/ittapi/task-api.rst @@ -0,0 +1,251 @@ +.. _task-api: + +Task API +======== + + +A task is a logical unit of work that is performed by a particular thread. +Tasks can nest; thus, tasks typically correspond to functions, scopes, +or a case block in a switch statement. + +Use the Task API to assign tasks to threads + +The Task API does not enable a thread to perform: + +- Task switching, where a thread suspends the current task and switches to + a different task. +- Task stealing, where a thread moves a task to a different thread. + + +A task instance represents a piece of work performed by a particular +thread for a period of time. The task is defined by the bracketing of +``__itt_task_begin()`` and ``__itt_task_end()`` on the same thread. + + +Tasks can be simple or overlapped. + + +Simple tasks implicitly support the concept of embedded execution. The call +``__itt_task_end()`` finalizes the most recent ``__itt_task_begin()`` call. +For example, the following metacode is a valid sequence, and the execution time +of "a" tasks incorporates the execution time of "b" tasks: + + +.. code-block:: cpp + + + __itt_task_begin(a); + __itt_task_begin(b); + __itt_task_end(b); + __itt_task_end(a); + + +The execution regions of overlapped tasks may intercept. For example, the +following metacode is a valid sequence. A "b" task that started after an +"a" task can finish upon completion of the "a" task: + + +.. code-block:: cpp + + + __itt_task_begin_overlapped(a); + __itt_task_begin_overlapped(b); + __itt_task_end_overlapped(a); + __itt_task_end_overlapped(b); + + +Task API Functions +------------------ + + +**To create a simple task instance on a thread, use the following functions:** + + +.. code-block:: cpp + + + void ITTAPI __itt_task_begin(const __itt_domain *domain, __itt_id taskid, + __itt_id parentid, __itt_string_handle *name); + + void ITTAPI __itt_task_begin_fn (const __itt_domain *domain,__itt_id taskid, + __itt_id parentid, void* address); + + void ITTAPI __itt_task_end (const __itt_domain *domain); + + +**To create a simple task instance in a different clock domain, use the +following functions:** + +.. code-block:: cpp + + + void ITTAPI __itt_task_begin_ex(const __itt_domain* domain, __itt_clock_domain* clock_domain, + unsigned long long timestamp, __itt_id taskid, __itt_id parentid, + __itt_string_handle* name); + + void ITTAPI __itt_task_begin_fn_ex(const __itt_domain* domain, __itt_clock_domain* clock_domain, + unsigned long long timestamp, __itt_id taskid, + __itt_id parentid, void* fn); + + void ITTAPI _itt_task_end_ex(const __itt_domain* domain, __itt_clock_domain* clock_domain, + unsigned long long timestamp); + + +**To create an overlapped task instance on a thread, use the following +functions:** + + +.. code-block:: cpp + + + void ITTAPI __itt_task_begin_overlapped(const __itt_domain* domain, __itt_id taskid, + __itt_id parentid, __itt_string_handle* name); + + void ITTAPI __itt_task_end_overlapped(const __itt_domain *domain, __itt_id taskid); + + +The argument ``taskid`` is mandatory for these functions. + + +**To create an overlapped task instance in a different clock domain, use +the following functions:** + + +.. code-block:: cpp + + + void ITTAPI __itt_task_begin_overlapped_ex(const __itt_domain* domain, __itt_clock_domain* clock_domain, + unsigned long long timestamp, __itt_id taskid, + __itt_id parentid, __itt_string_handle* name); + + void ITTAPI __itt_task_end_overlapped_ex(const __itt_domain* domain, __itt_clock_domain* clock_domain, + unsigned long long timestamp, __itt_id taskid); + + +The argument ``taskid`` is mandatory for these functions. + + +.. _task-api-IJIT_NOTIFYEVENT_FUNCTION: + + +ITTAPI__itt_task_* Function Parameters +------------------------------------------ + + +The following table defines the parameters used in the Task API +primitives. + + ++--------+---------------------------------+---------------------------------------------------------------------+ +| Type | Parameter | Description | ++========+=================================+=====================================================================+ +| [in] | .. code-block:: cpp | The domain of the task. | +| | | | +| | __itt_domain | | ++--------+---------------------------------+---------------------------------------------------------------------+ +| [in] | .. code-block:: cpp | User-defined ID optional for all task instances, | +| | | except for overlapped task instances. | +| | __itt_id taskid | ``__itt_null`` can be used as default for undefined task instances. | +| | | Task ID is used to define relations between task instances. | ++--------+---------------------------------+---------------------------------------------------------------------+ +| [in] | .. code-block:: cpp | Optional parameter. Parent instance ID, to which | +| | | the task belongs, or ``__itt_null``. | +| | __itt_id parentid | | ++--------+---------------------------------+---------------------------------------------------------------------+ +| [in] | .. code-block:: cpp | The task string handle. | +| | | | +| | __itt_string_handle | | ++--------+---------------------------------+---------------------------------------------------------------------+ +| [in] | .. code-block:: cpp | Function address that can be used instead of the name. | +| | | For example, the function address can be resolved | +| | void* fn | into the function name by using debug symbol information. | ++--------+---------------------------------+---------------------------------------------------------------------+ +| [in] | .. code-block:: cpp | User-defined clock domain. | +| | | | +| | __itt_clock_domain | | ++--------+---------------------------------+---------------------------------------------------------------------+ +| [in] | .. code-block:: cpp | User-defined timestamp for the corresponding clock domain. | +| | | | +| | unsigned long long timestamp | | ++--------+---------------------------------+---------------------------------------------------------------------+ + + +Usage Example +------------- + + +The following code snippet creates a domain and a couple of tasks at +global scope. + + +.. code-block:: cpp + + + #include "ittnotify.h" + + + void do_foo(double seconds); + + + __itt_domain* domain = __itt_domain_create("MyTraces.MyDomain"); + __itt_string_handle* shMyTask = __itt_string_handle_create("My Task"); + __itt_string_handle* shMySubtask = __itt_string_handle_create("My SubTask"); + + + void BeginFrame() { + __itt_task_begin(domain, __itt_null, __itt_null, shMyTask); + do_foo(1); + } + + + void DoWork() { + __itt_task_begin(domain, __itt_null, __itt_null, shMySubtask); + do_foo(1); + __itt_task_end(domain); + } + void EndFrame() { + do_foo(1); + __itt_task_end(domain); + } + + + int main() { + BeginFrame(); + DoWork(); + EndFrame(); + return 0; + } + + + #ifdef _WIN32 + #include + + + void do_foo(double seconds) { + clock_t goal = (clock_t)((double)clock() + seconds * CLOCKS_PER_SEC); + while (goal > clock()); + } + #else + #include + + + #define NSEC 1000000000 + #define TYPE long + + + void do_foo(double sec) { + struct timespec start_time; + struct timespec current_time; + + + clock_gettime(CLOCK_REALTIME, &start_time); + while(1) { + clock_gettime(CLOCK_REALTIME, ¤t_time); + TYPE cur_nsec=(long)((current_time.tv_sec-start_time.tv_sec-sec)*NSEC + + current_time.tv_nsec - start_time.tv_nsec); + if(cur_nsec>=0) + break; + } + } + #endif + diff --git a/docs/src/ittapi/thread-naming-api.rst b/docs/src/ittapi/thread-naming-api.rst new file mode 100644 index 00000000..9844f72a --- /dev/null +++ b/docs/src/ittapi/thread-naming-api.rst @@ -0,0 +1,84 @@ +.. _thread-naming-api: + +Thread Naming API +================= + + +By default, each thread in your application displays in the **timeline** track. +The threads use a default label that uses the OS thread name or gets generated +from the process ID and the thread ID. To give meaningful names to your +threads, use the Thread Naming API. + +he Thread Naming API is a per-thread function that works in all states +(paused or resumed). You must call this API from within the thread. + +To set thread name using a char or Unicode string, use the primitive: + +.. code-block:: cpp + + + void __itt_thread_set_name (const __itt_char *name); + + +**Parameters of the primitive:** + + ++--------+------------------------+---------------------+ +| Type | Parameter | Description | ++========+========================+=====================+ +| [in] | .. code-block:: cpp | The thread name | +| | | | +| | name | | ++--------+------------------------+---------------------+ + + +**To indicate that this thread should be ignored from analysis:** + +.. code-block:: cpp + + + void __itt_thread_ignore (void); + + +Calling ``__itt_thread_ignore()`` does not affect the application's +concurrency. After this call, the current thread will not be visible in the +**Timeline** pane. + + +If the thread name is set multiple times, only the last name is used. + + +Usage Example +------------- + + +This example shows how you set a meaningful name to a specific thread +and ignore the service thread. + +.. code-block:: cpp + + + DWORD WINAPI service_thread(LPVOID lpArg) + { + __itt_thread_ignore(); + // Do service work here. This thread will not be displayed. + return 0; + } + + + DWORD WINAPI thread_function(LPVOID lpArg) + { + __itt_thread_set_name("My worker thread"); + // Do thread work here + return 0; + } + + + int main(int argc, char* argv[]) + { + CreateThread(NULL, 0, service_thread, NULL, 0, NULL); + CreateThread(NULL, 0, thread_function, NULL, 0, NULL); + + return 0; + } + diff --git a/docs/src/ittapi/user-defined-synchronization-api.rst b/docs/src/ittapi/user-defined-synchronization-api.rst new file mode 100644 index 00000000..e73f2a64 --- /dev/null +++ b/docs/src/ittapi/user-defined-synchronization-api.rst @@ -0,0 +1,336 @@ +.. _user-defined-synchronization-api: + +User-Defined Synchronization API +================================ + + +Although Intel® VTune™ Profiler supports several Windows* OS and POSIX* APIs, +you may find it useful to define your own synchronization constructs. VTune +Profiler does not typically track the custom constructs that you create. +However, you can use the Synchronization API to collect statistical information +about the synchronization constructs you have defined. + +The User-Defined Synchronization API is a per-thread function that works +in the resumed profiling state only. + +Synchronization constructs may generally be modeled as a series of signals. +One or several threads may wait for a signal from another group of threads to +inform them to proceed with specific action. The synchronization API tracks +from the instant when a thread begins to wait for a signal and then notes the +arrival of the signal. This information can help you understand your code +better. This API uses memory handles along with a set of primitives to gather +statistics on the user-defined synchronization object. + + +.. note:: + + + The User-Defined Synchronization API works with the **Threading** + analysis type. + + +Using User-Defined Synchronization API in Your Code +--------------------------------------------------- + + +The following table describes the user-defined synchronization API primitives +that are available for use on Windows* and Linux* operating systems: + + ++------------------------------------+-------------------------------------------+ +| Use This Primitive | To Do This | ++====================================+===========================================+ +| .. code-block:: cpp | Register the creation of a | +| | sync object using char or Unicode string. | +| void | | +| __itt_sync_create( | | +| void *addr, | | +| const __itt_char *objtype, | | +| const __itt_char *objname, | | +| int attribute) | | ++------------------------------------+-------------------------------------------+ +| .. code-block:: cpp | Assign a name to a sync object using char | +| | or Unicode string, after it was created. | +| void | | +| __itt_sync_rename( | | +| void *addr, | | +| const __itt_char *name) | | ++------------------------------------+-------------------------------------------+ +| .. code-block:: cpp | Track lifetime of the destroyed object. | +| | | +| void | | +| __itt_sync_destroy( | | +| void *addr) | | ++------------------------------------+-------------------------------------------+ +| .. code-block:: cpp | Enter spin loop on user-defined | +| | sync object. | +| void | | +| __itt_sync_prepare( | | +| void *addr) | | ++------------------------------------+-------------------------------------------+ +| .. code-block:: cpp | Quit spin loop without acquiring | +| | spin object. | +| void | | +| __itt_sync_cancel( | | +| void *addr) | | ++------------------------------------+-------------------------------------------+ +| .. code-block:: cpp | Define successful spin loop completion | +| | (sync object acquired). | +| void | | +| __itt_sync_acquired( | | +| void *addr) | | ++------------------------------------+-------------------------------------------+ +| .. code-block:: cpp | Start sync object releasing code. | +| | This primitive is called | +| void | before the lock release call. | +| __itt_sync_releasing( | | +| void *addr) | | ++------------------------------------+-------------------------------------------+ + + +Each API call has a single parameter called addr. The address is used to +differentiate between two or more distinct custom synchronization objects. +Each unique address enables VTune Profiler to track a separate custom object. +Therefore, to use the same custom object to protect access in different parts +of your code, use the same addr parameter around each API call that operates +on that object. + +When properly embedded in your code, the primitives inform VTune Profiler +when the code attempts to perform some type of synchronization. Each prepare +primitive must be paired with a cancel or acquired primitive. + +A synchronization construct you define may involve any number of +synchronization objects. Each synchronization object must be triggered off +of a unique memory handle, which the user-defined synchronization API uses +to track the object. Any number of synchronization objects may be tracked +simultaneously using the user-defined synchronization API, as long as each +object uses a unique memory pointer. This action is similar to modeling +objects in the WaitForMultipleObjects function in the Windows* OS API. + +You can create more complex synchronization constructs out of a group of +synchronization objects. However, avoid interlacing different user-defined +synchronization constructs as this may cause incorrect behavior. + + +API Usage Tips +-------------- + +The user-defined synchronization API requires proper placement of the +primitives within your code. Follow these guidelines: + +- Put a prepare primitive immediately before the code that attempts to + obtain access to a synchronization object. + +- Put either a cancel primitive or an acquired primitive immediately + after your code is no longer waiting for a synchronization object. + +- Use the releasing primitive immediately before when the code signals + that no synchronization object is held. + +- When using multiple prepare primitives to simulate any construct that waits + for multiple objects, the result is determined by the last cancel or + acquired primitive called for any object in the group of objects. + + +Key Considerations and Performance Impact: + +- The time between a prepare primitive and an acquired primitive may be + considered as impact time. + +- The time between a prepare primitive and a cancel primitive is + considered blocking time, even though the processor does not + necessarily block. + +- Improper use of the user-defined synchronization API results in + incorrect statistical data. + + +Usage Example: User-Defined Spin-Waits +-------------------------------------- + + +The prepare API indicates to VTune Profiler that the current thread +is about to begin waiting for a signal on a memory location. This call +must occur before you invoke the user synchronization construct. The +prepare API must always be paired with a call to either the acquired or +cancel API. + + +This example shows the use of the prepare and acquired API in conjunction +with a user-defined spin-wait construct: + + +.. code-block:: cpp + + + long spin = 1; + + __itt_sync_prepare((void *) &spin ); + while(ResourceBusy); + // spin wait; + __itt_sync_acquired((void *) &spin ); + + +You may want to use the cancel API in scenarios where the current thread tests +the user synchronization construct and chooses to focus on a different task +instead of wait for a signal from another thread. See this example: + + +.. code-block:: cpp + + + long spin = 1; + + __itt_sync_prepare((void *) &spin ); + while(ResourceBusy) + { + __itt_sync_cancel((void *) &spin ); +     // +     // Do useful work +     // +  // ... + // +     // Once done with the useful work, this construct will test the +     // lock variable and try to acquire it again. Before this can +     // be done, a call to the prepare API is required. +     // + __itt_sync_prepare((void *) &spin ); + } + __itt_sync_acquired((void *) &spin); + + +After you acquire a lock, you must call the releasing API before the +current thread releases the lock. The following example shows how to use +the releasing API: + + +.. code-block:: cpp + + + long spin = 1; + + __itt_sync_releasing((void *) &spin ); + // Code here should free the resource + + +Usage Example: User-Defined Synchronized Critical Section +--------------------------------------------------------- + + +This example shows how to create a critical section construct that can be +tracked using the user-defined synchronization API: + + +.. code-block:: cpp + + + CSEnter() + { + __itt_sync_prepare((void*) &cs); + while(LockIsUsed) + { + if(LockIsFree) + { +     // Code to actually acquire the lock goes here + __itt_sync_acquired((void*) &cs); + } + if(timeout) + { + __itt_sync_cancel((void*) &cs ); + } + } + } + CSLeave() + { + if(LockIsMine) + { + __itt_sync_releasing((void*) &cs); +         // Code to actually release the lock goes here + } + } + + +This critical section example demonstrates how to use the user-defined +synchronization primitives. Note the following points: + + +- Each prepare primitive is paired with an acquired primitive or a + cancel primitive. + +- The prepare primitive is placed immediately before the user code + begins waiting for the user lock. + +- The acquired primitive is placed immediately after the user code + actually obtains the user lock. + +- The releasing primitive is placed before the user code actually + releases the user lock. This ensures that another thread does not + call the acquired primitive before VTune Profiler realizes that + this thread has released the lock. + + +Usage Example: User-Level Synchronized Barrier +---------------------------------------------- + + +You can use the synchronization API to model higher level constructs, like +barriers. This example shows how to create a barrier construct that you can +track using the synchronization API: + + +.. code-block:: cpp + + + Barrier() + { + teamflag = false; + __itt_sync_releasing((void *) &counter); + InterlockedIncrement(&counter); // Use the atomic increment primitive + + if( counter == thread count ) + { + __itt_sync_acquired((void *) &counter); + __itt_sync_releasing((void *) &teamflag); + teamflag = true; + counter = 0; + } + else + { + __ itt_sync_prepare((void *) &teamflag); + // Wait for team flag + __ itt_sync_acquired((void *) &teamflag); + } + } + + +Note the following points: + + +- There are two synchronization objects in this barrier code. The + counter object is used to do a gather-like signaling from all the + threads to the final thread, indicating that each thread has entered + the barrier. + Once the last thread hits the barrier, the thread uses the + teamflag object to signal to all the other threads that they may + proceed. + +- A thread entering the barrier calls ``__itt_sync_releasing()`` to + inform VTune Profiler that it is about to signal the last thread by + incrementing counter + +- The last thread to enter the barrier calls ``__itt_sync_acquired()`` to + inform VTune Profiler that it was successfully signaled by all the + other threads. + +- The last thread to enter the barrier calls ``__itt_sync_releasing()`` to + inform VTune Profiler that it is going to signal the barrier + completion to all the other threads by setting teamflag. + +- With the exception of the last thread, every other thread calls the + ``__itt_sync_prepare()`` to inform VTune Profiler that it is about to + start waiting for the teamflag signal from the last thread. + +- Finally, before leaving the barrier, each thread calls the + ``__itt_sync_acquired()`` primitive to inform VTune Profiler that it + received the end-of-barrier signal successfully. + diff --git a/docs/src/jit-api-reference.rst b/docs/src/jit-api-reference.rst new file mode 100644 index 00000000..b5b18406 --- /dev/null +++ b/docs/src/jit-api-reference.rst @@ -0,0 +1,13 @@ +.. _jit-api-reference: + +JIT API Reference +================= + +.. toctree:: + :maxdepth: 1 + + + jitapi/ijit_notifyevent + jitapi/ijit_isprofilingactive + jitapi/ijit_-getnewmethodid + diff --git a/docs/src/jit-api-support.rst b/docs/src/jit-api-support.rst new file mode 100644 index 00000000..e0a25873 --- /dev/null +++ b/docs/src/jit-api-support.rst @@ -0,0 +1,283 @@ +.. _jit-api-support: + +Just-In-Time (JIT) API +====================== + + +Use the Just-In-Time (JIT) Profiling API to enable performance tools to collect +information about just-in-time generated codes. You must insert JIT Profiling +API calls in the code generator to report information before the JIT-compiled +code goes to execution. This information is collected at runtime and used by +tools like Intel® VTune™ Profiler to display performance metrics associated +with JIT-compiled code. + +You can use the JIT Profiling API to profile scenarios like: + +- Dynamic JIT compilation of JavaScript code traces +- JIT execution in OpenCL™ applications +- Java*/.NET* managed execution environments +- Custom ISV JIT engines + +You can use the JIT Profiling API to profile such environments as +dynamic JIT compilation of JavaScript code traces, JIT execution in +OpenCL™ applications, Java*/.NET* managed execution environments, and +custom ISV JIT engines. + +The JIT engine generates code during runtime and communicates through the +static part with a profiler object (Collector). During runtime, the JIT engine +reports the information about JIT-compiled code that is stored in a trace file +by the profiler object. After collection, the profiling tool uses the generated +trace file to resolve the JIT-compiled code. + + +Use the JIT Profiling API to: + + +- :ref:`Profile trace-based and method-based JIT-compiled code` + +- :ref:`Analyze split functions` + +- :ref:`Explore inline functions` + + +Environment Variables in the JIT Profiling API +---------------------------------------------- + + +The JIT Profiling API contains two environment variables: + +- ``INTEL_JIT_PROFILER32`` +- ``INTEL_JIT_PROFILER64`` + +In turn, these variables contain paths to specific runtime libraries. + +These variables are used to signal the replacement of the stub +implementation of the JIT API with the JIT API collector. +After you instrument your code with the JIT API and link it to the +JIT API stub (``libjitprofiling.lib/libjitprofiling.a``), when the +environment variables are set, your code loads the libraries defined +in the variables. + +Make sure to set these environment variables for the ``ittnotify_collector`` +to enable data collection: + +On Windows*: + +.. code-block:: bash + + INTEL_JIT_PROFILER32=\bin32\runtime\ittnotify_collector.dll + INTEL_JIT_PROFILER64=\bin64\runtime\ittnotify_collector.dll + +On Linux*: + +.. code-block:: bash + + INTEL_JIT_PROFILER32=/lib32/runtime/libittnotify_collector.so + INTEL_JIT_PROFILER64=/lib64/runtime/libittnotify_collector.so + +On FreeBSD*: + +.. code-block:: bash + + INTEL_JIT_PROFILER64=/lib64/runtime/libittnotify_collector.so + + +.. _Profile trace-based and method-based JIT-compiled code : + +Profile Trace-based and Method-based JIT-compiled Code +------------------------------------------------------ + + +This is the most common scenario for using JIT Profiling API to profile +trace-based and method-based JIT-compiled code: + + +.. code-block:: cpp + + + #include + + + if (iJIT_IsProfilingActive() != iJIT_SAMPLING_ON) { + return; + } + + iJIT_Method_Load jmethod = {0}; + jmethod.method_id = iJIT_GetNewMethodID(); + jmethod.method_name = "method_name"; + jmethod.class_file_name = "class_name"; + jmethod.source_file_name = "source_file_name"; + jmethod.method_load_address = code_addr; + jmethod.method_size = code_size; + + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED, (void*)&jmethod); + iJIT_NotifyEvent(iJVM_EVENT_TYPE_SHUTDOWN, NULL); + + +**Usage Tips** + + +- If any ``iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED`` event overwrites a method + that has already been reported , that method becomes invalid. The memory + region of the method is treated as unloaded. + +- If the line number information that was provided contains multiple source + lines for the same assembly instruction (code location), the profiling tool + selects the first line number. + +- You can associate dynamically generated code with a module name. Use the + ``iJIT_Method_Load_V2`` structure for this purpose. + +- If you register a function with the same method ID multiple times and you + specify different module names, the profiling tool selects the module name + that was registered first. If you want to distinguish the same function + between different JIT engines, provide different method IDs for each + function. Other symbolic information, like source file, can be identical. + + +.. _Analyze split functions : + +Analyze Split Functions +----------------------- + + +You can use the JIT Profiling API to analyze split functions. This scenario +often occurs in resource-limited environments where the code for the same +function is generated or updated in separate segments. Sometimes this code +generation can happen with overlapping lifetimes. + + +.. code-block:: cpp + + + #include + + + unsigned int method_id = iJIT_GetNewMethodID(); + + + iJIT_Method_Load a = {0}; + a.method_id = method_id; + a.method_load_address = 0x100; + a.method_size = 0x20; + + + iJIT_Method_Load b = {0}; + b.method_id = method_id; + b.method_load_address = 0x200; + b.method_size = 0x30; + + + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED, (void*)&a); + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED, (void*)&b) + + +**Usage Tips** + + +- If a ``iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED`` event overwrites a method + that was already reported, that method becomes invalid and its memory + region is treated as unloaded. + +- All code regions that are reported with the same method ID are + considered to belong to the same method. Symbolic information + (method name, source file name) is taken from the first notification. + All subsequent notifications with the same method ID are processed only + for the information in the line number table. + +- If you register a second code region with a different source file + name and the same method ID, this information is saved and is not + considered as an extension of the first code region. However, the + profiling tool uses the source file of the first code region and + can map performance metrics incorrectly. + +- If you register a second code region with the same source file as + the one used for the first region and you use the same method ID, + the source file is discarded but the profiling tool maps metrics to + the source file correctly. + +- If you register a second code region with a null source file and + the same method ID, provided line number info will be associated + with the source file of the first code region. + + +.. _Explore inline functions: + +Explore Inline Functions +------------------------ + + +You can use the JIT Profiling API to explore inline functions including +the multilevel hierarchy of nested inline methods that shows the distribution +of performance metrics. + + +.. code-block:: cpp + + + #include + + + // method_id parent_id + // [-- c --] 3000 2000 + // [---- d -----] 2001 1000 + // [---- b ----] 2000 1000 + // [------------ a ----------------] 1000 n/a + + + iJIT_Method_Load a = {0}; + a.method_id = 1000; + + + iJIT_Method_Inline_Load b = {0}; + b.method_id = 2000; + b.parent_method_id = 1000; + + + iJIT_Method_Inline_Load c = {0}; + c.method_id = 3000; + c.parent_method_id = 2000; + + + iJIT_Method_Inline_Load d = {0}; + d.method_id = 2001; + d.parent_method_id = 1000; + + + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED, (void*)&a); + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_INLINE_LOAD_FINISHED, (void*)&b); + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_INLINE_LOAD_FINISHED, (void*)&c); + iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_INLINE_LOAD_FINISHED, (void*)&d); + + +**Usage Tips** + + +- Each inline (``iJIT_Method_Inline_Load``) method should be associated + with two method IDs: one for itself; one for its immediate parent. + +- Address regions of inline methods of the same parent method cannot + overlap each other. + +- Execution of the parent method must not start until the parent method + and all its inline methods are reported. + +- For nested inline methods, the order of + ``iJVM_EVENT_TYPE_METHOD_INLINE_LOAD_FINISHED`` events is not important. + +- If any event overwrites either inline method or top parent method, + then the parent, including inline methods, becomes invalid and their + memory region is treated as unloaded. + + +Learn More +---------- + + +.. toctree:: + :maxdepth: 1 + + + using-jit-api + jit-api-reference + diff --git a/docs/src/jitapi/ijit_-getnewmethodid.rst b/docs/src/jitapi/ijit_-getnewmethodid.rst new file mode 100644 index 00000000..b9dcca6b --- /dev/null +++ b/docs/src/jitapi/ijit_-getnewmethodid.rst @@ -0,0 +1,41 @@ +.. _ijit_-getnewmethodid: + +iJIT GetNewMethodID +=================== + + +Generates a new unique method ID. + + +Syntax +------ + +.. code-block:: cpp + + + unsigned int iJIT_GetNewMethodID(void); + + +Description +----------- + + +Upon each call, the ``iJIT_GetNewMethodID`` function generates new method ID. +Use this API to obtain unique and valid method IDs for methods or traces reported +to the agent if you do not have your own mechanism to generate unique method IDs. + + +Input Parameters +---------------- + + +None + + +Return Values +------------- + + +A new unique method ID. When out of unique method IDs, this API function +returns 0. + diff --git a/docs/src/jitapi/ijit_isprofilingactive.rst b/docs/src/jitapi/ijit_isprofilingactive.rst new file mode 100644 index 00000000..2cf0911f --- /dev/null +++ b/docs/src/jitapi/ijit_isprofilingactive.rst @@ -0,0 +1,39 @@ +.. _ijit_isprofilingactive: + +iJIT_IsProfilingActive +====================== + + +Returns the current mode of the agent. + + +Syntax +------ + +.. code-block:: cpp + + + iJIT_IsProfilingActiveFlags JITAPI iJIT IsProfilingActive(void); + + +Description +----------- + + +The ``iJIT_IsProfilingActive`` function returns the current mode of the agent. + + +Input Parameters +---------------- + + +None + + +Return Values +------------- + + +``iJIT_SAMPLING_ON``, indicating that agent is running, or +``iJIT_NOTHING_RUNNING`` if no agent is running. + diff --git a/docs/src/jitapi/ijit_notifyevent.rst b/docs/src/jitapi/ijit_notifyevent.rst new file mode 100644 index 00000000..9e9c4096 --- /dev/null +++ b/docs/src/jitapi/ijit_notifyevent.rst @@ -0,0 +1,334 @@ +.. _ijit_notifyevent: + +iJIT_NotifyEvent +================ + + +Reports information about JIT-compiled code to the agent. + + +Syntax +------ + +.. code-block:: cpp + + + int iJIT_NotifyEvent(iJIT_JVM_EVENT event_type, void EventSpecificData); + + +Description +----------- + + +The ``iJIT_NotifyEvent`` function sends a notification of +``event_type`` with the data pointed by ``EventSpecificData`` to the +agent. The reported information is used to attribute samples obtained +from any profiling tool collector. Make sure to call this API after +JIT compilation and before the first entry into the JIT-compiled code. + + +Input Parameters +---------------- + + ++-------------------------------+--------------------------------------------+ +| Parameter | Description | ++===============================+============================================+ +| .. code-block:: cpp | Notification code sent to the agent. | +| | See a complete list of event types below. | +| iJIT_JVM_EVENT event_type | See a complete list of event types below. | ++-------------------------------+--------------------------------------------+ +| .. code-block:: cpp | Pointer to event specific data. | +| | | +| void *EventSpecificData | | ++-------------------------------+--------------------------------------------+ + + +The following values are acceptable for ``event_type``: + + ++---------------------------------------------+---------------------------------------------------------------+ +| Value | Description | ++=============================================+===============================================================+ +| .. code-block:: cpp | Send this notification after a JITted method has been loaded | +| | into memory, and possibly JIT compiled, but before the code | +| iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED | is executed. Use the iJIT_Method_Load structure for | +| | EventSpecificData. The return value of iJIT_NotifyEvent is | +| | undefined. | ++---------------------------------------------+---------------------------------------------------------------+ +| .. code-block:: cpp | Send this notification to terminate profiling. Use NULL for | +| | EventSpecificData. iJIT_NotifyEvent returns 1 on success. | +| iJVM_EVENT_TYPE_SHUTDOWN | | ++---------------------------------------------+---------------------------------------------------------------+ +| .. code-block:: cpp | Send this notification to provide new content for a dynamic | +| | code that was reported previously. The previous content is | +| JVM_EVENT_TYPE_METHOD_UPDATE | invalidated, starting from the time of the notification. | +| | Use the iJIT_Method_Load structure for EventSpecificData | +| | with the following required fields: | ++---------------------------------------------+---------------------------------------------------------------+ +| .. code-block:: cpp | Send this notification when an inline dynamic code is JIT | +| | compiled and loaded into memory by the JIT engine, but before | +| JVM_EVENT_TYPE_METHOD_INLINE_LOAD_FINISHED| the parent code region starts executing. Use the | +| | iJIT_Method_Inline_Load structure for EventSpecificData. | ++---------------------------------------------+---------------------------------------------------------------+ +| .. code-block:: cpp | Send this notification when a dynamic code is JIT compiled | +| | and loaded into memory by the JIT engine, but before the code | +| iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED_V2 | is executed. Use the iJIT_Method_Load_V2 structure for | +| | EventSpecificData. | ++---------------------------------------------+---------------------------------------------------------------+ + + +You can use the following structures for ``EventSpecificData``: + + +**iJIT_Method_Inline_Load Structure** + + +When you use the ``iJIT_Method_Inline_Load`` structure to describe the +JIT compiled method, use ``iJVM_EVENT_TYPE_METHOD_INLINE_LOAD_FINISHED`` +as an event type to report it. The\ ``iJIT_Method_Inline_Load`` +structure has the following fields: + + ++------------------------------+------------------------------------------------+ +| Field | Description | ++==============================+================================================+ +| .. code-block:: cpp | Unique method ID. | +| | The Method ID cannot be smaller than 999. | +| unsigned int method_id | Use the API function | +| | ``iJIT_GetNewMethodID`` to get a valid and | +| | unique method ID, or choose to manage the | +| | uniqueness and range of the ID. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Unique immediate parents method ID. | +| | The Method ID cannot be smaller than 999. | +| unsigned int | Use the API function | +| parent_method_id | ``iJIT_GetNewMethodID`` to get a valid and | +| | unique method ID, or choose to manage the | +| | uniqueness and range of the ID. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The name of the method, optionally prefixed | +| | with its class name and appended with its | +| char *method_name | complete signature. This argument cannot be | +| | set to NULL. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The base address of the method code. | +| | Can be NULL if the method is not JITted. | +| void *method_load_address | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The virtual address on which the method is | +| | inlined. If NULL, then data provided with | +| unsigned int method_size | the event are not accepted. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The number of entries in the line number | +| | table. 0 if none. | +| unsigned int | | +| line_number_size | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Pointer to the line numbers info array. | +| | Can be NULL if ``line_number_size`` is 0. | +| pLineNumberInfo | See ``LineNumberInfo`` structure for a | +| line_number_table | description of a single entry in the line | +| | number info array. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Class name. | +| | Can be NULL. | +| char *class_file_name | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Source file name. | +| | Can be NULL. | +| char *source_file_name | | ++------------------------------+------------------------------------------------+ + + +**iJIT_Method_Load Structure** + + +When you use the\ ``iJIT_Method_Load`` structure to describe the JIT +compiled method, use ``iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED`` as an +event type to report it. The\ ``iJIT_Method_Load`` structure has the +following fields: + ++------------------------------+------------------------------------------------+ +| Field | Description | ++==============================+================================================+ +| .. code-block:: cpp | Unique method ID. | +| | Method ID cannot be smaller than 999. | +| unsigned int method_id | You must either use the API function | +| | ``iJIT_GetNewMethodID`` to get a valid and | +| | unique method ID, or else manage ID | +| | uniqueness and correct range by yourself. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The name of the method, optionally prefixed | +| | with its class name and appended with its | +| char *method_name | complete signature. This argument cannot be | +| | set to NULL. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The base address of the method code. | +| | Can be NULL if the method is not JITted. | +| void *method_load_address | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The virtual address on which the method is | +| | inlined. If NULL, then data provided with | +| unsigned int method_size | the event are not accepted. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The number of entries in the line number | +| | table. 0 if none. | +| unsigned int | | +| line_number_size | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Pointer to the line numbers info array. | +| | Can be NULL if ``line_number_size`` is 0. | +| pLineNumberInfo | See ``LineNumberInfo`` structure for a | +| line_number_table | description of a single entry in the line | +| | number info array. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | This field is obsolete. | +| | | +| unsigned int class_id | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Class name. | +| | Can be NULL. | +| char *class_file_name | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Source file name. | +| | Can be NULL. | +| char *source_file_name | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | This field is obsolete. | +| | | +| void *user_data | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | This field is obsolete. | +| | | +| unsigned int | | +| user_data_size | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | This field is obsolete. | +| | | +| iJDEnvironmentType env | | ++------------------------------+------------------------------------------------+ + + +**iJIT_Method_Load_V2 Structure** + + +When you use the ``iJIT_Method_Load_V2`` structure to describe the JIT +compiled method, use ``iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED_V2`` as an +event type to report it. The\ ``iJIT_Method_Load_V2`` structure has the +following fields: + ++------------------------------+------------------------------------------------+ +| Field | Description | ++==============================+================================================+ +| .. code-block:: cpp | Unique method ID. | +| | Method ID cannot be smaller than 999. You must | +| unsigned int method_id | either use the API function | +| | ``iJIT_GetNewMethodID`` to get a valid and | +| | unique method ID, or else manage ID | +| | uniqueness and correct range by yourself. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The name of the method, optionally prefixed | +| | with its class name and appended with its | +| char *method_name | complete signature. This argument cannot be | +| | set to NULL. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The base address of the method code. | +| | Can be NULL if the method is not JITted. | +| void *method_load_address | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The virtual address on which the method is | +| | inlined. If NULL, then data provided with | +| unsigned int method_size | the event are not accepted. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | The number of entries in the line number | +| | table. 0 if none. | +| unsigned int | | +| line_number_size | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Pointer to the line numbers info array. | +| | Can be NULL if ``line_number_size`` is 0. | +| pLineNumberInfo | See ``LineNumberInfo`` structure for a | +| line_number_table | description of a single entry in the line | +| | number info array. | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Class name. | +| | Can be NULL. | +| char *class_file_name | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Source file name. | +| | Can be NULL. | +| char *source_file_name | | ++------------------------------+------------------------------------------------+ +| .. code-block:: cpp | Module name. | +| | Can be NULL. The module name can be useful for | +| char *module_name | distinguishing among different JIT engines. | ++------------------------------+------------------------------------------------+ + + +**LineNumberInfo Structure** + + +Use the ``LineNumberInfo`` structure to describe a single entry in the +line number information of a code region. A table of line number entries +provides information about how the reported code region is mapped to +source file. The Profiling tool uses line number information to attribute +the samples (virtual address) to a line number. You can report different +code addresses for the same source line: + + ++------------+-----------------+ +| **Offset** | **Line Number** | ++============+=================+ +| 1 | 2 | ++------------+-----------------+ +| 12 | 4 | ++------------+-----------------+ +| 15 | 2 | ++------------+-----------------+ +| 18 | 1 | ++------------+-----------------+ +| 21 | 30 | ++------------+-----------------+ + + +Profilers construct the following table using the client data: + + ++-------------------+-----------------+ +| **Code sub-range**| **Line Number** | ++===================+=================+ +| 0-1 | 2 | ++-------------------+-----------------+ +| 1-12 | 4 | ++-------------------+-----------------+ +| 12-15 | 2 | ++-------------------+-----------------+ +| 15-18 | 1 | ++-------------------+-----------------+ +| 18-21 | 30 | ++-------------------+-----------------+ + + +The ``LineNumberInfo`` structure has the following fields: + + ++------------------------------+----------------------------------------------+ +| Field | Description | ++==============================+==============================================+ +| .. code-block:: cpp | Opcode byte offset from the | +| | beginning of the method. | +| unsigned int Offset | | ++------------------------------+----------------------------------------------+ +| .. code-block:: cpp | Matching source line number offset | +| | (from beginning of source file). | +| unsigned int LineNumber | | ++------------------------------+----------------------------------------------+ + + +Return Values +------------- + + +The return values are dependent on the particular ``iJIT_JVM_EVENT``. + diff --git a/docs/src/minimize-itt-api-overhead.rst b/docs/src/minimize-itt-api-overhead.rst new file mode 100644 index 00000000..e248b50c --- /dev/null +++ b/docs/src/minimize-itt-api-overhead.rst @@ -0,0 +1,103 @@ +.. _minimize-itt-api-overhead: + +Minimize ITT API Overhead +========================= + + +The extent of instrumentation you add to your application determines the +amount of overhead introduced by the ITT API and its impact on application +performance. To minimize this overhead, aim for a balance between desired +application performance and the amount of performance data you want to collect. + +Use these guidelines: + +- Add instrumentation to only those paths in your application that are + important for analysis. +- Create ITT domains and string handles outside the critical paths. +- Filter data collection by different aspects of your application that + can be analyzed separately. The overhead for a disabled API call + (thus filtering out the associated call) is always less than 10 clock + ticks. + + +Conditional Compilation +----------------------- + + +In the release version of your code, use conditional compilation to turn off +annotations. Before you include ``ittnotify.h`` during compilation, define the +macro ``INTEL_NO_ITTNOTIFY_API`` to eliminate all ``__itt_*`` functions from +your code. + +By defining this macro, you can also remove the static library from the +linking stage. + + +Usage Example: +-------------- + + +The ITT APIs include a subset of functions that create domains and string +handles. These functions always return identical handles for the same domain +names and strings. This action requires the subset of functions to perform +string comparisons and table lookups. These comparisons and lookups can incur +serious performance penalties. Additionally, the performance of these functions +is proportional to the log of the number of created domains or string handles. +A good practice is to create domains and string handles in the global scope, +or during application startup. + +The following code section creates two domains in the global scope. You can use +these domains to control the level of detail that is written to the trace file. + + +.. code-block:: cpp + + + #include "ittnotify.h" + + // Create domains at global scope. + __itt_domain* basic = __itt_domain_create(L"MyFunction.Basic"); + __itt_domain* detailed = __itt_domain_create(L"MyFunction.Detailed"); + + // Create string handles at global scope. + __itt_string_handle* h_my_funcion = __itt_string_handle_create(L"MyFunction"); + void MyFunction(int arg) + { + __itt_task_begin(basic, __itt_null, __itt_null, h_my_function); + Foo(arg); + FooEx(); + __itt_task_end(basic); + } + + __itt_string_handle* h_foo = __itt_string_handle_create(L"Foo"); + void Foo(int arg) + { + // Skip tracing detailed data if the detailed domain is disabled. + __itt_task_begin(detailed, __itt_null, __itt_null, h_foo); + // Do some work here... + __itt_task_end(detailed); + } + + __itt_string_handle* h_foo_ex = __itt_string_handle_create(L"FooEx"); + void FooEx() + { + // Skip tracing detailed data if the detailed domain is disabled. + __itt_task_begin(detailed, __itt_null, __itt_null, h_foo_ex); + // Do some work here... + __itt_task_end(detailed); + } + + // This is my entry point. + int main(int argc, char** argv) + { + if(argc < 2) + { + // Disable detailed domain if we do not need tracing from that + // in this application run. + detailed ->flags = 0; + } + + MyFunction(atoi(argv[1])); + return 0; + } + diff --git a/docs/src/overview.rst b/docs/src/overview.rst new file mode 100644 index 00000000..64063939 --- /dev/null +++ b/docs/src/overview.rst @@ -0,0 +1,37 @@ +.. _overview: + +Overview +======== + + +When you use Intel analyzer tools to improve the performance of your software +application, during execution, use the Intel® Instrumentation and Tracing +Technology (ITT) and Just-In-Time (JIT) APIs to instrument your code to generate +trace data and control its collection. You use the ITT/JIT APIs to identify and +measure specific areas of code to get insights into performance bottlenecks and +resource utilization. + + +Components +---------- + + +- **ITT API**: Powers your application to generate and control the collection + of trace data during its execution, seamlessly integrating with Intel tools. +- **JIT API**: Reports detailed information about just-in-time (JIT) compiled + code, enabling you to profile the performance of dynamically generated code. + + +Architecture +------------ + + +The ITT/JIT APIs consist of two parts: + +- **Static Part**: An open-source static library + (`ittapi `__) that you compile and link + with your application to enable tracing features. +- **Dynamic Part**: A tool-specific shared library that collects and writes + trace data. You can find the reference implementation of the dynamic part + as a *Reference Collector* `here `__. + diff --git a/docs/src/ref_collector.rst b/docs/src/ref_collector.rst new file mode 100644 index 00000000..5afa5da8 --- /dev/null +++ b/docs/src/ref_collector.rst @@ -0,0 +1,67 @@ +.. _ref_collector: + +ITT API Reference Collector +=========================== + + +This is a reference implementation of the ITT API **dynamic** part that +performs tracing data from ITT API function calls to log files. + + +To use this solution, build the collector as a shared library and point the +full library path to the ``INTEL_LIBITTNOTIFY64`` or ``INTEL_LIBITTNOTIFY32`` +environment variable: + + +**On Linux** + + +.. code-block:: bash + + make + export INTEL_LIBITTNOTIFY64=/libittnotify_refcol.so + + +**On FreeBSD** + + +.. code-block:: bash + + make + setenv INTEL_LIBITTNOTIFY64 /libittnotify_refcol.so + + +By default, log files save in the Temp directory. To change the location, +use the ``INTEL_LIBITTNOTIFY_LOG_DIR`` environment variable: + + +**On Linux** + + +.. code-block:: bash + + + export INTEL_LIBITTNOTIFY_LOG_DIR= + + +**On FreeBSD** + + +.. code-block:: bash + + + setenv INTEL_LIBITTNOTIFY_LOG_DIR + + +This implementation adds logging of some of the ITT API function calls. Adding +logging of other ITT API function calls is welcome. The solution provides 4 +functions with different log levels that take ``printf`` format for logging: + + +.. code-block:: cpp + + LOG_FUNC_CALL_INFO(const char *msg_format, ...); + LOG_FUNC_CALL_WARN(const char *msg_format, ...); + LOG_FUNC_CALL_ERROR(const char *msg_format, ...); + LOG_FUNC_CALL_FATAL(const char *msg_format, ...); + diff --git a/docs/src/using-jit-api.rst b/docs/src/using-jit-api.rst new file mode 100644 index 00000000..8302d21c --- /dev/null +++ b/docs/src/using-jit-api.rst @@ -0,0 +1,56 @@ +.. _using-jit-api: + +Compile and Link with JIT API +============================= + + +To include JIT Profiling support, do one of the following: + + +- Include the following files to your source tree: + + #. ``jitprofiling.h``, located under ``\include`` directory, + in your code. This header file provides all API function prototype + and type definitions. + #. ``ittnotify_config.h``, ``ittnotify_types.h`` and ``jitprofiling.c``, + located under ``/src/ittnotify`` directory. + +- Link the jitprofiling Static Library: + + #. ``jitprofiling.h``, located under ``\include`` directory, + in your code. This header file provides all API function prototype + and type definitions. + #. Link to ``jitprofiling.lib`` (Windows*) or ``jitprofiling.a`` (Linux*), + located under ``\build_\\bin`` + directory. + + ++----------------------------------------------------------------+-------------------------------------------------------------------------------+ +| Use This Primitive | To Do This | ++================================================================+===============================================================================+ +| .. code-block:: cpp | Use this API to send a notification of ``event_type`` with the data pointed | +| | by ``EventSpecificData`` to the agent. The reported information is used to | +| int iJIT_NotifyEvent( iJIT_JVM_EVENT event_type, | attribute samples obtained from any profiling tool collector. | +| void *EventSpecificData ); | | ++----------------------------------------------------------------+-------------------------------------------------------------------------------+ +| .. code-block:: cpp | Generate a new method ID. You must use this function to assign unique and | +| | valid method IDs to methods reported to the profiler. This API returns a new | +| unsigned int iJIT_GetNewMethodID( void ); | unique method ID. When out of unique method IDs, this API function returns 0. | ++----------------------------------------------------------------+-------------------------------------------------------------------------------+ +| .. code-block:: cpp | Returns the current mode of the profiler: off, or sampling, using the | +| | ``iJIT_IsProfilingActiveFlags`` enumeration. This API returns | +| iJIT_IsProfilingActiveFlags iJIT_IsProfilingActive( void ); | ``iJIT_SAMPLING_ON`` by default, indicating that Sampling is running. | +| | It returns ``iJIT_NOTHING_RUNNING`` if no profiler is running. | ++----------------------------------------------------------------+-------------------------------------------------------------------------------+ + + +Lifetime of Allocated Data +-------------------------- + + +You send an event notification to the agent (Collector) with event-specific +data, which is a structure. The pointers in the structure refer to memory that +you allocated. You are responsible for releasing the allocated memory. The +``iJIT_NotifyEvent`` method uses these pointers to copy your data in a trace +file. The pointers are not used after the ``iJIT_NotifyEvent`` method returns. + diff --git a/histogram-api-schema.png b/histogram-api-schema.png new file mode 100644 index 00000000..d4b8293c Binary files /dev/null and b/histogram-api-schema.png differ diff --git a/src/ittnotify_refcol/README.md b/src/ittnotify_refcol/README.md index 4905a7bf..ee30fc53 100644 --- a/src/ittnotify_refcol/README.md +++ b/src/ittnotify_refcol/README.md @@ -1,27 +1,31 @@ -# Instrumentation and Tracing Technology API (ITT API) Reference Collector +# Instrumentation and Tracing Technology (ITT) API Reference Collector -This is a reference implementation of the ITT API _dynamic part_ -that performs tracing data from ITT API functions calls to log files. +This is a reference implementation of the ITT API *dynamic* part that +performs tracing data from ITT API function calls to log files. -To use this solution it is required to build it like a shared library and add -full library path to the `INTEL_LIBITTNOTIFY64/INTEL_LIBITTNOTIFY32` environment variable: +To use this solution, build the collector as a shared library and point the +full library path to the `INTEL_LIBITTNOTIFY64` or `INTEL_LIBITTNOTIFY32` +environment variable: **On Linux** + ``` make export INTEL_LIBITTNOTIFY64=/libittnotify_refcol.so ``` **On FreeBSD** + ``` make setenv INTEL_LIBITTNOTIFY64 /libittnotify_refcol.so ``` -Temp directory is used by default to save log files. -To change log directory use the `INTEL_LIBITTNOTIFY_LOG_DIR` environment variable: +By default, log files save in the `tmp` directory. To change the location, +use the `INTEL_LIBITTNOTIFY_LOG_DIR` environment variable: **On Linux** + ``` export INTEL_LIBITTNOTIFY_LOG_DIR= ``` @@ -31,10 +35,10 @@ export INTEL_LIBITTNOTIFY_LOG_DIR= setenv INTEL_LIBITTNOTIFY_LOG_DIR ``` -This implementation adds logging of some of the ITT API functions calls. -Adding logging of the other ITT API functions calls are welcome. -The solution provides 4 functions with different log levels -that takes printf format for logging: +This implementation adds logging of some of the ITT API function calls. Adding +logging of other ITT API function calls is welcome. The solution provides 4 +functions with different log levels that take `printf` format for logging: + ``` LOG_FUNC_CALL_INFO(const char *msg_format, ...); LOG_FUNC_CALL_WARN(const char *msg_format, ...);