diff --git a/doc/user_guide/profiling.rst b/doc/user_guide/profiling.rst index 5bf781da97..7025377a2d 100644 --- a/doc/user_guide/profiling.rst +++ b/doc/user_guide/profiling.rst @@ -35,6 +35,7 @@ .. Modified by A. R. Porter, STFC Daresbury Lab .. Modified by R. W. Ford, STFC Daresbury Lab .. Modified by I. Kavcic, Met Office +.. Modified by T. H. Gibson, Advanced Micro Devices, Inc. .. _userguide-profiling: @@ -52,7 +53,8 @@ transformation within a transformation script. PSyclone can be used with a variety of existing profiling tools. It currently supports dl_timer, TAU, Vernier, Dr Hook, the NVIDIA GPU -profiling tools and it comes with a simple stand-alone timer library. +profiling tools (NVTX), the AMD ROCm profiling tools (ROCTx), and it +comes with a simple stand-alone timer library. The :ref:`PSyData API ` (see also the :ref:`Developer Guide `) is utilised to implement wrapper libraries that connect the PSyclone @@ -78,11 +80,11 @@ Interface to Third Party Profiling Tools PSyclone comes with :ref:`wrapper libraries ` to support usage of TAU, Vernier, Dr Hook, dl_timer, NVTX (NVIDIA Tools Extension -library), and a simple non-thread-safe timing library. Support for further -profiling libraries will be added in the future. To compile the -wrapper libraries, change into the directory ``lib/profiling`` -of PSyclone and type ``make`` to compile all wrappers. If only -some of the wrappers are required, you can either use +library), ROCTx (AMD library for code instrumentation), and a simple non-thread-safe timing +library. Support for further profiling libraries will be added in the +future. To compile the wrapper libraries, change into the directory +``lib/profiling`` of PSyclone and type ``make`` to compile all wrappers. +If only some of the wrappers are required, you can either use ``make wrapper-name`` (e.g. ``make drhook``), or change into the corresponding directory and use ``make``. The corresponding ``README.md`` files contain additional parameters @@ -131,6 +133,11 @@ libraries that come with PSyclone: to the NVIDIA Tools Extension library (NVTX). This library is available from https://developer.nvidia.com/cuda-toolkit. +``lib/profiling/amd`` + This is a wrapper library that maps the PSyclone profiling API + to the AMD ROCTx library. ROCTx documentation is available + from https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofiler-sdk-roctx.html. + ``lib/profiling/lfric_timer`` This profile wrapper uses the timer functionality provided by LFRic, and it comes in two different versions: @@ -160,7 +167,7 @@ wrapper provided by the tool which will provide the required additional compiler parameters. The exceptions are the template and simple_timing libraries, which are stand alone. The profiling example in ``examples/gocean/eg5/profile`` can be used with any of the -wrapper libraries (except ``nvidia``) to see how they work. +wrapper libraries (except ``nvidia`` and ``amd``) to see how they work. .. _required_profiling_calls: @@ -168,7 +175,10 @@ Required Modifications to the Program ------------------------------------- In order to guarantee that any profiling library is properly initialised, PSyclone's profiling wrappers utilise two additional -function calls that the user must manually insert into the program: +function calls that the user must manually insert into the program +(the NVIDIA NVTX wrapper in ``lib/profiling/nvidia`` and the AMD ROCTx +wrapper in ``lib/profiling/amd`` are exceptions and do not require these +calls): profile_PSyDataInit() ~~~~~~~~~~~~~~~~~~~~~ @@ -249,9 +259,10 @@ cannot be used as there is no concept of `kernels`. GPU execution). .. note:: It is still the responsibility of the user to manually - add the calls to ``profile_PSyDataInit`` and - ``profile_PSyDataShutdown`` to the - code base (see :ref:`required_profiling_calls`). + add the calls to ``profile_PSyDataInit`` and + ``profile_PSyDataShutdown`` to the code base (see + :ref:`required_profiling_calls`), unless using the NVIDIA NVTX or + AMD ROCTx wrapper. PSyclone will modify the schedule of each invoke to insert the profiling regions. Below we show an example of a schedule created diff --git a/examples/nemo/README.md b/examples/nemo/README.md index 0ed3998b1a..cfccafd95e 100644 --- a/examples/nemo/README.md +++ b/examples/nemo/README.md @@ -149,3 +149,13 @@ supported for generic transformations. A simple stand-alone example that shows verification that read-only data is not modified, e.g. by out-of-bounds accesses to other variables. This uses the PSyData interface to instrument generic Fortran code. + +## Example 7 + +OpenMP parallelisation (for CPU and GPU) of `tra_adv` over levels, using +`nowait` and minimisation of introduced barriers. + +## Example 8 + +A simple profiling example that shows OpenMP offloading transformations +with profiling hooks enabled. diff --git a/examples/nemo/eg8/Makefile b/examples/nemo/eg8/Makefile new file mode 100644 index 0000000000..99651a8fe1 --- /dev/null +++ b/examples/nemo/eg8/Makefile @@ -0,0 +1,81 @@ +# ----------------------------------------------------------------------------- +# BSD 3-Clause License +# +# Copyright (c) 2026, Science and Technology Facilities Council. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, this +# list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# * Neither the name of the copyright holder nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +# COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. +# ------------------------------------------------------------------------------ +# Author: T. H. Gibson, Advanced Micro Devices, Inc. + +# Set the compiler and flags in your environment before running make. +# This Makefile assumes F90FLAGS contains all compiler options needed for +# OpenMP and target offload (e.g. -fopenmp and --offload-arch=...). +# It also assumes LDFLAGS contains all linker options and vendor runtime +# libraries needed by your compiler/runtime stack. + +# Example for AMD GPU offload using amdflang: +# export F90=amdflang +# export F90FLAGS="-O3 -fopenmp --offload-arch=gfx942" +# export LDFLAGS="-fopenmp --offload-arch=gfx942 -L${ROCM_PATH}/lib -lrocprofiler-sdk-roctx" + +# Then run: +# make clean compile run + +include ../../common.mk + +GENERATED_FILES = traadv_instrumented.F90 \ + traadv_instrumented.o \ + traadv.exe \ + output.dat + +PROFILER ?= rocprofv3 +PROFILER_FLAGS ?= --runtime-trace --output-format pftrace + +# Profiling wrapper settings +PSYCLONE_PROFILING_DIR ?= $(PSYCLONE_DIR)/lib/profiling/amd +PSYCLONE_PROFILING_INCLUDE ?= ${PSYCLONE_PROFILING_DIR} +PSYCLONE_PROFILING_LIB ?= ${PSYCLONE_PROFILING_DIR}/libroctx_prof.a +PSYCLONE_PROFILING_LIBS ?= -L${PSYCLONE_PROFILING_DIR} -lroctx_prof + +transform: + ENABLE_PROFILING=1 ${PSYCLONE} -s ./omp_gpu_profile_trans.py ../code/tra_adv.F90 -o traadv_instrumented.F90 + +compile: transform traadv.exe + +run: traadv.exe + IT=10 JPI=64 JPJ=64 JPK=32 ${PROFILER} ${PROFILER_FLAGS} -- ./traadv.exe + +traadv.exe: traadv_instrumented.o ${PSYCLONE_PROFILING_LIB} + ${F90} ${F90FLAGS} traadv_instrumented.o -o traadv.exe ${LDFLAGS} ${PSYCLONE_PROFILING_LIBS} + +traadv_instrumented.o: traadv_instrumented.F90 ${PSYCLONE_PROFILING_LIB} + ${F90} ${F90FLAGS} -I${PSYCLONE_PROFILING_INCLUDE} -c $< -o $@ + +${PSYCLONE_PROFILING_LIB}: + ${MAKE} -C ${PSYCLONE_PROFILING_DIR} F90=${F90} diff --git a/examples/nemo/eg8/README.md b/examples/nemo/eg8/README.md new file mode 100644 index 0000000000..887a443cd0 --- /dev/null +++ b/examples/nemo/eg8/README.md @@ -0,0 +1,90 @@ +# PSyclone NEMO Example 8 + +**Author:** T. H. Gibson, Advanced Micro Devices, Inc. + +This example demonstrates a simple profiling workflow for OpenMP target +offloading, using the tracer advection demo. It processes `../code/tra_adv.F90` and +generates `traadv_instrumented.F90` with OpenMP target offload directives plus +profiling hooks. The transformation script `omp_gpu_profile_trans.py` is a +small local transform script that uses shared helpers from `../scripts` and +inserts profile regions around *all* OpenMP target regions. + +## Running + +```sh +make transform +``` + +or explicitly: + +```sh +ENABLE_PROFILING=1 ${PSYCLONE} -s ./omp_gpu_profile_trans.py ../code/tra_adv.F90 -o traadv_instrumented.F90 +``` + +This emits transformed Fortran code with PSyData profiling around OpenMP target +regions. + +## Compiling and Running + +This example supports compilation and execution using the AMD ROCTx profiling +wrapper in `../../../lib/profiling/amd` and the ROCm profiler (`rocprofv3`) +by default. It can also be tested with NVIDIA tooling by overriding the relevant +Makefile variables (compiler/flags and profiling wrapper variables such as +`PSYCLONE_PROFILING_DIR`, `PSYCLONE_PROFILING_LIB`, and +`PSYCLONE_PROFILING_LIBS`). + +Typical compiler settings for AMD GPU offloading are: + +```sh +export F90=amdflang +export F90FLAGS="-O3 -fopenmp --offload-arch=" +export LDFLAGS="-fopenmp --offload-arch= -L${ROCM_PATH}/lib -lrocprofiler-sdk-roctx" +``` + +Then build and run: + +```sh +make compile +make run +``` + +For more information on profiling wrappers and profiler-specific options, see the +[profiling wrappers README](../../../lib/profiling/README.md). + +## Licence + +----------------------------------------------------------------------------- + +BSD 3-Clause License + +Copyright (c) 2026, Science and Technology Facilities Council. +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +* Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +* Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +----------------------------------------------------------------------------- diff --git a/examples/nemo/eg8/omp_gpu_profile_trans.py b/examples/nemo/eg8/omp_gpu_profile_trans.py new file mode 100644 index 0000000000..106b0a15ff --- /dev/null +++ b/examples/nemo/eg8/omp_gpu_profile_trans.py @@ -0,0 +1,116 @@ +#!/usr/bin/env python +# ----------------------------------------------------------------------------- +# BSD 3-Clause License +# +# Copyright (c) 2026, Science and Technology Facilities Council. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, this +# list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# * Neither the name of the copyright holder nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +# COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. +# ----------------------------------------------------------------------------- + +import os +import pathlib +import sys +from typing import List, Union +from psyclone.psyir.nodes import ( + Assignment, IfBlock, Node, OMPDirective, ProfileNode, Routine, Schedule) +from psyclone.psyir.transformations import OMPTargetTrans +from psyclone.transformations import OMPLoopTrans + +# Add examples/nemo/scripts to python path; needed to import utils.py +SCRIPT_DIR = pathlib.Path(__file__).resolve().parent +NEMO_SCRIPTS_DIR = SCRIPT_DIR.parent / "scripts" +if str(NEMO_SCRIPTS_DIR) not in sys.path: + sys.path.insert(0, str(NEMO_SCRIPTS_DIR)) + + +PROFILING_ENABLED = os.environ.get("ENABLE_PROFILING", False) + + +def add_omp_region_profiling_markers(children: Union[List[Node], Schedule]): + """Insert profiling markers around all top-level OpenMP directives. + + :param children: a Schedule or sibling nodes in the PSyIR to which to + attempt to add profiling regions. + """ + from utils import add_profile_region + + if children and isinstance(children, Schedule): + # If we are given a Schedule, we look at its children. + children = children.children + # If we are given an empty list, we return. + if not children: + return + # We do not want profiling calipers inside functions (such as the + # PSyclone-generated comparison functions). + parent_routine = children[0].ancestor(Routine) + if parent_routine and parent_routine.return_symbol: + return + # Iterate over the children and wrap top-level OpenMP directives. + for child in children[:]: + if isinstance(child, OMPDirective): + # Only wrap top-level OpenMP directives and not + # nested directives or profiling markers. + if (not child.ancestor(OMPDirective) and + not child.ancestor(ProfileNode)): + add_profile_region([child]) + if isinstance(child, IfBlock): + # Recursively wrap any nested OpenMP kernels in if/else constructs. + add_omp_region_profiling_markers(child.if_body) + add_omp_region_profiling_markers(child.else_body) + elif not isinstance(child, Assignment): + add_omp_region_profiling_markers(child.children) + + +def trans(psyir): + """Apply OpenMP offloading and insert profiling around target regions.""" + from utils import normalise_loops, insert_explicit_loop_parallelism + + omp_target_trans = OMPTargetTrans() + omp_loop_trans = OMPLoopTrans(omp_schedule="none") + omp_loop_trans.omp_directive = "teamsloop" + + for subroutine in psyir.walk(Routine): + normalise_loops( + subroutine, + hoist_local_arrays=False, + convert_array_notation=True, + loopify_array_intrinsics=True, + convert_range_loops=True, + increase_array_ranks=False, + hoist_expressions=True + ) + insert_explicit_loop_parallelism( + subroutine, + region_directive_trans=omp_target_trans, + loop_directive_trans=omp_loop_trans, + collapse=True, + enable_reductions=True + ) + if PROFILING_ENABLED: + add_omp_region_profiling_markers(subroutine.children) diff --git a/lib/profiling/Makefile b/lib/profiling/Makefile index eb4dc2a2d9..b28887599a 100644 --- a/lib/profiling/Makefile +++ b/lib/profiling/Makefile @@ -48,10 +48,10 @@ NO_DEP_LIBS = lfric_timer simple_timing template # The list with all libraries (include the ones that have additional # dependencies): -ALL_LIBS = $(NO_DEP_LIBS) dl_timer drhook nvidia tau vernier +ALL_LIBS = $(NO_DEP_LIBS) amd dl_timer drhook nvidia tau vernier .PHONY: default all $(NO_DEP_LIBS) clean allclean \ - dl_timer drhook nvidia tau vernier + amd dl_timer drhook nvidia tau vernier # By default, compile all libraries that do not have additional dependencies # The 'all' target is used by the compilation tests, so this also can only diff --git a/lib/profiling/README.md b/lib/profiling/README.md index f328027747..2708d61ab2 100644 --- a/lib/profiling/README.md +++ b/lib/profiling/README.md @@ -6,7 +6,7 @@ https://psyclone.readthedocs.io/en/latest/user_guide/profiling.html#profiling). profiling-library interfaces use the the [PSyData API]( https://psyclone.readthedocs.io/en/latest/user_guide/psy_data.html). The profiling wrappers included in PSyclone are: ``template``, -``simple_timing``, ``dl_timer``, ``drhook``, ``nvidia``, ``tau`` and +``simple_timing``, ``dl_timer``, ``drhook``, ``nvidia``, ``amd``, ``tau`` and ``lfric_timer``. The overview is given below (for more information please refer to the linked individual ``README.md`` documents). @@ -175,6 +175,16 @@ Example output (from ``nvprof``): 6.17% 12.729us 3 4.2430us 2.3700us 7.7330us cuMemsetD32Async ``` +### [AMD](./amd) + +This wrapper library maps the PSyclone profiling API to the AMD ROCTx +library, providing code annotations for profiling GPU applications with +AMD's ROCm tools (e.g. ``rocprofv3``). This is very useful for identifying +regions which have not been offloaded yet. + +Detailed building and linking instructions are in +[``amd/README.md``](./amd/README.md). + ### [LFRic timer](./lfric_timer) This wrapper library uses the LFRic timer object. It can not only be @@ -220,7 +230,7 @@ In general is recommended to first pack the profiling output files into one file The top level ``Makefile`` can be used to compile the profiling-library interfaces included in PSyclone. The command ``make TARGET`` where ``TARGET`` is one of ``template``, ``simple_timing``, ``dl_timer``, ``drhook``, ``nvidia``, -``lfric_timer`` or ``tau``, will only compile the corresponding library interface. +``amd``, ``lfric_timer`` or ``tau``, will only compile the corresponding library interface. The target ``make all``, which is also the default, will compile all libraries that do not need additional software or libraries to be installed, i.e. ``lfric_timer``, ``simple_timing`` and ``template``. diff --git a/lib/profiling/amd/Makefile b/lib/profiling/amd/Makefile new file mode 100644 index 0000000000..a9dbc10283 --- /dev/null +++ b/lib/profiling/amd/Makefile @@ -0,0 +1,58 @@ +# ----------------------------------------------------------------------------- +# BSD 3-Clause License +# +# Copyright (c) 2024-2026, Science and Technology Facilities Council. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, this +# list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# * Neither the name of the copyright holder nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +# COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. +# ----------------------------------------------------------------------------- +# Author: T. H. Gibson, Advanced Micro Devices, Inc. + +# Makefile for the PSyclone wrapper to the AMD ROCTx profiling library. + +# ----------- Default "make" values, can be overwritten by the user ----------- +# Compiler and compiler flags +F90 ?= gfortran +F90FLAGS ?= -g +# ----------------------------------------------------------------------------- + +PSYDATA_LIB_NAME = roctx_prof +PSYDATA_LIB = lib$(PSYDATA_LIB_NAME).a + +default: $(PSYDATA_LIB) + +.PHONY: default clean allclean + +$(PSYDATA_LIB): $(PSYDATA_LIB_NAME).f90 + $(F90) -c $(F90FLAGS) $< + ar rs $(PSYDATA_LIB) $(PSYDATA_LIB_NAME).o + +clean: + rm -f *.o *.mod $(PSYDATA_LIB) + +allclean: clean diff --git a/lib/profiling/amd/README.md b/lib/profiling/amd/README.md new file mode 100644 index 0000000000..77cdd0040a --- /dev/null +++ b/lib/profiling/amd/README.md @@ -0,0 +1,86 @@ +# AMD ROCTx Wrapper + +This is a wrapper library that maps the [PSyclone profiling API]( +https://psyclone.readthedocs.io/en/latest/user_guide/profiling.html#profiling) to the +AMD ROCTx library. ROCTx provides code annotation capabilities for profiling GPU applications +with AMD's ROCm profiling tools. + +Unlike some of the other profiling tools, the use of this library does *not* require +that calls to ``profile_PSyDataInit()`` and ``profile_PSyDataShutdown()`` be inserted +into the application. + +This wrapper supports the ``profile_PSyDataStart()`` and ``profile_PSyDataStop()`` API +calls that may be used in order to limit the region of code that is profiled at runtime. +These use the ROCTx ``roctxProfilerPause()`` and ``roctxProfilerResume()`` functions. + +## Dependencies + +This wrapper uses the **rocprofiler-sdk-roctx** library from ROCm, which provides +the full ROCTx API including profiler control functions. The library +(``librocprofiler-sdk-roctx.so``) is required at link time and runtime. + +The ROCTx library is typically located at: +- ``$ROCM_PATH/lib/librocprofiler-sdk-roctx.so`` + +For documentation on ROCTx, see: +- https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofiler-sdk-roctx.html + +## Compilation + +A ``Makefile`` is provided and just executing `make` should build the wrapper +library. By default the ``gfortran`` compiler is used but ``amdflang`` is recommended in general. +Running ``make`` will produce ``libroctx_prof.a`` and ``profile_psy_data_mod.mod``. + +When compiling the application that has been instrumented for profiling, the +location of the ``profile_psy_data_mod.mod`` file must be provided as an +include/module path, e.g. ``-I/path/to/psyclone/lib/profiling/amd``. + +### Linking the wrapper library + +At the link stage, the location of the wrapper library AND the ROCTx library +must be provided: + +```shell +amdflang \ + -L/lib/profiling/amd -lroctx_prof \ + -L$ROCM_PATH/lib -lrocprofiler-sdk-roctx +``` + +**Note**: The ```` differs depending on whether the +wrapper library is compiled in a clone of the PSyclone repository or in a +PSyclone [installation](./../../README.md#installation). + +## Profiling Your Application + +Once the application has been built with ROCTx instrumentation, it may be +profiled using AMD's profiling tools: + +### Using rocprofv3 (recommended for ROCm 6.0+) + +```shell +# Trace marker regions +rocprofv3 --marker-trace --output-format pftrace -- ./your_app +``` + +This generates a ``marker_api_trace.csv`` file (prefixed with process ID) containing: +- Domain: MARKER_CORE_API +- Function: The region name (module:region format) +- Process_Id, Thread_Id +- Start_Timestamp, End_Timestamp (in nanoseconds) + +To collect not just ROCTx markers, but also kernel, memory copy, and HIP API traces, use `--runtime-trace` instead. For more +detailed usage of AMD profiling tools, refer to the AMD profiling guide series: +- [Performance Profiling on AMD GPUs – Part 1: Foundations](https://rocm.blogs.amd.com/software-tools-optimization/profiling-guide/intro/README.html) +- [Performance Profiling on AMD GPUs – Part 2: Basic Usage](https://rocm.blogs.amd.com/software-tools-optimization/profiling-guide/novice/README.html) +- [Performance Profiling on AMD GPUs – Part 3: Advanced Usage](https://rocm.blogs.amd.com/software-tools-optimization/profiling-guide/advanced/README.html) + +## ROCTx API Reference + +The wrapper uses the following ROCTx functions: + +| ROCTx Function | PSyclone API | Description | +|---------------|--------------|-------------| +| `roctxRangePushA()` | `PreStart()` | Start a named profiling range | +| `roctxRangePop()` | `PostEnd()` | End the current profiling range | +| `roctxProfilerPause()` | `profile_PSyDataStop()` | Pause profiling | +| `roctxProfilerResume()` | `profile_PSyDataStart()` | Resume profiling | diff --git a/lib/profiling/amd/roctx_prof.f90 b/lib/profiling/amd/roctx_prof.f90 new file mode 100644 index 0000000000..7177ad966c --- /dev/null +++ b/lib/profiling/amd/roctx_prof.f90 @@ -0,0 +1,186 @@ +! ----------------------------------------------------------------------------- +! BSD 3-Clause License +! +! Copyright (c) 2024-2026, Science and Technology Facilities Council. +! All rights reserved. +! +! Redistribution and use in source and binary forms, with or without +! modification, are permitted provided that the following conditions are met: +! +! * Redistributions of source code must retain the above copyright notice, this +! list of conditions and the following disclaimer. +! +! * Redistributions in binary form must reproduce the above copyright notice, +! this list of conditions and the following disclaimer in the documentation +! and/or other materials provided with the distribution. +! +! * Neither the name of the copyright holder nor the names of its +! contributors may be used to endorse or promote products derived from +! this software without specific prior written permission. +! +! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +! AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +! DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +! FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +! DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +! SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +! CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +! OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +! OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +! ----------------------------------------------------------------------------- +! Author: T. H. Gibson, Advanced Micro Devices, Inc. + +module profile_psy_data_mod + + use iso_c_binding, only : C_CHAR, C_INT, C_INT64_T, C_NULL_CHAR + + implicit none + + private + + !> The derived type passed to us from the profiled application. Required for + !! consistency with the PSyclone Profiling interface and to prevent repeated string + !! operations. + type, public :: profile_PSyDataType + !> Whether or not we've seen this region before + logical :: initialised = .false. + !> Name assigned to the region + character(kind=C_CHAR, len=256) :: name = "" + contains + ! The profiling API uses only the two following calls: + procedure :: PreStart + procedure :: PostEnd + end type profile_PSyDataType + + ! ROCTx C API bindings (rocprofiler-sdk-roctx) + ! See: https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofiler-sdk-roctx.html + interface + ! Push a new nested range with a name string + ! NOTE: roctxRangePush is a macro that expands to roctxRangePushA in the rocprofiler-sdk-roctx library. + ! This is why we use roctxRangePushA here and not roctxRangePush. + integer(C_INT) function roctxRangePushA(name) bind(C, name='roctxRangePushA') + use iso_c_binding + character(kind=C_CHAR), intent(in) :: name(*) + end function roctxRangePushA + + ! Pop the current nested range + integer(C_INT) function roctxRangePop() bind(C, name='roctxRangePop') + use iso_c_binding + end function roctxRangePop + + ! Get the current thread ID (returns 0 on success) + integer(C_INT) function roctxGetThreadId(tid) bind(C, name='roctxGetThreadId') + use iso_c_binding + integer(C_INT64_T), intent(out) :: tid + end function roctxGetThreadId + + ! Request profiling tool to pause data collection (tid=0 for all threads) + integer(C_INT) function roctxProfilerPause(tid) bind(C, name='roctxProfilerPause') + use iso_c_binding + integer(C_INT64_T), value, intent(in) :: tid + end function roctxProfilerPause + + ! Request profiling tool to resume data collection (tid=0 for all threads) + integer(C_INT) function roctxProfilerResume(tid) bind(C, name='roctxProfilerResume') + use iso_c_binding + integer(C_INT64_T), value, intent(in) :: tid + end function roctxProfilerResume + end interface + + ! Only the routines making up the PSyclone profiling API are public + public profile_PSyDataInit, profile_PSyDataShutdown, & + profile_PSyDataStart, profile_PSyDataStop + +contains + + !> An optional initialisation subroutine. This is not used for the ROCTx + !! library. + subroutine profile_PSyDataInit() + implicit none + return + end subroutine profile_PSyDataInit + + !> Enables profiling (if it is not already enabled). May be manually added + !! to source code in order to limit the amount of profiling performed at + !! run time. Uses roctxProfilerResume to request the profiling tool to + !! resume data collection. + subroutine profile_PSyDataStart() + implicit none + integer(C_INT64_T) :: tid + integer(C_INT) :: ierr + + ierr = roctxGetThreadId(tid) + ierr = roctxProfilerResume(tid) + + end subroutine profile_PSyDataStart + + !> Turns off profiling. All subsequent calls to the profiling API + !! will have no effect. Use in combination with profile_PSyDataStart() to + !! limit the amount of profiling performed at runtime. Uses roctxProfilerPause + !! to request the profiling tool to pause data collection. + subroutine profile_PSyDataStop() + implicit none + integer(C_INT64_T) :: tid + integer(C_INT) :: ierr + + ierr = roctxGetThreadId(tid) + ierr = roctxProfilerPause(tid) + + end subroutine profile_PSyDataStop + + !> Starts a profiling area. The module and region name can be used to create + !! a unique name for each region. + !! Parameters: + !! @param[in,out] this This PSyData instance. + !! @param[in] module_name Name of the module in which the region is + !! @param[in] region_name Name of the region (could be name of an invoke, or + !! subroutine name). + !! @param[in] num_pre_vars The number of variables that are declared and + !! written before the instrumented region. + !! @param[in] num_post_vars The number of variables that are also declared + !! before an instrumented region of code, but are written after + !! this region. + subroutine PreStart(this, module_name, region_name, num_pre_vars, & + num_post_vars) + + implicit none + + class(profile_PSyDataType), target, intent(inout) :: this + character(len=*), intent(in) :: module_name, region_name + integer, intent(in) :: num_pre_vars, num_post_vars + ! Locals + integer(C_INT) :: range_id + + if (.not. this%initialised) then + ! This is the first time we've seen this region. Construct and + ! save its name to save on future string operations. + this%initialised = .true. + this%name = trim(module_name)//":"//trim(region_name)//C_NULL_CHAR + end if + + range_id = roctxRangePushA(this%name) + + end subroutine PreStart + + !> Ends a profiling area. + !! @param[in,out] this: Persistent data, not used in this case. + subroutine PostEnd(this) + + implicit none + + class(profile_PSyDataType), target :: this + integer(C_INT) :: range_id + + range_id = roctxRangePop() + + end subroutine PostEnd + + !> The finalise function would normally print the results. However, this + !> is unnecessary for the ROCTx library so we do nothing. + subroutine profile_PSyDataShutdown() + implicit none + return + end subroutine profile_PSyDataShutdown + +end module profile_psy_data_mod