Skip to content

Commit 590e8da

Browse files
authored
Merge branch 'KernelTuner:master' into HIPbackend
2 parents 598ca39 + b3ff4cd commit 590e8da

File tree

16 files changed

+187
-114
lines changed

16 files changed

+187
-114
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@ This project adheres to [Semantic Versioning](http://semver.org/).
44

55
## Unreleased
66

7+
## [0.4.5] - 2023-06-01
8+
### Added
9+
- PMTObserver to measure power and energy on various platforms
10+
11+
### Changed
12+
- Improved functionality for storing output and metadata files
13+
- Updated PowerSensorObserver to support PowerSensor3
14+
- Refactored interal interfaces of runners and backends
15+
- Bugfix in interface to set objective and optimization direction
16+
717
## [0.4.4] - 2023-03-09
818
### Added
919
- Support for using time_limit in simulation mode

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
include setup.py
22
include LICENSE
33
include README.rst
4+
include kernel_tuner/schema/T4/1.0.0/*

doc/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@
5959
# built documents.
6060
#
6161
# The short X.Y version.
62-
version = u'0.4.4'
62+
version = u'0.4.5'
6363
# The full version, including alpha/beta/rc tags.
64-
release = u'0.4.4'
64+
release = u'0.4.5'
6565

6666
# The language for content autogenerated by Sphinx. Refer to documentation
6767
# for a list of supported languages.

doc/source/matrix_multiplication.ipynb

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,11 @@
314314
" int k, kb;\n",
315315
"\n",
316316
" float sum[tile_size_y][tile_size_x];\n",
317+
" for (int i=0; i < tile_size_y; i++) {\n",
318+
" for (int j=0; j < tile_size_x; j++) {\n",
319+
" sum[i][j] = 0.0;\n",
320+
" }\n",
321+
" } \n",
317322
"\n",
318323
" for (k = 0; k < WIDTH; k += block_size_x) {\n",
319324
"\n",
@@ -430,7 +435,7 @@
430435
],
431436
"metadata": {
432437
"kernelspec": {
433-
"display_name": "Python 3",
438+
"display_name": "Python 3 (ipykernel)",
434439
"language": "python",
435440
"name": "python3"
436441
},
@@ -444,7 +449,7 @@
444449
"name": "python",
445450
"nbconvert_exporter": "python",
446451
"pygments_lexer": "ipython3",
447-
"version": "3.7.9"
452+
"version": "3.9.12"
448453
}
449454
},
450455
"nbformat": 4,

doc/source/observers.rst

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ observer will be called when the event takes place.
1212

1313
Kernel Tuner implements an abstract BenchmarkObserver with methods that may be overwritten by classes extending
1414
the BenchmarkObserver class, shown below. The only mandatory method to implement
15-
is ``get\_results``, which is used to return the resulting observations at the end of benchmarking a
15+
is ``get_results``, which is used to return the resulting observations at the end of benchmarking a
1616
particular kernel configuration and usually returns aggregated results over multiple iterations of kernel
1717
execution. Before tuning starts, each observer is given a reference to the lower-level backend that is used for
1818
compiling and benchmarking the kernel configurations. In this way, the observer can inspect the compiled module,
@@ -53,7 +53,7 @@ the user to record power and/or energy consumption of kernel configurations duri
5353
Kernel Tuner to accurately determine the power and energy consumption of all kernel configurations it benchmarks
5454
during auto-tuning.
5555

56-
.. autoclass:: kernel_tuner.observers.PowerSensorObserver
56+
.. autoclass:: kernel_tuner.observers.powersensor.PowerSensorObserver
5757

5858

5959
NVMLObserver
@@ -74,7 +74,7 @@ time it takes to benchmark different kernel configurations. However, NVML can be
7474
almost all Nvidia GPUs, so this method is much more accessible to end-users compared to solutions that require
7575
custom hardware, such as PowerSensor2.
7676

77-
.. autoclass:: kernel_tuner.nvml.NVMLObserver
77+
.. autoclass:: kernel_tuner.observers.nvml.NVMLObserver
7878

7979

8080
Tuning execution parameters with NVML
@@ -101,7 +101,14 @@ the path where you are allowed to run nvidia-smi with privileges. This allows yo
101101
limits will be done through nvidia-smi.
102102

103103

104+
PMTObserver
105+
~~~~~~~~~~~
104106

107+
The PMTObserver can be used to measure power and energy on various platforms including Nvidia Jetson, Nvidia NVML,
108+
the RAPL interface, AMD ROCM, and Xilinx. It requires PMT to be installed, as well as the PMT's Python interface.
109+
More information about PMT can be found here: https://git.astron.nl/RD/pmt/
110+
111+
.. autoclass:: kernel_tuner.observers.pmt.PMTObserver
105112

106113

107114

examples/cuda/expdist.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import numpy
55

66
from kernel_tuner import tune_kernel
7+
from kernel_tuner import util
78

89
def tune_expdist():
910

@@ -43,7 +44,7 @@ def tune_expdist():
4344
json.dump(kernel1, fp)
4445

4546
#get the number of blocks used by the best configuration in the first kernel
46-
best_config1 = min(kernel1[0], key=lambda x:x['time'])
47+
best_config1 = util.get_best_config(kernel1[0], 'time')
4748
nblocks = numpy.int32( numpy.ceil(size / float(best_config1["block_size_x"]*best_config1["tile_size_x"])) *
4849
numpy.ceil(size / float(best_config1["block_size_y"]*best_config1["tile_size_y"])) )
4950

@@ -56,7 +57,7 @@ def tune_expdist():
5657
kernel2 = tune_kernel("reduce_cross_term", kernel_string, 1, arguments, tune_params,
5758
grid_div_x=[], verbose=True)
5859

59-
best_config2 = min(kernel2[0], key=lambda x:x['time'])
60+
best_config2 = util.get_best_config(kernel2[0], 'time')
6061
print("best GPU configuration, total time=", best_config1['time'] + best_config2['time'])
6162
print(best_config1)
6263
print(best_config2)

kernel_tuner/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from kernel_tuner.integration import store_results, create_device_targets
22
from kernel_tuner.interface import tune_kernel, run_kernel
33

4-
__version__ = "0.4.4"
4+
__version__ = "0.4.5"

0 commit comments

Comments
 (0)