Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
7c0e231
Tutorials/Accelerated Python/Kernel Authoring: Improve book histogram…
brycelelbach Mar 3, 2026
7c88734
Tutorials/Accelerated Python/Kernel Authoring: Add configurable corre…
brycelelbach Mar 3, 2026
166df50
Tutorials/Accelerated Python/Kernel Authoring: Display book histogram…
brycelelbach Mar 3, 2026
fc9cdf7
Tutorials/Accelerated Python/Kernel Authoring: Add correctness check …
brycelelbach Mar 3, 2026
c713c1e
Tutorials/Accelerated Python/Power Iteration: Fix typos in example us…
brycelelbach Mar 3, 2026
2f0ff1c
Tutorials/Accelerated Python/Kernel Authoring: Apply title, TOC, sect…
brycelelbach Mar 3, 2026
0870f94
Tutorials/Accelerated Python/Kernel Authoring: Add output mode to cop…
brycelelbach Mar 3, 2026
85d3955
Tutorials/Accelerated Python/Memory Spaces: Re-add checkpoint I/O to …
brycelelbach Mar 3, 2026
dbb7c13
Tutorials/Accelerated Python/Memory Spaces: Remove unnecessary isinst…
brycelelbach Mar 3, 2026
9032673
Tutorials/Accelerated Python/Memory Spaces: Rename estimate_device_ex…
brycelelbach Mar 3, 2026
f949362
Tutorials/Accelerated Python/Asynchrony: Limit warmup calls to 1 step…
brycelelbach Mar 3, 2026
7ace634
Tutorials/Accelerated Python/Memory Spaces: Revert benchmarking to us…
brycelelbach Mar 3, 2026
756d3c8
Tutorials/Accelerated Python: Unify benchmark reporting to use millis…
brycelelbach Mar 3, 2026
6a1c179
Tutorials/Accelerated Python/Memory Spaces: Separate eigvals into its…
brycelelbach Mar 3, 2026
e3a5d94
Tutorials/Accelerated Python/Asynchrony: Fix compute step NVTX annota…
brycelelbach Mar 3, 2026
2555b01
Tutorials/Accelerated Python/Memory Spaces: Remove inline benchmarkin…
brycelelbach Mar 3, 2026
b5d19ee
Tutorials/Accelerated Python/Memory Spaces: Remove outdated note abou…
brycelelbach Mar 3, 2026
7f7e7a4
Tutorials/Accelerated Python/Memory Spaces: Fix benchmark to compare …
brycelelbach Mar 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,9 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rO4kOPuP_0JG"
},
"outputs": [],
"source": [
"import os\n",
"\n",
Expand All @@ -46,7 +44,9 @@
" !pip install \"nvtx\" \"nsightful[notebook] @ git+https://github.com/brycelelbach/nsightful.git\" > /dev/null 2>&1\n",
"\n",
"print(\"Environment setup complete.\")"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -77,18 +77,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sEicxhLO_9G9"
},
"outputs": [],
"source": [
"%%writefile power_iteration__baseline.py\n",
"\n",
"import numpy as np\n",
"import cupy as cp\n",
"import cupyx as cpx\n",
"import nvtx\n",
"import time\n",
"from dataclasses import dataclass\n",
"\n",
"@dataclass\n",
Expand Down Expand Up @@ -140,17 +139,18 @@
"A_device = generate_device()\n",
"\n",
"# Warmup to ensure modules are loaded and code is JIT compiled before timing.\n",
"estimate_device(A_device, cfg=PowerIterationConfig(progress=False))\n",
"estimate_device(A_device, cfg=PowerIterationConfig(max_steps=1, check_frequency=1, progress=False))\n",
"cp.cuda.get_current_stream().synchronize()\n",
"\n",
"start = cp.cuda.get_current_stream().record()\n",
"start = time.perf_counter()\n",
"lam_est_device = estimate_device(A_device).item()\n",
"stop = cp.cuda.get_current_stream().record()\n",
"\n",
"duration = cp.cuda.get_elapsed_time(start, stop) / 1e3\n",
"stop = time.perf_counter()\n",
"\n",
"print()\n",
"print(f\"GPU Execution Time: {duration:.3f} s\")"
]
"print(f\"{(stop - start) * 1000:.3f} ms\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -165,14 +165,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1HU5p1IhAkTA"
},
"outputs": [],
"source": [
"!nsys profile --cuda-event-trace=false --force-overwrite true -o power_iteration__baseline python power_iteration__baseline.py"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -189,17 +189,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s6VVOnGQR3Ph"
},
"outputs": [],
"source": [
"import nsightful\n",
"\n",
"!nsys export --type sqlite --quiet true --force-overwrite true power_iteration__baseline.nsys-rep\n",
"nsightful.display_nsys_sqlite_file_in_notebook(\"power_iteration__baseline.sqlite\", title=\"Power Iteration - Baseline\")"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -216,7 +216,7 @@
"The first is to limit when we start and stop profiling in the program. In Python, we can do this with `cupyx.profiler.profile()`, which give us a Python context manager. Any CUDA code used during scope will be included in the profile.\n",
"\n",
"```\n",
"not_in_the profile()\n",
"not_in_the_profile()\n",
"with cpx.profiler.profile():\n",
" in_the_profile()\n",
"not_in_the_profile()\n",
Expand All @@ -227,7 +227,7 @@
"We can also annotate specific regions of our code, which will show up in the profiler. We can even add categories, domains, and colors to these regions, and they can be nested. To add these annotations, we use `nvtx.annnotate()`, another Python context manager, this time from a library called NVTX.\n",
"\n",
"```\n",
"with nvtx.annotate(\"Loop\")\n",
"with nvtx.annotate(\"Loop\"):\n",
" for i in range(20):\n",
" with nvtx.annotate(f\"Step {i}\"):\n",
" pass\n",
Expand Down Expand Up @@ -269,16 +269,15 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile power_iteration__async.py\n",
"\n",
"import numpy as np\n",
"import cupy as cp\n",
"import cupyx as cpx\n",
"import nvtx\n",
"import time\n",
"from dataclasses import dataclass\n",
"\n",
"@dataclass\n",
Expand All @@ -305,17 +304,18 @@
"A_device = generate_device()\n",
"\n",
"# Warmup to ensure modules are loaded and code is JIT compiled before timing.\n",
"estimate_device(A_device, cfg=PowerIterationConfig(progress=False))\n",
"estimate_device(A_device, cfg=PowerIterationConfig(max_steps=1, check_frequency=1, progress=False))\n",
"cp.cuda.get_current_stream().synchronize()\n",
"\n",
"start = cp.cuda.get_current_stream().record()\n",
"start = time.perf_counter()\n",
"lam_est_device = estimate_device(A_device).item()\n",
"stop = cp.cuda.get_current_stream().record()\n",
"\n",
"duration = cp.cuda.get_elapsed_time(start, stop) / 1e3\n",
"stop = time.perf_counter()\n",
"\n",
"print()\n",
"print(f\"GPU Execution Time: {duration:.3f} s\")"
]
"print(f\"{(stop - start) * 1000:.3f} ms\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -326,14 +326,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pszz-k8cDfqy"
},
"outputs": [],
"source": [
"!python power_iteration__async.py"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -348,24 +348,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "uSPFNIb9KcPb"
},
"outputs": [],
"source": [
"power_iteration_baseline_output = !python power_iteration__baseline.py\n",
"power_iteration_baseline_duration = float(power_iteration_baseline_output[-1].split()[-2])\n",
"power_iteration_baseline_duration = float(power_iteration_baseline_output[-1].split()[0])\n",
"power_iteration_async_output = !python power_iteration__async.py\n",
"power_iteration_async_duration = float(power_iteration_async_output[-1].split()[-2])\n",
"power_iteration_async_duration = float(power_iteration_async_output[-1].split()[0])\n",
"speedup = power_iteration_baseline_duration / power_iteration_async_duration\n",
"\n",
"print(f\"GPU Execution Time\")\n",
"print()\n",
"print(f\"power_iteration_baseline: {power_iteration_baseline_duration:.3f} s\")\n",
"print(f\"power_iteration_async: {power_iteration_async_duration:.3f} s\")\n",
"print(f\"power_iteration_baseline: {power_iteration_baseline_output[-1]}\")\n",
"print(f\"power_iteration_async: {power_iteration_async_output[-1]}\")\n",
"print(f\"power_iteration_async speedup over power_iteration_baseline: {speedup:.2f}\")"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -378,14 +376,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BtQR4CHikWFK"
},
"outputs": [],
"source": [
"!nsys profile --cuda-event-trace=false --capture-range=cudaProfilerApi --capture-range-end=stop --force-overwrite true -o power_iteration__async python power_iteration__async.py"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
Expand All @@ -398,15 +396,15 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mWXBvi-hFGhU"
},
"outputs": [],
"source": [
"!nsys export --type sqlite --quiet true --force-overwrite true power_iteration__async.nsys-rep\n",
"nsightful.display_nsys_sqlite_file_in_notebook(\"power_iteration__async.sqlite\", title=\"Power Iteration - Async Event\")"
]
],
"execution_count": null,
"outputs": []
}
],
"metadata": {
Expand All @@ -423,4 +421,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
Loading
Loading