Tutorials/Accelerated Python: Fixes and improvements to memory spaces, asynchrony, and kernel authoring notebooks#148
Open
brycelelbach wants to merge 18 commits intomainfrom
Conversation
Contributor
❌ Commit Signature Check FailedFound 1 unsigned commit(s):
All commits must be signedHow to fix:
|
Collaborator
Author
|
/ok to test e9e650d |
Contributor
❌ Link Check FailedBroken links were detected in this PR. Please check the workflow run logs for details on which links are broken. Common fixes:
To test links locally:./brev/test-links.bash . |
1a8ecc0 to
bc7da88
Compare
Collaborator
Author
|
/ok to test eab9896 |
… plot formatting, add title and dataset size display.
…ctness check to copy kernel launch function.
… dataset size in megabytes instead of bytes. Made-with: Cursor
…cell before profiling copy_blocked kernel. Add a verification cell that runs the script immediately after the %%writefile cell, matching the pattern used in the book histogram notebook. Made-with: Cursor
…age of cpyx.profiler.profile and NVTX.
…ion header, and text updates to solution notebooks. These changes were made to the exercise notebooks in 2f5c4fc but were not applied to the corresponding solution notebooks. Made-with: Cursor
…y kernel scripts to print problem size and dtype. Made-with: Cursor
…power iteration. The savetxt checkpoint I/O was removed from the 05 memory spaces notebooks in 2f5c4fc. This I/O is needed to set up the narrative for Notebook 06 (Asynchrony), whose baseline is the synchronous device-to-host copy + file write pattern introduced here. Made-with: Cursor
…ance check before cp.asarray(). The isinstance(A, np.ndarray) guard added in 2f5c4fc is unnecessary because cp.asarray() already handles both cases: it copies a host array to the GPU, and is a no-op when the array is already on the GPU. Teaching users to call cp.asarray() unconditionally is the intended lesson. Made-with: Cursor
…ercise and generate_device_exercise back to estimate_device and generate_device. The _exercise suffix was added in 2f5c4fc but breaks the naming symmetry with estimate_host and generate_host. The host/device naming convention is cleaner and mirrors the pattern used in the Notebook 06 (Asynchrony) notebooks. Made-with: Cursor
… to avoid unnecessary computation. Made-with: Cursor
…e the same input matrix for host and device. Different matrices converge at different rates, so it's only valid to benchmark on the same inputs. Use A_host for both host and device benchmarks instead of comparing A_host (CPU) against A_device (GPU-generated). Made-with: Cursor
…econds with mean ± relative stdev format. - 05 Memory Spaces: Use cupyx.profiler.benchmark with mean/stdev/runs format. - 06 Asynchrony: Use time.perf_counter for single-run timing in ms. - 40/41 Kernel Authoring: Convert benchmark output from seconds to ms. - Rename timing variable from D to T across all notebooks. Made-with: Cursor
… own cell, fix capitalization, restore eigvals timing. - Split expensive np.linalg.eigvals call into a dedicated cell timed with time.perf_counter. - Report eigvals timing alongside host/device benchmarks. - Capitalize print labels consistently (Power Iteration, Relative Error). - Use "Timing Host"/"Timing Device" instead of "Timing CPU"/"Timing GPU". Made-with: Cursor
…tions with accurate step ranges and per-step regions. Made-with: Cursor
…g from sections 3 and 4. Restore the original style from before 2f5c4fc: just call the functions, print the estimates, and show both matrices side by side. Benchmarking belongs in section 5 where cupyx.profiler.benchmark is used properly. Made-with: Cursor
…t cupy.linalg.eigvals not being implemented. Made-with: Cursor
…CPU wall-clock times for both host and device. The benchmarking cell was using .gpu_times[0] for the device benchmark but .cpu_times for the host benchmark, which is an apples-to-oranges comparison. The GPU time measures only device execution, excluding kernel launch overhead, synchronization, and other CPU-side costs. The CPU time (wall-clock) is the end-to-end time, which is the fair metric for both. This was introduced in 0c365ed. Made-with: Cursor
eab9896 to
7f7e7a4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes and improvements across four Accelerated Python tutorial notebooks (and their solutions): Memory Spaces, Asynchrony, Copy Kernel Authoring, and Book Histogram Kernel Authoring.
Memory Spaces (Power Iteration)
eigvalsinto its own cell, fix capitalization, and restore eigvals timing.cupy.linalg.eigvalsnot being implemented.isinstancecheck beforecp.asarray().estimate_device_exercise/generate_device_exerciseback toestimate_device/generate_device.Asynchrony (Power Iteration)
Cross-Cutting (Memory Spaces + Asynchrony)
cupyx.profiler.profileand NVTX.Kernel Authoring (Copy)
copy_blockedkernel.Kernel Authoring (Book Histogram)