@@ -214,7 +214,11 @@ interactions in detail, Nsight Compute is the tool for you. It is again possible
214214profiler with an interactive session of Julia, and debug or profile only those sections of
215215your application that are marked with ` CUDA.@profile ` .
216216
217- Start with launching Julia under the Nsight Compute CLI tool:
217+ First, ensure that all (CUDA) packages that are involved in your application have been
218+ precompiled. Otherwise, you'll end up profiling the precompilation process, instead of
219+ the process where the actual work happens.
220+
221+ Then, launch Julia under the Nsight Compute CLI tool as follows:
218222
219223```
220224$ ncu --mode=launch julia
@@ -224,23 +228,25 @@ You will get an interactive REPL, where you can execute whatever code you want:
224228
225229``` julia
226230julia> using CUDA
227-
228- julia> CUDA. driver_version ()
229-
230231# Julia hangs!
231232```
232233
233234As soon as you use CUDA.jl, your Julia process will hang. This is expected, as the tool
234235breaks upon the very first call to the CUDA API, at which point you are expected to launch
235- the Nsight Compute GUI utility and attach to the running session:
236+ the Nsight Compute GUI utility, select ` Interactive Profile ` under ` Activity ` , and attach
237+ to the running session by selecting it in the list in the ` Attach ` pane:
236238
237239![ "NVIDIA Nsight Compute - Attaching to a session"] ( nsight_compute-attach.png )
238240
239- You will see that the tool has stopped execution on the call to ` cuInit ` . Now check
240- ` Profile > Auto Profile ` to make Nsight Compute gather statistics on our kernels, and clock
241- ` Debug > Resume ` to resume your session.
241+ Note that this even works with remote systems, i.e., you can have NSight Compute connect
242+ over ssh to a remote system where you run Julia under ` ncu ` .
242243
243- Now our CLI session comes to life again, and we can enter the rest of our script:
244+ Once you've successfully attached to a Julia process, you will see that the tool has stopped
245+ execution on the call to ` cuInit ` . Now check ` Profile > Auto Profile ` to make Nsight Compute
246+ gather statistics on our kernels, uncheck ` Debug > Break On API Error ` to avoid halting the
247+ process when innocuous errors happen, and click ` Debug > Resume ` to resume your application.
248+
249+ After doing so, our CLI session comes to life again, and we can execute the rest of our script:
244250
245251``` julia
246252julia> a = CUDA. rand (1024 ,1024 ,1024 );
@@ -254,6 +260,12 @@ Once that's finished, the Nsight Compute GUI window will have plenty details on
254260
255261![ "NVIDIA Nsight Compute - Kernel profiling"] ( nsight_compute-kernel.png )
256262
263+ By default, this only collects a basic set of metrics. If you need more information on a
264+ specific kernel, select ` detailed ` or ` full ` in the ` Metric Selection ` pane and re-run
265+ your kernels. Note that collecting more metrics is also more expensive, sometimes even
266+ requiring multiple executions of your kernel. As such, it is recommended to only collect
267+ basic metrics by default, and only detailed or full metrics for kernels of interest.
268+
257269At any point in time, you can also pause your application from the debug menu, and inspect
258270the API calls that have been made:
259271
0 commit comments