Skip to content

Time Breakdown by NVTX Annotation - V2#157

Open
nkasica wants to merge 3 commits into
hpcgroup:developfrom
nkasica:ann-breakdown
Open

Time Breakdown by NVTX Annotation - V2#157
nkasica wants to merge 3 commits into
hpcgroup:developfrom
nkasica:ann-breakdown

Conversation

@nkasica
Copy link
Copy Markdown

@nkasica nkasica commented Apr 7, 2026

Implements ann_time_breakdown_v2() from Prajwal's fork of Pipit.

ann_time_breakdown_v2() (which is just named ann_time_breakdown() in this PR) builds off #149 which introduces a time breakdown function by NVTX annotation, with the following enhancements:

  1. mapper parameter, which allows users to group kernels into categories and get per-category breakdowns
  2. more efficient time calculations
  3. grouping by row index, rather than annotation name; this allows separate calculations for repeated annotation names, and thus a more detailed breakdown

Additional Notes:

  • When testing, it was discovered that there is a possible discrepancy in parent/child relationships. Events in a parent's _children column did not have the parent listed in their _parent column. I'm currently looking into the nsight_sqlite_reader to see if there are any issues I was using .iloc instead of .loc, and since the DataFrame was already filtered this would return the wrong row; no issues with parent/children
  • To create tests for this function, we need an example GPU trace. Ideally, this will be simple enough to visually review, have NVTX annotations, communication, etc. This will also be helpful for testing future functions for GPU traces

Comment thread pipit/trace.py
ts = child_row["Timestamp (ns)"]
mts = child_row["_matching_timestamp"]

if not pd.isna(child_type) and child_type in ("kernel", "comm"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"comm" does not exist as a type really. My csv generation code using the v2 api had this hacky logic -
trace.events.loc[(trace.events["Name"].str.contains("nccl")) & (trace.events["type"] == "kernel"), "type"] = "comm"

This is not needed really because the regex can handle what kernels belong to communication

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should include "cuda_memcpy" type times as well so that those don't get counted as idle time.

calls_that_launch["index_x"].to_numpy()
)

trace.events = trace_df
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants