Skip to content

MiroMindAI/trace-blame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trace-blame

A Go CLI for analyzing PyTorch Profiler traces.

Reimplements most of the workflows in HolisticTraceAnalysis with following features:

  1. install-skill supports agent usage.
  2. single go binary.
  3. use sqlite table to store intermediate state.
  4. markdown output for cli usage.

Quick Start

go build -o trace-blame ./cmd/trace-blame/

# Install accompany claude skills
trace-blame install-skills

# Parse raw traces into a SQLite database
trace-blame pre-process --trace-dir ./traces --output trace.db

# Run analyses
trace-blame temporal-breakdown --db trace.db
trace-blame gpu-kernel-breakdown --db trace.db
trace-blame idle-time-breakdown --db trace.db --ranks 0,1

Subcommands

Category Subcommands
Preprocessing pre-process
Overview temporal-breakdown, comm-comp-overlap, profiler-steps, potential-stragglers
GPU Kernels gpu-kernel-breakdown, gpu-kernels-with-annotations, cuda-kernel-launch-stats, aten-op-kernels-and-delay, frequent-cuda-kernel-sequences
Counters queue-length-summary, queue-length-time-series, blocked-on-full-queue, memory-bw-summary, memory-bw-time-series, generate-trace-with-counters
Idle Time idle-time-breakdown
Critical Path critical-path
CUPTI cupti-counter-data

Run trace-blame with no arguments for usage, or trace-blame <subcommand> -h for flag details.

Roadmap

  • Expand debugging workflows beyond the current HTA coverage
  • Navigate up and down the operator call stack from within the agent
  • Traverse forward and backward along a CUDA stream in the agent
  • Support memory-profiling workflows

Ideas and contributions are welcome! See CONTRIBUTING.md to get started.

Other Awesome Tools

Check out these tools to make debugging pytorch training job easier:

  • hta analyzes torch profile.
  • tlparse analyzes torch compile process.
  • mosaic analyzes torch memory snapshot.

About

Let agent analyse pytorch profiler dump. A golang rewrite of HolisticTraceAnalysis from Meta, updated for agent use.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages