Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
8d1730a
Add distributed launcher support for linex and metrix
mawad-amd Mar 23, 2026
f5ac2f0
Fix command construction: launcher wraps rocprofv3, not the reverse
mawad-amd Mar 23, 2026
a6406da
Replace auto-detection with explicit launcher parameter
mawad-amd Mar 23, 2026
cb1ee05
Fix lint: remove duplicate launcher kwarg, fix parenthesized with for…
mawad-amd Mar 23, 2026
9b07645
Fix lint: add missing launcher param to analyze_instruction_hotspots
mawad-amd Mar 23, 2026
22f8fad
Fix ruff formatting
mawad-amd Mar 23, 2026
f308ae1
Address review feedback: fix docstring placement, forward launcher in…
mawad-amd Mar 23, 2026
06ba4de
Fix launcher forwarding in CounterBackend batch path
mawad-amd Mar 23, 2026
cac6dbd
Fix distributed launcher: rocprofv3 wraps launcher instead of vice versa
mawad-amd Mar 25, 2026
74e5e38
Handle TypeError in CSV parsing for multi-process rocprofv3 traces
mawad-amd Mar 25, 2026
510b79d
Fix unit tests and ruff formatting for new launcher command order
mawad-amd Mar 25, 2026
45b7f5c
feat: per-rank profiling via wrapper script for distributed launchers
mawad-amd Mar 25, 2026
3ddfd8e
fix: three bugs in per-rank wrapper from hardware testing
mawad-amd Mar 25, 2026
3060277
fix: include global_rank in multi-pass merge key to prevent rank coll…
mawad-amd Mar 25, 2026
9a4e0f3
Merge remote-tracking branch 'origin/main' into muhaawad/distributed-…
mawad-amd Mar 25, 2026
feaa833
fix: add dispatch_index and atol/rtol/equal-nan to accordo CLI
mawad-amd Mar 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 23 additions & 10 deletions accordo/accordo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
--ref-binary ./ref \\
--opt-binary ./opt \\
[--tolerance 1e-6] \\
[--atol 1e-6] [--rtol 1e-5] [--equal-nan] \\
[--timeout 30] \\
[--working-dir .] \\
[--kernel-args "input:const float*,output:float*"] \\
Expand Down Expand Up @@ -59,7 +60,15 @@ def _build_validate_parser(subparsers: argparse._SubParsersAction) -> None:
help="Path to optimized executable (single path; use API or a wrapper for argv)",
)
p.add_argument(
"--tolerance", type=float, default=1e-6, help="Absolute tolerance (default: 1e-6)"
"--tolerance", type=float, default=None, help="Legacy alias for --atol (default: 1e-6)"
)
p.add_argument("--atol", type=float, default=None, help="Absolute tolerance (default: 1e-6)")
p.add_argument("--rtol", type=float, default=0.0, help="Relative tolerance (default: 0.0)")
p.add_argument(
"--equal-nan",
action="store_true",
default=False,
help="Treat NaN values as equal (default: False)",
)
p.add_argument(
"--timeout", type=int, default=30, help="Timeout per snapshot in seconds (default: 30)"
Expand Down Expand Up @@ -122,19 +131,23 @@ def _run_validate(args: argparse.Namespace) -> int:
ref_snapshot,
opt_snapshot,
tolerance=args.tolerance,
atol=args.atol,
rtol=args.rtol,
equal_nan=args.equal_nan,
)

mismatches_serialized = []
for m in result.mismatches or []:
mismatches_serialized.append(
{
"arg_index": m.arg_index,
"arg_name": m.arg_name,
"arg_type": m.arg_type,
"max_difference": m.max_difference,
"mean_difference": m.mean_difference,
}
)
entry = {
"arg_index": m.arg_index,
"arg_name": m.arg_name,
"arg_type": m.arg_type,
"max_difference": m.max_difference,
"mean_difference": m.mean_difference,
}
if m.dispatch_index is not None:
entry["dispatch_index"] = m.dispatch_index
mismatches_serialized.append(entry)

output = {
"is_valid": result.is_valid,
Expand Down
25 changes: 25 additions & 0 deletions linex/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,29 @@ for line in profiler.source_lines[:5]:
print(f" {line.total_cycles:,} cycles ({line.stall_percent:.1f}% stalled)")
```

## Distributed Launchers

Linex supports distributed profiling with launchers like `torchrun`, `mpirun`,
`srun`, and `horovodrun`. Pass the launcher separately so Linex builds the
correct command order (`launcher rocprofv3 ... -- app`).

```python
profiler = Linex()
profiler.profile(
command="train.py",
launcher="torchrun --nproc_per_node=8",
output_dir="linex_sqtt",
)

print(profiler.distributed_context.global_rank)
for rank_key, rank_profile in profiler.rank_profiles.items():
print(rank_key, len(rank_profile.source_lines))
```

In distributed mode, Linex writes traces into rank-specific subdirectories
(`.../rank0000`, `.../rank0001`, ...) to avoid collisions. Rank metadata is
automatically detected from environment variables set by the launcher.

## What You Get

**Instruction-level metrics mapped to source lines:**
Expand Down Expand Up @@ -66,6 +89,8 @@ profiler = Linex(
**Properties:**
- `source_lines` - List[SourceLine] sorted by total_cycles
- `instructions` - List[InstructionData]
- `rank_profiles` - Per-rank profiling data for distributed runs
- `distributed_context` - Detected launcher/rank metadata

### SourceLine

Expand Down
4 changes: 2 additions & 2 deletions linex/src/linex/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
providing cycle counts and performance metrics per source line.
"""

from .api import Linex, SourceLine, InstructionData
from .api import InstructionData, Linex, RankProfile, SourceLine

__version__ = "0.1.0"
__all__ = ["Linex", "SourceLine", "InstructionData"]
__all__ = ["Linex", "SourceLine", "InstructionData", "RankProfile"]
Loading
Loading