Skip to content

Conversation

@kaiming-cheng
Copy link
Contributor

Summary:

  • Implements main optimization loop: profile → analyze bottleneck → generate → verify → benchmark
  • Filters NCU metrics to target Triton kernels
  • Tracks optimization metadata (best round, bottleneck category, NCU metrics)
  • Integrates with BottleneckAnalyzer, VerificationWorker, and PromptManager

Kaiming Cheng and others added 30 commits January 15, 2026 11:44
Consolidates previous kernel_benchmark.py and pytorch_benchmark.py into a
streamlined 3-file architecture with clear separation of concerns:

Architecture:
- benchmark.py (299 lines): Main Benchmark class with simplified API
  - benchmark_kernel(): Always uses subprocess for crash protection
  - benchmark_pytorch(): Always uses direct mode for stable code
  - BenchmarkLockManager: GPU lock management for multi-worker scenarios

- timing.py (437 lines): Complete timing infrastructure
  - Timing: time_with_cuda_events(), time_with_triton_do_bench()
  - Loading: prepare_pytorch_model(), load_kernel_function()
  - Stats: compute_timing_stats() with essential metrics (mean/std/min/max)

- kernel_subprocess.py (442 lines): Subprocess runner for kernel isolation
  - Crash protection for potentially buggy kernels
  - Clean CUDA state between runs
  - Timeout handling

Key improvements:
- Eliminated string code generation (was generating Python as strings)
- Removed unnecessary statistics (median, p25/p75/p95/p99)
- Removed confusing use_subprocess parameter (behavior now deterministic)
- Fixed dtype bug causing incorrect speedup measurements
- Reduced from 5 files to 3 files with clearer naming
- Code reduction: ~1,400 lines → 1,178 lines

Simple API:
  bench = Benchmark(logger, temp_dir, lock, worker_id)
  pytorch_result = bench.benchmark_pytorch(problem_file)
  kernel_result = bench.benchmark_kernel(kernel_file, problem_file)
  speedup = pytorch_result['stats']['mean'] / kernel_result['time_ms']
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 2, 2026
@kaiming-cheng kaiming-cheng requested review from Jack-Khuu and Laurawly and removed request for Laurawly February 3, 2026 22:58
Copy link
Contributor

@Jack-Khuu Jack-Khuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed just OptOrchest since i assume that is the main delta

from kernel_perf_agent.kernel_opt.roofline.ncu_roofline import RooflineAnalyzer
from triton_kernel_agent.prompt_manager import PromptManager
from triton_kernel_agent.worker import VerificationWorker
from triton_kernel_agent.worker_util import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto this file doesn't exist anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the functions*

Comment on lines +64 to +65
# Fallback: return first kernel if no Triton kernel found
return next(iter(ncu_metrics.values()), {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we return the first kernel if there are no Triton kernels? This seems like unexpected behavior?

Comment on lines +355 to +362
if self.pytorch_baseline_time is not None:
pytorch_baseline_time = self.pytorch_baseline_time
if pytorch_baseline_time != float("inf"):
self.logger.info(
f"📊 PyTorch baseline: {pytorch_baseline_time:.4f} ms (pre-computed)"
)
else:
pytorch_baseline_time = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.pytorch_baseline_time is not None:
pytorch_baseline_time = self.pytorch_baseline_time
if pytorch_baseline_time != float("inf"):
self.logger.info(
f"📊 PyTorch baseline: {pytorch_baseline_time:.4f} ms (pre-computed)"
)
else:
pytorch_baseline_time = None
if self.pytorch_baseline_time is not None and self.pytorch_baseline_time != float("inf"):
self.logger.info(
f"📊 PyTorch baseline: {pytorch_baseline_time:.4f} ms (pre-computed)"
)
else:
pytorch_baseline_time = None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants