-
Notifications
You must be signed in to change notification settings - Fork 28
Introduce Judger Prompt Component #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a structured bottleneck-diagnosis pipeline around Nsight Compute metrics, including roofline analysis, GPU spec lookup, and an LLM-oriented prompt/response interface.
Changes:
- Add an NCU SOL-based roofline analysis module (
RooflineConfig,RooflineResult,RooflineAnalyzer,format_roofline_summary) and a correspondingrooflinepackage scaffold. - Introduce a diagnose prompt subsystem with metric schemas, GPU specs database/accessor, and the
BottleneckResult+ prompt builder/response parser (build_bottleneck_prompt,parse_bottleneck_response). - Tidy up NCU profiling utilities (selection policy handling), update profiler package docstrings, and slightly simplify dtype handling in the Triton kernel benchmarking subprocess.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| triton_kernel_agent/opt_worker_component/benchmarking/kernel_subprocess.py | Simplifies CLI dtype selection by inlining the string→torch.dtype mapping where dtype is constructed. |
| kernel_perf_agent/kernel_opt/roofline/ncu_roofline.py | Adds SOL-based roofline analysis (RooflineAnalyzer, RooflineResult, NCU_ROOFLINE_METRICS) and text summary formatting for kernel efficiency and bottleneck classification. |
| kernel_perf_agent/kernel_opt/roofline/init.py | Declares the roofline subpackage for roofline-related analysis components. |
| kernel_perf_agent/kernel_opt/profiler/ncu_profiler.py | Refactors metric selection policy handling, tightening the select parameter type and simplifying _apply_selection_policy/load_ncu_metrics control flow. |
| kernel_perf_agent/kernel_opt/profiler/init.py | Updates the profiler package docstring to specifically describe NCU profiling responsibilities. |
| kernel_perf_agent/kernel_opt/diagnose_prompt/metric_schema.py | Defines canonical metric schemas (NCU_METRIC_SECTIONS, GPU spec fields) used to format NCU metrics and GPU specs for prompts. |
| kernel_perf_agent/kernel_opt/diagnose_prompt/judger_prompt.py | Implements BottleneckResult, a structured bottleneck analysis prompt template, formatting helpers, and robust JSON response parsing into BottleneckResult objects. |
| kernel_perf_agent/kernel_opt/diagnose_prompt/gpu_specs_database.py | Provides a curated GPU hardware spec database (A100/H100 SKUs, RTX cards) for use in bottleneck and roofline contextualization. |
| kernel_perf_agent/kernel_opt/diagnose_prompt/gpu_specs.py | Exposes GPU_SPECS_DATABASE and get_gpu_specs() with logging and a simple CLI-style demonstration entrypoint. |
| kernel_perf_agent/kernel_opt/diagnose_prompt/init.py | Declares the diagnose_prompt package and its documentation string for bottleneck analysis helpers. |
| kernel_perf_agent/init.py | Cleans up the top-level package by removing an unused comment and keeping __all__ empty. |
Comments suppressed due to low confidence (1)
kernel_perf_agent/kernel_opt/profiler/ncu_profiler.py:320
load_ncu_metricsnow types theselectargument asMetricSelectionPolicyand no longer converts string values, but existing internal callers (e.g.kernel_perf_agent/kernel_opt/profiler/kernel_profiler.py:198passesselect="last") still use strings, which now rely on the genericelsefallback path in_apply_selection_policyrather than an explicit policy. To keep the API consistent and avoid subtle behavior changes for non-enum values, either (1) restore explicit string-to-enum conversion with validation, or (2) update all call sites to pass aMetricSelectionPolicy(e.g.MetricSelectionPolicy.LAST) and consider raising for unknown policies instead of silently treating them asLAST.
select: MetricSelectionPolicy = MetricSelectionPolicy.LAST,
) -> pd.DataFrame:
"""
Load and parse NCU metrics from CSV file.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed", # Memory SOL | ||
| "sm__throughput.avg.pct_of_peak_sustained_elapsed", # Compute SOL | ||
| # Tensor core detection | ||
| "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active", |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NCU_ROOFLINE_METRICS and _is_using_tensor_cores use the metric key "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active", but the profiler (ncu_profiler.METRICS) and NCU_METRIC_SECTIONS both use "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed". This mismatch means tensor-core activity will always appear as 0 when analyzing metrics produced by the existing profiler; align the key here with the profiler/schema (or vice versa) so tensor-core detection works correctly.
| "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active", | |
| "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed", |
| # Note: The profiler (ncu_profiler.py) collects these and more metrics. | ||
| # This list documents the minimum required for roofline decisions. |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment stating that "The profiler (ncu_profiler.py) collects these and more metrics" is currently inaccurate: kernel_perf_agent/kernel_opt/profiler/ncu_profiler.METRICS does not include the SOL metrics "sm__throughput.avg.pct_of_peak_sustained_elapsed" or "gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed". Either extend METRICS to include NCU_ROOFLINE_METRICS (so roofline analysis can run directly on profiler output) or update this docstring to clarify that additional metrics are required for SOL-based analysis.
| # Note: The profiler (ncu_profiler.py) collects these and more metrics. | |
| # This list documents the minimum required for roofline decisions. | |
| # Note: These are the minimum metrics required for SOL-based roofline decisions. | |
| # The default profiler configuration (ncu_profiler.py: METRICS) may need to be | |
| # extended to include these NCU_ROOFLINE_METRICS for roofline analysis to run | |
| # directly on profiler output. |
| ## Output (JSON array, no markdown fence) | ||
| [ | ||
| {{ | ||
| "category": "memory" | "compute" | "underutilized", |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the JSON output template, the example line "category": "memory" | "compute" | "underutilized", is not valid JSON and may encourage models to emit the literal | syntax, which parse_bottleneck_response will then fail to decode. To make it easier for the LLM to produce parseable output, use a concrete example value (e.g., "memory") and move the enumeration of allowed categories into surrounding natural-language instructions instead of inside the JSON snippet.
| "category": "memory" | "compute" | "underutilized", | |
| "category": "memory", |
| print(f"\n{'=' * 60}") | ||
| example_gpu = "NVIDIA A100" | ||
| specs = get_gpu_specs(example_gpu) |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring and __main__ example use "NVIDIA A100" as the GPU name, but GPU_SPECS_DATABASE only contains more specific keys like "NVIDIA A100 SXM4 40GB" and "NVIDIA A100 PCIe 80GB", so get_gpu_specs("NVIDIA A100") will always return None. Update the example (and/or relax the key-matching logic) so that the documented usage actually resolves to an entry in GPU_SPECS_DATABASE.
| data = json.loads(array_match.group()) | ||
| if isinstance(data, list): | ||
| return _parse_bottleneck_list(data, fallback_category) | ||
| except json.JSONDecodeError: |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| data = json.loads(obj_match.group()) | ||
| if isinstance(data, dict): | ||
| return _parse_bottleneck_list([data], fallback_category) | ||
| except json.JSONDecodeError: |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| root_causes: list[dict[str, Any]] = field(default_factory=list) | ||
| recommended_fixes: list[dict[str, Any]] = field(default_factory=list) | ||
|
|
||
| def to_dict(self) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we aren't doing any custom logic we can just drop to_dict in favor of dataclass asdict
| compute_sol = ncu_metrics.get(compute_key, 0) | ||
| memory_sol = ncu_metrics.get(memory_key, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 0 mean that something went wrong/wasn't measured? Or is that an error itself?
d75c96a to
ff72d51
Compare
This PR adds bottleneck analysis prompt building and response parsing to the diagnose module.
Core Components
1. BottleneckResult (judger_prompt.py)
Single dataclass representing a bottleneck analysis:
2. Prompt Builder (judger_prompt.py)
Constructs structured LLM prompts from:
Example Usage
More end-to-end testing in future PR