Introduce Judger Prompt Component #89

kaiming-cheng · 2026-01-31T04:53:23Z

This PR adds bottleneck analysis prompt building and response parsing to the diagnose module.

Core Components

1. BottleneckResult (judger_prompt.py)

Single dataclass representing a bottleneck analysis:

Category: memory, compute, or underutilized
Summary: one-line description
Reasoning: explanation citing metrics
Root_causes: list of causes with metric evidence
Recommended_fixes: actionable fixes with rationale
Configurable analysis:
- num_bottlenecks: how many bottlenecks to identify (default: 2)
- num_causes: root causes per bottleneck (default: 2)
- num_fixes: fixes per bottleneck (default: 1)

2. Prompt Builder (judger_prompt.py)

Constructs structured LLM prompts from:

Kernel source code
NCU profiling metrics (formatted via metric_schema)
Roofline analysis results (via ncu_roofline)
GPU hardware specifications (via gpu_spec)

Example Usage

  prompt = build_bottleneck_prompt(                                                                                             
      kernel_code=kernel_src,                                                                                                   
      ncu_metrics=ncu_data,                                                                                                     
      roofline=roofline_result,                                                                                                 
      gpu_specs=gpu_specs,                                                                                                      
      num_bottlenecks=2,                                                                                                        
      num_causes=2,                                                                                                             
      num_fixes=1,                                                                                                              
  )                                                                                                                             
                                                                                                                                
  # After LLM call...              
   results = parse_bottleneck_response(llm_response)

More end-to-end testing in future PR

Copilot

Pull request overview

This PR introduces a structured bottleneck-diagnosis pipeline around Nsight Compute metrics, including roofline analysis, GPU spec lookup, and an LLM-oriented prompt/response interface.

Changes:

Add an NCU SOL-based roofline analysis module (RooflineConfig, RooflineResult, RooflineAnalyzer, format_roofline_summary) and a corresponding roofline package scaffold.
Introduce a diagnose prompt subsystem with metric schemas, GPU specs database/accessor, and the BottleneckResult + prompt builder/response parser (build_bottleneck_prompt, parse_bottleneck_response).
Tidy up NCU profiling utilities (selection policy handling), update profiler package docstrings, and slightly simplify dtype handling in the Triton kernel benchmarking subprocess.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
triton_kernel_agent/opt_worker_component/benchmarking/kernel_subprocess.py	Simplifies CLI dtype selection by inlining the string→`torch.dtype` mapping where `dtype` is constructed.
kernel_perf_agent/kernel_opt/roofline/ncu_roofline.py	Adds SOL-based roofline analysis (`RooflineAnalyzer`, `RooflineResult`, `NCU_ROOFLINE_METRICS`) and text summary formatting for kernel efficiency and bottleneck classification.
kernel_perf_agent/kernel_opt/roofline/init.py	Declares the `roofline` subpackage for roofline-related analysis components.
kernel_perf_agent/kernel_opt/profiler/ncu_profiler.py	Refactors metric selection policy handling, tightening the `select` parameter type and simplifying `_apply_selection_policy`/`load_ncu_metrics` control flow.
kernel_perf_agent/kernel_opt/profiler/init.py	Updates the profiler package docstring to specifically describe NCU profiling responsibilities.
kernel_perf_agent/kernel_opt/diagnose_prompt/metric_schema.py	Defines canonical metric schemas (`NCU_METRIC_SECTIONS`, GPU spec fields) used to format NCU metrics and GPU specs for prompts.
kernel_perf_agent/kernel_opt/diagnose_prompt/judger_prompt.py	Implements `BottleneckResult`, a structured bottleneck analysis prompt template, formatting helpers, and robust JSON response parsing into `BottleneckResult` objects.
kernel_perf_agent/kernel_opt/diagnose_prompt/gpu_specs_database.py	Provides a curated GPU hardware spec database (A100/H100 SKUs, RTX cards) for use in bottleneck and roofline contextualization.
kernel_perf_agent/kernel_opt/diagnose_prompt/gpu_specs.py	Exposes `GPU_SPECS_DATABASE` and `get_gpu_specs()` with logging and a simple CLI-style demonstration entrypoint.
kernel_perf_agent/kernel_opt/diagnose_prompt/init.py	Declares the `diagnose_prompt` package and its documentation string for bottleneck analysis helpers.
kernel_perf_agent/init.py	Cleans up the top-level package by removing an unused comment and keeping `__all__` empty.

Comments suppressed due to low confidence (1)

kernel_perf_agent/kernel_opt/profiler/ncu_profiler.py:320

load_ncu_metrics now types the select argument as MetricSelectionPolicy and no longer converts string values, but existing internal callers (e.g. kernel_perf_agent/kernel_opt/profiler/kernel_profiler.py:198 passes select="last") still use strings, which now rely on the generic else fallback path in _apply_selection_policy rather than an explicit policy. To keep the API consistent and avoid subtle behavior changes for non-enum values, either (1) restore explicit string-to-enum conversion with validation, or (2) update all call sites to pass a MetricSelectionPolicy (e.g. MetricSelectionPolicy.LAST) and consider raising for unknown policies instead of silently treating them as LAST.

    select: MetricSelectionPolicy = MetricSelectionPolicy.LAST,
) -> pd.DataFrame:
    """
    Load and parse NCU metrics from CSV file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-04T03:30:52Z

kernel_perf_agent/kernel_opt/roofline/ncu_roofline.py

+    "gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed",  # Memory SOL
+    "sm__throughput.avg.pct_of_peak_sustained_elapsed",  # Compute SOL
+    # Tensor core detection
+    "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active",


NCU_ROOFLINE_METRICS and _is_using_tensor_cores use the metric key "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active", but the profiler (ncu_profiler.METRICS) and NCU_METRIC_SECTIONS both use "sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed". This mismatch means tensor-core activity will always appear as 0 when analyzing metrics produced by the existing profiler; align the key here with the profiler/schema (or vice versa) so tensor-core detection works correctly.

Suggested change

"sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active",

"sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed",

Copilot · 2026-02-04T03:30:53Z

kernel_perf_agent/kernel_opt/roofline/ncu_roofline.py

+# Note: The profiler (ncu_profiler.py) collects these and more metrics.
+# This list documents the minimum required for roofline decisions.


The comment stating that "The profiler (ncu_profiler.py) collects these and more metrics" is currently inaccurate: kernel_perf_agent/kernel_opt/profiler/ncu_profiler.METRICS does not include the SOL metrics "sm__throughput.avg.pct_of_peak_sustained_elapsed" or "gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed". Either extend METRICS to include NCU_ROOFLINE_METRICS (so roofline analysis can run directly on profiler output) or update this docstring to clarify that additional metrics are required for SOL-based analysis.

Suggested change

# Note: The profiler (ncu_profiler.py) collects these and more metrics.

# This list documents the minimum required for roofline decisions.

# Note: These are the minimum metrics required for SOL-based roofline decisions.

# The default profiler configuration (ncu_profiler.py: METRICS) may need to be

# extended to include these NCU_ROOFLINE_METRICS for roofline analysis to run

# directly on profiler output.

Copilot · 2026-02-04T03:30:53Z

kernel_perf_agent/kernel_opt/diagnose_prompt/judger_prompt.py

+## Output (JSON array, no markdown fence)
+[
+    {{
+        "category": "memory" | "compute" | "underutilized",


In the JSON output template, the example line "category": "memory" | "compute" | "underutilized", is not valid JSON and may encourage models to emit the literal | syntax, which parse_bottleneck_response will then fail to decode. To make it easier for the LLM to produce parseable output, use a concrete example value (e.g., "memory") and move the enumeration of allowed categories into surrounding natural-language instructions instead of inside the JSON snippet.

Suggested change

"category": "memory" | "compute" | "underutilized",

"category": "memory",

Copilot · 2026-02-04T03:30:53Z

kernel_perf_agent/kernel_opt/diagnose_prompt/gpu_specs.py

+    print(f"\n{'=' * 60}")
+    example_gpu = "NVIDIA A100"
+    specs = get_gpu_specs(example_gpu)


The docstring and __main__ example use "NVIDIA A100" as the GPU name, but GPU_SPECS_DATABASE only contains more specific keys like "NVIDIA A100 SXM4 40GB" and "NVIDIA A100 PCIe 80GB", so get_gpu_specs("NVIDIA A100") will always return None. Update the example (and/or relax the key-matching logic) so that the documented usage actually resolves to an entry in GPU_SPECS_DATABASE.

Copilot · 2026-02-04T03:30:54Z

kernel_perf_agent/kernel_opt/diagnose_prompt/judger_prompt.py

+            data = json.loads(array_match.group())
+            if isinstance(data, list):
+                return _parse_bottleneck_list(data, fallback_category)
+        except json.JSONDecodeError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-04T03:30:54Z

kernel_perf_agent/kernel_opt/diagnose_prompt/judger_prompt.py

+            data = json.loads(obj_match.group())
+            if isinstance(data, dict):
+                return _parse_bottleneck_list([data], fallback_category)
+        except json.JSONDecodeError:


'except' clause does nothing but pass and there is no explanatory comment.

Jack-Khuu · 2026-02-04T03:32:27Z

kernel_perf_agent/kernel_opt/diagnose_prompt/judger_prompt.py

+    root_causes: list[dict[str, Any]] = field(default_factory=list)
+    recommended_fixes: list[dict[str, Any]] = field(default_factory=list)
+
+    def to_dict(self) -> dict[str, Any]:


If we aren't doing any custom logic we can just drop to_dict in favor of dataclass asdict

Jack-Khuu · 2026-02-04T18:31:12Z

kernel_perf_agent/kernel_opt/roofline/ncu_roofline.py

+        compute_sol = ncu_metrics.get(compute_key, 0)
+        memory_sol = ncu_metrics.get(memory_key, 0)


Does 0 mean that something went wrong/wasn't measured? Or is that an error itself?

kaiming-cheng requested a review from Jack-Khuu January 31, 2026 04:53

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 31, 2026

kaiming-cheng requested a review from Laurawly January 31, 2026 04:53

Jack-Khuu requested a review from Copilot February 4, 2026 03:26

Copilot started reviewing on behalf of Jack-Khuu February 4, 2026 03:27 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Jack-Khuu approved these changes Feb 4, 2026

View reviewed changes

Kaiming Cheng added 2 commits February 5, 2026 15:42

introduce judger prompt

40eff8b

map fix to corrsponding cause instead of bottleneck

ff72d51

kaiming-cheng force-pushed the kaiming/judger_prompt branch from d75c96a to ff72d51 Compare February 5, 2026 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Judger Prompt Component #89

Introduce Judger Prompt Component #89

Uh oh!

kaiming-cheng commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Jack-Khuu Feb 4, 2026

Uh oh!

Jack-Khuu Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active",
	"sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed",

		# Note: The profiler (ncu_profiler.py) collects these and more metrics.
		# This list documents the minimum required for roofline decisions.

-# Note: The profiler (ncu_profiler.py) collects these and more metrics.
-# This list documents the minimum required for roofline decisions.
+# Note: These are the minimum metrics required for SOL-based roofline decisions.
+# The default profiler configuration (ncu_profiler.py: METRICS) may need to be
+# extended to include these NCU_ROOFLINE_METRICS for roofline analysis to run
+# directly on profiler output.

	"category": "memory" \| "compute" \| "underutilized",
	"category": "memory",

		compute_sol = ncu_metrics.get(compute_key, 0)
		memory_sol = ncu_metrics.get(memory_key, 0)

Introduce Judger Prompt Component #89

Are you sure you want to change the base?

Introduce Judger Prompt Component #89

Uh oh!

Conversation

kaiming-cheng commented Jan 31, 2026

Core Components

1. BottleneckResult (judger_prompt.py)

2. Prompt Builder (judger_prompt.py)

Example Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants