-
Notifications
You must be signed in to change notification settings - Fork 125
benchmarks hardening! #1078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmarks hardening! #1078
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughWorkflows reduced self-hosted timeouts and made job names include interface; bench step now runs a parallel orchestration script. Added SLURM submit/monitor orchestration scripts. Strengthened bench/test runners with timeouts, early‑abort, traceback logging, and per-case validation. Minor SBATCH header and .gitignore updates. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Runner as run_parallel_benchmarks.sh
participant Submit as submit_and_monitor_bench.sh
participant Monitor as monitor_slurm_job.sh
participant SLURM as SLURM (sbatch / squeue / sacct / scontrol)
participant FS as Filesystem (bench outputs / YAMLs)
GH->>Runner: start parallel bench (device, interface, cluster)
Runner->>Submit: launch PR bench (pr dir, device, interface, cluster) & background PID
Runner->>Submit: launch Master bench (master dir, device, interface, cluster) & background PID
Submit->>SLURM: sbatch bench job
SLURM-->>Submit: job_id
Submit->>Monitor: monitor_slurm_job.sh job_id output_file
Monitor->>FS: wait for output file (retry/backoff)
FS-->>Monitor: file appears
Monitor->>FS: tail -f while job present in squeue
loop job running
Monitor->>SLURM: squeue check
SLURM-->>Monitor: running
Monitor->>FS: stream appended lines
end
alt job no longer in squeue
Monitor->>SLURM: sacct / scontrol fallback -> terminal state & exitcode
end
Monitor->>FS: wait for output file growth to stabilize
Monitor-->>Submit: return exitcode
Submit-->>Runner: report exitcode and YAML path
Runner->>FS: verify PR/master YAMLs exist
Runner-->>GH: final status (non-zero if any failure or missing YAML)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)toolchain/bootstrap/python.sh (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces "benchmarking hardening" features to improve the robustness and reliability of MFC's testing and benchmarking infrastructure. The changes add timeout mechanisms, early failure detection, enhanced error handling, and better job monitoring for SLURM-based workflows.
Key changes:
- Added per-test timeout (1 hour) and early abort on high failure rates (30% after 20 tests) to the test suite
- Enhanced benchmark validation with comprehensive error checking and failed case tracking
- Improved GitHub Actions workflows with better job monitoring and reduced timeout (from 1400 to 480 minutes)
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| toolchain/mfc/test/test.py | Added signal-based timeout handler (1 hour per test) and early abort logic when failure rate exceeds 30% after 20 tests |
| toolchain/mfc/bench.py | Added comprehensive validation (file existence, return codes, summary validation) and improved error reporting with failed case tracking |
| .github/workflows/test.yml | Updated job name formatting to include interface and reduced timeout from 1400 to 480 minutes |
| .github/workflows/bench.yml | Reduced timeout to 480 minutes and added detailed SLURM job monitoring with output streaming and exit code validation |
|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
.github/workflows/bench.yml (2)
105-164: Consider extracting submit_and_monitor to a standalone script.This ~60-line function inline in YAML is hard to test and maintain. Extracting it to
.github/scripts/monitor_slurm_job.shwould improve reusability and testability. This was also noted in a previous review.
146-149: Transient squeue failures could cause premature exit.If
squeuefails transiently (network blip, scheduler hiccup), the loop exits immediately even though the job may still be running. Consider adding a brief grace period or retry.# Wait for job to complete (will wait up to GitHub Actions timeout) - while squeue -j "$job_id" &>/dev/null; do + squeue_failures=0 + while true; do + if squeue -j "$job_id" &>/dev/null; then + squeue_failures=0 + else + squeue_failures=$((squeue_failures + 1)) + # Allow a few transient failures before concluding job is done + if [ $squeue_failures -ge 3 ]; then + break + fi + fi sleep 5 done
🧹 Nitpick comments (2)
toolchain/mfc/test/test.py (1)
337-339: Remove extraneousfprefix from strings without placeholders.- cons.print(f"[bold red]This suggests a systemic issue (bad build, broken environment, etc.)[/bold red]") - cons.print(f"[bold red]Aborting remaining tests to fail fast.[/bold red]\n") + cons.print("[bold red]This suggests a systemic issue (bad build, broken environment, etc.)[/bold red]") + cons.print("[bold red]Aborting remaining tests to fail fast.[/bold red]\n")toolchain/mfc/bench.py (1)
112-128:cons.unindent()may be skipped if validation fails without explicit continue.If an exception is raised inside the try block (line 124-126),
cons.unindent()at line 128 is still called. However, if validation_failed is True (line 112), you callcons.unindent()at 114 thencontinue, skipping line 128—which is correct. The logic is sound but fragile.Consider using a
finallyblock to guarantee unindent:try: + # ... existing try block code ... + except Exception as e: + cons.print(f"[bold red]ERROR[/bold red]: Unexpected error running {case.slug}: {e}") + failed_cases.append(case.slug) + finally: cons.unindent() - - cons.unindent()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/bench.yml(3 hunks).github/workflows/test.yml(1 hunks)toolchain/mfc/bench.py(5 hunks)toolchain/mfc/test/test.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Add and update focused tests for changes; ensure no regressions in golden test outputs without clear justification
Applied to files:
toolchain/mfc/bench.py
🧬 Code graph analysis (2)
toolchain/mfc/bench.py (3)
toolchain/mfc/common.py (3)
MFCException(31-32)system(34-41)file_load_yaml(65-70)toolchain/mfc/state.py (1)
ARG(85-93)toolchain/mfc/printer.py (1)
unindent(19-24)
toolchain/mfc/test/test.py (6)
toolchain/mfc/common.py (3)
MFCException(31-32)file_write(44-54)find(165-174)toolchain/mfc/test/case.py (1)
compute_tolerance(248-276)toolchain/mfc/state.py (2)
ARG(85-93)items(33-34)toolchain/mfc/packer/pack.py (3)
has_NaNs(90-96)save(46-88)find(34-35)toolchain/mfc/packer/tol.py (1)
compare(50-97)toolchain/mfc/test/cases.py (1)
modify_example_case(1035-1052)
🪛 Ruff (0.14.7)
toolchain/mfc/bench.py
42-42: Avoid specifying long messages outside the exception class
(TRY003)
124-124: Do not catch blind exception: Exception
(BLE001)
137-137: Avoid specifying long messages outside the exception class
(TRY003)
211-211: Do not catch blind exception: Exception
(BLE001)
toolchain/mfc/test/test.py
36-36: Unused function argument: signum
(ARG001)
36-36: Unused function argument: frame
(ARG001)
37-37: Avoid specifying long messages outside the exception class
(TRY003)
224-224: Avoid specifying long messages outside the exception class
(TRY003)
228-228: Avoid specifying long messages outside the exception class
(TRY003)
231-231: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
256-256: Avoid specifying long messages outside the exception class
(TRY003)
270-270: Avoid specifying long messages outside the exception class
(TRY003)
277-277: Avoid specifying long messages outside the exception class
(TRY003)
280-280: Avoid specifying long messages outside the exception class
(TRY003)
283-283: Avoid specifying long messages outside the exception class
(TRY003)
295-299: Avoid specifying long messages outside the exception class
(TRY003)
338-338: f-string without any placeholders
Remove extraneous f prefix
(F541)
339-339: f-string without any placeholders
Remove extraneous f prefix
(F541)
340-342: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
🔇 Additional comments (12)
.github/workflows/test.yml (3)
96-96: Dynamic job naming is clear and informative.The conditional formatting nicely handles the interface suffix, making job names like "Georgia Tech | Phoenix (gpu-acc)" or "Oak Ridge | Frontier (cpu)" more descriptive for debugging CI failures.
100-100: Significant timeout reduction from 1400 to 480 minutes.This is a substantial reduction (from ~23 hours to 8 hours). Verify this is sufficient for the longest-running test configurations, particularly on Frontier with GPU tests.
101-115: Matrix expansion and runner configuration look correct.The new matrix properly covers CPU and GPU variants across both clusters. The
runs-onconfiguration correctly uses thephoenixgroup with cluster-specific labels..github/workflows/bench.yml (2)
26-26: Job naming and timeout updates align with test.yml.Consistent naming pattern and timeout reduction across both workflows improves maintainability.
Also applies to: 76-76
173-174: Exit status capture pattern has subtle issues.The pattern
wait $pid && var=$? || var=$?always setsvarto 0 when the process succeeds because&&short-circuits. However, since you check both values at the end, this works correctly for the failure case.toolchain/mfc/test/test.py (2)
294-301: Good use of try/except/finally for alarm cleanup.The finally block ensures the alarm is always cancelled, preventing spurious timeouts on subsequent tests. The exception chaining with
from excpreserves the original traceback.
26-37: Timeout constants and exception class are well-defined.The 1-hour timeout and 30% failure threshold are reasonable defaults. Consider making these configurable via command-line arguments for flexibility in different environments.
toolchain/mfc/bench.py (5)
40-42: Good early validation of benchmark case files.Failing fast with a clear error message is better than failing deep in the execution with a cryptic error.
64-128: Robust error handling with comprehensive validation.The try/except block properly handles:
- Non-zero return codes
- Missing summary files
- Missing target data
- Unexpected exceptions
The structure ensures
failed_casesis always updated on failure. However, there's a potential issue withcons.unindent()placement.
124-126: Broad exception catch is acceptable here for robustness.In benchmark orchestration code, catching all exceptions ensures one failing case doesn't abort the entire benchmark run. The error is logged with context, and the case is tracked as failed.
131-137: Clear failure summary with actionable output.Listing all failed cases at the end and raising with the count helps quickly identify systemic issues. Good practice for CI diagnostics.
211-213: Improved error logging indiff()replaces silent pass.This is a significant improvement—previously, speedup computation failures were silently ignored. Now they're logged with context and properly set the error flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
toolchain/mfc/test/test.py (3)
202-204: SIGALRM in worker threads is still a critical issue.This has been flagged in previous reviews but appears unaddressed. The test scheduler uses
WorkerThread(seesched.py), andsignal.signal()/signal.alarm()only work in the main thread—calling from worker threads raisesValueError: signal only works in main thread.Consider one of these approaches:
- Guard with a thread check and skip timeout in worker threads
- Use
threading.Timerwith a flag-based approach- Move timeout handling to the subprocess level with
subprocess.run(timeout=...)
264-266: Directory existence check still missing.This was flagged in previous reviews.
os.listdir()will raiseFileNotFoundErrorif thesilo_hdf5/p0directory doesn't exist (which may happen for test cases that don't produce silo outputs).+ silo_dir = os.path.join(case.get_dirpath(), 'silo_hdf5', 'p0') + if not os.path.isdir(silo_dir): + # No silo outputs for this case; skip post-processing check + pass + else: - for silo_filepath in os.listdir(os.path.join(case.get_dirpath(), 'silo_hdf5', 'p0')): + for silo_filepath in os.listdir(silo_dir):
342-344: Exception propagation from worker task may break scheduler.As noted in previous reviews, raising
MFCExceptionhere propagates throughsched.sched, potentially leaving worker threads in an undefined state. Consider using a shared abort flag that the scheduler checks between tasks for a cleaner shutdown.
🧹 Nitpick comments (1)
toolchain/mfc/test/test.py (1)
340-341: Remove unnecessary f-string prefixes.These strings have no placeholders, so the
fprefix is extraneous.- cons.print(f"[bold red]This suggests a systemic issue (bad build, broken environment, etc.)[/bold red]") - cons.print(f"[bold red]Aborting remaining tests to fail fast.[/bold red]\n") + cons.print("[bold red]This suggests a systemic issue (bad build, broken environment, etc.)[/bold red]") + cons.print("[bold red]Aborting remaining tests to fail fast.[/bold red]\n")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
toolchain/mfc/test/test.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
toolchain/mfc/test/test.py (2)
toolchain/mfc/common.py (1)
MFCException(31-32)toolchain/mfc/state.py (1)
ARG(85-93)
🪛 Ruff (0.14.7)
toolchain/mfc/test/test.py
36-36: Unused function argument: signum
(ARG001)
36-36: Unused function argument: frame
(ARG001)
37-37: Avoid specifying long messages outside the exception class
(TRY003)
224-224: Avoid specifying long messages outside the exception class
(TRY003)
228-228: Avoid specifying long messages outside the exception class
(TRY003)
231-231: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
256-256: Avoid specifying long messages outside the exception class
(TRY003)
270-270: Avoid specifying long messages outside the exception class
(TRY003)
277-277: Avoid specifying long messages outside the exception class
(TRY003)
280-280: Avoid specifying long messages outside the exception class
(TRY003)
283-283: Avoid specifying long messages outside the exception class
(TRY003)
295-299: Avoid specifying long messages outside the exception class
(TRY003)
340-340: f-string without any placeholders
Remove extraneous f prefix
(F541)
341-341: f-string without any placeholders
Remove extraneous f prefix
(F541)
342-344: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Build & Publish
🔇 Additional comments (2)
toolchain/mfc/test/test.py (2)
26-38: LGTM on timeout and abort constants.The timeout infrastructure is well-structured. The
TestTimeoutErrorproperly extendsMFCException, and the constants are reasonable (1-hour timeout, 30% failure threshold after 20 tests).Note: The unused
signumandframeparameters intimeout_handlerare required by the signal handler signature, so the Ruff warnings can be safely ignored.
294-301: Good timeout exception handling with cleanup.The
finallyblock ensures the alarm is always cancelled, preventing lingering signals. The re-raised exception provides helpful context including the log file path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 4 files
Prompt for AI agents (all 2 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="toolchain/mfc/test/test.py">
<violation number="1" location="toolchain/mfc/test/test.py:203">
P0: `signal.signal()` and `signal.alarm()` cannot be called from worker threads - they only work in the main thread. Since `_handle_case` is executed via `sched.sched()` which uses `WorkerThread`, this will raise `ValueError: signal only works in main thread`. Consider using `threading.Timer` or a timeout wrapper that works across threads instead.</violation>
</file>
<file name=".github/workflows/bench.yml">
<violation number="1" location=".github/workflows/bench.yml:159">
P2: Slurm reports exit codes as `X:Y`, but this regex only captures the first component. If a job finishes with `0:9` (success but terminated by signal), `$exit_code` becomes `0` and the failure check is bypassed, so signaled jobs are treated as success. Capture the full `X:Y` pair (or parse both fields) and ensure any non-`0:0` result fails the workflow.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (all 2 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="toolchain/mfc/bench.py">
<violation number="1" location="toolchain/mfc/bench.py:127">
P2: The `finally` block will cause double `cons.unindent()` calls. Lines 81 and 89 still have explicit `cons.unindent()` before `continue`, but since `finally` always executes (even on `continue`), the unindent will be called twice in those error paths. Remove the `cons.unindent()` calls at lines 81 and 89, similar to how it was removed for the `validation_failed` case.</violation>
</file>
<file name=".github/scripts/monitor_slurm_job.sh">
<violation number="1" location=".github/scripts/monitor_slurm_job.sh:57">
P2: Exit code defaults to `0:0` (success) if `scontrol` or `grep` fails. This could mask real job failures when job information is unavailable. Consider failing explicitly if exit code cannot be determined.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
toolchain/mfc/bench.py (2)
58-83: Fix doublecons.unindent()on error paths to avoid printer stack underflowIn the non-zero return-code and missing-summary branches you call
cons.unindent()and thencontinue, butcons.unindent()is also executed in thefinallyblock. This will unindent twice for failing cases and can corrupt the printer’s indentation stack or raise underflow errors.Rely on the
finallyblock for unindent and drop the inner calls:- if rc != 0: - cons.print(f"[bold red]ERROR[/bold red]: Case {case.slug} failed with exit code {rc}") - cons.print(f"[bold red] Check log at: {log_filepath}[/bold red]") - failed_cases.append(case.slug) - cons.unindent() - continue + if rc != 0: + cons.print(f"[bold red]ERROR[/bold red]: Case {case.slug} failed with exit code {rc}") + cons.print(f"[bold red] Check log at: {log_filepath}[/bold red]") + failed_cases.append(case.slug) + continue @@ - if not os.path.exists(summary_filepath): - cons.print(f"[bold red]ERROR[/bold red]: Summary file not created for {case.slug}") - cons.print(f"[bold red] Expected: {summary_filepath}[/bold red]") - failed_cases.append(case.slug) - cons.unindent() - continue + if not os.path.exists(summary_filepath): + cons.print(f"[bold red]ERROR[/bold red]: Summary file not created for {case.slug}") + cons.print(f"[bold red] Expected: {summary_filepath}[/bold red]") + failed_cases.append(case.slug) + continueAlso applies to: 124-129
1-10: Movetracebackimport to the module top to satisfy lint and avoid repeated importsPylint is flagging
import tracebackinside theexceptblock (C0415 Import outside toplevel). Since this code path is not particularly hot, it’s simpler and clearer to import at the top-level and just use it in the handler.Suggested change:
-import os, sys, uuid, subprocess, dataclasses, typing, math +import os, sys, uuid, subprocess, dataclasses, typing, math, traceback @@ - except Exception as e: - import traceback + except Exception as e: cons.print( f"[bold red]ERROR[/bold red]: Failed to compute speedup for {target.name} in {slug}: {e}\n" f"{traceback.format_exc()}" )Also applies to: 211-217
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/scripts/monitor_slurm_job.sh(1 hunks).github/workflows/bench.yml(3 hunks).github/workflows/test.yml(1 hunks)toolchain/mfc/bench.py(5 hunks)toolchain/mfc/test/test.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
.github/workflows/bench.yml (1)
.github/workflows/phoenix/submit-bench.sh (1)
usage(5-7)
toolchain/mfc/test/test.py (2)
toolchain/mfc/common.py (4)
MFCException(31-32)file_write(44-54)find(165-174)does_command_exist(127-138)toolchain/mfc/state.py (1)
ARG(85-93)
toolchain/mfc/bench.py (2)
toolchain/mfc/common.py (3)
MFCException(31-32)system(34-41)file_load_yaml(65-70)toolchain/mfc/printer.py (1)
unindent(19-24)
🪛 actionlint (1.7.9)
.github/workflows/test.yml
105-105: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
109-109: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
113-113: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
118-118: label "frontier" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
122-122: label "frontier" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
126-126: label "frontier" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🪛 GitHub Actions: Lint Toolchain
toolchain/mfc/test/test.py
[error] 230-230: pylint: R1702 Too many nested blocks (6/5).
toolchain/mfc/bench.py
[error] 212-212: pylint: C0415 Import outside toplevel ('traceback').
🪛 Ruff (0.14.7)
toolchain/mfc/test/test.py
184-186: Avoid specifying long messages outside the exception class
(TRY003)
233-233: Abstract raise to an inner function
(TRY301)
233-233: Avoid specifying long messages outside the exception class
(TRY003)
238-238: Abstract raise to an inner function
(TRY301)
238-238: Avoid specifying long messages outside the exception class
(TRY003)
246-246: Avoid specifying long messages outside the exception class
(TRY003)
250-250: Avoid specifying long messages outside the exception class
(TRY003)
253-253: Avoid specifying long messages outside the exception class
(TRY003)
261-261: Avoid specifying long messages outside the exception class
(TRY003)
278-278: Avoid specifying long messages outside the exception class
(TRY003)
294-294: Avoid specifying long messages outside the exception class
(TRY003)
301-301: Avoid specifying long messages outside the exception class
(TRY003)
304-304: Avoid specifying long messages outside the exception class
(TRY003)
307-307: Avoid specifying long messages outside the exception class
(TRY003)
319-323: Avoid specifying long messages outside the exception class
(TRY003)
toolchain/mfc/bench.py
42-42: Avoid specifying long messages outside the exception class
(TRY003)
124-124: Do not catch blind exception: Exception
(BLE001)
137-137: Avoid specifying long messages outside the exception class
(TRY003)
211-211: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Code Cleanliness Check
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (cpu)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Build & Publish
🔇 Additional comments (4)
.github/workflows/bench.yml (1)
25-27: Nice extraction of submit/monitor logic and parallel PR/master runsThe new
submit_and_monitorhelper plus.github/scripts/monitor_slurm_job.shintegration gives you much clearer control over submission errors, job IDs, and exit-code handling, while preserving parallel PR/master execution and avoidingset -epitfalls in thewaitlogic. The dynamic job name and reduced timeout also look reasonable.Also applies to: 76-77, 100-143
.github/workflows/test.yml (1)
95-101: Matrix reshaping and dynamic self-hosted job naming look consistentExplicit Phoenix/Frontier variants, the
lbl-driven labels, and the dynamic name format align cleanly with the later build/test steps and log handling. The reduced timeout is also appropriate for CI hardening, assuming your self-hosted runners are configured with these labels.Also applies to: 104-129
toolchain/mfc/test/test.py (2)
26-37: Early-abort mechanism viaabort_testsevent is structured cleanlyThe combination of:
- global
abort_testsevent,- per-case pre-check in
handle_case,- failure-rate computation gated on a minimum number of completed tests, and
- a post-scheduler check in
test()that surfaces a clear aggregated failure,gives you a deterministic, thread-safe way to fast‑fail runs with systemic breakage without throwing from worker threads. This is a solid improvement over raising directly inside
_handle_case.Also applies to: 179-187, 333-373
30-41: Timer-based timeout only detects timeouts aftercase.runreturns; add note and fix pylint nesting errorUsing
threading.Timerwith a flag is safe in worker threads, but it can't interrupt a blockedcase.runcall — ifcase.runhangs (e.g., stuck MPI job), the worker thread will remain blocked and the timeout will only be noticed after (or if)case.runreturns. That means this mechanism detects slow tests but does not strictly bound wall-clock time for truly hung runs.Two concrete suggestions:
Document/verify behavior of
case.run
Confirm (and, if possible, document) thatcase.runitself uses a timeout-capable mechanism (e.g., subprocess withtimeout=or equivalent), or that hung simulations are otherwise impossible/handled elsewhere; otherwise the 1‑hour timeout expectation is misleading.Resolve pylint R1702 ("too many nested blocks") cheaply
Given this function is already structurally complex and you're explicitly disabling other size checks, the quickest unblock is to extend the existing pylint disable:-# pylint: disable=too-many-locals, too-many-branches, too-many-statements, trailing-whitespace -def _handle_case(case: TestCase, devices: typing.Set[int]): +# pylint: disable=too-many-locals, too-many-branches, too-many-statements, too-many-nested-blocks, trailing-whitespace +def _handle_case(case: TestCase, devices: typing.Set[int]):If you prefer not to disable, you'd need to extract parts of
_handle_case(e.g., golden-file comparison andtest_allpost-processing) into helpers to reduce nesting.
|
/improve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1078 +/- ##
=======================================
Coverage 44.16% 44.16%
=======================================
Files 71 71
Lines 20197 20197
Branches 1970 1970
=======================================
Hits 8921 8921
Misses 10148 10148
Partials 1128 1128 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/scripts/run_parallel_benchmarks.sh(1 hunks).github/scripts/submit_and_monitor_bench.sh(1 hunks).github/workflows/bench.yml(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
.github/scripts/submit_and_monitor_bench.sh (3)
.github/workflows/phoenix/submit-bench.sh (1)
usage(5-7).github/workflows/frontier/submit.sh (1)
usage(5-7)toolchain/mfc/bench.py (1)
bench(19-76)
.github/scripts/run_parallel_benchmarks.sh (2)
toolchain/mfc/bench.py (1)
bench(19-76).github/workflows/phoenix/submit-bench.sh (1)
usage(5-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (cpu)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
- GitHub Check: Georgia Tech | Phoenix (cpu)
- GitHub Check: Oak Ridge | Frontier (gpu-omp)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Georgia Tech | Phoenix (gpu-omp)
- GitHub Check: Build & Publish
🔇 Additional comments (5)
.github/scripts/run_parallel_benchmarks.sh (1)
1-78: Script structure and exit code capture look sound.The parallel job orchestration and exit code capture pattern (
if ! wait "$pid"; then exit=$?; fi) is correct. The file path expectations (lines 60-61) for YAML files inpr/andmaster/directories should align with the centralized orchestration model.Confirm that
submit_and_monitor_bench.shcreates output files in the expected locations (pr/bench--.yaml and master/bench--.yaml) matching the paths referenced on lines 60–61..github/workflows/bench.yml (3)
26-26: Dynamic job naming with conditional interface suffix looks good.The formatted job name string now conditionally appends the interface:
${{ matrix.interface != 'none' && format('-{0}', matrix.interface) || '' }}. This improves job labeling in the UI.
76-76: Timeout reduction to 480 minutes documented and aligned with PR objectives.The timeout has been reduced from 1400 to 480 minutes as part of hardening. Verify that this is adequate for all benchmark configurations (especially Frontier GPU jobs on lines 65–72) by checking historical run times.
100-101: Centralized orchestration script integration is clean.The bench step now delegates to a single script, replacing inline SLURM monitoring logic. The parameter order (device, interface, cluster) must match the expectations in
run_parallel_benchmarks.sh.Confirm that script parameters on line 101 match the positional argument order expected by
.github/scripts/run_parallel_benchmarks.sh(currently: device, interface, cluster)..github/scripts/submit_and_monitor_bench.sh (1)
24-26: Job ID extraction is portable and robust.The
sedpattern's/.*Submitted batch job \([0-9][0-9]*\).*/\1/p'is POSIX-portable and correctly extracts the job ID from sbatch output. The job slug construction and output file naming are clear.
| # Verify the YAML output file was created | ||
| yaml_file="${job_slug}.yaml" | ||
| if [ ! -f "$yaml_file" ]; then | ||
| echo "[$dir] ERROR: Expected output file not found: $yaml_file" | ||
| echo "[$dir] Directory contents:" | ||
| ls -la *.yaml 2>/dev/null || echo " No YAML files found" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "[$dir] Verified output file exists: $yaml_file ($(stat -f%z "$yaml_file" 2>/dev/null || stat -c%s "$yaml_file" 2>/dev/null) bytes)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML file existence check is incomplete — gap vs. PR objectives.
The script verifies the YAML output file exists (lines 41–48) but does not validate its contents. According to the PR objectives, the hardening should include "Check summary file existence and required target-data fields (execution and simulation metrics such as grind)". The current validation only checks for file presence, not schema or required fields.
Consider parsing the YAML and checking for required fields (e.g., execution time, simulation metrics) to align with the stated objectives.
Would you like me to generate a helper function to validate the YAML structure and required fields? This could be added to the script or extracted as a reusable utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/scripts/submit_and_monitor_bench.sh (1)
44-54: Output validation is appropriately scoped; cross-platform stat flags are functional but inelegant.File existence check for the YAML output is correct (lines 46–51). Line 49's fallback directory listing with error suppression aids debugging.
Line 53 uses conditional
statflags (-f%zfor macOS,-c%sfor Linux) to report file size. While this works, it's a common cross-platform portability pattern that can be simplified:# Simpler alternative using wc echo "[$dir] Verified output file exists: $yaml_file ($(wc -c < "$yaml_file") bytes)"This is a minor style point; the current approach is functional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/scripts/run_parallel_benchmarks.sh(1 hunks).github/scripts/submit_and_monitor_bench.sh(1 hunks).github/workflows/bench.yml(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/bench.yml
🧰 Additional context used
🧬 Code graph analysis (1)
.github/scripts/run_parallel_benchmarks.sh (3)
.github/workflows/phoenix/submit-bench.sh (1)
usage(5-7)toolchain/mfc/bench.py (1)
bench(19-76).github/workflows/frontier/submit.sh (1)
usage(5-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-acc)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
- GitHub Check: Georgia Tech | Phoenix (gpu-omp)
- GitHub Check: Georgia Tech | Phoenix (cpu)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
🔇 Additional comments (2)
.github/scripts/run_parallel_benchmarks.sh (1)
1-81: Solid parallel orchestrator with robust error handling and exit-code tracking.The script correctly launches both benchmark jobs in parallel, captures exit codes reliably using the
if ! waitpattern, and validates output files with helpful debugging output. Strict error handling viaset -euo pipefailand|| trueguards in thelsfallbacks ensure the script doesn't mask real failures..github/scripts/submit_and_monitor_bench.sh (1)
17-41: Correctly captures script directory before changing workdir — critical fix in place.Line 18 captures
SCRIPT_DIRbefore thecd "$dir"on line 21, then line 40 uses"${SCRIPT_DIR}/monitor_slurm_job.sh"to reference the monitoring script from the PR location. This ensures the script works correctly whether benchmarking the PR or master checkout, sincemonitor_slurm_job.shonly exists in the PR branch. This resolves the critical path-resolution issue flagged in the previous review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/scripts/submit_and_monitor_bench.sh (1)
46-56: YAML validation still incomplete — gap vs. PR objectives persists.This duplicates a concern from the previous review: the script only validates file existence but does not check required fields. Per PR objectives, hardening should include "Check summary file existence and required target-data fields (execution and simulation metrics such as grind)".
Consider parsing the YAML and asserting that required fields exist (e.g.,
execution,simulation_metrics, or similar keys defined by the benchmark output schema).
🧹 Nitpick comments (2)
.github/scripts/submit_and_monitor_bench.sh (2)
40-40: Missing error handling for monitoring script return code.The
monitor_slurm_job.shscript is invoked but its exit status is not checked. If monitoring fails, the script continues and may produce misleading success messages. Withset -euo pipefail, the script should exit on failure, but explicit error handling here would be clearer.Consider capturing the return code and providing context if monitoring fails:
-# Use the monitoring script from PR (where this script lives) -bash "${SCRIPT_DIR}/monitor_slurm_job.sh" "$job_id" "$output_file" +# Use the monitoring script from PR (where this script lives) +if ! bash "${SCRIPT_DIR}/monitor_slurm_job.sh" "$job_id" "$output_file"; then + echo "[$dir] ERROR: Monitoring script failed for job $job_id" + exit 1 +fi
24-25: Optional: Pre-validate submit script path for clarity.The relative path
.github/workflows/$cluster/submit-bench.shassumes the directory structure exists in$dir. Adding an early existence check could catch configuration errors sooner:# Submit job +submit_script=".github/workflows/$cluster/submit-bench.sh" +if [ ! -f "$submit_script" ]; then + echo "[$dir] ERROR: Submit script not found: $submit_script" + exit 1 +fi -submit_output=$(bash .github/workflows/$cluster/submit-bench.sh \ +submit_output=$(bash "$submit_script" \ .github/workflows/$cluster/bench.sh "$device" "$interface" 2>&1)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/scripts/submit_and_monitor_bench.sh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.github/scripts/submit_and_monitor_bench.sh (3)
.github/workflows/phoenix/submit-bench.sh (1)
usage(5-7).github/workflows/frontier/submit.sh (1)
usage(5-7)toolchain/mfc/bench.py (1)
bench(19-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Oak Ridge | Frontier (gpu-omp)
- GitHub Check: Oak Ridge | Frontier (cpu)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (cpu)
- GitHub Check: Build & Publish
🔇 Additional comments (2)
.github/scripts/submit_and_monitor_bench.sh (2)
17-18: LGTM: Script directory handling correctly addresses prior critical path issue.The use of
SCRIPT_DIR(line 17–18) before thecd "$dir"(line 21) and its reference at line 40 properly resolves the previous critical issue where.github/scripts/monitor_slurm_job.shwould fail on master branch. This is a solid fix.Also applies to: 40-40
58-58:statformat strings are already portable; current syntax is correct.The unquoted format syntax
stat -f%zandstat -c%sworks fine on both BSD and GNU systems for simple format specifiers like%zand%s. Quoting is only necessary when format strings contain spaces, tabs, or shell-special characters. While adding quotes (stat -f '%z'andstat -c '%s') is reasonable for defensive programming and readability, it is not required to fix a portability issue—the current code is portable as-is.Likely an incorrect or invalid review comment.
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
.github/scripts/monitor_slurm_job.sh (3)
137-137: Consider portability ofstatcommand.Line 137 uses
stat -c%s(GNU), which will fail on BSD/macOS withstat -f%z. While this script targets SLURM clusters (typically Linux), past reviewers flagged this as a portability concern. If the script may run on heterogeneous systems, consider a portable wrapper or helper function as suggested in prior reviews.If portability across BSD/Linux is required, you can wrap the file-size check:
get_file_size() { stat -c%s "$1" 2>/dev/null || stat -f%z "$1" 2>/dev/null || echo -1 } # Then later: size=$(get_file_size "$output_file")
56-75: Sophisticated tail multiplexing is correct but complex; ensure edge cases are tested.The
exec 3< <(stdbuf -oL -eL tail -f ...)pattern with non-blocking reads (line 67, timeout 0.1s) and burst-read capping (lines 72–74) is a clever approach to stream output while monitoring job status without blocking. However, this sophistication introduces multiple tail process references (lines 57, 130, 153) and cleanup responsibilities. The drain logic (lines 115–126) with a 10k-line safety limit helps prevent infinite loops, but edge cases (e.g., rapid log spikes, stdbuf buffering delays) should be validated in testing.Consider testing scenarios where:
- Output file grows faster than the 0.1s read timeout can consume
- The drain loop hits the 10k-line safety limit during a large flush
- stdbuf buffering delays output availability during job completion
Verify that no output is lost or truncated in these scenarios by comparing final file size against streamed lines.
Also applies to: 115-126
84-95: Exit code retrieval handles simple jobs well; consider filtering for array jobs.Lines 84 and 171 use
head -n1to extract state/exit code from sacct. For non-array jobs or single-step jobs, this works correctly. However, if benchmarks use SLURM job arrays or multi-step jobs, sacct may return multiple lines, andhead -n1could select the wrong entry (e.g., an array element step instead of the parent job). Past reviewers suggested filtering by terminal state or by parent job ID before extraction.If job arrays are out of scope for this monitoring script, this is fine. Otherwise, consider the more selective sacct query suggested in earlier reviews:
# Filter for terminal states before extracting sacct_line=$(sacct -j "$job_id" --format=JobID,State,ExitCode --noheader --parsable2 2>/dev/null | \ awk -F'|' '$2 ~ /COMPLETED|FAILED|CANCELLED|TIMEOUT|OUT_OF_MEMORY/ {print; exit}') exit_code=$(echo "$sacct_line" | awk -F'|' '{print $3}')Confirm whether benchmarks use SLURM job arrays. If yes, test with an array job to verify the script correctly reports the parent job's exit status, not a sub-step's exit code.
Also applies to: 171-174
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/scripts/monitor_slurm_job.sh(1 hunks).github/workflows/phoenix/submit-bench.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: Oak Ridge | Frontier (cpu)
- GitHub Check: Oak Ridge | Frontier (gpu-acc)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Georgia Tech | Phoenix (gpu-omp)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Oak Ridge | Frontier (CCE) (gpu-omp)
- GitHub Check: Oak Ridge | Frontier (CCE) (gpu-acc)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-omp)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-acc)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (cpu)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Build & Publish
🔇 Additional comments (2)
.github/workflows/phoenix/submit-bench.sh (1)
45-45: Good alignment with benchmarking hardening objectives.The SLURM job time limit increase (03:00:00 → 04:00:00) provides necessary headroom for extended benchmark case execution while staying within the reduced GitHub workflow timeout (480 minutes). This complements the suite-level timeout reductions.
.github/scripts/monitor_slurm_job.sh (1)
5-13: Well-implemented hardening with comprehensive error handling and retry logic.The script successfully incorporates past feedback: exponential backoff for transient failures (lines 36–48), explicit failure when exit codes cannot be determined (lines 178–182), and dual-fallback exit code resolution (scontrol → sacct, lines 163–175). The cleanup trap and
set -euo pipefailprevent orphaned processes and catch unexpected errors. Job completion detection now properly guards against both immediate squeue failures and false positives via sacct.Also applies to: 26-50, 159-191
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/scripts/run_parallel_benchmarks.sh (1)
7-14: Input validation is adequate but could be explicit.The script validates argument count but doesn't explicitly validate that
device,interface, andclusterare non-empty. Consider adding a check to prevent silent failures if arguments are passed as empty strings.if [ $# -ne 3 ]; then echo "Usage: $0 <device> <interface> <cluster>" exit 1 fi device="$1" interface="$2" cluster="$3" + +if [ -z "$device" ] || [ -z "$interface" ] || [ -z "$cluster" ]; then + echo "ERROR: device, interface, and cluster cannot be empty" + exit 1 +fi
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/scripts/run_parallel_benchmarks.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Oak Ridge | Frontier (gpu-omp)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-omp)
- GitHub Check: Oak Ridge | Frontier (CCE) (gpu-omp)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (cpu)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-acc)
- GitHub Check: Oak Ridge | Frontier (CCE) (gpu-acc)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
🔇 Additional comments (1)
.github/scripts/run_parallel_benchmarks.sh (1)
64-80: File verification is robust with good debugging output.The verification logic correctly checks for YAML file existence and provides helpful debugging information (directory listing, log tail) if files are missing. The use of
|| trueto prevent script failure onlserrors is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/scripts/run_parallel_benchmarks.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Build & Publish
🔇 Additional comments (2)
.github/scripts/run_parallel_benchmarks.sh (2)
25-25: Good:set -o pipefailcorrectly applied in subshells.The prior critical issue regarding lost exit codes in pipelines has been properly addressed by adding
set -o pipefailat the start of each subshell. This ensures the pipeline returns the exit status ofsubmit_and_monitor_bench.sh, leaving the while-loop output prefixing intact.Also applies to: 29-29
63-80: Verify log file naming convention.Line 78 assumes the master benchmark log file uses the extension
.out(e.g.,master/bench-${device}-${interface}.out). Confirm this filename matches the actual output generated bysubmit_and_monitor_bench.sh.
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/scripts/run_parallel_benchmarks.sh (1)
39-51: Critical: Exit code fromwaitis still lost due to!operator (unresolved from prior review).The pattern
if ! wait "$pr_pid"; then pr_exit=$?captures the exit status of the!operator (which is 0 after successful inversion), not the actual exit code of the background process. This causes the error-handling logic to always report both jobs as successful, even if they fail.The first critical issue (pipefail in subshells) was correctly addressed on lines 25–29, but this second issue remains unfixed. The prior review flagged this at lines 39–51, and it is still present.
Apply this diff to capture exit codes directly:
# Wait and capture exit codes reliably pr_exit=0 master_exit=0 -if ! wait "$pr_pid"; then - pr_exit=$? - echo "PR job exited with code: $pr_exit" -else - echo "PR job completed successfully" -fi +wait "$pr_pid" +pr_exit=$? +if [ "$pr_exit" -ne 0 ]; then + echo "PR job exited with code: $pr_exit" +else + echo "PR job completed successfully" fi -if ! wait "$master_pid"; then - master_exit=$? - echo "Master job exited with code: $master_exit" -else - echo "Master job completed successfully" -fi +wait "$master_pid" +master_exit=$? +if [ "$master_exit" -ne 0 ]; then + echo "Master job exited with code: $master_exit" +else + echo "Master job completed successfully" +fi
🧹 Nitpick comments (1)
.gitignore (1)
94-95: Clarify the gitignore pattern syntax for.supercode/.Line 95 uses
**.supercode/, which matchesprefix.supercode/(any path ending with.supercode/). However, given the naming convention (starting with.), the intent is likely to match hidden directories.supercode/at any depth. The pattern should be**/.supercode/instead.Apply this diff to clarify the pattern:
**isolation_rules/ -**.supercode/ +**/.supercode/
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/scripts/run_parallel_benchmarks.sh(1 hunks).gitignore(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.github/scripts/run_parallel_benchmarks.sh (3)
.github/workflows/phoenix/submit-bench.sh (1)
usage(5-7).github/workflows/frontier/submit-bench.sh (1)
usage(5-7)toolchain/mfc/bench.py (1)
bench(19-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Oak Ridge | Frontier (gpu-acc)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Georgia Tech | Phoenix (cpu)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Oak Ridge | Frontier (cpu)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-omp)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (gpu-acc)
- GitHub Check: Oak Ridge | Frontier (CCE) (gpu-omp)
- GitHub Check: Georgia Tech | Phoenix (NVHPC) (cpu)
- GitHub Check: Oak Ridge | Frontier (CCE) (gpu-acc)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Code Cleanliness Check
- GitHub Check: Build & Publish
🔇 Additional comments (3)
.github/scripts/run_parallel_benchmarks.sh (3)
1-24: Setup and argument handling look good.The shebang,
set -euo pipefail, argument validation, and SCRIPT_DIR calculation are all correct and follow best practices.
25-31: Subshell pipefail handling is correct.The
set -o pipefailadded to both subshells (lines 25 and 29) correctly addresses the prior critical issue where exit codes fromsubmit_and_monitor_bench.shwere lost in pipelines. This ensures the background processes' exit statuses propagate correctly.
53-88: File verification logic is sound, but depends on fixing the exit code bug.The output verification (checking for YAML files) and detailed error reporting (directory listings and log tails) are well-designed. However, this entire error-handling path is unreachable due to the broken exit code capture in lines 39–51: since both
pr_exitandmaster_exitwill always be 0, the check on line 54 will never trigger even if the jobs actually fail.Once the exit code handling is corrected, this section will work as intended.
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
User description
PR Type
Enhancement, Tests
Description
Add comprehensive error handling and validation to benchmark execution
Implement per-test timeout mechanism (1 hour) with signal handling
Add early abort logic when failure rate exceeds 30% threshold
Improve job monitoring in CI workflows with real-time output streaming
Reduce benchmark workflow timeout from 1400 to 480 minutes
Diagram Walkthrough
File Walkthrough
bench.py
Add comprehensive error handling and validation to benchmarkstoolchain/mfc/bench.py
execution
handling
fields
messages
silently passing
test.py
Add timeout mechanism and early abort on high failure ratetoolchain/mfc/test/test.py
timeout thresholds
timeout alarm
after 20 completed tests
references
bench.yml
Improve job monitoring with real-time output and error handling.github/workflows/bench.yml
submit_and_monitor function
codes
test.yml
Update job naming and reduce workflow timeout.github/workflows/test.yml
CodeAnt-AI Description
Add per-test 1-hour timeout, fail-fast on high failure rates, stricter benchmark validation, and improved CI bench monitoring
What Changed
Impact
✅ Fewer hung tests✅ Shorter CI runs on systemic failures✅ Clearer benchmark failure logs💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Behavior Changes
✏️ Tip: You can customize this high-level summary in your review settings.
Note
Adds robust benchmark validation, per-test 1h timeouts with fail-fast, and real-time SLURM job monitoring in CI with shorter timeouts.
toolchain/mfc/bench.py):exec, simulationgrind) exist.toolchain/mfc/test/test.py):.github/workflows/bench.yml): dynamic job names incl. interface; submit-and-monitor both runs; real-time output via new script; strict exit-code handling; artifacts on Frontier..github/workflows/test.yml): clearer self-hosted job names; expanded matrix; reducedtimeout-minutesto480..github/scripts/monitor_slurm_job.shto stream SLURM output, detect completion, and verify exit status.Written by Cursor Bugbot for commit b30fea3. Configure here.