fix(skills): enqueue script polls until merge or ejection#2516
fix(skills): enqueue script polls until merge or ejection#2516ralphbean wants to merge 3 commits into
Conversation
The enqueue script reported success even when GitHub immediately dequeued the PR for failed checks. Add a polling loop after the enqueue mutation to confirm the PR remains in the queue, and surface the dequeue reason if it was removed. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Site previewPreview: https://fb92d8f5-site.fullsend-ai.workers.dev Commit: |
Instead of a short verification window, the enqueue script now polls until the PR either merges or gets ejected from the queue. Reports the dequeue reason on ejection. This replaces the fire-and-forget behavior that silently reported success even when GitHub immediately dequeued the PR. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
|
🤖 Finished Review · ✅ Success · Started 4:10 PM UTC · Completed 4:23 PM UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
ReviewFindingsHigh
Medium
Low
Labels: PR is a bug fix to skill scripts, adding type/bug to match the fix() commit prefix. Previous runReviewFindingsHigh
Medium
Low
Labels: PR modifies skill scripts and documentation under skills/merge-queue/. |
| sleep "$POLL_INTERVAL" | ||
|
|
||
| state="$(gh api graphql -f query=' | ||
| query($owner: String!, $name: String!, $number: Int!) { |
There was a problem hiding this comment.
[medium] error-handling
The polling GraphQL call (gh api graphql) has no error handling, but the script runs under set -euo pipefail. Any transient failure (network timeout, GitHub rate limit, 5xx) will cause the script to exit immediately with a non-zero status, indistinguishable from a genuine ejection. The second GraphQL call (dequeue reason) correctly uses 2>/dev/null || true, showing the author is aware of the pattern but did not apply it to the main polling call.
Suggested fix: Wrap the polling gh api graphql call in a retry-or-continue pattern so transient failures retry on the next poll iteration rather than terminating the script.
|
|
||
| POLL_INTERVAL="${POLL_INTERVAL:-30}" | ||
|
|
||
| # Resolve owner/repo and number from the URL for GraphQL queries |
There was a problem hiding this comment.
[low] variable-naming
Local variables use leading underscore prefix (_owner, _name, _number) which is not a pattern seen in other scripts in this directory. dequeue-reason.sh and queue-status.sh use plain names (owner, name, number).
| echo "ERROR: could not parse PR URL: $pr_url" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
[low] edge-case
The polling loop has no maximum iteration count or overall timeout. If a PR remains queued indefinitely, the script will poll forever. This matches the unbounded loop in await-and-enqueue.sh but is worth noting.
The script was aborting on any failed check (e.g. codecov/patch) even when it wasn't a required check. Now only failures and pending status of required checks (commit-lint, e2e, test) block enqueue. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
|
🤖 Finished Review · ✅ Success · Started 6:50 PM UTC · Completed 7:01 PM UTC |
| [$map | to_entries[] | select(.value | test("FAILURE|ERROR|CANCELLED|TIMED_OUT|STARTUP_FAILURE|ACTION_REQUIRED")) | .key + " (" + .value + ")"] as $failures | | ||
| # Check for pending | ||
| [$map | to_entries[] | select(.value | test("SUCCESS|NEUTRAL|SKIPPED|COMPLETED|FAILURE|ERROR|CANCELLED|TIMED_OUT|STARTUP_FAILURE|ACTION_REQUIRED") | not) | .key] as $pending | | ||
| # Check for failures (only required checks matter) |
There was a problem hiding this comment.
[high] logic-error
When $required_json is [] (either because no required checks are configured or because the rules API call on line 29 fails and falls back to echo '[]'), the new $required | index($k) filter causes both $failures and $pending to be empty arrays unconditionally. Combined with $missing also being empty (iterating an empty $required), the script will skip all check validation and immediately proceed to enqueue the PR. This is a silent safety regression: a transient API failure now causes the script to enqueue a PR with failing checks.
Suggested fix: When $required is empty, fall back to evaluating all checks. Change the jq filter to: select(($required | length == 0) or (.key as $k | $required | index($k))) for both failures and pending arrays. Alternatively, treat an empty $required as an error condition and abort.
| sleep "$POLL_INTERVAL" | ||
|
|
||
| state="$(gh api graphql -f query=' | ||
| query($owner: String!, $name: String!, $number: Int!) { |
There was a problem hiding this comment.
[medium] error-handling
The polling GraphQL call (gh api graphql) has no error handling, but the script runs under set -euo pipefail. Any transient failure will cause the script to exit immediately with a non-zero status, indistinguishable from a genuine ejection.
Suggested fix: Wrap the polling gh api graphql call in a retry-or-continue pattern so that transient failures retry on the next poll iteration rather than terminating the script.
| @@ -15,6 +15,10 @@ allowed-tools: Bash(bash skills/merge-queue/scripts/*:*) | |||
| Run `bash skills/merge-queue/scripts/enqueue-pr.sh [PR_NUMBER_OR_URL]` to enqueue a PR. | |||
There was a problem hiding this comment.
[low] missing-doc
The enqueue-pr.sh script now accepts a POLL_INTERVAL environment variable (default: 30 seconds) but this is not documented in the Enqueue a PR section.
| echo "ERROR: could not parse PR URL: $pr_url" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
[low] edge-case
The polling loop has no maximum iteration count or overall timeout. If a PR remains in a QUEUED/AWAITING_CHECKS state indefinitely, the script will poll forever.
|
|
||
| POLL_INTERVAL="${POLL_INTERVAL:-30}" | ||
|
|
||
| # Resolve owner/repo and number from the URL for GraphQL queries |
There was a problem hiding this comment.
[low] variable-naming
Local variables use leading underscore prefix (_owner, _name, _number) which is not a pattern seen in other scripts in this directory. dequeue-reason.sh and queue-status.sh use plain names.
| Run `bash skills/merge-queue/scripts/enqueue-pr.sh [PR_NUMBER_OR_URL]` to enqueue a PR. | ||
| Omit the argument to enqueue the current branch's PR. | ||
|
|
||
| The script enqueues the PR and then **polls until the PR merges or is ejected |
There was a problem hiding this comment.
[low] behavioral-change
The blocking behavior affects all callers of enqueue-pr.sh, including await-and-enqueue.sh which delegates via exec (line 95).
|
🤖 Finished Retro · ✅ Success · Started 3:58 PM UTC · Completed 4:03 PM UTC |
Retro: PR #2516 —
|
Summary
enqueue-pr.shnow polls after enqueuing until the PR either merges or gets ejected from the queuefailed_checks,merge_conflict) and exits non-zeroTest plan
failed_checks) — confirmed the polling loop runs and the script exits non-zero with reason on ejection🤖 Generated with Claude Code