Skip to content

fix(skills): enqueue script polls until merge or ejection#2516

Closed
ralphbean wants to merge 3 commits into
mainfrom
fix/enqueue-verify-dequeue
Closed

fix(skills): enqueue script polls until merge or ejection#2516
ralphbean wants to merge 3 commits into
mainfrom
fix/enqueue-verify-dequeue

Conversation

@ralphbean

@ralphbean ralphbean commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

  • enqueue-pr.sh now polls after enqueuing until the PR either merges or gets ejected from the queue
  • On ejection, reports the dequeue reason (e.g. failed_checks, merge_conflict) and exits non-zero
  • Previously the script reported success immediately after the enqueue mutation, even when GitHub dequeued the PR seconds later
  • Updates SKILL.md to document the new blocking behavior

Test plan

🤖 Generated with Claude Code

The enqueue script reported success even when GitHub immediately
dequeued the PR for failed checks. Add a polling loop after the
enqueue mutation to confirm the PR remains in the queue, and surface
the dequeue reason if it was removed.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

Site preview

Preview: https://fb92d8f5-site.fullsend-ai.workers.dev

Commit: 75279d98850261c7f9802d0ca1f9279a5f48d715

Instead of a short verification window, the enqueue script now polls
until the PR either merges or gets ejected from the queue. Reports the
dequeue reason on ejection. This replaces the fire-and-forget behavior
that silently reported success even when GitHub immediately dequeued
the PR.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@ralphbean ralphbean changed the title fix(skills): verify PR stays in merge queue after enqueue fix(skills): enqueue script polls until merge or ejection Jun 22, 2026
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:10 PM UTC · Completed 4:23 PM UTC
Commit: f48898a · View workflow run →

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review

Findings

High

  • [logic-error] skills/merge-queue/scripts/await-and-enqueue.sh:49 — When $required_json is [] (either because no required checks are configured or because the rules API call on line 29 fails and falls back to echo '[]'), the new $required | index($k) filter causes both $failures and $pending to be empty arrays unconditionally. Combined with $missing also being empty (iterating an empty $required), the script will skip all check validation and immediately proceed to enqueue the PR. Before this change, failures in ANY check would block enqueuing. This is a silent safety regression: a transient API failure on line 29 now causes the script to enqueue a PR with failing checks rather than blocking.
    Remediation: When $required is empty, fall back to the previous behavior of evaluating all checks (not filtering by required). For example, change the jq filter to: select(($required | length == 0) or (.key as $k | $required | index($k))) for both the failures and pending arrays. Alternatively, treat an empty $required as an error condition and abort rather than silently proceeding.

  • [protected-path] skills/merge-queue/SKILL.md, skills/merge-queue/scripts/await-and-enqueue.sh, skills/merge-queue/scripts/enqueue-pr.sh — All changed files are under the skills/ protected path, which requires human approval. The PR has no linked issue and the description does not justify why these governance/infrastructure files need to be modified.
    Remediation: Link this PR to an issue that authorizes the change to protected files, or add a detailed explanation in the PR description for why the protected path change is necessary.

Medium

  • [error-handling] skills/merge-queue/scripts/enqueue-pr.sh:68 — The polling GraphQL call (gh api graphql) has no error handling, but the script runs under set -euo pipefail. Any transient failure (network timeout, GitHub rate limit, 5xx) will cause the script to exit immediately with a non-zero status, indistinguishable from a genuine ejection. The second GraphQL call (dequeue reason) correctly uses 2>/dev/null || true, showing the author is aware of the pattern but did not apply it to the main polling call.
    Remediation: Wrap the polling gh api graphql call in a retry-or-continue pattern (e.g., state="$(gh api graphql ...)" || { echo "API call failed, retrying..." >&2; continue; }) so that transient failures retry on the next poll iteration rather than terminating the script.

Low

  • [missing-doc] skills/merge-queue/SKILL.md:15 — The enqueue-pr.sh script now accepts a POLL_INTERVAL environment variable (default: 30 seconds) to control polling frequency, but this is not documented in the "Enqueue a PR" section. The variable is already documented for await-and-enqueue.sh.

  • [edge-case] skills/merge-queue/scripts/enqueue-pr.sh:62 — The polling loop has no maximum iteration count or overall timeout. If a PR remains in a QUEUED/AWAITING_CHECKS state indefinitely, the script will poll forever. This matches the unbounded loop in await-and-enqueue.sh but is worth noting.

  • [variable-naming] skills/merge-queue/scripts/enqueue-pr.sh:53 — Local variables use leading underscore prefix (_owner, _name, _number) which is not a pattern seen in other scripts in this directory. dequeue-reason.sh and queue-status.sh use plain names (owner, name, number).

  • [behavioral-change] skills/merge-queue/SKILL.md:18 — The blocking behavior affects all callers of enqueue-pr.sh, including await-and-enqueue.sh which delegates via exec (line 95). Two scripts now form a pipeline: await-and-enqueue waits for checks, then enqueue-pr waits for merge outcome.


Labels: PR is a bug fix to skill scripts, adding type/bug to match the fix() commit prefix.

Previous run

Review

Findings

High

  • [protected-path] skills/merge-queue/scripts/enqueue-pr.sh, skills/merge-queue/SKILL.md — Both changed files are under the skills/ protected path, which requires human approval. The PR has no linked issue and the description does not justify why these governance/infrastructure files need to be modified.
    Remediation: Link this PR to an issue that authorizes the change to protected files, or add a detailed explanation in the PR description for why the protected path change is necessary.

Medium

  • [missing-authorization] skills/merge-queue/scripts/enqueue-pr.sh — This PR adds 82 lines of non-trivial polling logic without a linked issue. Non-trivial changes require explicit authorization through a linked issue.
    Remediation: Link this PR to an issue that authorizes the behavioral change from fire-and-forget enqueue to blocking poll-until-merged.

  • [error-handling] skills/merge-queue/scripts/enqueue-pr.sh:68 — The polling GraphQL call (gh api graphql) has no error handling, but the script runs under set -euo pipefail. Any transient failure (network timeout, GitHub rate limit, 5xx) will cause the script to exit immediately with a non-zero status, indistinguishable from a genuine ejection. The second GraphQL call (dequeue reason) correctly uses 2>/dev/null || true, showing the author is aware of the pattern but did not apply it to the main polling call.
    Remediation: Wrap the polling gh api graphql call in a retry-or-continue pattern (e.g., state="$(gh api graphql ...)" || { echo "API call failed, retrying..." >&2; continue; }) so that transient failures retry on the next poll iteration rather than terminating the script.

Low

  • [scope-coherence] skills/merge-queue/scripts/enqueue-pr.sh — The change alters the script's behavior from fire-and-forget to blocking. await-and-enqueue.sh delegates to enqueue-pr.sh via exec (line 95), so the two scripts form a pipeline: await-and-enqueue waits for checks, then enqueue-pr enqueues and waits for merge outcome. The scripts serve complementary roles, but the behavioral shift is worth noting.

  • [naming-alignment] skills/merge-queue/scripts/enqueue-pr.sh — The script name enqueue-pr.sh suggests a single action (add to queue), but the implementation now performs two distinct operations (enqueue + poll until outcome). The SKILL.md documentation is updated to describe the new behavior, but the name no longer fully reflects the script's contract.

  • [variable-naming] skills/merge-queue/scripts/enqueue-pr.sh:53 — Local variables use leading underscore prefix (_owner, _name, _number) which is not a pattern seen in other scripts in this directory. dequeue-reason.sh and queue-status.sh use plain names (owner, name, number).

  • [edge-case] skills/merge-queue/scripts/enqueue-pr.sh:62 — The polling loop has no maximum iteration count or overall timeout. If a PR remains in a QUEUED/AWAITING_CHECKS state indefinitely, the script will poll forever. This matches the unbounded loop in await-and-enqueue.sh but is worth noting.


Labels: PR modifies skill scripts and documentation under skills/merge-queue/.

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

sleep "$POLL_INTERVAL"

state="$(gh api graphql -f query='
query($owner: String!, $name: String!, $number: Int!) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] error-handling

The polling GraphQL call (gh api graphql) has no error handling, but the script runs under set -euo pipefail. Any transient failure (network timeout, GitHub rate limit, 5xx) will cause the script to exit immediately with a non-zero status, indistinguishable from a genuine ejection. The second GraphQL call (dequeue reason) correctly uses 2>/dev/null || true, showing the author is aware of the pattern but did not apply it to the main polling call.

Suggested fix: Wrap the polling gh api graphql call in a retry-or-continue pattern so transient failures retry on the next poll iteration rather than terminating the script.


POLL_INTERVAL="${POLL_INTERVAL:-30}"

# Resolve owner/repo and number from the URL for GraphQL queries

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] variable-naming

Local variables use leading underscore prefix (_owner, _name, _number) which is not a pattern seen in other scripts in this directory. dequeue-reason.sh and queue-status.sh use plain names (owner, name, number).

echo "ERROR: could not parse PR URL: $pr_url" >&2
exit 1
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

The polling loop has no maximum iteration count or overall timeout. If a PR remains queued indefinitely, the script will poll forever. This matches the unbounded loop in await-and-enqueue.sh but is worth noting.

@fullsend-ai-review fullsend-ai-review Bot added the component/harness Agent harness, config, and skills loading label Jun 22, 2026
The script was aborting on any failed check (e.g. codecov/patch) even
when it wasn't a required check. Now only failures and pending status
of required checks (commit-lint, e2e, test) block enqueue.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 6:50 PM UTC · Completed 7:01 PM UTC
Commit: 75279d9 · View workflow run →

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

[$map | to_entries[] | select(.value | test("FAILURE|ERROR|CANCELLED|TIMED_OUT|STARTUP_FAILURE|ACTION_REQUIRED")) | .key + " (" + .value + ")"] as $failures |
# Check for pending
[$map | to_entries[] | select(.value | test("SUCCESS|NEUTRAL|SKIPPED|COMPLETED|FAILURE|ERROR|CANCELLED|TIMED_OUT|STARTUP_FAILURE|ACTION_REQUIRED") | not) | .key] as $pending |
# Check for failures (only required checks matter)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] logic-error

When $required_json is [] (either because no required checks are configured or because the rules API call on line 29 fails and falls back to echo '[]'), the new $required | index($k) filter causes both $failures and $pending to be empty arrays unconditionally. Combined with $missing also being empty (iterating an empty $required), the script will skip all check validation and immediately proceed to enqueue the PR. This is a silent safety regression: a transient API failure now causes the script to enqueue a PR with failing checks.

Suggested fix: When $required is empty, fall back to evaluating all checks. Change the jq filter to: select(($required | length == 0) or (.key as $k | $required | index($k))) for both failures and pending arrays. Alternatively, treat an empty $required as an error condition and abort.

sleep "$POLL_INTERVAL"

state="$(gh api graphql -f query='
query($owner: String!, $name: String!, $number: Int!) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] error-handling

The polling GraphQL call (gh api graphql) has no error handling, but the script runs under set -euo pipefail. Any transient failure will cause the script to exit immediately with a non-zero status, indistinguishable from a genuine ejection.

Suggested fix: Wrap the polling gh api graphql call in a retry-or-continue pattern so that transient failures retry on the next poll iteration rather than terminating the script.

@@ -15,6 +15,10 @@ allowed-tools: Bash(bash skills/merge-queue/scripts/*:*)
Run `bash skills/merge-queue/scripts/enqueue-pr.sh [PR_NUMBER_OR_URL]` to enqueue a PR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] missing-doc

The enqueue-pr.sh script now accepts a POLL_INTERVAL environment variable (default: 30 seconds) but this is not documented in the Enqueue a PR section.

echo "ERROR: could not parse PR URL: $pr_url" >&2
exit 1
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

The polling loop has no maximum iteration count or overall timeout. If a PR remains in a QUEUED/AWAITING_CHECKS state indefinitely, the script will poll forever.


POLL_INTERVAL="${POLL_INTERVAL:-30}"

# Resolve owner/repo and number from the URL for GraphQL queries

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] variable-naming

Local variables use leading underscore prefix (_owner, _name, _number) which is not a pattern seen in other scripts in this directory. dequeue-reason.sh and queue-status.sh use plain names.

Run `bash skills/merge-queue/scripts/enqueue-pr.sh [PR_NUMBER_OR_URL]` to enqueue a PR.
Omit the argument to enqueue the current branch's PR.

The script enqueues the PR and then **polls until the PR merges or is ejected

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] behavioral-change

The blocking behavior affects all callers of enqueue-pr.sh, including await-and-enqueue.sh which delegates via exec (line 95).

@fullsend-ai-review fullsend-ai-review Bot added the type/bug Confirmed defect in existing behavior label Jun 22, 2026
@ralphbean ralphbean closed this Jul 2, 2026
@fullsend-ai-retro

fullsend-ai-retro Bot commented Jul 2, 2026

Copy link
Copy Markdown

🤖 Finished Retro · ✅ Success · Started 3:58 PM UTC · Completed 4:03 PM UTC
Commit: 75279d9 · View workflow run →

@fullsend-ai-retro

Copy link
Copy Markdown

Retro: PR #2516fix(skills): enqueue script polls until merge or ejection

Outcome: Draft PR opened June 22, closed without merging July 2. Two review agent runs, zero human reviews.

Key findings

  1. Token waste on draft PR reviews — The review agent ran twice (Opus model) on a draft PR that was never merged. The dispatch routing logic (opened|synchronize|ready_for_review) does not filter out draft PRs, so every draft PR opened event triggers a review. This is by-design oversight: ADR 0002 lists ready_for_review as a trigger type but doesn't explicitly exclude opened for drafts.

  2. Review agent quality was strong — The agent caught a genuine HIGH-severity logic error: when $required_json is empty (e.g., transient API failure), the required-check filter makes both $failures and $pending empty, silently enqueuing a PR with failing checks. This is a real safety regression catch. The MEDIUM error-handling finding about unguarded GraphQL calls under set -euo pipefail was also practical.

  3. No human review delta — No human reviewed this PR, so autonomy-readiness comparison is not possible for this PR.

Proposals filed

  1. Skip review agent on draft PRs — Add github.event.pull_request.draft == false guard to dispatch routing
  2. Add shell tests for merge-queue skill scripts — The logic error the agent caught would be preventable by tests

Proposals filed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/harness Agent harness, config, and skills loading type/bug Confirmed defect in existing behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant