fix(hooks): eliminate macOS PTY output race condition by raballew · Pull Request #837 · jumpstarter-dev/jumpstarter

raballew · 2026-06-24T09:36:51Z

Summary

Fixes the root cause of flaky PTY output tests on macOS (#821, previously addressed as symptoms in #560, #733, #826).

The bug has two compounding races:

Event-loop scheduling starvation: The reader task checks while not pty_state.reader_stop: at loop entry. For fast-completing subprocesses, wait_for_process can set reader_stop = True before the reader ever gets scheduled, causing it to skip all reads entirely.
macOS PTY revocation: start_new_session=True makes the subprocess a session leader. On macOS (BSD), when a session leader exits, revoke(2) is called on the terminal device, which can discard buffered data -- making the drain (the only remaining capture path after Race 1) unreliable.

Changes

start_new_session=True -> process_group=0: Creates a new process group without a new session. No session leader means no PTY revocation on macOS. The code never uses session-based operations (os.killpg, session signals) -- only process.terminate()/process.kill() -- so nothing else changes.
Reader loop restructured to read-before-stop: Changed from while not reader_stop (can skip the entire loop) to while True with reader_stop checked only inside the except BlockingIOError handler. The reader always attempts os.read() before considering the stop flag.
Removed DRAIN_MAX_EMPTY_POLLS retry logic: With the root causes fixed, the drain retry mechanism from fix: retry PTY drain on empty select() to prevent data loss on macOS #826 is unnecessary. Reverted the drain to a simple single-pass safety net.

Test plan

All 53 hooks tests pass locally on Linux (Python 3.12)
Linter passes (ruff check)
macOS CI passes consistently (the previously flaky tests)

Closes #821

🤖 Generated with Claude Code

coderabbitai · 2026-06-24T09:37:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Updates PTY hook execution and drain handling, adjusts the PTY tests to match the new behavior, and expands the pull request Python test matrix.

Changes

PTY Drain Logic Simplification

Layer / File(s)	Summary
PTY reader and drain behavior `python/packages/jumpstarter/jumpstarter/exporter/hooks.py`	Changes subprocess spawn to `process_group=0`, refactors the PTY reader loop to handle `reader_stop` during non-blocking reads, and simplifies final drain termination to use `select()` timeout with byte/time bounds.

PTY Drain Test Updates

Layer / File(s)	Summary
Drain tests and PTY mock behavior `python/packages/jumpstarter/jumpstarter/exporter/hooks_test.py`	Removes the `DRAIN_MAX_EMPTY_POLLS` import, updates PTY mock EOF handling, and replaces drain retry tests with a stop-flag test and constant checks.

Python Test Matrix Update

Layer / File(s)	Summary
PR matrix outputs `.github/workflows/python-tests.yaml`	The `pull_request` matrix now outputs Python 3.11, 3.12, and 3.13 on `ubuntu-24.04` and `macos-15`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

jumpstarter-dev/jumpstarter#561: Modifies the same PTY reading and draining code path in hooks.py and related tests.
jumpstarter-dev/jumpstarter#826: Also changes the PTY drain constants and the associated drain tests.
jumpstarter-dev/jumpstarter#830: Also updates .github/workflows/python-tests.yaml for the pull request test matrix.

Suggested reviewers

bennyz
kirkbrauer
evakhoni

Poem

🐇 I hopped through PTYs, light and spry,
With bytes and time to catch each cry.
New tests now tumble, clear and neat,
While runners stretch across the fleet.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately summarizes the macOS PTY race fix in hooks.
Description check	✅ Passed	The description clearly matches the PTY race fix and test/workflow changes in the PR.
Linked Issues check	✅ Passed	The changes address the reported macOS PTY race by fixing reader timing, avoiding session-leader revocation, and updating tests/CI.
Out of Scope Changes check	✅ Passed	The workflow and test updates are directly tied to validating the macOS PTY fix, with no obvious unrelated changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/python-tests.yaml:
- Around line 44-45: Update the stale matrix comments in the workflow so they
match the current behavior. The comments near the matrix setup and downstream
job reference only Python 3.12 on Linux, but the actual matrix in the workflow
now includes Python 3.11, 3.12, and 3.13 on both ubuntu-24.04 and macos-15.
Adjust the comments in the relevant workflow sections to accurately describe the
expanded PR test coverage, using the matrix/output symbols already present in
the job definition.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c151c11a-1d80-4cf7-920c-31f30c29e97b

📥 Commits

Reviewing files that changed from the base of the PR and between 90695e1 and c2ed2a6.

📒 Files selected for processing (1)

.github/workflows/python-tests.yaml

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/python-tests.yaml:
- Around line 56-60: The matrix job artifact upload is using the same name for
every run, which causes conflicts when multiple `run-index` executions upload in
parallel. Update the workflow job that uses the `matrix` entries for
`python-version` and `run-index` so the `actions/upload-artifact` name includes
`run-index` (and any other matrix identifier needed) to make each artifact name
unique. Use the existing `run-index` matrix symbol to parameterize the artifact
name in the upload step.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c1a6b71-1179-478b-942a-4db5a8d36733

📥 Commits

Reviewing files that changed from the base of the PR and between 3a84cd7 and 2b33f9a.

📒 Files selected for processing (2)

.github/workflows/python-tests.yaml
python/packages/jumpstarter/jumpstarter/exporter/hooks.py

🚧 Files skipped from review as they are similar to previous changes (1)

python/packages/jumpstarter/jumpstarter/exporter/hooks.py

coderabbitai · 2026-06-24T12:05:09Z

+      fail-fast: false
      matrix:
        runs-on: ${{ fromJson(needs.changes.outputs.runners) }}
-        # Floor: oldest Python in supported platforms (RHEL 9 appstream)
-        # Ceiling: newest Python in latest Fedora
-        # Review on each RHEL/Fedora release
-        # PRs run only 3.12 on Linux; merge queue runs all versions
-        # on both Linux and macOS.
        python-version: ${{ fromJson(needs.changes.outputs.python-versions) }}
+        run-index: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]


🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

sed -n '1,220p' .github/workflows/python-tests.yaml

Repository: jumpstarter-dev/jumpstarter

Length of output: 7886

🌐 Web query:

actions/upload-artifact repeated artifact name same workflow run matrix jobs behavior v7

💡 Result:

In GitHub Actions, using the same artifact name across multiple jobs in a matrix (or within the same workflow run) is not supported in actions/upload-artifact version 4 and later (including v7) [1][2][3]. Key points regarding this behavior: - Immutability and Conflict Errors: Unlike earlier versions (e.g., v3), which allowed multiple uploads to overwrite or merge into a single artifact, v4+ artifacts are immutable [1][2][3]. Attempting to upload multiple artifacts with the same name during a single workflow run will trigger a conflict error (HTTP 409) [4][2][3]. - Recommended Solution: You must ensure that each artifact name is unique within the workflow run [1][2][3]. A common practice is to suffix or prefix the artifact name with matrix variables, such as name: artifact-${{ matrix.os }} [1][4][5]. - Downloading Merged Artifacts: To reconstruct a single collection of files from these uniquely named artifacts, use actions/download-artifact with the pattern and merge-multiple inputs [1][4][6]. Setting merge-multiple: true allows you to download all artifacts matching a specific pattern (e.g., pattern: artifact-*) into a single directory [4][6]. This change was introduced to prevent accidental data corruption and undefined behavior caused by concurrent, conflicting writes to the same artifact storage [1][2][3].

Citations:

1: https://github.blog/news-insights/product-news/get-started-with-v4-of-github-actions-artifacts/

2: https://raw.githubusercontent.com/actions/upload-artifact/main/README.md

3: https://github.com/marketplace/actions/upload-a-build-artifact

4: [bug] (v4) Unable to upload to same artifact name from multiple jobs actions/upload-artifact#478

5: Uploading to the same named Artifact multiple times -> Why the breaking change? actions/upload-artifact#469

6: https://github.com/actions/upload-artifact/blob/main/docs/MIGRATION.md

Include run-index in the artifact name. Each matrix job currently uploads to the same actions/upload-artifact name, which conflicts in v4/v7 and fails the later uploads.

Suggested fix

- name: Upload test logs if: always() uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7 with: - name: test-logs-${{ matrix.runs-on }}-py${{ matrix.python-version }} + name: test-logs-${{ matrix.runs-on }}-py${{ matrix.python-version }}-run${{ matrix.run-index }} path: ${{ runner.temp }}/test-logs/ if-no-files-found: ignore retention-days: 7

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/python-tests.yaml around lines 56 - 60, The matrix job artifact upload is using the same name for every run, which causes conflicts when multiple `run-index` executions upload in parallel. Update the workflow job that uses the `matrix` entries for `python-version` and `run-index` so the `actions/upload-artifact` name includes `run-index` (and any other matrix identifier needed) to make each artifact name unique. Use the existing `run-index` matrix symbol to parameterize the artifact name in the upload step.

mangelajo · 2026-06-24T16:34:27Z

+            echo 'python-versions=["3.11", "3.12", "3.13"]' >> "$GITHUB_OUTPUT"
+            echo 'runners=["macos-15"]' >> "$GITHUB_OUTPUT"


Suggested change

echo 'python-versions=["3.11", "3.12", "3.13"]' >> "$GITHUB_OUTPUT"

echo 'runners=["macos-15"]' >> "$GITHUB_OUTPUT"

echo 'python-versions=["3.12"]' >> "$GITHUB_OUTPUT"

echo 'runners=["ubuntu-24.04"]' >> "$GITHUB_OUTPUT"

let's just put this back

mangelajo · 2026-06-24T16:35:19Z

    if: needs.changes.outputs.should_run == 'true' || github.event_name == 'workflow_dispatch'
    runs-on: ${{ matrix.runs-on }}
    strategy:
+      fail-fast: false


let's keep fail fast as this... it's very annoying to not get to know if the other variants work or not when one fails.

mangelajo · 2026-06-24T16:35:28Z

-        # PRs run only 3.12 on Linux; merge queue runs all versions
-        # on both Linux and macOS.
        python-version: ${{ fromJson(needs.changes.outputs.python-versions) }}
+        run-index: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]


Suggested change

run-index: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

mangelajo · 2026-06-24T16:37:58Z

+    async def test_reader_grace_constant_is_reasonable(self) -> None:
+        assert READER_GRACE_SECONDS == 2.0


Suggested change

async def test_reader_grace_constant_is_reasonable(self) -> None:

assert READER_GRACE_SECONDS == 2.0

mangelajo

looks good, much simpler. let's just remove the remains of forced retesting

mangelajo

looks good only those changes needed

mangelajo · 2026-06-25T08:56:54Z

oh no, it's really introducing regressions in E2E/linux for the hooks

https://github.com/jumpstarter-dev/jumpstarter/actions/runs/28114462001/job/83374717893?pr=837#step:10:1194

Hooks E2E Tests Group H: PR Regression Tests H2: beforeLease fail+exit does NOT run afterLease hook [hooks]
/home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410
  > Enter [It] H2: beforeLease fail+exit does NOT run afterLease hook @ 06/25/26 07:33:35.26
  Started exporter test-exporter-hooks (PID 23141)
  [TIMEDOUT] in [It] - /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410 @ 06/25/26 07:37:06.917
  < Exit [It] H2: beforeLease fail+exit does NOT run afterLease hook @ 06/25/26 07:37:06.918 (3m31.658s)
  > Enter [AfterEach] Hooks E2E Tests @ 06/25/26 07:37:06.918
  < Exit [AfterEach] Hooks E2E Tests @ 06/25/26 07:37:06.918 (0s)
  > Enter [AfterAll] Hooks E2E Tests @ 06/25/26 07:37:06.918
  < Exit [AfterAll] Hooks E2E Tests @ 06/25/26 07:37:08.649 (1.732s)
• [TIMEDOUT] [213.389 seconds]
Hooks E2E Tests Group H: PR Regression Tests [It] H2: beforeLease fail+exit does NOT run afterLease hook [hooks]
/home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410

  [TIMEDOUT] A suite timeout occurred
  In [It] at: /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410 @ 06/25/26 07:37:06.917

  Full Stack Trace

  This is the Progress Report generated when the suite timeout occurred:
    Hooks E2E Tests Group H: PR Regression Tests H2: beforeLease fail+exit does NOT run afterLease hook (Spec Runtime: 3m31.657s)
      /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410
      In [It] (Node Runtime: 3m31.657s)
        /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410

      Spec Goroutine
      goroutine 833 [syscall, 3 minutes]
        syscall.Syscall6(0xf7, 0x3, 0xc, 0x161486d7fbb0, 0x4, 0x161486dc8240, 0x0)
          /opt/hostedtoolcache/go/1.26.3/x64/src/syscall/syscall_linux.go:96
        internal/syscall/unix.Waitid(0x161486d7fd00?, 0x56976d?, 0x161486dee340?, 0x1?, 0x161486b74510?)
          /opt/hostedtoolcache/go/1.26.3/x64/src/internal/syscall/unix/waitid_linux.go:18
        os.(*Process).pidfdWait.func1(...)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/pidfd_linux.go:108
        os.ignoringEINTR(...)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/file_posix.go:256
        os.(*Process).pidfdWait(0x161486d9c400)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/pidfd_linux.go:107
        os.(*Process).wait(0x569a40?)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec_unix.go:25
        os.(*Process).Wait(...)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec.go:346
        os/exec.(*Cmd).Wait(0x161486c28000)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec/exec.go:930
        os/exec.(*Cmd).Run(0x161486c28000)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec/exec.go:632
      > github.com/jumpstarter-dev/jumpstarter/e2e/test.RunCmd({0xa5bf5a?, 0xa81ccf?}, {0x161486d7fe08?, 0x161486aa4520?, 0x161486d21450?})
          /home/runner/work/jumpstarter/jumpstarter/e2e/test/utils.go:156
            | 	cmd.Stdout = &out
            | 	cmd.Stderr = &out
            > 	err := cmd.Run()
            | 	return strings.TrimSpace(out.String()), err
            | }
      > github.com/jumpstarter-dev/jumpstarter/e2e/test.Jmp({0x161486d7fe08, 0x8, 0x8})
          /home/runner/work/jumpstarter/jumpstarter/e2e/test/utils.go:213
            | // Jmp runs a jmp CLI command and returns the output.
            | func Jmp(args ...string) (string, error) {
            > 	return RunCmd(JmpPath(), args...)
            | }
            | 
      > github.com/jumpstarter-dev/jumpstarter/e2e/test.init.func6.14.2()
          /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:413
            | startHooksExporterSingle("exporter-hooks-before-fail-exit-with-after.yaml")
            | 
            > out, err := Jmp("shell", "--client", "test-client-hooks",
            | 	"--selector", "example.com/board=hooks", "j", "power", "on")
            | Expect(err).To(HaveOccurred())
        github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0x161486b11f68?, 0x0?})
          /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.28.1/internal/node.go:585
        github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
          /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.28.1/internal/suite.go:946
        github.com/onsi/ginkgo/v2/internal.(*Suite).runNode in goroutine 7
          /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.28.1/internal/suite.go:911

      Goroutines of Interest
      goroutine 853 [syscall, 3 minutes]
        syscall.Syscall6(0xf7, 0x3, 0xb, 0x161486bb3e68, 0x4, 0x161486dc8000, 0x0)
          /opt/hostedtoolcache/go/1.26.3/x64/src/syscall/syscall_linux.go:96
        internal/syscall/unix.Waitid(0x0?, 0x161486bb3e98?, 0x46c3d7?, 0x0?, 0xea4a30?)
          /opt/hostedtoolcache/go/1.26.3/x64/src/internal/syscall/unix/waitid_linux.go:18
        os.(*Process).pidfdWait.func1(...)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/pidfd_linux.go:108
        os.ignoringEINTR(...)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/file_posix.go:256
        os.(*Process).pidfdWait(0x161486d9c140)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/pidfd_linux.go:107
        os.(*Process).wait(0x0?)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec_unix.go:25
        os.(*Process).Wait(...)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec.go:346
        os/exec.(*Cmd).Wait(0x161486d37a00)
          /opt/hostedtoolcache/go/1.26.3/x64/src/os/exec/exec.go:930
      > github.com/jumpstarter-dev/jumpstarter/e2e/test.(*ProcessTracker).StartExporterSingle.func1()
          /home/runner/work/jumpstarter/jumpstarter/e2e/test/utils.go:422
            | // Reap the child process in the background so it doesn't become a zombie.
            | go func() {
            > 	_ = cmd.Wait()
            | }()
            | 
      > github.com/jumpstarter-dev/jumpstarter/e2e/test.(*ProcessTracker).StartExporterSingle in goroutine 833
          /home/runner/work/jumpstarter/jumpstarter/e2e/test/utils.go:421
            | 
            | // Reap the child process in the background so it doesn't become a zombie.
            > go func() {
            | 	_ = cmd.Wait()
            | }()
------------------------------
SSSSSSSSSS

Summarizing 1 Failure:
  [TIMEDOUT] Hooks E2E Tests Group H: PR Regression Tests [It] H2: beforeLease fail+exit does NOT run afterLease hook [hooks]
  /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:410

Ran 69 of 91 Specs in 1794.185 seconds
FAIL! - Suite Timeout Elapsed -- 68 Passed | 1 Failed | 0 Pending | 22 Skipped

Replace start_new_session=True with process_group=0 to prevent macOS PTY revocation on subprocess exit. Restructure the reader loop to always attempt os.read() before checking the stop flag, preventing event-loop scheduling starvation from skipping all reads. These two changes address the root cause of the flaky macOS PTY tests (jumpstarter-dev#560, jumpstarter-dev#733, jumpstarter-dev#821, jumpstarter-dev#826) rather than the symptoms. The drain retry logic from jumpstarter-dev#826 is no longer needed and is removed. Closes jumpstarter-dev#821 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run the full test matrix (all Python versions, Linux + macOS) on PRs to confirm the PTY race condition fix passes consistently on macOS. This will be reverted once 20 consecutive macOS passes are confirmed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add test for the grandchild scenario where the PTY slave is held open after the direct child exits. The reader gets BlockingIOError (no data) and exits via the stop flag. This covers lines 357-358 in hooks.py to satisfy the diff-coverage threshold. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous test mocked os.read to simulate BlockingIOError, but on Linux the reader gets EOF before BlockingIOError. Use a real backgrounded subprocess (sleep &) to hold the PTY slave open, which forces the reader to exit via the reader_stop + BlockingIOError path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BlockingIOError means "no data yet" (transient), not EOF. Setting eof_seen on BlockingIOError caused the mock to inject drain data during the main reader loop on macOS, where BlockingIOError can occur before the real EOF arrives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Temporarily run only macOS runners with 20 repetitions via matrix index to confirm the PTY race condition fix is stable. Will be reverted after validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… empty On macOS, PTY internal buffer delivery can lag behind slave closure for very fast commands. The drain must keep polling (bounded by DRAIN_TIMEOUT_SECONDS) rather than breaking on the first empty select(), giving the kernel time to deliver remaining data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the 0.2s grace period with DRAIN_TIMEOUT_SECONDS (2s) after subprocess exit. This gives the macOS PTY kernel buffer sufficient time to deliver data to the master fd before the reader_stop flag forces the reader to exit. The 0.2s window was the fundamental cause of the remaining failures: on loaded macOS CI runners, the PTY buffer delivery can lag behind slave closure by more than 200ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The PTY fix adds a 2s grace period after each hook subprocess exits, giving the macOS PTY kernel buffer time to deliver data. With ~24 hook e2e tests running both beforeLease and afterLease hooks, this adds ~2-3 minutes total. Bump the timeout to accommodate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the xfail markers added by upstream for the macOS PTY race condition. The fix in this PR (process_group=0 + read-before-stop + drain with continue + extended grace period) addresses the root cause, making these markers unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread .github/workflows/python-tests.yaml Outdated

mangelajo reviewed Jun 24, 2026

View reviewed changes

Comment thread python/packages/jumpstarter/jumpstarter/exporter/hooks.py

Comment thread python/packages/jumpstarter/jumpstarter/exporter/hooks.py

Comment thread .github/workflows/python-tests.yaml Outdated

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

mangelajo mentioned this pull request Jun 24, 2026

test: mark PTY-dependent hooks tests as xfail on macOS (#821) #836

Merged

mangelajo requested changes Jun 24, 2026

View reviewed changes

raballew force-pushed the fix/pty-read-before-stop branch from ca2530d to 2b33f9a Compare June 24, 2026 16:43

mangelajo reviewed Jun 24, 2026

View reviewed changes

mangelajo added this to the 0.9.0 milestone Jun 24, 2026

mangelajo modified the milestones: 0.9.0, 0.10.0 Jun 25, 2026

raballew and others added 11 commits June 25, 2026 16:16

ci: retrigger jumpstarter-dev#2

a0a4890

ci: run macOS pytest 20x to validate PTY fix

4930a56

Temporarily run only macOS runners with 20 repetitions via matrix index to confirm the PTY race condition fix is stable. Will be reverted after validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

raballew force-pushed the fix/pty-read-before-stop branch from 6e9fa74 to e06ac52 Compare June 25, 2026 14:18

		echo 'python-versions=["3.11", "3.12", "3.13"]' >> "$GITHUB_OUTPUT"
		echo 'runners=["macos-15"]' >> "$GITHUB_OUTPUT"

		async def test_reader_grace_constant_is_reasonable(self) -> None:
		assert READER_GRACE_SECONDS == 2.0

Uh oh!

Conversation

raballew commented Jun 24, 2026

Summary

Changes

Test plan

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mangelajo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mangelajo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mangelajo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mangelajo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mangelajo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mangelajo left a comment

Choose a reason for hiding this comment

Uh oh!

mangelajo left a comment

Choose a reason for hiding this comment

Uh oh!

mangelajo commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

mangelajo commented Jun 25, 2026 •

edited

Loading