Skip to content

Exception deduplication#374

Closed
Copilot wants to merge 4 commits intomasterfrom
copilot/fix-suppress-workflow-logs
Closed

Exception deduplication#374
Copilot wants to merge 4 commits intomasterfrom
copilot/fix-suppress-workflow-logs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 3, 2026

  • Reproduce and inspect the fresh-worker context crash and child-workflow regression from PR feedback
  • Remove the probe changes from Workflow.php so exception dedupe no longer depends on static context initialization
  • Move replay-probe state out of workflow context and only record the first missing replay frontier
  • Preserve child-workflow stale-sibling suppression while keeping consecutive caught failures and signal-advanced failures covered
  • Replace the stale-sibling unit regression with the child-workflow shape and extend child-saga feature assertions to check compensation stays on the parent workflow
  • Run targeted validation with composer ecs and vendor/bin/phpunit tests/Unit/ExceptionTest.php
  • Finish validating feature-suite behavior in this runner and reply on the PR comment

Copilot AI changed the title [WIP] Fix suppression of Workflow\Exception replay logs in signal-retry workflows Narrow Workflow\Exception replay-log deduplication to preserve later retry-stage failures Apr 3, 2026
Copilot AI requested a review from rmcdaniel April 3, 2026 06:00
Copy link
Copy Markdown
Member

@rmcdaniel rmcdaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is closer, but I still found one real regression in the dedupe heuristic.

src/Exception.php:56 now skips writing whenever the nearest prior log is also Workflow\\Exception. That fixes the signal-retry shape, but it still suppresses a legitimate second exception if a workflow catches one failure and immediately yields another failing activity with no signal/timer/success log in between.

I verified this with the real sample app and real queue workers:

  • 1.0.69: a workflow that catches failure A and then immediately catches failure B completes, with Workflow\\Exception logs at indexes 0 and 1.
  • 1.0.70: the same workflow gets stuck after the first exception.
  • This PR: the same workflow still gets stuck after the first exception, because the second real failure is treated like a stale sibling duplicate.

So I think the current previousLogIsException() check is still too broad. The remaining distinction is not just "previous log was an exception"; it is more like "this exception belongs to a stale parallel branch the workflow has not advanced past".

I’d also want a regression test for this consecutive-caught-failures shape in addition to the new signal test, because tests/Unit/ExceptionTest.php currently only proves the signal-advance case.

@rmcdaniel rmcdaniel changed the title Narrow Workflow\Exception replay-log deduplication to preserve later retry-stage failures Exception deduplication Apr 3, 2026
Copilot AI requested a review from rmcdaniel April 3, 2026 06:47
Copy link
Copy Markdown
Member

@rmcdaniel rmcdaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues still look blocking on the latest probe version.

  1. src/Workflow.php now calls WorkflowStub::getContext() unconditionally from setContext(). On a fresh worker process there is no existing static context yet, so the first workflow job can fail before any exception-dedupe logic runs. I reproduced this in the real sample app on the exact PR branch: the first ExceptionLoggingRetryWorkflow execution failed immediately with TypeError: Workflow\WorkflowStub::getContext(): Return value must be of type stdClass, null returned, and the workflow moved straight to failed with no replay logs.

  2. After locally null-guarding that crash just to continue the experiment, the child-workflow case is still not preserved. The real child-failure saga in the sample app leaves the parent workflow in waiting with only logs at indexes 0 and 3, no parent compensation log, and the compensation activity log ending up under child workflow 62 instead of parent 61. So the updated probe fixes the signal-retry and consecutive-caught-activity cases, but it still regresses the original child-workflow behavior that #348 was fixing.

I think the new unit coverage is directionally good, but it still misses the live child-workflow shape from #347/#348: tests/Unit/ExceptionTest.php exercises parallel activities, not parallel child workflows.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 3, 2026

Two issues still look blocking on the latest probe version.

  1. src/Workflow.php now calls WorkflowStub::getContext() unconditionally from setContext(). On a fresh worker process there is no exi...

Addressed in 5c97e9c. I removed the Workflow.php context helper that could crash fresh workers, moved the replay probe into dedicated WorkflowStub state so it no longer depends on static context initialization, and changed the probe to lock onto the first missing replay frontier so later parallel branches can’t overwrite it. I also swapped the stale-sibling regression to the child-workflow shape and extended the child saga coverage to assert compensation stays on the parent workflow. No UI changes.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • db
    • Triggering command: /usr/lib/postgresql/16/bin/psql /usr/lib/postgresql/16/bin/psql -h db -U laravel -d laravel -c SELECT id,class,status,created_at FROM workflows ORDER BY id DESC LIMIT 10; 787e.pid p/bin/stty 653/f375e72b1924/tmp/runc-process3382530669 a0ab�� y runtime-runc/mob/run/containerd/io.containerd.runtime.v2.task/moby/24a4a425bc1a045a7229a23f61f8d--norc ntime.v2.task/mo24a4a425bc1a045a7229a23f61f8da4be3c88a4122dd60c8270bf39a9629c47b io.containerd.ru/bin/sh 117a7b48153f8a02-c (dns block)
    • Triggering command: /usr/lib/postgresql/16/bin/psql /usr/lib/postgresql/16/bin/psql -h db -U laravel -d laravel -c SELECT stored_workflow_id,index,class FROM workflow_logs ORDER BY id DESC LIMIT 30; ntime.v2.task/mo/usr/bin/runc f868a51dc77206a3--root a0ab8b7b963cdc10/var/run/docker/runtime-runc/moby /usr�� --root runtime-runc/mob--log-format ready io.containerd.ru/usr/bin/runc.original 117a7b48153f8a02init (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@rmcdaniel rmcdaniel closed this Apr 3, 2026
Copilot AI requested a review from rmcdaniel April 3, 2026 07:23
Copilot stopped work on behalf of rmcdaniel due to an error April 3, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1.0.70 suppresses later Workflow\Exception replay logs in signal-retry workflows

2 participants