-
-
Notifications
You must be signed in to change notification settings - Fork 64
1.0.70 suppresses later Workflow\Exception replay logs in signal-retry workflows #373
Description
Summary
After upgrading from 1.0.69 to 1.0.70, later handled activity failures in the same workflow lifecycle stop being recorded in workflow_logs as Workflow\\Exception rows.
I reproduced this in the real durable-workflow/sample-app app with a real queue worker and Redis queue, not with feature tests.
This looks like the same underlying problem described in discussion #372.
What I tested
I used the sample app with a real worker:
php -d opcache.enable_cli=0 artisan queue:work --sleep=1 --tries=1 --timeout=0 -v
php -d opcache.enable_cli=0 artisan app:exception-logging-repro --timeout=90The repro workflow shape is:
class ExceptionLoggingRetryActivity extends Activity
{
public $tries = 1;
public function execute(string $step): string
{
return match ($step) {
'first' => throw new RuntimeException('first failure from activity'),
'second' => throw new InvalidArgumentException('second failure from activity'),
default => "success on {$step}",
};
}
}
class ExceptionLoggingRetryWorkflow extends Workflow
{
protected int $retryRequests = 0;
#[SignalMethod]
public function requestRetry(): void
{
$this->retryRequests++;
}
public function execute(): Generator
{
$caught = [];
$stage = 0;
while (true) {
try {
$result = yield activity(
ExceptionLoggingRetryActivity::class,
match ($stage) {
0 => 'first',
1 => 'second',
default => 'success',
}
);
return [
'caught' => $caught,
'result' => $result,
];
} catch (Throwable $throwable) {
$caught[] = get_class($throwable).': '.$throwable->getMessage();
$requiredRetries = $stage + 1;
yield await(fn () => $this->retryRequests >= $requiredRetries);
$stage++;
}
}
}
}The command just starts the workflow, waits 3 seconds, sends requestRetry(), waits another 3 seconds, sends requestRetry() again, then waits for completion.
Expected behavior
The second handled failure happens at a new workflow index, so it should produce a second Workflow\\Exception row in workflow_logs, just like 1.0.69 does.
Actual behavior
On 1.0.69
The workflow completes successfully.
workflow_logs for the run:
[0] Workflow\\Exception
[1] Workflow\\Signal
[2] Workflow\\Exception
[3] Workflow\\Signal
[4] App\\Workflows\\Repro\\ExceptionLoggingRetryActivity
workflow_exceptions for the run:
RuntimeException: first failure from activity
InvalidArgumentException: second failure from activity
On 1.0.70
The workflow gets stuck in WorkflowWaitingStatus.
workflow_logs for the run:
[0] Workflow\\Exception
[1] Workflow\\Signal
workflow_exceptions for the run:
RuntimeException: first failure from activity
InvalidArgumentException: second failure from activity
InvalidArgumentException: second failure from activity
The worker output shows the later Workflow\\Exception jobs being dispatched, but they do not create new replay-log rows. Because index 2 never gets a Workflow\\Exception row, the workflow keeps replaying the same second failing activity on later retry signals.
Why this seems to happen
src/Exception.php in 1.0.70 now does:
if ($this->storedWorkflow->hasLogByIndex($this->index)) {
$workflow->resume();
} elseif (! $this->storedWorkflow->logs()->where('class', self::class)->exists()) {
$workflow->next($this->index, $this->now, self::class, $this->exception);
}That global exists() check looks fine for suppressing stale sibling exception logs in a parallel fan-out, but it also suppresses later legitimate exceptions at new indexes in the same workflow lifecycle.
Why I think this is a real bug
workflow_exceptions keeps growing, so the later activity failures are definitely happening.
What stops working is the replay log in workflow_logs, which means the workflow cannot deterministically advance past the later failed stage.
So this is not just a visibility/logging issue. In signal-driven manual retry flows, it changes behavior and can leave the workflow stuck replaying the same failing stage.
Related context
- Discussion: Exceptions only being saved once on logs table after version 1.0.70 #372
- PR that appears related: Child exceptions #348