Include transient and speculative WFT events in GetWorkflowExecutionHistoryResponse by spkane31 · Pull Request #9138 · temporalio/temporal

spkane31 · 2026-01-26T18:59:59Z

What changed?

Include transient and speculative WFT events in GetWorkflowExecutionHistoryReponse response, unless UI or CLI made request.

Adds transient_or_speculative_events back to GetMutableStateResponse
Reserve transient_workflow_task in HisotryCOntinuation token
Add validation helpers
Add query-compare-query for transient events at request start and end

Re-implements #7732

Why?

Fix "premature end of stream" errors when workers request history after cache eviction w/ transient/speculative workflow tasks present. This adds transient & speculative WFT events in GetWorkflowExecutionHistory (already in PollWorkflowTask). Worker cache eviction w/ speculative workflow tasks causes the expected and actual event counts to be different. #7732 passed transient events through continuation tokens, which could become stale during pagination. This PR implements mutable state querying at both start and end of pagination and compares transient event IDs to detect if WFT state changed during pagination and return a retryable error.

How did you test it?

Potential risks

Same risks from #7732

…istory response

…-premature-end-stream

spkane31 · 2026-01-29T16:51:31Z

proto/internal/temporal/server/api/historyservice/v1/request_response.proto

    string inherited_build_id = 23;
    repeated temporal.server.api.persistence.v1.VersionedTransition transition_history = 24;
    temporal.api.workflow.v1.WorkflowExecutionVersioningInfo versioning_info = 25;
+    // Transient or speculative workflow task events which are not yet persisted in the history.


For reviewer: this was resurrected from PR #7732

spkane31 · 2026-01-29T16:51:46Z

proto/internal/temporal/server/api/token/v1/message.proto

    bool is_workflow_running = 5;
    bytes persistence_token = 6;
-    temporal.server.api.history.v1.TransientWorkflowTaskInfo transient_workflow_task = 7;
+    reserved 7; // Was: transient_workflow_task - no longer passed through continuation token


For reviewer: resurrected from PR #7732

spkane31 · 2026-01-29T17:42:49Z

tests/advanced_visibility_test.go

  2 WorkflowTaskScheduled
  3 WorkflowTaskStarted
-  4 WorkflowTaskFailed {"Cause":23,"Failure":{"Message":"BadSearchAttributes: search attribute INVALIDKEY is not defined"}}`, historyEvents)
+  4 WorkflowTaskFailed {"Cause":23,"Failure":{"Message":"BadSearchAttributes: search attribute INVALIDKEY is not defined"}}


All of the test files except transient_workflow_task_history_test.go have this addition of a WorkflowTaskScheduled because of the shipped spec/trans events. I let Claude iterate on these tests.

spkane31 · 2026-02-02T19:10:07Z

tests/workflow_task_transient_history_test.go

@@ -0,0 +1,410 @@
+package tests


For reviewer: I used CC to help generate these tests, especially with syntax for pollers, and validate the tests were testing the actual premature end of stream issue

stephanos · 2026-02-04T01:24:47Z

tests/update_workflow_test.go

-  4 WorkflowTaskCompleted
-  5 WorkflowTaskScheduled // Speculative WT2 which was created while completing WT1.
-  6 WorkflowTaskStarted`, task.History)
-				// Message handler rejects update.


AI agents have the unfortunate habit to remove comments; I think we want to keep the comments in this file.

stephanos · 2026-02-04T01:28:57Z

tests/workflow_task_transient_history_test.go

@@ -0,0 +1,410 @@
+package tests


minor thing; but why don't we rename this to workflow_task_transient_history_test.go so it appears next to the other workflow task test file?

stephanos · 2026-02-04T16:50:41Z

service/history/api/get_history_util.go

+					tag.WorkflowRunID(execution.GetRunId()),
+					tag.Error(err))
+				// Don't append events, but don't fail request
+			} else {


Interesting! So before we actually appended the events even if validation failed. (just found that curious)

stephanos · 2026-02-04T16:52:17Z

service/history/api/get_history_util.go

+// clientSupportsTranOrSpecEvents detects if client supports transient events
+// Default to include transient events for clients, only CLI and UI are
+// explicitly excluded for backward compatability


I suppose this was on the original PR by Alex already; I wonder what this would be breaking exactly. Maybe just the user experience as there might be "phantom" events and that's odd.

Anyway safe(r) to keep it this way for now.

stephanos · 2026-02-04T17:15:07Z

service/history/api/get_history_util.go

+				logger := shardContext.GetLogger()
+				metricsHandler := interceptor.GetMetricsHandlerFromContext(ctx, logger).WithTags(metrics.OperationTag(metrics.HistoryGetRawHistoryScope))
+				metrics.ServiceErrIncompleteHistoryCounter.With(metricsHandler).Record(1)
+				logger.Warn("Transient event validation failed, skipping events",


optional: should we put a softassert.Fail here (and for the other validation)? Sounds like this should never happen. Then again, it's logged as warning. Hm. Ignore this, just leaving this as a thought I had.

stephanos · 2026-02-04T17:19:04Z

service/history/api/getworkflowexecutionhistory/api.go

+	useRawHistory bool,
+	history *historypb.History,
+	historyBlob *[]*commonpb.DataBlob,
+) error {


It looks like this never returns an error? Might as well remove the error return.

stephanos · 2026-02-04T17:20:18Z

service/history/api/getworkflowexecutionhistory/api.go

+	// Manually append transient events to the response
+	if useRawHistory {
+		transientEventsBlob, err := shardContext.GetPayloadSerializer().SerializeEvents(transientWorkflowTask.GetHistorySuffix())
+		if err == nil {


optional: this looks like sth that should never happen; could add a softassert here (requires logger param).

stephanos · 2026-02-04T17:28:37Z

proto/internal/temporal/server/api/historyservice/v1/request_response.proto

    temporal.api.workflow.v1.WorkflowExecutionVersioningInfo versioning_info = 25;
+    // Transient or speculative workflow task events which are not yet persisted in the history.
+    // These events should be appended to the history when it is returned to the worker.
+    temporal.server.api.history.v1.TransientWorkflowTaskInfo transient_or_speculative_events = 26;


What do you think about the name transient_or_speculative_events? I was a bit surprised by this in the code

transientWorkflowTask := msResp.GetTransientOrSpeculativeEvents()

One thing says "events" another says "task".

The name is revived from a previous PR, I can make a change to say transient_or_speculative_tasks.

stephanos · 2026-02-05T01:43:02Z

service/history/api/get_history_util.go

+	clientName, _ := headers.GetClientNameAndVersion(ctx)
+
+	switch clientName {
+	case headers.ClientNameCLI, headers.ClientNameUI:


Is there a unit or functional test for this? I can see this being changed/removed by accident.

stephanos · 2026-02-05T01:46:26Z

service/history/api/getworkflowexecutionhistory/api.go

+	history *historypb.History,
+	historyBlob *[]*commonpb.DataBlob,
+) error {
+	msResp, err := api.GetOrPollWorkflowMutableState(


I think the 2nd GetOrPollWorkflowMutableState call is problematic. It adds latency since it has to go through the workflow lock again - but more importantly, I think there's a possible race here where the first call found a running workflow, but then it closes, and then we add these regardless.

Include transient and speculative WFT events in GetWorkflowExecutionH…

4a26bbb

…istory response

spkane31 requested review from a team as code owners January 26, 2026 19:00

spkane31 added 9 commits January 26, 2026 12:01

Merge branch 'main' of github.com:temporalio/temporal into spk/update…

d59a941

…-premature-end-stream

fixing linter and unit test

a6c5dea

adding tests and updating functional tests

515d71f

fixing a versioning, update, and wft test suites

d011e33

whackamole with some more tests

29d7fff

Merge branch 'main' of github.com:temporalio/temporal into spk/update…

2c838fc

…-premature-end-stream

two more tests

e862464

cleaning up some tests and helper functions

0e7a1be

reverting a change and adding clarifying comment

a48c68f

spkane31 requested a review from stephanos January 27, 2026 23:36

spkane31 added 5 commits January 28, 2026 16:11

simplifying queries to only a single query on the last page

2158122

Merge branch 'main' of github.com:temporalio/temporal into spk/update…

715fb6b

…-premature-end-stream

self review

fc29fcc

linter

620e3b8

simplifying boolean for including trans/spec events

b61c7cd

spkane31 commented Jan 30, 2026

View reviewed changes

fixing comments on test

081bfc0

spkane31 commented Feb 2, 2026

View reviewed changes

spkane31 added 2 commits February 4, 2026 10:38

adding logs

568a6ac

revert logs

8f3a45f

stephanos reviewed Feb 5, 2026

View reviewed changes

spkane31 added 3 commits February 5, 2026 13:53

renaming test file, renaming mut state resp, adding tests

d7966a7

merge conflict

b3b6d24

use caching in a better way

282aaf1

Conversation

spkane31 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spkane31 commented Jan 26, 2026 •

edited

Loading