Skip to content

Race condition in InMemoryTaskStore causes flaky test failures #1648

@halter73

Description

@halter73

Summary

The in-memory test data store InMemoryTaskStore used by McpServerTaskTests is not thread-safe, resulting in occasional test failures due to field mutation races between test background tasks and server polling.

Symptoms:

  • Flaky failures in CallToolAsync_AsyncTool_FailedTask_ThrowsMcpException (and similar tests)
  • Stack trace shows Nullable object must have a value thrown from GetTask when Status == Failed but Error is null.
  • The client receives a McpProtocolException (internal error) instead of the expected McpException for failed tasks.

How to Reproduce:

  • Run the test suite repeatedly (e.g., on CI, especially macOS)
  • Failing run shows error:
    Assert.Throws() Failure: Exception type was not an exact match
    Expected: typeof(ModelContextProtocol.McpException)
    Actual:   typeof(ModelContextProtocol.McpProtocolException)
    System.InvalidOperationException: Nullable object must have a value.
      at InMemoryTaskStore.GetTask(...)
    

Root Cause

  • The test's FailTask sets entry.Status = Failed before entry.Error, so a concurrent reader (GetTask, called by the server handler) can see a failed status with Error == null.
  • GetTask then dereferences entry.Error!.Value, which throws.
  • Similar races exist in other methods mutating/testing TaskEntry fields.

Solution

  • Make InMemoryTaskStore atomic for each operation: lock around mutations and reads, or replace ConcurrentDictionary + mutable TaskEntry fields with an immutable snapshot approach.
  • For a minimal fix: add a private object _gate lock to synchronize all mutations and reads, and always lock the critical section.
  • Also reorder assignments in FailTask etc. so all fields are valid before status is set.

Code suggestion

private readonly object _gate = new();
// ...
public GetTaskResult GetTask(string taskId)
{
    lock (_gate)
    {
        // ... see full solution in assistant's comment ...
    }
}
// ... lock everywhere else ...

Impact

  • This causes rare but real test failures on CI, especially with concurrent test scheduling.

References

  • Failing job: Build and Test matrix in .github/workflows/ci-build-test.yml
  • Example failure: CallToolAsync_AsyncTool_FailedTask_ThrowsMcpException in McpServerTaskTests.cs

Metadata

Metadata

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions