Skip to content

Race Condition Between Buffer Transport and Transaction Commit #578

@jtnelson

Description

@jtnelson

Summary

When committing a large transaction, there's a race condition between the transaction commit process and the transport job that can lead to inconsistent state after crash recovery. Specifically, buffer pages may be transported to disk while the inventory's in-memory state is not yet synced, creating inaccurate verification results after recovery.

Problem Description

During a large transaction commit:

  1. A backup of the transaction is taken (standard procedure for all transactions)
  2. The transaction commits to the Buffer using group sync which does not immediately flush page contents or Inventory to disk
  3. The Buffer content is placed in a MappedByteBuffer even when not immediately synced
  4. The Inventory content remains only in memory until explicitly synced

The issue occurs in this sequence:

  • A transaction causes Buffer expansion with new Pages
  • The transport job runs concurrently during transaction commit
  • Some Buffer pages are transported to the Database
  • A crash occurs before commit completion
  • Upon restart, the server attempts to replay the transaction
  • The system detects partial commit and tries to finish it
  • However, the dirty in-memory Inventory content was never synced to disk

This creates a scenario where verifications return incorrect results because records that should be in the Inventory don't appear there. This also explains why we're seeing an influx of unoffset writes in iXL production. Because verifyFast (which leverages the inventory) is inaccurate, it is permitting unoffset writes to be inserted after the transaction commits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions