-
Notifications
You must be signed in to change notification settings - Fork 106
Fix read-only open during checkpoint from reading inconsistent state #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ericyuanhui
wants to merge
1
commit into
LadybugDB:main
Choose a base branch
from
ericyuanhui:main_test
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe the thought process here? Why not lock the main database file instead of creating new intent/apply files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the core reason why we cannot rely solely on the primary database file lock is straightforward: this fix targets the consistency window during checkpoints, rather than simply controlling which processes can open the primary data file.
Let’s break down the problem point by point.
A primary file lock can only indicate whether a file is occupied. However, this bug requires us to distinguish three distinct states:
Normal readable state
Dangerous intermediate state while a checkpoint is in progress
Stable state after the checkpoint completes
This change introduces a two-phase locking mechanism: intent lock and apply lock. It explicitly marks the "checkpoint-in-progress" state instead of conflating it with generic file occupancy status. The relevant code is defined in storage_utils.h, and the locking logic is implemented in checkpointer.cpp.
If read-only open operations contend for the primary file lock to avoid race conditions, we end up with two undesirable outcomes:
All read-only openings fail whenever a writer process exists, resulting in overly conservative behavior.
Read-only access is permitted to proceed, yet we lack a way to precisely block only during the risky checkpoint window.
The goal of this patch is to block only unsafe time windows while avoiding unnecessary interruptions in other scenarios. This is why we introduced dedicated checkpoint coordination locks and status checks, rather than lumping all logic into the primary file lock.
As highlighted in your summary, read-only recovery may encounter intermediate artifacts such as shadow files and checkpoint WAL entries.
The inconsistency window spans the main data file, shadow files and checkpoint WAL files. Locking only the primary file cannot verify whether the entire set of files is in an atomically recoverable state.
The corresponding safeguards are implemented in WALReplayer:
First perform checkpoint lock detection for read-only mode (implemented in wal_replayer.cpp).
Immediately trigger fail-fast errors for intermediate states involving shadow files or frozen WALs (also in wal_replayer.cpp).
OS file locks are bound to the process lifecycle. Locks are automatically released when a process crashes, yet partial intermediate files may remain on disk.
To address this, the patch adds checks based on the existence of state files plus validation for metadata page ranges. Any leftover partial state after a crash will result in a controlled failure instead of blind reads that trigger segmentation faults.
Defensive validation for page ranges is added in the Checkpointer read path, converting invalid memory access crashes into managed exceptions. The code resides in checkpointer.cpp.
The final paragraph of your summary already captures the essence:
The fix does not aim to eliminate failures entirely. Instead, it ensures the system fails explicitly whenever it hits the unsafe window, eliminating undefined behavior.
This objective requires four components:
Explicit checkpoint coordination with intent and apply locks
Pre-emptive rejection logic for recovery (fail-fast on read-only opens)
Defensive validation of checkpoint metadata
Resource cleanup on lock acquisition failures to prevent file descriptor and handle leaks (implemented in local_file_system.cpp)
Conclusion:
The primary database file lock is a necessary but insufficient condition. It only enforces mutual exclusion, yet cannot implement the phase protocol, multi-file consistency boundaries, or detection of leftover intermediate states after crashes that this change requires.