fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile#792
Merged
fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile#792
Conversation
…reate_dataset The SDK bump in #783 switched to DatasetServiceCreateDatasetBody and added `or ""` fallbacks for all fields. This is valid for string fields but version is a uint64 in protobuf, so "" causes a 400 Bad Request when saving to Lightning storage. Revert to passing None for version when no value is provided, matching the pre-bump behaviour.
for more information, see https://pre-commit.ci
…nto fix/windows-lock-file-cleanup
4 tasks
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #792 +/- ##
===================================
- Coverage 81% 81% -0%
===================================
Files 53 53
Lines 7565 7588 +23
===================================
+ Hits 6112 6129 +17
- Misses 1453 1459 +6 🚀 New features to boost your workflow:
|
4a7aa37 to
055729e
Compare
for more information, see https://pre-commit.ci
When _apply_delete returns early due to remaining_locks > 0 (a race between decrement and delete on the last chunk), download lock files like chunk-0-3.zstd.bin.lock were left behind. Extract lock cleanup into _cleanup_download_locks (excluding .cnt.lock files still needed for refcounting) and call it in both code paths. Fixes leftover chunk-*.zstd.bin.lock files on Windows CI.
for more information, see https://pre-commit.ci
PyTreeLoader._open_handle is only closed when a subsequent chunk is loaded, which never happens for the final chunk. On Windows this causes os.remove to fail with PermissionError in the prepare thread. Move the item_loader.close() call before the delete request so the file handle is released and the data file can actually be removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before submitting
Root cause of issues
Filelock is unpinned in requirements.txt, CI installed the latest version (≥3.24.0), pulling in two key changes:
Change 1: Windows file locks no longer auto-delete
filelock v3.24.0 (released Feb 14, 2026) removed the auto-deletion of lock files on Windows in PR #484. The change removes these lines from WindowsFileLock._release():
Since filelock is unpinned in requirements.txt, CI installed the latest version (≥3.24.0), and lock files started persisting on Windows after release.
Why the existing litdata code didn't handle it:
Why Linux was unaffected: Linux allows unlink() on open files, so litdata's cleanup inside the with block always worked.
Change 2: Linux file locks now do auto-delete
As of v3.21.0 (Feb 12, 2026) and #408, Unix file locks do auto-delete.
The net result of these two changes is:
NoLockCleanuptest which simulates having no lock clean-up and checking fallback global clean-up started failing on Windows. Still working out why it didn't previously fail on Unix.What does this PR do?
I've put in partial fixes for some issues:
Fix manual lock file clean-up on Windows
It moves lock file deletion outside FileLock context for Windows compatibility.
On Windows, files held open by FileLock cannot be deleted. Both
LocalDownloader.download_fileandPrepareChunksThread._decrement_local_lockwere attemptingos.remove()on .lock files from inside thewith FileLock()block, which silently failed on Windows, leaving .zstd.bin.lock and .cnt.lock files behind.We move deletions to after the
withblock exits (releasing the lock first). And we guard with lock_acquired/remove_lock flags to avoid deleting lock files we never successfully acquired (e.g. on timeout); or where the work is not yet done and the lock is still preventing duplicate entries into a critical section.Fix global clean-up in reader.py
When the reader starts clean up, it tries to remove the last chunk before it has closed it.
This causes issues on Windows, where you can't delete an open file. The deletion errors, which means the lock file is never cleared up; causing the test break.
Also adds some debug logs to help in future.
Unfix filelock versions
Now that we've fixed the bugs which prevent proper tidy up on Windows, we handle all lock file clean-up properly on both Windows and Linux, and can unpin out filelock version.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃