Skip to content

fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile#792

Merged
tchaton merged 18 commits intomainfrom
fix/windows-lock-file-cleanup
Feb 19, 2026
Merged

fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile#792
tchaton merged 18 commits intomainfrom
fix/windows-lock-file-cleanup

Conversation

@dhedey
Copy link
Collaborator

@dhedey dhedey commented Feb 18, 2026

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

Root cause of issues

Filelock is unpinned in requirements.txt, CI installed the latest version (≥3.24.0), pulling in two key changes:

Change 1: Windows file locks no longer auto-delete

filelock v3.24.0 (released Feb 14, 2026) removed the auto-deletion of lock files on Windows in PR #484. The change removes these lines from WindowsFileLock._release():

  # REMOVED in v3.24.0:
  with suppress(OSError):
      Path(self.lock_file).unlink()

Since filelock is unpinned in requirements.txt, CI installed the latest version (≥3.24.0), and lock files started persisting on Windows after release.

Why the existing litdata code didn't handle it:

  • litdata's os.remove() calls were inside the with FileLock() block, where the file handle is still open. On Windows, you can't delete a file with an open handle — so litdata's cleanup silently failed, and it was relying (unknowingly) on filelock's now-removed auto-delete.

Why Linux was unaffected: Linux allows unlink() on open files, so litdata's cleanup inside the with block always worked.

Change 2: Linux file locks now do auto-delete

As of v3.21.0 (Feb 12, 2026) and #408, Unix file locks do auto-delete.


The net result of these two changes is:

  • Code which cleaned up locks only previously worked on Unix, and doesn't affect Windows --> test breaks on Windows
  • The NoLockCleanup test which simulates having no lock clean-up and checking fallback global clean-up started failing on Windows. Still working out why it didn't previously fail on Unix.

What does this PR do?

I've put in partial fixes for some issues:

Fix manual lock file clean-up on Windows

It moves lock file deletion outside FileLock context for Windows compatibility.

On Windows, files held open by FileLock cannot be deleted. Both LocalDownloader.download_file and PrepareChunksThread._decrement_local_lock were attempting os.remove() on .lock files from inside the with FileLock() block, which silently failed on Windows, leaving .zstd.bin.lock and .cnt.lock files behind.

We move deletions to after the with block exits (releasing the lock first). And we guard with lock_acquired/remove_lock flags to avoid deleting lock files we never successfully acquired (e.g. on timeout); or where the work is not yet done and the lock is still preventing duplicate entries into a critical section.

Fix global clean-up in reader.py

When the reader starts clean up, it tries to remove the last chunk before it has closed it.

This causes issues on Windows, where you can't delete an open file. The deletion errors, which means the lock file is never cleared up; causing the test break.

Also adds some debug logs to help in future.

Unfix filelock versions

Now that we've fixed the bugs which prevent proper tidy up on Windows, we handle all lock file clean-up properly on both Windows and Linux, and can unpin out filelock version.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

dhedey and others added 7 commits February 18, 2026 01:32
…reate_dataset

The SDK bump in #783 switched to DatasetServiceCreateDatasetBody and added `or ""` fallbacks for all fields. This is valid for string fields but version is a uint64 in protobuf, so "" causes a 400 Bad Request when saving to Lightning storage. Revert to passing None for version when no value is provided, matching the pre-bump behaviour.
@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 75.86207% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 81%. Comparing base (6332939) to head (5f9304a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #792   +/-   ##
===================================
- Coverage    81%    81%   -0%     
===================================
  Files        53     53           
  Lines      7565   7588   +23     
===================================
+ Hits       6112   6129   +17     
- Misses     1453   1459    +6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dhedey dhedey force-pushed the fix/windows-lock-file-cleanup branch from 4a7aa37 to 055729e Compare February 18, 2026 12:44
dhedey and others added 5 commits February 18, 2026 13:03
When _apply_delete returns early due to remaining_locks > 0 (a race between decrement and delete on the last chunk), download lock files like chunk-0-3.zstd.bin.lock were left behind. Extract lock cleanup into _cleanup_download_locks (excluding .cnt.lock files still needed for refcounting) and call it in both code paths.

Fixes leftover chunk-*.zstd.bin.lock files on Windows CI.
PyTreeLoader._open_handle is only closed when a subsequent chunk is loaded, which never happens for the final chunk.  On Windows this causes os.remove to fail with PermissionError in the prepare thread.

Move the item_loader.close() call before the delete request so the file handle is released and the data file can actually be removed.
@dhedey dhedey changed the title fix: Fixes tests failures on windows fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile Feb 18, 2026
Copy link
Collaborator

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !

@tchaton tchaton merged commit f4765cf into main Feb 19, 2026
61 of 65 checks passed
@tchaton tchaton deleted the fix/windows-lock-file-cleanup branch February 19, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants