fix: Ensure uint64 fields are handled correctly in `_create_dataset` by dhedey · Pull Request #791 · Lightning-AI/litData

dhedey · 2026-02-18T01:42:14Z

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Main Change - utilities.py

The SDK bump in #783 switched to DatasetServiceCreateDatasetBody and added or "" fallbacks for updated fields. This was incorrect - these were changed to be uint64 in the backend, where "" isn't valid.

Sidenote: Protobuf generation maps unit64 to string in client models for JS precision reasons (whereas uint32 can be safely represented as an int). I guess this decision also impacted the python generation.

Anyway. This PR ensures we pass None for unit64 fields when no value is provided, which makes the value not set in the request and avoids 400 responses. This matches the pre-bump behaviour.

Other change - requirements utiltiies

The build was breaking due to issues pulling in the pkg_resources dependency. Thomas added some commits to fix this, by vendoring imports.py from the lightning-utilities package.

I added some auto-generated tests so codecov could pass.

Other change - Windows lock file clean-up fixes

Pins the version of lockfile so that the lock files are properly cleaned up on Windows, and so that the tests pass. More detial is explained in this PR, which may get closed in due course: #792

I'll see in the background if I can solve the issues on that PR and unpin lockfile. But I think pinning is OK.

Testing

I tested this locally by:

pip install "setuptools<82" && pip install -e . --no-build-isolation to install litData locally
Updating to the latest version of lightning sdk pip install --upgrade lightning_sdk
Running optimize with a target_dir pointing at Lightning storage from a Lightning AI studio

Test script:

from litdata import StreamingDataset, optimize
from tqdm.auto import trange


def should_keep(data):
    if data % 2 == 0:
        yield data


if __name__ == "__main__":
    output_dir = (
        "/teamspace/lightning_storage/cloud-experiments/litdata_optimize_bug/test_0_2"
    )
    optimize(
        fn=should_keep,
        inputs=list(range(1000)),
        output_dir=output_dir,
        chunk_bytes="64MB",
        num_workers=4,
    )
    print("done optimizing dataset")
    dataset = StreamingDataset(output_dir)
    print(f"length of datset is {len(dataset)}")

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…reate_dataset The SDK bump in Lightning-AI#783 switched to DatasetServiceCreateDatasetBody and added `or ""` fallbacks for all fields. This is valid for string fields but version is a uint64 in protobuf, so "" causes a 400 Bad Request when saving to Lightning storage. Revert to passing None for version when no value is provided, matching the pre-bump behaviour.

codecov · 2026-02-18T09:07:52Z

Codecov Report

❌ Patch coverage is 82.90155% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 81%. Comparing base (c8a94a4) to head (67f92c7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@         Coverage Diff          @@
##           main   #791    +/-   ##
====================================
+ Coverage    80%    81%    +1%     
====================================
  Files        52     53     +1     
  Lines      7386   7565   +179     
====================================
+ Hits       5920   6112   +192     
+ Misses     1466   1453    -13

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

…ments.py

for more information, see https://pre-commit.ci

dhedey requested review from justusschock, lantiga and tchaton as code owners February 18, 2026 01:42

dhedey changed the title ~~fix: pass None instead of empty string for uint64 version field in _create_dataset~~ fix: Ensure uint64 fields are handled correctly in _create_dataset Feb 18, 2026

tchaton added 4 commits February 18, 2026 08:48

update

968ea25

update

1ba2efb

update

1c82b89

update

623f2e9

dhedey and others added 2 commits February 18, 2026 10:43

fix: Further fixes for uint64 defaults

24bc1c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

a1d2c17

for more information, see https://pre-commit.ci

tchaton approved these changes Feb 18, 2026

View reviewed changes

dhedey mentioned this pull request Feb 18, 2026

fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile #792

Merged

4 tasks

dhedey and others added 3 commits February 18, 2026 12:48

fix: Pins lockfile so that it cleans up properly on Windows

4e0d90b

fix(ci): Add auto-generated tests for vendored imports.py and require…

de40eec

…ments.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

67f92c7

for more information, see https://pre-commit.ci

dhedey requested a review from tchaton February 18, 2026 16:20

tchaton merged commit 6332939 into Lightning-AI:main Feb 18, 2026
49 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ensure uint64 fields are handled correctly in `_create_dataset`#791

fix: Ensure uint64 fields are handled correctly in `_create_dataset`#791
tchaton merged 10 commits intoLightning-AI:mainfrom
dhedey:fix/optimize-version-api-request

dhedey commented Feb 18, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhedey commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Main Change - utilities.py

Other change - requirements utiltiies

Other change - Windows lock file clean-up fixes

Testing

PR review

Did you have fun?

Uh oh!

codecov bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhedey commented Feb 18, 2026 •

edited

Loading

codecov bot commented Feb 18, 2026 •

edited

Loading