Skip to content

fix: Ensure uint64 fields are handled correctly in _create_dataset#791

Merged
tchaton merged 10 commits intoLightning-AI:mainfrom
dhedey:fix/optimize-version-api-request
Feb 18, 2026
Merged

fix: Ensure uint64 fields are handled correctly in _create_dataset#791
tchaton merged 10 commits intoLightning-AI:mainfrom
dhedey:fix/optimize-version-api-request

Conversation

@dhedey
Copy link
Collaborator

@dhedey dhedey commented Feb 18, 2026

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Main Change - utilities.py

The SDK bump in #783 switched to DatasetServiceCreateDatasetBody and added or "" fallbacks for updated fields. This was incorrect - these were changed to be uint64 in the backend, where "" isn't valid.

Sidenote: Protobuf generation maps unit64 to string in client models for JS precision reasons (whereas uint32 can be safely represented as an int). I guess this decision also impacted the python generation.

Anyway. This PR ensures we pass None for unit64 fields when no value is provided, which makes the value not set in the request and avoids 400 responses. This matches the pre-bump behaviour.

Other change - requirements utiltiies

The build was breaking due to issues pulling in the pkg_resources dependency. Thomas added some commits to fix this, by vendoring imports.py from the lightning-utilities package.

I added some auto-generated tests so codecov could pass.

Other change - Windows lock file clean-up fixes

Pins the version of lockfile so that the lock files are properly cleaned up on Windows, and so that the tests pass. More detial is explained in this PR, which may get closed in due course: #792

I'll see in the background if I can solve the issues on that PR and unpin lockfile. But I think pinning is OK.

Testing

I tested this locally by:

  • pip install "setuptools<82" && pip install -e . --no-build-isolation to install litData locally
  • Updating to the latest version of lightning sdk pip install --upgrade lightning_sdk
  • Running optimize with a target_dir pointing at Lightning storage from a Lightning AI studio

Test script:

from litdata import StreamingDataset, optimize
from tqdm.auto import trange


def should_keep(data):
    if data % 2 == 0:
        yield data


if __name__ == "__main__":
    output_dir = (
        "/teamspace/lightning_storage/cloud-experiments/litdata_optimize_bug/test_0_2"
    )
    optimize(
        fn=should_keep,
        inputs=list(range(1000)),
        output_dir=output_dir,
        chunk_bytes="64MB",
        num_workers=4,
    )
    print("done optimizing dataset")
    dataset = StreamingDataset(output_dir)
    print(f"length of datset is {len(dataset)}")

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…reate_dataset

The SDK bump in Lightning-AI#783 switched to DatasetServiceCreateDatasetBody and added `or ""` fallbacks for all fields. This is valid for string fields but version is a uint64 in protobuf, so "" causes a 400 Bad Request when saving to Lightning storage. Revert to passing None for version when no value is provided, matching the pre-bump behaviour.
@dhedey dhedey changed the title fix: pass None instead of empty string for uint64 version field in _create_dataset fix: Ensure uint64 fields are handled correctly in _create_dataset Feb 18, 2026
@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 82.90155% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 81%. Comparing base (c8a94a4) to head (67f92c7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff          @@
##           main   #791    +/-   ##
====================================
+ Coverage    80%    81%    +1%     
====================================
  Files        52     53     +1     
  Lines      7386   7565   +179     
====================================
+ Hits       5920   6112   +192     
+ Misses     1466   1453    -13     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dhedey dhedey requested a review from tchaton February 18, 2026 16:20
@tchaton tchaton merged commit 6332939 into Lightning-AI:main Feb 18, 2026
49 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants