Removing the requirement for Datasets to implement ids()#704
Removing the requirement for Datasets to implement ids()#704
Conversation
mtauraso
commented
Feb 14, 2026
- HSCDataSet and FitsFileDatasets ids() -> _all_ids() iterators
- InferenceDataset and DataProvider continue to have ids() methods, but they are entirely based on the get_object_id() HyraxQL getter
- Some Iterable Datasets have been removed from tests to simplify a case where ids() was called on all Iterable datasets. More full removal of Iterable Datasets will occur in a later commit
- ids() methods on InferenceDataset and DataProvider now return lists to make the callsites more ergonomic
- Several lookups on list(obj.ids()) have been refactored to call get_object_id for efficiency and clarity.
- HSCDataSet and FitsFileDatasets ids() -> _all_ids() iterators - InferenceDataset and DataProvider continue to have ids() methods, but they are entirely based on the get_object_id() HyraxQL getter - Some Iterable Datasets have been removed from tests to simplify a case where ids() was called on all Iterable datasets. More full removal of Iterable Datasets will occur in a later commit - ids() methods on InferenceDataset and DataProvider now return lists to make the callsites more ergonomic - Several lookups on list(obj.ids()) have been refactored to call get_object_id for efficiency and clarity.
|
This still needs to percolate through example notebooks and docs, which I'm going to try to do on this PR. Things that are showing up for future in this direction:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #704 +/- ##
==========================================
- Coverage 63.25% 62.59% -0.67%
==========================================
Files 59 59
Lines 5770 5737 -33
==========================================
- Hits 3650 3591 -59
- Misses 2120 2146 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR refactors Hyrax dataset ID handling to remove the expectation that datasets expose an ids() generator, shifting most callsites toward get_object_id(idx) and keeping ids() as an eager list primarily on DataProvider and InferenceDataSet.
Changes:
- Converts multiple callsites (verbs + tests) from
list(x.ids())tox.ids()and/orget_object_id(idx). - Removes iterable dataset implementations/tests (CIFAR + Random) as an interim simplification.
- Introduces/standardizes
get_object_id()onInferenceDataSetandDataProvider, and makes theirids()methods return lists.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/hyrax/test_umap.py | Updates UMAP ID comparisons to use list-returning ids() without extra list(...). |
| tests/hyrax/test_save_to_database.py | Updates ID extraction for DB save test to match list-returning ids(). |
| tests/hyrax/test_inference_dataset.py | Updates ID enumeration and replaces list(result.ids())[i] with get_object_id(i). |
| tests/hyrax/test_infer.py | Updates inference ordering test to rely on list-returning ids(). |
| tests/hyrax/test_fits_image_dataset.py | Adjusts expectation to compare directly against DataProvider.ids() list. |
| tests/hyrax/test_e2e.py | Removes iterable CIFAR dataset from parametrized E2E matrix. |
| tests/hyrax/test_data_provider.py | Renames/adjusts primary dataset test and updates ID expectations. |
| tests/hyrax/conftest.py | Removes iterable random dataset from loopback fixture parameterization. |
| src/hyrax/verbs/visualize.py | Replaces repeated list(ids())[idx] patterns with get_object_id(idx). |
| src/hyrax/verbs/umap.py | Uses list-returning InferenceDataSet.ids() directly when building all_ids. |
| src/hyrax/verbs/lookup.py | Uses list-returning InferenceDataSet.ids() directly for lookup indexing. |
| src/hyrax/pytorch_ignite.py | Simplifies index generation for the no-split dataloader path. |
| src/hyrax/data_sets/random/hyrax_random_dataset.py | Removes iterable random dataset and its ids() generator implementation. |
| src/hyrax/data_sets/inference_dataset.py | Adds get_object_id() and changes ids() to return a list; refactors metadata ID lookup. |
| src/hyrax/data_sets/hyrax_cifar_dataset.py | Removes iterable CIFAR dataset and its ids() implementation. |
| src/hyrax/data_sets/hsc_data_set.py | Switches internal enumeration from ids() to _all_ids(). |
| src/hyrax/data_sets/fits_image_dataset.py | Makes FITS dataset ID iterator private (_all_ids) and updates internal users. |
| src/hyrax/data_sets/downloaded_lsst_dataset.py | Removes ids() generator implementation. |
| src/hyrax/data_sets/data_set_registry.py | Removes default ids() implementation and metadata auto-injection based on ids(). |
| src/hyrax/data_sets/data_provider.py | Adds get_object_id(), makes ids() eager list, and enforces primary dataset presence. |
| src/hyrax/data_sets/init.py | Stops exporting iterable datasets from the public hyrax.data_sets surface. |
| @pytest.fixture(scope="function", params=["HyraxRandomDataset"]) | ||
| def loopback_hyrax(tmp_path_factory, request): |
There was a problem hiding this comment.
The fixture no longer parameterizes HyraxRandomIterableDataset, but the if request.param == "HyraxRandomIterableDataset": ... block below is now dead code. Removing that branch will avoid confusion and keep the fixture aligned with the supported datasets in this PR.
Click here to view all benchmarks. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |