[REG]: Introduce feature files addition to registry#3
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (90.50%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3 +/- ##
===========================================
- Coverage 100.00% 91.05% -8.95%
===========================================
Files 1 3 +2
Lines 15 190 +175
Branches 0 33 +33
===========================================
+ Hits 15 173 +158
- Misses 0 8 +8
- Partials 0 9 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| return data | ||
|
|
||
|
|
||
| def _parse_yaml(yaml_path: Path) -> dict: |
There was a problem hiding this comment.
We do have this in junifer, why are we re-parsing the yaml and checking everything here?
There was a problem hiding this comment.
This is a minimal parse-validate for malformed YAML and adjusting of paths for further operations.
| y["datagrabber"] = meta["datagrabber"].copy() | ||
| a = y["datagrabber"].pop("class") | ||
| y["datagrabber"]["kind"] = a | ||
| if a not in ("PatternDataGrabber", "PatternDataladDataGrabber"): |
There was a problem hiding this comment.
Why are we doing this? I don't understand it.
There was a problem hiding this comment.
This is done to correctly regenerate the YAML.
| ) | ||
| feature_dir = ds.pathobj / "features" | ||
| feature_dir.mkdir(exist_ok=True) | ||
| for k, v in tqdm(metadata.items(), desc="Processing features"): |
There was a problem hiding this comment.
is here k the feature md5? Can we rename the variable to be more explicit?
There was a problem hiding this comment.
What do you want it to be called?
Enable users to add features to registry like so:
This should add the feature file (HDF5), the Junifer computation file (YAML) and its corresponding metadata file (YAML) to the specified registry dataset. If the Junifer YAML file contains multiple features, all features should be added to the registry, each with their own files. The metadata file should be generated automatically based on the information in the Junifer YAML file as well as the computed feature file (e.g., size, shape, data type, etc.). If the registry is remote, the command should clone the dataset locally, add the files, and then push the changes back to the remote.