Skip to content

Fix dataset prefix matching in STAC server#9

Merged
TheGreatAlgo merged 3 commits into
mainfrom
load-dataset-bug
Jun 16, 2026
Merged

Fix dataset prefix matching in STAC server#9
TheGreatAlgo merged 3 commits into
mainfrom
load-dataset-bug

Conversation

@eloramirez1356

@eloramirez1356 eloramirez1356 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

This pull request improves the accuracy and reliability of dataset resolution in the dClimate client, specifically when dealing with datasets whose names are prefixes of other datasets (such as precipitation_total and precipitation_total_land). The changes ensure that only exact dataset matches are returned, preventing accidental conflation of similarly named datasets. The update also adds comprehensive tests and clarifies usage in the documentation.

Exact dataset matching and bug fix:

  • Refactored the dataset matching logic in resolve_cid_from_stac_server to use a new helper function _feature_matches_dataset, ensuring that only exact dataset IDs are matched and avoiding prefix collisions (e.g., precipitation_total vs. precipitation_total_land). [1] [2]

Testing improvements:

  • Added new tests to tests/test_stac_server_listing.py that verify resolve_cid_from_stac_server correctly handles datasets with similar prefixes, rejects prefix-only matches, and supports legacy ID fallback, ensuring robust behavior for edge cases.

Documentation update:

  • Updated the README.md to clarify the distinction between ERA5 and ERA5-Land datasets, including usage examples and recommendations to inspect dataset names before loading.

Minor codebase updates:

  • Added resolve_cid_from_stac_server to the import list in tests/test_stac_server_listing.py to support the new tests.

Summary by CodeRabbit

  • Documentation

    • Added a README usage example showing how to load ERA5-Land datasets with the expected load_dataset pattern (including precipitation examples).
  • Bug Fixes

    • Improved STAC CID resolution to use exact dataset matching, preventing collisions with similarly named datasets and improving legacy-id fallback behavior.
  • Tests

    • Added unit tests for exact matching, collision handling, and legacy-id fallback cases in STAC CID resolution.
  • Chores

    • Bumped package version to 0.5.9.

@eloramirez1356 eloramirez1356 self-assigned this Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cd717e14-4736-4140-aff0-5d3e7690697d

📥 Commits

Reviewing files that changed from the base of the PR and between bb25f35 and 7b4e22f.

📒 Files selected for processing (1)
  • pyproject.toml

📝 Walkthrough

Walkthrough

resolve_cid_from_stac_server in stac_server.py gains two private helpers that perform exact dataset-id matching via the collection field and properties["dclimate:dataset_id"] (or a parsed fallback), replacing the old prefix-startswith filter. Three new unit tests cover the updated resolution behavior, README adds an ERA5-Land usage example, and the package version is bumped to 0.5.9.

Changes

STAC Strict Dataset Matching and ERA5-Land Docs

Layer / File(s) Summary
Exact dataset matching implementation
dclimate_client_py/stac_server.py
Two new internal helpers parse the dataset segment from a feature id and validate exact membership by checking collection then dclimate:dataset_id or parsed id. The existing prefix-startswith filter in resolve_cid_from_stac_server is replaced with the new exact-match helper.
Unit tests, README example, and release update
tests/test_stac_server_listing.py, README.md, pyproject.toml
Import statement extended to include resolve_cid_from_stac_server. Three new tests verify exact-match CID resolution, ValueError on prefix-only collisions, and legacy-id fallback CID resolution. README adds an async def main_era5_land() example showing load_dataset calls for ERA5 and ERA5-Land precipitation datasets. Package version is bumped from 0.5.8 to 0.5.9.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 Hoppity-hop through the STAC fields I go,
No more confusing "land" from the plain ERA flow!
I sniff the dataset_id, I parse out the name,
Prefix collisions? No more of that game.
With tests and a README to light up the way,
Precipitation is perfectly sorted today! ☁️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix dataset prefix matching in STAC server' directly and concisely describes the main change—resolving a dataset resolution bug caused by prefix-based matching logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch load-dataset-bug

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dclimate_client_py/stac_server.py`:
- Around line 78-82: The code in this area has not been formatted according to
Ruff's formatting standards, causing the CI pre-commit check to fail. Run the
pre-commit tool or ruff-format command locally to automatically reformat the
code in the stac_server.py file to comply with the project's formatting
requirements, then commit these formatting changes before merging the pull
request.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61e36dab-5972-4d68-b1de-07e328f04ad2

📥 Commits

Reviewing files that changed from the base of the PR and between 63f9a1c and 2b52153.

📒 Files selected for processing (3)
  • README.md
  • dclimate_client_py/stac_server.py
  • tests/test_stac_server_listing.py

Comment thread dclimate_client_py/stac_server.py Outdated
@TheGreatAlgo TheGreatAlgo merged commit bc93bb9 into main Jun 16, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants