Skip to content

Fix GFS dataset import when ocf-data-sampler constants are unavailable#110

Open
AswaniSahoo wants to merge 2 commits intoopenclimatefix:mainfrom
AswaniSahoo:add-gfs-dataset-tests
Open

Fix GFS dataset import when ocf-data-sampler constants are unavailable#110
AswaniSahoo wants to merge 2 commits intoopenclimatefix:mainfrom
AswaniSahoo:add-gfs-dataset-tests

Conversation

@AswaniSahoo
Copy link

Pull Request

Description

This PR fixes an import-time crash in open_data_pvnet.nwp.gfs_dataset.

NWP_MEANS and NWP_STDS were introduced in ocf-data-sampler
(openclimatefix/ocf-data-sampler#145) but are not yet part of a PyPI release.
As a result, importing gfs_dataset fails when installing dependencies
from PyPI with ocf-data-sampler==0.2.32.

This change adds a graceful fallback when these constants are unavailable,
allowing the module to be imported and used without normalization.
When the constants are present in a future release, the existing
normalization logic will be used automatically.

Minimal regression tests are included to lock this behavior.

Fixes #


How Has This Been Tested?

The following tests were run locally:

  • pytest tests/test_gfs_dataset.py
  • pytest

These tests verify that:

  • gfs_dataset imports successfully when normalization constants are missing

  • NaN handling behaves as expected

  • init_time is correctly renamed to init_time_utc

  • GFSDataSampler initializes without crashing

  • Yes

If your changes affect data processing, have you plotted any changes?

  • Yes (not applicable — behavior is unchanged when constants are present)

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@AswaniSahoo
Copy link
Author

Hi @siddharth7113 @peterdudfield

Just a gentle ping on this PR when you get a chance.
It fixes an import-time crash in gfs_dataset.py caused by unreleased
ocf-data-sampler constants and adds a small regression test.

Happy to make any changes or follow suggestions if needed.
Thanks!

@peterdudfield
Copy link
Contributor

This was merged in 0.0.52 of data-sampler. I wonder if those constants have been moved somewhere?

Note there is also the task #115 which might go over this

@AswaniSahoo
Copy link
Author

Thanks for the clarification @peterdudfield

Yes , the constants do exist in newer ocf-data-sampler, but
open-data-pvnet is currently pinned to 0.2.32, where
ocf_data_sampler.constants is not available. That’s what causes the
import-time crash .

This PR is intended as a backwards-compatible safeguard, so
gfs_dataset.py doesn’t crash on import while the dependency upgrade
in #115 is still pending.

Happy to:

  • adjust this to conditionally use the new constants once available,
  • keep this as a temporary fix until Update to new ocf-data-sampler + pvnet #115 lands, or
  • close/rework it if you’d prefer everything handled as part of the upgrade.

Just let me know what direction you’d like

@peterdudfield
Copy link
Contributor

Thanks for the clarification @peterdudfield

Yes , the constants do exist in newer ocf-data-sampler, but open-data-pvnet is currently pinned to 0.2.32, where ocf_data_sampler.constants is not available. That’s what causes the import-time crash .

This PR is intended as a backwards-compatible safeguard, so gfs_dataset.py doesn’t crash on import while the dependency upgrade in #115 is still pending.

Happy to:

  • adjust this to conditionally use the new constants once available,
  • keep this as a temporary fix until Update to new ocf-data-sampler + pvnet #115 lands, or
  • close/rework it if you’d prefer everything handled as part of the upgrade.

Just let me know what direction you’d like

In ocf-data-sampler==0.2.32, can you see those values anywhere? Or can you see which version they were removed/moved?

@AswaniSahoo
Copy link
Author

I’ve double-checked this locally against ocf-data-sampler==0.2.32.

In that version:

  • there is no ocf_data_sampler.constants module at all
  • NWP_MEANS / NWP_STDS are not defined or re-exported under any other module
  • a full search of the installed package shows no occurrences of those names

The available top-level modules in 0.2.32 are only:
config, load, numpy_sample, select, and utils.

From what I can see, the constants were introduced later (e.g. in
ocf-data-sampler PR #145) and are only available in newer versions, which
explains the import-time failure while open-data-pvnet is pinned to 0.2.32.

So there isn’t an alternative import path for those values in 0.2.32; the
options are either a defensive fallback (this PR) or upgrading the dependency
as part of #115.

@peterdudfield
Copy link
Contributor

I’ve double-checked this locally against ocf-data-sampler==0.2.32.

In that version:

  • there is no ocf_data_sampler.constants module at all
  • NWP_MEANS / NWP_STDS are not defined or re-exported under any other module
  • a full search of the installed package shows no occurrences of those names

The available top-level modules in 0.2.32 are only: config, load, numpy_sample, select, and utils.

From what I can see, the constants were introduced later (e.g. in ocf-data-sampler PR #145) and are only available in newer versions, which explains the import-time failure while open-data-pvnet is pinned to 0.2.32.

So there isn’t an alternative import path for those values in 0.2.32; the options are either a defensive fallback (this PR) or upgrading the dependency as part of #115.

What version do they come in?

@AswaniSahoo
Copy link
Author

The normalization constants (NWP_MEANS / NWP_STDS in constants.py) were introduced in ocf-data-sampler PR #145, merged on Jan 24, 2025. I also verified this locally: constants.py is present and importable in ocf-data-sampler==0.0.52, and not present in 0.2.32.

Based on the repository tags, the first versions that include this change are v0.0.52 and later (including the v0.1.x series).

@peterdudfield
Copy link
Contributor

The normalization constants (NWP_MEANS / NWP_STDS in constants.py) were introduced in ocf-data-sampler PR #145, merged on Jan 24, 2025. I also verified this locally: constants.py is present and importable in ocf-data-sampler==0.0.52, and not present in 0.2.32.

Based on the repository tags, the first versions that include this change are v0.0.52 and later (including the v0.1.x series).

and can you find when they are removed?

@AswaniSahoo
Copy link
Author

I checked this in detail.

constants.py (and NWP_MEANS / NWP_STDS) was introduced in v0.0.52 (PR #145), and is present through v0.1.16. It was then removed in v0.1.17 by commit e82e3e9 (PR #206: “Move normalisation values to config”). After v0.1.17, those symbols no longer exist under the same API.

So ocf-data-sampler==0.2.32 predates their introduction entirely, while newer versions past v0.1.17 have a changed normalization API.

@siddharth7113
Copy link
Contributor

Hi @AswaniSahoo & @peterdudfield ,

Sorry I should have commented sooner, the current script that you have been working needs to be removed and it is longer used, we made this when were using the old version of ocf-data-sampler , now everything simply needs to be passed through configsspecifically present here.

@siddharth7113
Copy link
Contributor

@AswaniSahoo apologies for all your trouble , this is why we haven't opened #115 for new contirbutors and was assinged to me, since we are working on making proper documentation soon.

Meanwhile thank you for investigating different version, I would recommend to keep a watch on #111 , we'll open more issues for contributor soon.

@AswaniSahoo
Copy link
Author

Thanks for the clarification @siddharth7113 — that makes sense.

Appreciate you and @peterdudfield taking the time to explain the newer
config-based flow and the ongoing work around #115.

Glad the version investigation was still useful for understanding the
failure mode. I’ll keep an eye on #111 / future issues and would be happy
to help once the new structure is ready for contributors.

Thanks again!

@AswaniSahoo AswaniSahoo mentioned this pull request Jan 20, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants