Improve NWP time-slice logic and optimize performance (#282) by theoteske · Pull Request #390 · openclimatefix/ocf-data-sampler

theoteske · 2026-01-21T09:07:01Z

Pull Request

Description

This pull request addresses and resolves issue #282.

The select_time_slice_nwp function would previously fail when an NWP data source contained init_time_utc coordinates that were not rounded to the hour (e.g., 12:30) due to the assumption in the time-selection logic that init times would align with hourly intervals. This PR makes the code more robust to external datasets that may not have these nice hourly intervals.

The PR simultaneously improves the performance of the time-selection logic by using the vectorized numpy.searchsorted function in place of the previous list comprehension to efficiently find the appropriate init times. This operation is now approximately 1.5x to 2.6x faster based on local benchmarking.

Additionally, we add more descriptive ValueError exceptions to provide clearer feedback to users when no valid NWP data can be found for a requested time range.

Fixes #282

How Has This Been Tested?

Comprehensive tests have been added to tests/select/test_select_time_slice.py to verify the update. These include a new test case that uses a mock NWP DataArray with non-hourly init times to confirm that the correct time slices are selected, as well as tests to ensure that the new ValueError exceptions are raised under the appropriate conditions.

The existing test suite also still passes.

Yes
Yes, a sanity check was performed by confirming that the init_time_utc and step coordinates of the output DataArray correctly align to produce the expected target times.

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

…climatefix#282)

…ix#282)

dfulu · 2026-01-21T17:54:05Z

Thanks for the work on this @theoteske, its looking really good.

If I understand from the original issue #282, the problem setup in that issue might have been more like:

def nwp_data_with_offset_steps():
    data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
    init_time = pd.to_datetime(["2023-01-01 12:00", "2023-01-01 13:00"])
    step = pd.to_timedelta([0.5, 1.5, 2.5], unit="h")
    channel = ["t", "dswrf"]
    x = np.arange(4)
    y = np.arange(5)
    da = xr.DataArray(
        data,
        coords=[init_time, step, channel, x, y],
        dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
    )
    return da

or perhaps it was

def nwp_data_with_offset_init_times():
   data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
   init_time = pd.to_datetime(["2023-01-01 12:30", "2023-01-01 13:30"])
   step = pd.to_timedelta([0, 1, 2], unit="h")
   channel = ["t", "dswrf"]
   x = np.arange(4)
   y = np.arange(5)
   da = xr.DataArray(
       data,
       coords=[init_time, step, channel, x, y],
       dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
   )
   return da

@zaryab-ali, could you help to clear this up?

zaryab-ali · 2026-01-21T18:48:22Z

the second code snippet is more accurate i think because step was rounded down and there was a value missing at the end

theoteske · 2026-01-22T12:48:59Z

Thanks for the clarification @dfulu and @zaryab-ali.

My changes were designed to fix the second scenario (nwp_data_with_offset_init_times), where the init_time_utc coordinates are offset.

To confirm, the test case test_select_time_slice_nwp_success in my PR uses a fixture with non-hourly init times (12:30 and 13:30) and hourly steps, which I believe is functionally identical to the second code snippet you provided. The tests for this case are passing with the new logic.

Let me know if you are understanding it differently or if you would like me to update anything in the PR.

dfulu · 2026-01-22T16:58:08Z

Thanks both. I think the use-case is still slightly vague, mostly because I think it was vague in the original issue.

@theoteske the current fixture nwp_data_array_non_hourly() has half-hourly init-time AND also half-hourly steps. But in test_select_time_slice_nwp_success() its sliced so that the returned array has hourly steps. So essentially every other of the forecast steps of the NWP is used.

Is that the intended behaviour? So users can choose a more coarse forecast step that the underlying data?

dfulu · 2026-01-22T17:05:51Z

I don't think your example is functionally equivalent to nwp_data_with_offset_init_times because nwp_data_with_offset_init_times doesn't have any valid-times which lie on the integer hour. In the select_time_slice_nwp() function target_times would all lie on the integer hours.

So I don't think this is solved. But perhaps I'm wrong

dfulu · 2026-02-04T12:38:24Z

@theoteske just a little prod since I'd like this not to go stale. I think there are some nice changes here, the np.searchsorted() especially.

I appreciate this is a hard one since there was ambiguity in the original issue. I've closed that issue now, but I think this PR can stand on its own. It reproduces the original functionality and it a bit more performant and robust.

I'd suggest to reduce the scope of this to basically remove the nwp_data_array_non_hourly fixture and not aim to support that yet. This could always be the focus of a separate issue once we have scoped it more precisely

theoteske and others added 3 commits January 20, 2026 23:58

Improve performance and error handling of select_time_slice_nwp (open…

af9bb58

…climatefix#282)

Add tests to verify updated select_time_slice_nwp logic (openclimatef…

aa7086b

…ix#282)

Merge branch 'main' into time-slice-bug-fix

1cbc182

theoteske mentioned this pull request Jan 21, 2026

issue in select_time_slice_nwp #282

Closed

Merge branch 'main' into time-slice-bug-fix

5fcf225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve NWP time-slice logic and optimize performance (#282)#390

Improve NWP time-slice logic and optimize performance (#282)#390
theoteske wants to merge 4 commits intoopenclimatefix:mainfrom
theoteske:time-slice-bug-fix

theoteske commented Jan 21, 2026

Uh oh!

dfulu commented Jan 21, 2026

Uh oh!

zaryab-ali commented Jan 21, 2026

Uh oh!

theoteske commented Jan 22, 2026

Uh oh!

dfulu commented Jan 22, 2026

Uh oh!

dfulu commented Jan 22, 2026 •

edited

Loading

Uh oh!

dfulu commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

theoteske commented Jan 21, 2026

Pull Request

Description

How Has This Been Tested?

Checklist:

Uh oh!

dfulu commented Jan 21, 2026

Uh oh!

zaryab-ali commented Jan 21, 2026

Uh oh!

theoteske commented Jan 22, 2026

Uh oh!

dfulu commented Jan 22, 2026

Uh oh!

dfulu commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfulu commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dfulu commented Jan 22, 2026 •

edited

Loading