Skip to content

Improve NWP time-slice logic and optimize performance (#282)#390

Open
theoteske wants to merge 4 commits intoopenclimatefix:mainfrom
theoteske:time-slice-bug-fix
Open

Improve NWP time-slice logic and optimize performance (#282)#390
theoteske wants to merge 4 commits intoopenclimatefix:mainfrom
theoteske:time-slice-bug-fix

Conversation

@theoteske
Copy link
Contributor

Pull Request

Description

This pull request addresses and resolves issue #282.

The select_time_slice_nwp function would previously fail when an NWP data source contained init_time_utc coordinates that were not rounded to the hour (e.g., 12:30) due to the assumption in the time-selection logic that init times would align with hourly intervals. This PR makes the code more robust to external datasets that may not have these nice hourly intervals.

The PR simultaneously improves the performance of the time-selection logic by using the vectorized numpy.searchsorted function in place of the previous list comprehension to efficiently find the appropriate init times. This operation is now approximately 1.5x to 2.6x faster based on local benchmarking.

Additionally, we add more descriptive ValueError exceptions to provide clearer feedback to users when no valid NWP data can be found for a requested time range.

Fixes #282

How Has This Been Tested?

Comprehensive tests have been added to tests/select/test_select_time_slice.py to verify the update. These include a new test case that uses a mock NWP DataArray with non-hourly init times to confirm that the correct time slices are selected, as well as tests to ensure that the new ValueError exceptions are raised under the appropriate conditions.

The existing test suite also still passes.

  • Yes

  • Yes, a sanity check was performed by confirming that the init_time_utc and step coordinates of the output DataArray correctly align to produce the expected target times.

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu
Copy link
Member

dfulu commented Jan 21, 2026

Thanks for the work on this @theoteske, its looking really good.

If I understand from the original issue #282, the problem setup in that issue might have been more like:

def nwp_data_with_offset_steps():
    data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
    init_time = pd.to_datetime(["2023-01-01 12:00", "2023-01-01 13:00"])
    step = pd.to_timedelta([0.5, 1.5, 2.5], unit="h")
    channel = ["t", "dswrf"]
    x = np.arange(4)
    y = np.arange(5)
    da = xr.DataArray(
        data,
        coords=[init_time, step, channel, x, y],
        dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
    )
    return da

or perhaps it was

def nwp_data_with_offset_init_times():
   data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
   init_time = pd.to_datetime(["2023-01-01 12:30", "2023-01-01 13:30"])
   step = pd.to_timedelta([0, 1, 2], unit="h")
   channel = ["t", "dswrf"]
   x = np.arange(4)
   y = np.arange(5)
   da = xr.DataArray(
       data,
       coords=[init_time, step, channel, x, y],
       dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
   )
   return da

@zaryab-ali, could you help to clear this up?

@zaryab-ali
Copy link
Contributor

the second code snippet is more accurate i think because step was rounded down and there was a value missing at the end

@theoteske
Copy link
Contributor Author

Thanks for the clarification @dfulu and @zaryab-ali.

My changes were designed to fix the second scenario (nwp_data_with_offset_init_times), where the init_time_utc coordinates are offset.

To confirm, the test case test_select_time_slice_nwp_success in my PR uses a fixture with non-hourly init times (12:30 and 13:30) and hourly steps, which I believe is functionally identical to the second code snippet you provided. The tests for this case are passing with the new logic.

Let me know if you are understanding it differently or if you would like me to update anything in the PR.

@dfulu
Copy link
Member

dfulu commented Jan 22, 2026

Thanks both. I think the use-case is still slightly vague, mostly because I think it was vague in the original issue.

@theoteske the current fixture nwp_data_array_non_hourly() has half-hourly init-time AND also half-hourly steps. But in test_select_time_slice_nwp_success() its sliced so that the returned array has hourly steps. So essentially every other of the forecast steps of the NWP is used.

Is that the intended behaviour? So users can choose a more coarse forecast step that the underlying data?

@dfulu
Copy link
Member

dfulu commented Jan 22, 2026

I don't think your example is functionally equivalent to nwp_data_with_offset_init_times because nwp_data_with_offset_init_times doesn't have any valid-times which lie on the integer hour. In the select_time_slice_nwp() function target_times would all lie on the integer hours.

So I don't think this is solved. But perhaps I'm wrong

@dfulu
Copy link
Member

dfulu commented Feb 4, 2026

@theoteske just a little prod since I'd like this not to go stale. I think there are some nice changes here, the np.searchsorted() especially.

I appreciate this is a hard one since there was ambiguity in the original issue. I've closed that issue now, but I think this PR can stand on its own. It reproduces the original functionality and it a bit more performant and robust.

I'd suggest to reduce the scope of this to basically remove the nwp_data_array_non_hourly fixture and not aim to support that yet. This could always be the focus of a separate issue once we have scoped it more precisely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

issue in select_time_slice_nwp

3 participants