Improve NWP time-slice logic and optimize performance (#282)#390
Improve NWP time-slice logic and optimize performance (#282)#390theoteske wants to merge 4 commits intoopenclimatefix:mainfrom
Conversation
|
Thanks for the work on this @theoteske, its looking really good. If I understand from the original issue #282, the problem setup in that issue might have been more like: def nwp_data_with_offset_steps():
data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
init_time = pd.to_datetime(["2023-01-01 12:00", "2023-01-01 13:00"])
step = pd.to_timedelta([0.5, 1.5, 2.5], unit="h")
channel = ["t", "dswrf"]
x = np.arange(4)
y = np.arange(5)
da = xr.DataArray(
data,
coords=[init_time, step, channel, x, y],
dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
)
return daor perhaps it was def nwp_data_with_offset_init_times():
data = np.random.rand(2, 3, 2, 4, 5).astype(np.float32)
init_time = pd.to_datetime(["2023-01-01 12:30", "2023-01-01 13:30"])
step = pd.to_timedelta([0, 1, 2], unit="h")
channel = ["t", "dswrf"]
x = np.arange(4)
y = np.arange(5)
da = xr.DataArray(
data,
coords=[init_time, step, channel, x, y],
dims=["init_time_utc", "step", "channel", "x_osgb", "y_osgb"],
)
return da@zaryab-ali, could you help to clear this up? |
|
the second code snippet is more accurate i think because step was rounded down and there was a value missing at the end |
|
Thanks for the clarification @dfulu and @zaryab-ali. My changes were designed to fix the second scenario ( To confirm, the test case Let me know if you are understanding it differently or if you would like me to update anything in the PR. |
|
Thanks both. I think the use-case is still slightly vague, mostly because I think it was vague in the original issue. @theoteske the current fixture Is that the intended behaviour? So users can choose a more coarse forecast step that the underlying data? |
|
I don't think your example is functionally equivalent to So I don't think this is solved. But perhaps I'm wrong |
|
@theoteske just a little prod since I'd like this not to go stale. I think there are some nice changes here, the I appreciate this is a hard one since there was ambiguity in the original issue. I've closed that issue now, but I think this PR can stand on its own. It reproduces the original functionality and it a bit more performant and robust. I'd suggest to reduce the scope of this to basically remove the |
Pull Request
Description
This pull request addresses and resolves issue #282.
The
select_time_slice_nwpfunction would previously fail when an NWP data source containedinit_time_utccoordinates that were not rounded to the hour (e.g., 12:30) due to the assumption in the time-selection logic that init times would align with hourly intervals. This PR makes the code more robust to external datasets that may not have these nice hourly intervals.The PR simultaneously improves the performance of the time-selection logic by using the vectorized
numpy.searchsortedfunction in place of the previous list comprehension to efficiently find the appropriate init times. This operation is now approximately 1.5x to 2.6x faster based on local benchmarking.Additionally, we add more descriptive ValueError exceptions to provide clearer feedback to users when no valid NWP data can be found for a requested time range.
Fixes #282
How Has This Been Tested?
Comprehensive tests have been added to
tests/select/test_select_time_slice.pyto verify the update. These include a new test case that uses a mock NWP DataArray with non-hourly init times to confirm that the correct time slices are selected, as well as tests to ensure that the new ValueError exceptions are raised under the appropriate conditions.The existing test suite also still passes.
Yes
Yes, a sanity check was performed by confirming that the
init_time_utcandstepcoordinates of the output DataArray correctly align to produce the expected target times.Checklist: