Skip to content

Add quantile mapping and associated tests#2264

Open
maxwhitemet wants to merge 15 commits intometoppv:masterfrom
maxwhitemet:mobt_1007_quantile_mapping_plugin
Open

Add quantile mapping and associated tests#2264
maxwhitemet wants to merge 15 commits intometoppv:masterfrom
maxwhitemet:mobt_1007_quantile_mapping_plugin

Conversation

@maxwhitemet
Copy link
Contributor

@maxwhitemet maxwhitemet commented Dec 9, 2025

Addresses #1007

This PR implements quantile mapping into the IMPROVER repo, adding a quantile mapping module, CLI, unit tests, and acceptance tests.

A demonstration of the plugin's functionality is available here.

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 96.10390% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.19%. Comparing base (84a8944) to head (ae2a5ad).
⚠️ Report is 154 commits behind head on master.

Files with missing lines Patch % Lines
improver/calibration/quantile_mapping.py 96.10% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2264      +/-   ##
==========================================
- Coverage   98.39%   95.19%   -3.20%     
==========================================
  Files         124      150      +26     
  Lines       12212    15323    +3111     
==========================================
+ Hits        12016    14587    +2571     
- Misses        196      736     +540     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maxwhitemet maxwhitemet force-pushed the mobt_1007_quantile_mapping_plugin branch from 73363ed to ae2a5ad Compare December 10, 2025 16:11
Copy link
Contributor

@gavinevans gavinevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxwhitemet 👍

I've added some comments below.

- Move functionality into QuantileMapping class
- Remove redundancy
- Increase variable name clarity
- Refactor into smaller functions
2. Additions:
- Improved readability experience of docstrings
- Fixed improper masked array handling
@maxwhitemet maxwhitemet force-pushed the mobt_1007_quantile_mapping_plugin branch from ae2a5ad to 6ec215f Compare December 29, 2025 16:22
Copy link
Contributor Author

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the feedback received, I have implemented the below modifications:

  • Made lots of changes to docstrings, such that now:
    • More extensive documentation has moved from private to public methods
    • Removed redundant Args sections in private methods, defined elsewhere.
  • Masked arrays
    • I was concerned about what would happen if the reference cube and the post-processed forecast cube had differing mask locations. Thus I have added handling that may require further discussion: combine the masks such that only points that are valid in both cubes are used to build the CDFs.
    • Removed redundant use of np.where for non-masked arrays as I discovered this is implicitly handled in np.ma.where

Copy link
Contributor

@gavinevans gavinevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the simplications @maxwhitemet 👍

I've added a few minor comments.

Copy link
Contributor Author

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gavinevans. I have now implemented all the requested changes.

Comment on lines 120 to 127
[2.63564289e-07, 8.47503543e-08, 3.35276127e-08],
[4.65661287e-08, 2.14204192e-08, 1.67638063e-08],
[8.38190317e-09, 1.21071935e-08, 2.23517418e-08],
],
[
[5.58793545e-09, 3.81842256e-08, 2.03959644e-07],
[2.51457095e-08, 6.61239028e-08, 1.89989805e-07],
[5.49480319e-08, 9.40635800e-08, 1.64844096e-07],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have made all examples easier to understand with the method suggested.

gavinevans
gavinevans previously approved these changes Jan 14, 2026
Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have some delightful comments.

For example, if reference_data is [10, 20, 30, 40, 50] and forecast_data
is [5, 15, 25, 35, 45], the forecast systematically underestimates by 5 units.
The corrected values will be [10, 20, 30, 40, 50], mapped to match the
reference distribution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a major fan of the example given, it demonstrates only a simple uniform linear adjustment to every forecast value.

The application in Gavin's work is about stretching out the data values to recover the peaks (and notionally troughs, but it's zero bounded for precip) that get lost in the smoothing / blending operations that have been applied to the forecast values. Perhaps that provides a nicer example, e.g.

        For example, if reference_data is [10, 20, 30, 40, 50] and forecast_data
        is [20, 25, 30, 35, 40], the forecast values are mapped to the corresponding
        values in the reference data distribution. This stretches the range of the forecast
        data, shifting the extreme values by 10 units in opposing directions. The median
        value is left unchanged as the two distributions are aligned at this point.
        The inter-quartile values are each shifted by 5 units in opposing directions,
        again reflecting the broader distribution found in the reference data.

Something like that? It demonstrates more of the behaviours that we may see in the quantile mapping than a simple bias correction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the above suggestion.

"""
sorted_values = np.sort(data)
num_points = sorted_values.shape[0]
quantiles = np.arange(1, num_points + 1) / num_points
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be defined as quantiles = np.arange(1, num_points + 1) / (num_points + 1) which would give a symmetric set of percentiles. As currently defined we assume that the lowest value in our dataset is not equivalent to the zeroth percentile, but that the maximum value is equivalent to the 100th percentile. I'm not specifically advocating for that, just highlighting that we've made a choice; see here for the fact that for large arrays this will not be important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gavin has suggested I keep the current implementation as it represents how an empirical distribution function is typically defined, and follows the ibicus implementation (ultimately using the statsmodels ecdf construction) Gavin's work follows.

# Map the quantiles to values in the reference distribution
corrected_values = self._inverted_cdf(reference_data, forecast_quantiles)

return corrected_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent far too long trying to think of how we can handle extremes and duplicate values here, but I've not come up with anything satisfactory.

Consider the following:

forecast = np.array([0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30])
reference = np.array([0, 0, 0, 0, 0, 0, 0, 10, 20, 40, 50])

The reference forecast in this confected case has one additional non-zero value, the arrays are the same length. We are going to be applying this to precipitation so lots of zeroes is a reasonable expectation. I've attempted to draw what the empirical CDF will look like below. In each case the median value is 0 as less than half of the values in each array are non-zero.

Image

The result of applying the quantile mapping method to this data is:

[10, 10, 10, 10, 10, 10, 10, 10, 20, 40, 50]

The interpolation step which maps the forecasts into the sorted forecasts and quantiles is confronted with many repeated zero values. The first transition in the forecast data is 0 --> 10 which occurs at the 0.727 quantile. The forecast quantiles for all zero values are set to this 0.727.

As the reference values have one fewer zero values the transition there from 0 --> 10 occurs at the 0.636 quantile. Mapping from one to the other using the floor method here all of our 0.727 quantile forecasts take on the value of 10 from the reference data.

In this confected case we are left with no zero value points in the final result. None of this may reflect our use cases. In Gavin's work I think the forecast values will generally have been spread through blending / neighbourhooding etc. to generally cover more grid points than the reference data, so this situation ought not to arise. You've also added a threshold below which data is not modified, which will be important for the zero values. My main concern is that this is a relatively generic technique that might be applied to data apart from our original use case and some of these issues could get buried.

I'd appreciate it if you could think about this a little. Perhaps I've gone down the garden path and none of this is quite right, but if there is anything worth documenting, it would be good to add such documentation as part of this change. This could take the form of some extended documentation (see https://github.com/metoppv/improver/tree/master/doc/source/extended_documentation) so that these issues are recorded. I don't think any of this is unique to our use case so there is probably some discussion somewhere that might be instructive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. I have also enjoyed a garden path.

I have addressed the duplicate problem.

In the latest commit, I have allowed the user the choice of quantile mapping 'method' to address duplicates: 'step' or 'linear'.

  • Step method: repeated forecast values collapse to the same (right-most) quantile, creating the stepwise assignment.
    • quantiles are np.interp(forecast_data, sorted_values, quantiles)
  • Linear method: repeated values are assigned different quantiles.
    • quantiles are (ranks + 0.5) / num_points where 'ranks' are the indices of the sorted values

Consider the example you provided above:

forecast = np.array([0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30])
reference = np.array([0, 0, 0, 0, 0, 0, 0, 10, 20, 40, 50])

The outputs are now:

  • 'step'
    [10, 10, 10, 10, 10, 10, 10, 10, 20, 40, 50]
  • 'linear'
    [0., 0., 0., 0., 0., 0., 0., 10, 20, 40, 50]

Comment on lines 148 to 149
reference_cube = reference_cube.copy()
reference_cube.convert_units(target_units)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy this rather than simply convert the units on the reference_cube? Doing so you would not need to return anything, you could simply act in place. You overwrite the original reference_cube, so you clearly have no need to retain it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now moved the use of reference_cube.copy() above the conditional logic so that a copy is returned regardless of whether units match.

I think that in-place modification would be a surprise to the user.

separated based on the truth attribute.
truth_attribute:
An attribute and its value in the format of "attribute=value",
which must be present on historical truth cubes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this CLI you talk about truths, historic data and reference data, elsewhere you talk mostly about reference data (which I prefer). One way or the other could you choose a common nomenclature.

def test__map_quantiles(
simple_reference_array,
simple_forecast_array,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc-string please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added



@pytest.fixture
def expected_result_no_threshold():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define this as a fixture? You use it only once in a single test, so it could be defined locally within that test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted.

assert isinstance(result, Cube)
assert result.shape == forecast_cube.shape
assert result.data.dtype == np.float32
assert result.data.mask is not False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the test you want. Don't you expect the result.data.mask to be False in this case?

Horribly, the iris mask type is a numpy bool (so I discover reviewing this). Meaning you can have both of the tests below here and they will both pass:

    assert result.data.mask is not False
    assert result.data.mask is not True

because the mask is neither of these native python types. I think you want:

assert not np.ma.is_masked(result.data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Sorted.

@pytest.fixture
def simple_forecast_array():
"""Fixture for creating a simple forecast array"""
return np.array([5, 15, 25, 35, 45])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more interesting examples of these simple linear cases might be nice to show more than a simple linear translation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've now addressed this.

Copy link
Contributor Author

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @bayliffe. I have implemented your suggested changes.

For example, if reference_data is [10, 20, 30, 40, 50] and forecast_data
is [5, 15, 25, 35, 45], the forecast systematically underestimates by 5 units.
The corrected values will be [10, 20, 30, 40, 50], mapped to match the
reference distribution.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the above suggestion.

Comment on lines 148 to 149
reference_cube = reference_cube.copy()
reference_cube.convert_units(target_units)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now moved the use of reference_cube.copy() above the conditional logic so that a copy is returned regardless of whether units match.

I think that in-place modification would be a surprise to the user.

"""
sorted_values = np.sort(data)
num_points = sorted_values.shape[0]
quantiles = np.arange(1, num_points + 1) / num_points
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gavin has suggested I keep the current implementation as it represents how an empirical distribution function is typically defined, and follows the ibicus implementation (ultimately using the statsmodels ecdf construction) Gavin's work follows.

corrected_values_flat = forecast_data_flat.copy()
corrected_values_flat[valid_mask] = corrected_valid

output_mask = combined_mask
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now implemented.



@pytest.fixture
def expected_result_no_threshold():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted.

assert isinstance(result, Cube)
assert result.shape == forecast_cube.shape
assert result.data.dtype == np.float32
assert result.data.mask is not False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Sorted.

@pytest.fixture
def simple_forecast_array():
"""Fixture for creating a simple forecast array"""
return np.array([5, 15, 25, 35, 45])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've now addressed this.

def test__map_quantiles(
simple_reference_array,
simple_forecast_array,
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments