Skip to content

Numerical Stability of Variance #376

@genos

Description

@genos

First off, thank you for statrs! This looks like a fantastic library that took a lot of effort.

Regarding the numerical stability of the functions in statrs::statistics, I think it would be useful to reach for Welford's algorithm for reasons outlined here by John D. Cook and implemented here. It could also improve performance; the original algorithm requires only a single pass over the data, and it can also be parallelized.

For example, running the following on my laptop

Example Code
//! Example from <https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/>
use rand::prelude::*;
use statrs::statistics::Statistics;

/// <https://www.johndcook.com/skewness_kurtosis.html>
#[derive(Default)]
struct RunningStats(f64, f64, f64);

impl RunningStats {
    fn push(&mut self, x: f64) {
        let n1 = self.0;
        self.0 += 1.0;
        let delta = x - self.1;
        let delta_n = delta / self.0;
        let term1 = delta * delta_n * n1;
        self.1 += delta_n;
        self.2 += term1;
    }
    fn variance(&self) -> Option<f64> {
        if self.0 < 2.0 {
            None
        } else {
            Some(self.2 / (self.0 - 1.0))
        }
    }
}

impl<T: Iterator<Item = f64>> From<T> for RunningStats {
    fn from(xs: T) -> Self {
        let mut r = Self::default();
        for x in xs {
            r.push(x);
        }
        r
    }
}

fn main() {
    let mut rng = SmallRng::seed_from_u64(30_613_700); // seed from random.org
    let mut xs = vec![0f64; 1_000_000];
    // John D. Cook's second example: samples ~ U([0, 1]), shifted by 10^12
    for x in &mut xs {
        *x = 1e12 + rng.random::<f64>();
    }
    println!(
        "statrs:       {}\nRunningStats: {}",
        Statistics::variance(&xs),
        RunningStats::from(xs.into_iter())
            .variance()
            .expect("nonempty")
    );
}

yields

statrs:       0.31179242499195364
RunningStats: 0.08339059221218162

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions