Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 19, 2025

📄 804% (8.04x) speedup for _pstdev in unstructured/metrics/utils.py

⏱️ Runtime : 6.59 milliseconds 729 microseconds (best of 250 runs)

📝 Explanation and details

The optimized version replaces Python's statistics.pstdev() with a custom Numba-compiled implementation that provides significant performance gains.

Key optimizations applied:

  1. Numba JIT compilation: The _numba_pstdev() function uses @njit(cache=True, fastmath=True) to compile the standard deviation calculation to native machine code, eliminating Python interpreter overhead.

  2. Manual computation: Instead of relying on Python's statistics.pstdev(), the algorithm manually computes the population standard deviation using basic loops - calculating the mean first, then the sum of squared differences, and finally taking the square root.

  3. NumPy array conversion: The filtered scores are converted to a np.float64 array, which provides better memory layout and enables Numba's optimizations.

Why this leads to speedup:

  • JIT compilation eliminates interpreter overhead: Numba compiles the math-heavy computation to machine code, removing the cost of Python bytecode interpretation during the core calculation.
  • Optimized memory access: NumPy arrays provide contiguous memory layout that's more cache-friendly than Python lists.
  • fastmath optimizations: Enables aggressive floating-point optimizations that can improve performance.

Performance characteristics based on test results:

  • Small datasets (2-10 elements): 300-500% speedup, showing that even the compilation overhead is quickly amortized.
  • Large datasets (1000+ elements): 600-2000% speedup, demonstrating that the optimization scales excellently with data size.
  • Edge cases (empty/single element): No performance penalty, as the early returns bypass the expensive computation entirely.

The optimization is particularly effective for the mathematical computation while preserving all original behavior including None filtering, rounding logic, and edge case handling.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 5 Passed
🌀 Generated Regression Tests 69 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
metrics/test_utils.py::test_stats 18.1μs 5.29μs 243%✅
🌀 Generated Regression Tests and Runtime
import statistics

# imports
from unstructured.metrics.utils import _pstdev

# unit tests

# 1. Basic Test Cases


def test_basic_integers():
    # Standard deviation of [1, 2, 3, 4, 5] is sqrt(2) ≈ 1.414, rounded to 3 decimals
    codeflash_output = _pstdev([1, 2, 3, 4, 5])  # 14.0μs -> 2.62μs (433% faster)


def test_basic_floats():
    # Standard deviation of [1.0, 2.0, 3.0] is ≈ 0.816, rounded to 3 decimals
    codeflash_output = _pstdev([1.0, 2.0, 3.0])  # 13.2μs -> 2.33μs (468% faster)


def test_basic_with_none():
    # None values are ignored; [None, 2, 4, None, 6] => [2, 4, 6]
    # pstdev([2, 4, 6]) = sqrt(((2-4)^2 + (4-4)^2 + (6-4)^2)/3) = sqrt((4+0+4)/3) = sqrt(8/3) ≈ 1.633
    codeflash_output = _pstdev([None, 2, 4, None, 6])  # 12.9μs -> 2.38μs (442% faster)


def test_basic_with_negative_numbers():
    # Standard deviation of [-1, -2, -3, -4, -5] is sqrt(2) ≈ 1.414
    codeflash_output = _pstdev([-1, -2, -3, -4, -5])  # 13.8μs -> 2.38μs (479% faster)


def test_basic_with_mixed_signs():
    # Standard deviation of [1, -1, 1, -1] is 1.0
    codeflash_output = _pstdev([1, -1, 1, -1])  # 13.3μs -> 2.38μs (460% faster)


def test_basic_with_zeroes():
    # Standard deviation of [0, 0, 0, 0] is 0.0
    codeflash_output = _pstdev([0, 0, 0, 0])  # 13.1μs -> 2.33μs (461% faster)


def test_basic_with_repeated_values():
    # Standard deviation of [3, 3, 3, 3, 3] is 0.0
    codeflash_output = _pstdev([3, 3, 3, 3, 3])  # 13.4μs -> 2.38μs (463% faster)


def test_basic_no_rounding():
    # When rounding is None or 0, return full precision
    # [1, 2, 3] => pstdev = sqrt(((1-2)^2 + (2-2)^2 + (3-2)^2)/3) = sqrt((1+0+1)/3) = sqrt(2/3)
    expected = statistics.pstdev([1, 2, 3])
    codeflash_output = _pstdev([1, 2, 3], rounding=None)  # 8.75μs -> 1.96μs (347% faster)
    codeflash_output = _pstdev([1, 2, 3], rounding=0)  # 7.71μs -> 958ns (705% faster)


def test_basic_custom_rounding():
    # Custom rounding to 1 decimal
    codeflash_output = _pstdev([1, 2, 3], rounding=1)  # 12.6μs -> 2.33μs (439% faster)


# 2. Edge Test Cases


def test_edge_empty_list():
    # Empty list should return None
    codeflash_output = _pstdev([])  # 416ns -> 416ns (0.000% faster)


def test_edge_list_of_none():
    # List of only None should return None
    codeflash_output = _pstdev([None, None])  # 458ns -> 459ns (0.218% slower)


def test_edge_single_element():
    # Single element list should return None
    codeflash_output = _pstdev([5])  # 458ns -> 458ns (0.000% faster)


def test_edge_single_element_with_none():
    # [None, 7, None] -> [7] -> None
    codeflash_output = _pstdev([None, 7, None])  # 541ns -> 500ns (8.20% faster)


def test_edge_two_elements():
    # [2, 4] -> pstdev = sqrt(((2-3)^2 + (4-3)^2)/2) = sqrt((1+1)/2) = 1.0
    codeflash_output = _pstdev([2, 4])  # 12.5μs -> 2.38μs (425% faster)


def test_edge_rounding_zero_and_none():
    # Rounding as 0 or None should return full precision
    expected = statistics.pstdev([10, 20, 30])
    codeflash_output = _pstdev([10, 20, 30], rounding=0)  # 8.58μs -> 1.92μs (348% faster)
    codeflash_output = _pstdev([10, 20, 30], rounding=None)  # 7.62μs -> 916ns (732% faster)


def test_edge_negative_and_positive_floats():
    # [1.5, -2.5, 3.5, -4.5] should work
    codeflash_output = _pstdev([1.5, -2.5, 3.5, -4.5])
    result = codeflash_output  # 14.0μs -> 2.33μs (502% faster)
    expected = round(statistics.pstdev([1.5, -2.5, 3.5, -4.5]), 3)


def test_edge_all_none_and_one_value():
    # [None, None, 42, None] -> [42] -> None
    codeflash_output = _pstdev([None, None, 42, None])  # 541ns -> 500ns (8.20% faster)


def test_edge_large_negative_and_positive():
    # Large magnitude values
    codeflash_output = _pstdev([-1e6, 1e6])
    result = codeflash_output  # 13.5μs -> 2.29μs (489% faster)


def test_edge_all_zeros_and_none():
    # [0, 0, None, 0] -> [0, 0, 0] -> stddev = 0.0
    codeflash_output = _pstdev([0, 0, None, 0])  # 12.8μs -> 2.29μs (458% faster)


def test_edge_float_rounding():
    # Rounding to 2 decimals
    codeflash_output = _pstdev([1.123, 2.234, 3.345], rounding=2)  # 24.9μs -> 2.38μs (949% faster)


def test_edge_rounding_negative():
    # Negative rounding should behave like normal round
    codeflash_output = _pstdev([100, 200, 300], rounding=-1)
    result = codeflash_output  # 13.3μs -> 2.38μs (460% faster)
    expected = round(statistics.pstdev([100, 200, 300]), -1)


# 3. Large Scale Test Cases


def test_large_scale_homogeneous():
    # List of 1000 identical values; stddev should be 0.0
    data = [7.5] * 1000
    codeflash_output = _pstdev(data)  # 348μs -> 39.9μs (772% faster)


def test_large_scale_uniform():
    # List of 1000 values from 1 to 1000
    data = list(range(1, 1001))
    # The population standard deviation can be calculated directly
    expected = round(statistics.pstdev(data), 3)
    codeflash_output = _pstdev(data)  # 322μs -> 41.7μs (674% faster)


def test_large_scale_with_nones():
    # List with 900 valid floats and 100 Nones
    data = [float(i) for i in range(900)] + [None] * 100
    expected = round(statistics.pstdev([float(i) for i in range(900)]), 3)
    codeflash_output = _pstdev(data)  # 348μs -> 34.7μs (905% faster)


def test_large_scale_negative_and_positive():
    # 500 negative, 500 positive numbers
    data = list(range(-500, 0)) + list(range(1, 501))
    expected = round(statistics.pstdev(data), 3)
    codeflash_output = _pstdev(data)  # 318μs -> 41.4μs (668% faster)


def test_large_scale_random_floats():
    # 1000 random floats between 0 and 1
    import random

    random.seed(42)  # Deterministic test
    data = [random.random() for _ in range(1000)]
    expected = round(statistics.pstdev(data), 3)
    codeflash_output = _pstdev(data)  # 571μs -> 39.5μs (1346% faster)


def test_large_scale_all_none():
    # 1000 None values should return None
    data = [None] * 1000
    codeflash_output = _pstdev(data)  # 12.9μs -> 12.9μs (0.008% slower)


def test_large_scale_one_valid_rest_none():
    # 999 None and 1 valid value should return None
    data = [None] * 999 + [5.0]
    codeflash_output = _pstdev(data)  # 13.0μs -> 13.0μs (0.000% faster)


def test_large_scale_two_valid_rest_none():
    # 998 None and 2 valid values
    data = [None] * 998 + [2.0, 4.0]
    codeflash_output = _pstdev(data)  # 25.4μs -> 14.8μs (72.3% faster)


def test_large_scale_custom_rounding():
    # 1000 values, rounding to 2 decimals
    data = list(range(1000))
    expected = round(statistics.pstdev(data), 2)
    codeflash_output = _pstdev(data, rounding=2)  # 322μs -> 41.6μs (677% faster)


def test_large_scale_no_rounding():
    # 1000 values, no rounding
    data = list(range(1000))
    expected = statistics.pstdev(data)
    codeflash_output = _pstdev(data, rounding=None)  # 322μs -> 41.2μs (683% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import statistics

# imports
from unstructured.metrics.utils import _pstdev

# unit tests

# --------- Basic Test Cases ---------


def test_basic_positive_integers():
    # Test with a simple list of positive integers
    scores = [1, 2, 3, 4, 5]
    # Population stdev of [1,2,3,4,5] is sqrt(2)
    codeflash_output = _pstdev(scores)  # 14.0μs -> 2.62μs (435% faster)


def test_basic_negative_integers():
    # Test with negative integers
    scores = [-1, -2, -3, -4, -5]
    codeflash_output = _pstdev(scores)  # 13.8μs -> 2.42μs (469% faster)


def test_basic_mixed_signs():
    # Test with a mix of positive and negative numbers
    scores = [-2, 0, 2]
    codeflash_output = _pstdev(scores)  # 13.2μs -> 2.33μs (464% faster)


def test_basic_floats():
    # Test with floating point numbers
    scores = [1.5, 2.5, 3.5]
    codeflash_output = _pstdev(scores)  # 13.1μs -> 2.29μs (473% faster)


def test_basic_with_none_values():
    # Test with some None values (should be ignored)
    scores = [1, None, 2, None, 3]
    codeflash_output = _pstdev(scores)  # 12.7μs -> 2.29μs (453% faster)


def test_basic_rounding_none():
    # Test with rounding=None, should not round
    scores = [1, 2, 3, 4, 5]
    codeflash_output = _pstdev(scores, rounding=None)
    result = codeflash_output  # 12.8μs -> 2.00μs (540% faster)


def test_basic_rounding_zero():
    # Test with rounding=0, should not round
    scores = [1, 2, 3, 4, 5]
    codeflash_output = _pstdev(scores, rounding=0)
    result = codeflash_output  # 12.6μs -> 1.92μs (557% faster)


def test_basic_custom_rounding():
    # Test with custom rounding
    scores = [1, 2, 3, 4, 5]
    codeflash_output = _pstdev(scores, rounding=1)
    result = codeflash_output  # 13.3μs -> 2.38μs (460% faster)


def test_basic_duplicate_values():
    # Test with all same values, stdev should be 0
    scores = [7, 7, 7, 7]
    codeflash_output = _pstdev(scores)  # 13.0μs -> 2.29μs (469% faster)


def test_basic_two_values():
    # Test with two values, should return their population stdev
    scores = [1, 3]
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 8.38μs -> 2.08μs (302% faster)


# --------- Edge Test Cases ---------


def test_edge_empty_list():
    # Test with empty list, should return None
    scores = []
    codeflash_output = _pstdev(scores)  # 416ns -> 416ns (0.000% faster)


def test_edge_all_none():
    # Test with all None values, should return None
    scores = [None, None]
    codeflash_output = _pstdev(scores)  # 500ns -> 458ns (9.17% faster)


def test_edge_single_value():
    # Test with a single value, should return None
    scores = [42]
    codeflash_output = _pstdev(scores)  # 458ns -> 417ns (9.83% faster)


def test_edge_single_value_with_none():
    # Test with a single value + Nones, should return None
    scores = [None, 42, None]
    codeflash_output = _pstdev(scores)  # 500ns -> 500ns (0.000% faster)


def test_edge_large_negative_and_positive():
    # Test with large negative and positive values
    scores = [-1e10, 0, 1e10]
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 10.2μs -> 2.38μs (332% faster)


def test_edge_small_floats():
    # Test with very small float values
    scores = [1e-10, 2e-10, 3e-10]
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 20.2μs -> 2.21μs (817% faster)


def test_edge_high_precision_rounding():
    # Test with high precision rounding
    scores = [1.123456, 2.654321, 3.789012]
    expected = round(statistics.pstdev(scores), 6)
    codeflash_output = _pstdev(scores, rounding=6)  # 15.4μs -> 2.25μs (583% faster)


def test_edge_rounding_is_zero():
    # Test with rounding=0, should not round (float result)
    scores = [1, 2, 3]
    expected = statistics.pstdev(scores)
    codeflash_output = _pstdev(scores, rounding=0)  # 8.54μs -> 1.92μs (346% faster)


def test_edge_rounding_is_false():
    # Test with rounding=False, should not round
    scores = [1, 2, 3, 4]
    expected = statistics.pstdev(scores)
    codeflash_output = _pstdev(scores, rounding=False)  # 8.88μs -> 1.92μs (363% faster)


def test_edge_rounding_is_none_and_nonint():
    # Test with rounding=None and non-integer values
    scores = [1.5, 2.5, 3.5]
    expected = statistics.pstdev(scores)
    codeflash_output = _pstdev(scores, rounding=None)  # 8.88μs -> 1.83μs (384% faster)


def test_edge_all_values_none():
    # Test with all values None
    scores = [None, None, None]
    codeflash_output = _pstdev(scores)  # 500ns -> 459ns (8.93% faster)


def test_edge_mixed_none_and_values():
    # Test with mixture of None and values, only values used
    scores = [None, 1, None, 2, None, 3]
    expected = round(statistics.pstdev([1, 2, 3]), 3)
    codeflash_output = _pstdev(scores)  # 8.88μs -> 2.25μs (294% faster)


def test_edge_rounding_negative():
    # Test with negative rounding value (should round to left of decimal)
    scores = [10, 20, 30]
    expected = round(statistics.pstdev(scores), -1)
    codeflash_output = _pstdev(scores, rounding=-1)  # 8.83μs -> 2.21μs (300% faster)


def test_edge_rounding_large():
    # Test with large rounding value
    scores = [1.23456789, 2.34567891, 3.45678912]
    expected = round(statistics.pstdev(scores), 10)
    codeflash_output = _pstdev(scores, rounding=10)  # 15.2μs -> 2.21μs (587% faster)


def test_edge_non_float_values():
    # Test with integer and float values mixed
    scores = [1, 2.0, 3]
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 9.25μs -> 2.08μs (344% faster)


def test_edge_explicit_zero_rounding():
    # Test with rounding=0, should return float, not int
    scores = [1, 2, 3, 4]
    codeflash_output = _pstdev(scores, rounding=0)
    result = codeflash_output  # 12.6μs -> 1.92μs (559% faster)


# --------- Large Scale Test Cases ---------


def test_large_scale_many_elements():
    # Test with a large list of 1000 elements
    scores = list(range(1000))  # [0, 1, ..., 999]
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 323μs -> 42.0μs (670% faster)


def test_large_scale_many_elements_with_none():
    # Test with a large list of 1000 elements, with some None values
    scores = [i if i % 10 != 0 else None for i in range(1000)]
    filtered_scores = [i for i in range(1000) if i % 10 != 0]
    expected = round(statistics.pstdev(filtered_scores), 3)
    codeflash_output = _pstdev(scores)  # 291μs -> 39.7μs (635% faster)


def test_large_scale_identical_elements():
    # Test with a large list of identical elements, stdev should be 0
    scores = [5.5] * 1000
    codeflash_output = _pstdev(scores)  # 347μs -> 39.6μs (777% faster)


def test_large_scale_alternating_values():
    # Test with alternating values
    scores = [0, 1] * 500  # 1000 elements
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 318μs -> 42.4μs (650% faster)


def test_large_scale_high_precision():
    # Test with large list and high precision rounding
    scores = [float(i) / 1000 for i in range(1000)]
    expected = round(statistics.pstdev(scores), 8)
    codeflash_output = _pstdev(scores, rounding=8)  # 780μs -> 37.1μs (2006% faster)


def test_large_scale_all_none():
    # Test with large list of all None
    scores = [None] * 1000
    codeflash_output = _pstdev(scores)  # 12.9μs -> 13.0μs (0.316% slower)


def test_large_scale_one_value():
    # Test with large list, only one non-None value
    scores = [None] * 999 + [42]
    codeflash_output = _pstdev(scores)  # 12.9μs -> 13.0μs (0.646% slower)


def test_large_scale_two_values():
    # Test with large list, only two non-None values
    scores = [None] * 998 + [10, 20]
    expected = round(statistics.pstdev([10, 20]), 3)
    codeflash_output = _pstdev(scores)  # 21.1μs -> 14.7μs (43.8% faster)


def test_large_scale_random_values():
    # Test with random values, reproducible
    import random

    random.seed(42)
    scores = [random.uniform(-1000, 1000) for _ in range(1000)]
    expected = round(statistics.pstdev(scores), 3)
    codeflash_output = _pstdev(scores)  # 731μs -> 37.0μs (1875% faster)


def test_large_scale_mixed_none_and_random():
    # Test with random values and some None
    import random

    random.seed(123)
    scores = [random.uniform(-500, 500) if i % 7 != 0 else None for i in range(1000)]
    filtered_scores = [score for score in scores if score is not None]
    expected = round(statistics.pstdev(filtered_scores), 3)
    codeflash_output = _pstdev(scores)  # 634μs -> 34.8μs (1725% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from unstructured.metrics.utils import _pstdev


def test__pstdev():
    _pstdev([], rounding=0)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_e8goshnj/tmpmi76rap0/test_concolic_coverage.py::test__pstdev 500ns 500ns 0.000%✅

To edit these changes git checkout codeflash/optimize-_pstdev-mjcl3e1a and push.

Codeflash Static Badge

The optimized version replaces Python's `statistics.pstdev()` with a custom Numba-compiled implementation that provides significant performance gains.

**Key optimizations applied:**

1. **Numba JIT compilation**: The `_numba_pstdev()` function uses `@njit(cache=True, fastmath=True)` to compile the standard deviation calculation to native machine code, eliminating Python interpreter overhead.

2. **Manual computation**: Instead of relying on Python's `statistics.pstdev()`, the algorithm manually computes the population standard deviation using basic loops - calculating the mean first, then the sum of squared differences, and finally taking the square root.

3. **NumPy array conversion**: The filtered scores are converted to a `np.float64` array, which provides better memory layout and enables Numba's optimizations.

**Why this leads to speedup:**

- **JIT compilation eliminates interpreter overhead**: Numba compiles the math-heavy computation to machine code, removing the cost of Python bytecode interpretation during the core calculation.
- **Optimized memory access**: NumPy arrays provide contiguous memory layout that's more cache-friendly than Python lists.
- **fastmath optimizations**: Enables aggressive floating-point optimizations that can improve performance.

**Performance characteristics based on test results:**

- **Small datasets (2-10 elements)**: 300-500% speedup, showing that even the compilation overhead is quickly amortized.
- **Large datasets (1000+ elements)**: 600-2000% speedup, demonstrating that the optimization scales excellently with data size.
- **Edge cases (empty/single element)**: No performance penalty, as the early returns bypass the expensive computation entirely.

The optimization is particularly effective for the mathematical computation while preserving all original behavior including None filtering, rounding logic, and edge case handling.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 19, 2025 08:05
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant