Skip to content

Conversation

@thisisamirv
Copy link

Add kernel functions with statistical properties and LOESS support

This PR prepares the foundation for KDE and LOESS implementations.

Summary

Major enhancement to the kernel module adding:

  1. Kernel density estimation (KDE) constants to consts.rs
  2. Complete rewrite of kernel.rs with AMISE efficiency metrics, dual evaluation modes, and LOESS integration
  3. Compile-time validation of statistical properties

Changes to consts.rs

  • Kernel variance (μ₂): ∫ u² K(u) du for 9 kernels
  • Kernel roughness (R(K)): ∫ K(u)² du for 9 kernels
  • Module docs explaining kernel properties and bandwidth conversion
  • References to Silverman (1986) Table 3.1 and Wand & Jones (1995)
  • Added missing SQRT_PI constant

Changes to kernel.rs

1. Statistical Correctness

  • Dual evaluation modes:
    • evaluate(): Normalized for KDE (integrates to 1)
    • evaluate_weight(): Unnormalized for LOESS (local regression weights)
  • AMISE properties: Efficiency, variance, roughness, bandwidth factors from Silverman (1986)
  • Compile-time validation: Ensures efficiency calculations match theoretical values

2. New Kernels & Renaming

  • Renamed QuarticBisquare (it's a more standard name)
  • Added Cosine kernel (high efficiency ≈ 0.9995)
  • Added Logistic kernel (heavy-tailed unbounded)
  • Added Sigmoid kernel (hyperbolic secant)
  • Fixed Sigmoid formula: was exp(πx), now correctly exp(x)

3. Enhanced API

  • KernelType enum: Runtime kernel selection for LOESS and other applications
  • CustomKernel: User-defined kernels with metadata
  • Batch operations: evaluate_batch(), compute_distance_weights()
  • Utilities: robust_reweights(), normalize_weights()
  • Recommendations: recommended_for_kde(), recommended_for_loess(), most_efficient()

4. Boundary Behavior Fix

  • Changed boundary handling from |x| <= 1 to |x| >= 1 for consistency
  • Support now returns open interval (-1, 1) as mathematically correct
  • All bounded kernels return 0 at exactly x = ±1

5. Documentation & Testing

  • Comprehensive table: Formula, support, efficiency, R(K), μ₂(K) for all kernels
  • Compile-time tests: Verify efficiency matches Silverman (1986) exact values
  • Many runtime tests: Integration, symmetry, monotonicity, normalization, edge cases

Breaking Changes

  1. Quartic → Bisquare: Renamed for statistical consistency
  2. Boundary behavior: |x| = 1 now returns 0 (was non-zero)
  3. Sigmoid formula: Fixed to match hyperbolic secant definition
  4. Tricube normalization: Now (70/81)(1-|x|³)³ (was unnormalized)

Fixed

  • Fix unused import warnings for distribution (removed unused import crate::distribution::internal::testing_boiler).
  • Fix Geometric::inverse_cdf platform-dependent behavior on Windows:

Problem

The test_inverse_cdf test was failing on Windows with:

  • Expected: 1
  • Got: 2 for inverse_cdf(0.0)

Root Cause

Floating-point precision differences across platforms caused inverse_cdf(0.0) to compute inconsistent results when using the formula ceil(log(1-p) / log(1-self.p)).

Solution

Added an explicit implementation of inverse_cdf for the Geometric distribution that handles edge cases consistently:

  • Returns min() (1) when input probability p <= 0.0
  • Returns 1 when distribution parameter self.p == 1.0
  • Returns max() (u64::MAX) when input probability p >= 1.0
  • Uses the mathematical formula for other cases

This ensures consistent behavior across all platforms (macOS, Linux, Windows).

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 95.62044% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.73%. Comparing base (5da3470) to head (8325b1a).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/function/kernel.rs 95.65% 53 Missing ⚠️
src/distribution/geometric.rs 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #360      +/-   ##
==========================================
- Coverage   94.99%   94.73%   -0.27%     
==========================================
  Files          61       59       -2     
  Lines       13615    14004     +389     
==========================================
+ Hits        12934    13266     +332     
- Misses        681      738      +57     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YeungOnion
Copy link
Contributor

Hey, would you be willing to break this one out across a few PRs?

One partition could be,

  • fixes and changes to kernel functions, additions for consts and compile time testing
  • kerneltype enum and custom kernel (happy to discuss this, but we've not implemented a lot in the way of dynamic dispatch)
  • fixing geometric inverse cdf is worked on in feat: add specialised inverse cdf implementation for geometric distribution #343 (I should move that forward and make additional tests into a separate issue)
  • fix for macro imports and use - I don't know why this lint appears in this context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants