Skip to content

rodrgo/geosh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geosh: Geometric Shapley Data Valuation

This repository implement small-scale active learning experiments to illustrate the ridge leverage scores approximation to Shapley data values as in this paper. The repo compares different selection strategies on MNIST, CIFAR-10, and synthetic datasets.

Installation and Command-Line Usage

To install create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

To run a single active learning experiment:

python scripts/run_active_learning.py \
    --dataset mnist \
    --model mlp \
    --selector ridge-leverage \
    --rounds 20 \
    --batch-size 5 \
    --initial-size 100 \
    --pretraining 10 \
    --adaptive-lambda \
    --alpha 0.01 \
    --seed 42 \
    --device cpu

Parameters:

  • --dataset: Dataset to use (mnist, cifar10, synthetic)
  • --model: Model architecture (mlp, cnn)
  • --selector: Selection strategy (ridge-leverage, uniform, kcenter, margin, entropy, loss, egl)
  • --rounds: Number of active learning rounds
  • --batch-size: Samples selected per round
  • --initial-size: Initial labeled set size
  • --pretraining: Pretraining rounds before active learning
  • --adaptive-lambda: Use adaptive lambda calculation
  • --alpha: Scaling factor for adaptive lambda (default: 0.01)
  • --seed: Random seed for reproducibility
  • --device: Device to use (cpu, cuda, mps)

To compare all selection strategies run python scripts/run_comparison.py with any of the above parameters, but omitting the --selector flag. CSV files, plots, and tables will be saved to geosh/experiments/output

Reproducing Paper Results

To replicate the figures from the NeurIPS Workshop paper, run:

bash mlxor.sh

This executes the full experimental setup with 40 rounds, 20 pretraining rounds, and 5 random seeds. No GPUs required!

Citation

If you use this code in your research, please cite:

@misc{mendozasmith2025geometricdatavaluationleverage,
      title={Geometric Data Valuation via Leverage Scores}, 
      author={Rodrigo Mendoza-Smith},
      year={2025},
      eprint={2511.02100},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.02100}, 
}

About

Geometric Shapley

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published