Geosh: Geometric Shapley Data Valuation

This repository implement small-scale active learning experiments to illustrate the ridge leverage scores approximation to Shapley data values as in this paper. The repo compares different selection strategies on MNIST, CIFAR-10, and synthetic datasets.

Installation and Command-Line Usage

To install create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

To run a single active learning experiment:

python scripts/run_active_learning.py \
    --dataset mnist \
    --model mlp \
    --selector ridge-leverage \
    --rounds 20 \
    --batch-size 5 \
    --initial-size 100 \
    --pretraining 10 \
    --adaptive-lambda \
    --alpha 0.01 \
    --seed 42 \
    --device cpu

Parameters:

--dataset: Dataset to use (mnist, cifar10, synthetic)
--model: Model architecture (mlp, cnn)
--selector: Selection strategy (ridge-leverage, uniform, kcenter, margin, entropy, loss, egl)
--rounds: Number of active learning rounds
--batch-size: Samples selected per round
--initial-size: Initial labeled set size
--pretraining: Pretraining rounds before active learning
--adaptive-lambda: Use adaptive lambda calculation
--alpha: Scaling factor for adaptive lambda (default: 0.01)
--seed: Random seed for reproducibility
--device: Device to use (cpu, cuda, mps)

To compare all selection strategies run python scripts/run_comparison.py with any of the above parameters, but omitting the --selector flag. CSV files, plots, and tables will be saved to geosh/experiments/output

Reproducing Paper Results

To replicate the figures from the NeurIPS Workshop paper, run:

bash mlxor.sh

This executes the full experimental setup with 40 rounds, 20 pretraining rounds, and 5 random seeds. No GPUs required!

Citation

If you use this code in your research, please cite:

@misc{mendozasmith2025geometricdatavaluationleverage,
      title={Geometric Data Valuation via Leverage Scores}, 
      author={Rodrigo Mendoza-Smith},
      year={2025},
      eprint={2511.02100},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.02100}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
experiments		experiments
geosh		geosh
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
mlxor.sh		mlxor.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Geosh: Geometric Shapley Data Valuation

Installation and Command-Line Usage

Reproducing Paper Results

Citation

About

Uh oh!

Releases

Packages

Languages

rodrgo/geosh

Folders and files

Latest commit

History

Repository files navigation

Geosh: Geometric Shapley Data Valuation

Installation and Command-Line Usage

Reproducing Paper Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages