KeyStone

Geometry-Guided Self-Consistency for Physical AI

KeyStone is a drop-in, inference-time, model-free wrapper for diffusion- and flow-matching physical-AI policies (VLAs and world-action models). At each round of the standard sample-and-execute loop it:

draws K candidate action chunks in parallel from a shared model context,
clusters them in continuous action space, and
executes the medoid of the largest cluster — a real sampled chunk, not a synthesized average.

No extra model, no training, no learned judge. Two properties make this practical:

Latency-preserving. Action diffusion runs over compact trajectories and is memory-bandwidth bound, so the shared context is computed once and the K chains run as a single batched forward — adding the K candidates is nearly free.
Judge-free selection. Action chunks are ordered sequences of bounded robot commands, so Euclidean distance between chunks reflects physical similarity. The largest dense region is a principled, model-free notion of consensus.

Across diverse VLAs and WAMs, KeyStone improves task success rates by up to 13.3% over single-trajectory sampling with negligible latency overhead, on par with model-based selectors at no training cost.

Paper: Geometry Guided Self-Consistency for Physical AI

Install

pip install -e .          # from this repo
# or, for development:
pip install -e ".[dev]"

The core library depends only on PyTorch.

Quickstart

import torch
from keystone import cluster_medoid_select

# K candidate action chunks, e.g. drawn from K independent noises.
candidates = torch.randn(16, 50, 7)        # (K, T, action_dim)

# Returns ONE real chunk via the paper's guarded cluster-medoid rule
# (global medoid if unimodal, else the dominant cluster's medoid).
chunk = cluster_medoid_select(candidates)  # (T, action_dim)

Run the end-to-end demo (no model or GPU required):

python examples/minimal_quickstart.py

API

Symbol	Purpose
`cluster_medoid_select(candidates, C=2, tau=0.3, distance="l2", ...)`	KeyStone's selector — the paper's `cluster_medoid_select` (Listing 2). Returns one real chunk. Accepts `(K, T, D)` or `(K, B, T, D)`.
`SelfConsistencyConfig`	All knobs: `num_samples` (K), `aggregation`, `cluster_medoid_num_clusters` (C), `unimodal_tau` (τ), `distance`, `action_dim`.
`aggregate_actions(actions, config, executed_steps=None)`	Lower-level engine over `(K, B, T, D)`.
`select_chunk(candidates, config, ...)`	Like `aggregate_actions` but also accepts unbatched `(K, T, D)`.
`expand_kv_cache(kv, K)`, `repeat_batch(t, K)`	Latency-preserving K-batch expansion of a transformer KV cache / prefix tensors.
`KeystonePolicy(base_policy, config)`	Generic wrapper implementing the paper's Listing 1 pattern.

Selection methods

All return a real sampled chunk — never a synthesized average that could land off the model's action manifold.

`aggregation`	Notes
`cluster_medoid_guarded` (default)	The paper's method (Listing 2). A unimodality guard returns the global medoid when the sample mean lies close to it (`spread = ‖mean − medoid‖ / median_pairwise < unimodal_tau`, default τ=0.3); otherwise it runs k-means (C clusters) and returns the largest cluster's medoid.
`medoid`	The global medoid alone — the chunk minimizing total distance to all others (vector generalization of the median). No clustering (the guard's unimodal branch).
`cluster_medoid`	k-means with a fixed cluster count C (`cluster_medoid_num_clusters`, default 2); return the medoid of the largest cluster. No guard (the guard's multimodal branch).

distance ∈ {l1, l2, cosine}. Set action_dim to ignore zero-padding beyond the real action dimensions. Pass executed_steps to select over only the timesteps that actually run before the next replan.

How it works

                      ┌─ noise ε₁ ─┐
   observation ──► encode context ─┤  ...  ├─► K candidate chunks ─► cluster-medoid ─► aₜ*
   (shared, once)      (expand ×K)  └─ noise ε_K ┘   (one batched forward)   selection

The selector flattens each candidate chunk and computes one K×K pairwise distance matrix. cluster_medoid_guarded (the default) first runs the unimodality guard — it compares the sample mean to the global medoid, normalized by the median pairwise distance; if they nearly agree it returns the global medoid, otherwise it runs k-means (fixed cluster count C), takes the largest cluster, and returns its medoid (the within-cluster chunk closest to the others). The medoid and cluster_medoid methods expose those two branches on their own. Every step is a batched on-GPU tensor op over only K compact chunks (≤ a few hundred µs at K≤16).

Adding KeyStone to your policy

Integration always reduces to encode-once → expand-to-K → batched-sample → select, in one of two styles. The framework-agnostic guide is examples/README_custom.md; each style has a runnable template:

Override the sampler — subclass a policy and replace its action generator (examples/flow_matching_policy.py; the LeRobot SmolVLA / π0.5 pattern).
Monkey-patch generate — intercept a served model's generate_actions without touching the simulator (examples/server_monkeypatch.py; the vla-eval GR00T / X-VLA / StarVLA pattern).

Reproducing the paper

The benchmark harnesses used in the paper import this package directly — their policy / server code calls aggregate_actions and expand_kv_cache from keystone, so the selector is maintained in exactly one place. Each has a self-contained, step-by-step guide (clone the fork, install keystone, run):

examples/README_lerobot.md — SmolVLA & π0.5 on LIBERO (LeRobot fork).
examples/README_vla_eval.md — GR00T N1.6, X-VLA, StarVLA on SimplerEnv (vla-eval fork).

Tests

pytest

Citation

@misc{keystone,
      title={Geometry Guided Self-Consistency for Physical AI},
      author={Yinwei Dai and Zhuofu Chen and Lijie Yang and Ravi Netravali},
      year={2026},
      eprint={2605.08638},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.08638},
}

License

Apache 2.0. See LICENSE and NOTICE — the selector was developed alongside forks of LeRobot and vla-evaluation-harness, both Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src/keystone		src/keystone
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KeyStone

Install

Quickstart

API

Selection methods

How it works

Adding KeyStone to your policy

Reproducing the paper

Tests

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KeyStone

Install

Quickstart

API

Selection methods

How it works

Adding KeyStone to your policy

Reproducing the paper

Tests

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages