feat(models): add UMA (fairchem-core) interatomic-potential wrapper by dallasfoster · Pull Request #117 · NVIDIA/nvalchemi-toolkit

dallasfoster · 2026-06-16T20:42:00Z

ALCHEMI Toolkit Pull Request

Description

Add UMAWrapper, a BaseModelMixin-compatible wrapper around fairchem-core's
UMA (Universal Models for Atoms) MLIPPredictUnit, so UMA foundation models can
drive nvalchemi dynamics and inference. UMA is multi-task: one checkpoint
(uma-s-1p1 / uma-s-1p2 / uma-m-1p1) ships heads for OMol, OMat, OC20, ODAC,
and OMC; the wrapper pins a single task at construction (one-wrapper-one-model,
drive nvalchemi dynamics and inference. UMA is multi-task: one checkpoint
(uma-s-1p1 / uma-s-1p2 / uma-m-1p1) ships heads for OMol, OMat, OC20, ODAC,
and OMC; the wrapper pins a single task at construction (one-wrapper-one-model,
matching MACEWrapper). The conversion is tensor-native (no ASE round trip),
energy is the differentiable primitive, and forces/stress come from autograd.

Type of Change

New feature (non-breaking change that adds functionality)

Changes Made

nvalchemi/models/uma.py: UMAWrapper (task-aware model_config, adapt_input/
adapt_output, compute_embeddings, from_checkpoint). from_checkpoint exposes
fairchem's native inference_settings (incl. "turbo"); forward routes the
one-time lazy-init/MoLE-merge through CPU input to dodge a fairchem device-
placement bug under turbo on GPU-resident first batches.
nvalchemi/_optional.py + nvalchemi/models/init.py: register/export UMA.
pyproject.toml: uma extra (fairchem-core>=2.0.0); declare uma conflicting
with mace (e3nn pin) and cu12/cu13 (fairchem torch<2.9 vs toolkit-ops
torch>=2.11); pin setuptools<81 for fairchem's torchtnt. uv.lock regenerated.
examples/advanced/09_uma_nve.py: NVE/NVT/NPT MD example driven by built-in
LoggingHook + EnergyDriftMonitorHook; NPT exercises the stress path; turbo
selectable via inference_settings.
test/models/test_uma.py: consolidated suite — structural (mock), forward
equivalence vs FAIRChemCalculator, charged-input response, NVE drift (@slow),
turbo/compile device path (@slow, CUDA).
docs: userguide/models.md (supported-models table + UMA usage / HF-token /
torch-environment notes), modules/models.rst, models/index.md, examples README.
.github/workflows/ci.yml: install a dedicated .venv-uma and run the UMA tests
from it (gated on UMA-file changes or full runs); optional HF_TOKEN secret.

Testing

Unit tests pass locally (UMA suite: 36 passed with --slow against
uma-s-1p1 on CUDA; 33 passed / 3 slow-skipped without --slow)
Linting passes (make lint)
New tests added for new functionality

Equivalence matches FAIRChemCalculator to 1e-4 (OMol energy/forces, OMat
energy/forces/stress); charged OMol runs match the calculator at charge -1;
small (uma-s-1p1/1p2) and medium (uma-m-1p1) checkpoints load and run.

Additional Notes

UMA's deps conflict with the mace/cuXX stack, so it must be installed in its
own environment (uv sync --extra uma); it brings its own CUDA-enabled torch
(2.8, cu12.8) and does not use the cuXX GPU stack. Checkpoints are gated on
HuggingFace (facebook/UMA) — CI structural tests run without a token; the
checkpoint-based tests skip unless an HF_TOKEN secret is provided.

Tip

This repository uses Greptile, an AI code review service, to help conduct
pull request reviews. We encourage contributors to read and consider suggestions
made by Greptile, but note that human maintainers will provide the necessary
reviews for merging: Greptile's comments are not a qualitative judgement
of your code, nor is it an indication that the PR will be accepted/rejected.
We encourage the use of emoji reactions to Greptile comments, depending on
their usefulness and accuracy.

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

copy-pr-bot · 2026-06-16T20:42:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

dallasfoster · 2026-06-16T20:42:26Z

/ok to test 8344178

greptile-apps · 2026-06-16T20:46:31Z

Greptile Summary

This PR introduces UMAWrapper, a BaseModelMixin-compatible wrapper around fairchem-core's MLIPPredictUnit that makes UMA foundation-model checkpoints (uma-s-1p1, uma-s-1p2, uma-m-1p1) available to nvalchemi dynamics and inference. The implementation is tensor-native (no ASE round-trip), task-pinned at construction, and includes a dedicated turbo/compile first-forward CPU-routing workaround.

nvalchemi/models/uma.py: adapt_input handles charge, spin, and atom_categories\u2192tags conversion correctly; _cpu_route_first_forward cleanly replaces the previously flagged lazy_model_intialized workaround; adapt_output remaps fairchem's energy/forces/stress to nvalchemi shapes.
pyproject.toml: uma extra declared with fairchem-core>=2.0.0; conflicts with mace/cu12/cu13 registered; setuptools<81 pinned in the build group for torchtnt compatibility.
.github/workflows/ci.yml: isolated .venv-uma install step gated on UMA file changes; HF_TOKEN secret threads through to the checkpoint-based tests; coverage appended to the shared file.

Important Files Changed

Filename	Overview
nvalchemi/models/uma.py	New UMAWrapper implementation — tensor-native adapt_input/adapt_output, task-aware model_config, turbo/compile CPU-routing workaround, and from_checkpoint; one module-docstring inconsistency (OMol spin default stated as 1 vs. code default of 0)
test/models/test_uma.py	Comprehensive two-tier test suite: structural mock tests cover adapt_input/adapt_output/forward/config; checkpoint tests cover OMol/OMat equivalence vs FAIRChemCalculator, charged inputs, NVE drift (@slow), and turbo/compile device path (@slow CUDA)
.github/workflows/ci.yml	Adds dedicated .venv-uma step gated on UMA-specific file changes; pyproject.toml missing from the uma-changed detection list means dependency-only changes skip the UMA test job
pyproject.toml	Adds uma extra (fairchem-core>=2.0.0), declares it conflicting with mace/cu12/cu13 extras, and pins setuptools<81 in the build group for fairchem's torchtnt compatibility
examples/advanced/09_uma_nve.py	NVE/NVT/NPT MD example using LoggingHook and EnergyDriftMonitorHook; turbo selectable; NPT exercises stress path; graceful fallback if checkpoint unavailable

_{Reviews (3): Last reviewed commit: "add train=true/false flag to freeze chec..." | Re-trigger Greptile}

greptile-apps · 2026-06-16T20:46:38Z

+
+        fixed = torch.zeros(total_atoms, dtype=torch.long, device=device)
+        tags = torch.zeros(total_atoms, dtype=torch.long, device=device)


tags from input data silently discarded

tags is declared in optional_inputs, yet adapt_input always constructs tags = torch.zeros(...) rather than reading from data. For OC20/ODAC tasks, tags carry semantic meaning — 0 = subsurface, 1 = surface, 2 = adsorbate — and UMA's OC20 head uses them to route per-atom force computation. An OC20 caller who correctly populates tags on the AtomicData will get all-zero tags forwarded to fairchem, producing results that silently diverge from the FAIRChemCalculator reference. Either read tags from the data (similar to how charge/spin are handled) or remove "tags" from optional_inputs and document that OC20 callers must ensure their system tags match fairchem's zero-tag convention.

…IA#45 WIP) Adopt the canonical UMAWrapper from NVIDIA/nvalchemi-toolkit PR NVIDIA#117 (the forked UMAWrapper line of work) as the base, while preserving this branch's distributed `distribution_spec` (halo storage + Triton-kernel OpAdapters) so the DD work (NVIDIA#45) continues on the up-to-date wrapper. From NVIDIA#117: torch.compile via fairchem InferenceSettings (no compile_model flag) + module docstring; inference_settings typed Any (preset name OR an InferenceSettings instance); turbo/merge_mole CPU-lazy-init workaround in predict; adapt_input uses data.num_nodes_per_graph + cleaner cell/pbc/charge/ spin handling; dropped the per-atom / energy debug monkey-patches. Kept (not in NVIDIA#117): the `distribution_spec` property (SPEC_UMA_HALO + the 5 fairchem Wigner Triton OpAdapters with ScatterOutputs on the edge->node kernel) and the "no distributed_setup needed" note. Merge is mechanical (take NVIDIA#117's uma.py, splice our distribution_spec before embedding_shapes); imports cleanly in .venv-uma (fairchem) and ast-parses in .venv. RUNTIME-UNVALIDATED: the facebook/UMA checkpoint is HF-gated, so single-process + eager-DD + compile validation is pending an HF token on the box. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

dallasfoster · 2026-06-16T21:25:12Z

/ok to test 00044fa

laserkelvin

Generally looks good to me; the spin default value needs to change though

laserkelvin · 2026-06-16T20:51:35Z

+
+# Task names accepted by fairchem / UMA. Kept as a module-level tuple
+# so the wrapper can validate at construction and tests can iterate.
+_UMA_TASKS: tuple[str, ...] = ("omol", "omat", "oc20", "odac", "omc")


Why not a set as well?

laserkelvin · 2026-06-16T20:52:35Z

+        The UMA task this wrapper is pinned to.
+    """
+
+    def __init__(self, predict_unit: Any, task_name: str = "omol") -> None:


Can we tighten up the Any?

laserkelvin · 2026-06-16T20:57:33Z

+    def from_checkpoint(
+        cls,
+        name_or_path: str | Path,
+        task_name: str = "omol",


Should be typed as Literal or point to the _UMA_TASKS set/tuple

laserkelvin · 2026-06-16T21:17:29Z

+            raise ValueError(
+                f"{name_str!r} is neither a registered model name nor a "
+                f"local file path. Known names: "
+                f"{sorted(pretrained_mlip.available_models)[:6]}..."


Why is it truncated to the 6 items? You can just leave an inline comment maybe

laserkelvin · 2026-06-16T22:12:22Z

+
+        spin = getattr(data, "spin", None)
+        if spin is None:
+            spin = torch.zeros(n_systems, dtype=torch.long, device=device)


This should not be zero. A sensible default could be 1, but it really depends on the system's unpaired electrons.

laserkelvin · 2026-06-16T22:16:14Z

+# Maxwell-Boltzmann velocities at ``TEMPERATURE_K`` and zero net momentum.
+# Periodic systems carry ``cell`` and ``pbc``.
+
+_PROPANE_POSITIONS = np.array(


Could we make this example batched? You could duplicate the system and just use modified positions to start the simulations

dallasfoster added 4 commits June 16, 2026 13:35

consolidate tests and improve example

193d6f9

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

linting

c6a11cb

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

update ci workflow to account for uma

1874de4

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

update changelog

8344178

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

dallasfoster requested a review from laserkelvin June 16, 2026 20:42

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

dallasfoster added 2 commits June 16, 2026 14:06

greptile comments

58784e7

Signed-off-by: Dallas Foster <dallasf@nvidia.com>

add train=true/false flag to freeze checkpoint weights

00044fa

laserkelvin requested changes Jun 16, 2026

View reviewed changes


		fixed = torch.zeros(total_atoms, dtype=torch.long, device=device)
		tags = torch.zeros(total_atoms, dtype=torch.long, device=device)

Conversation

dallasfoster commented Jun 16, 2026

ALCHEMI Toolkit Pull Request

Description

Type of Change

Changes Made

Testing

Additional Notes

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

dallasfoster commented Jun 16, 2026

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Uh oh!

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dallasfoster commented Jun 16, 2026

Uh oh!

laserkelvin left a comment

Choose a reason for hiding this comment

Uh oh!

laserkelvin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

laserkelvin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

laserkelvin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

laserkelvin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

laserkelvin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

laserkelvin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading