feat(models): add UMA (fairchem-core) interatomic-potential wrapper#117
feat(models): add UMA (fairchem-core) interatomic-potential wrapper#117dallasfoster wants to merge 6 commits into
Conversation
Signed-off-by: Dallas Foster <dallasf@nvidia.com>
Signed-off-by: Dallas Foster <dallasf@nvidia.com>
Signed-off-by: Dallas Foster <dallasf@nvidia.com>
|
/ok to test 8344178 |
Greptile SummaryThis PR introduces
Important Files Changed
Reviews (3): Last reviewed commit: "add train=true/false flag to freeze chec..." | Re-trigger Greptile |
|
|
||
| fixed = torch.zeros(total_atoms, dtype=torch.long, device=device) | ||
| tags = torch.zeros(total_atoms, dtype=torch.long, device=device) |
There was a problem hiding this comment.
tags from input data silently discarded
tags is declared in optional_inputs, yet adapt_input always constructs tags = torch.zeros(...) rather than reading from data. For OC20/ODAC tasks, tags carry semantic meaning — 0 = subsurface, 1 = surface, 2 = adsorbate — and UMA's OC20 head uses them to route per-atom force computation. An OC20 caller who correctly populates tags on the AtomicData will get all-zero tags forwarded to fairchem, producing results that silently diverge from the FAIRChemCalculator reference. Either read tags from the data (similar to how charge/spin are handled) or remove "tags" from optional_inputs and document that OC20 callers must ensure their system tags match fairchem's zero-tag convention.
…IA#45 WIP) Adopt the canonical UMAWrapper from NVIDIA/nvalchemi-toolkit PR NVIDIA#117 (the forked UMAWrapper line of work) as the base, while preserving this branch's distributed `distribution_spec` (halo storage + Triton-kernel OpAdapters) so the DD work (NVIDIA#45) continues on the up-to-date wrapper. From NVIDIA#117: torch.compile via fairchem InferenceSettings (no compile_model flag) + module docstring; inference_settings typed Any (preset name OR an InferenceSettings instance); turbo/merge_mole CPU-lazy-init workaround in predict; adapt_input uses data.num_nodes_per_graph + cleaner cell/pbc/charge/ spin handling; dropped the per-atom / energy debug monkey-patches. Kept (not in NVIDIA#117): the `distribution_spec` property (SPEC_UMA_HALO + the 5 fairchem Wigner Triton OpAdapters with ScatterOutputs on the edge->node kernel) and the "no distributed_setup needed" note. Merge is mechanical (take NVIDIA#117's uma.py, splice our distribution_spec before embedding_shapes); imports cleanly in .venv-uma (fairchem) and ast-parses in .venv. RUNTIME-UNVALIDATED: the facebook/UMA checkpoint is HF-gated, so single-process + eager-DD + compile validation is pending an HF token on the box. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Dallas Foster <dallasf@nvidia.com>
|
/ok to test 00044fa |
laserkelvin
left a comment
There was a problem hiding this comment.
Generally looks good to me; the spin default value needs to change though
|
|
||
| # Task names accepted by fairchem / UMA. Kept as a module-level tuple | ||
| # so the wrapper can validate at construction and tests can iterate. | ||
| _UMA_TASKS: tuple[str, ...] = ("omol", "omat", "oc20", "odac", "omc") |
There was a problem hiding this comment.
Why not a set as well?
| The UMA task this wrapper is pinned to. | ||
| """ | ||
|
|
||
| def __init__(self, predict_unit: Any, task_name: str = "omol") -> None: |
There was a problem hiding this comment.
Can we tighten up the Any?
| def from_checkpoint( | ||
| cls, | ||
| name_or_path: str | Path, | ||
| task_name: str = "omol", |
There was a problem hiding this comment.
Should be typed as Literal or point to the _UMA_TASKS set/tuple
| raise ValueError( | ||
| f"{name_str!r} is neither a registered model name nor a " | ||
| f"local file path. Known names: " | ||
| f"{sorted(pretrained_mlip.available_models)[:6]}..." |
There was a problem hiding this comment.
Why is it truncated to the 6 items? You can just leave an inline comment maybe
|
|
||
| spin = getattr(data, "spin", None) | ||
| if spin is None: | ||
| spin = torch.zeros(n_systems, dtype=torch.long, device=device) |
There was a problem hiding this comment.
This should not be zero. A sensible default could be 1, but it really depends on the system's unpaired electrons.
| # Maxwell-Boltzmann velocities at ``TEMPERATURE_K`` and zero net momentum. | ||
| # Periodic systems carry ``cell`` and ``pbc``. | ||
|
|
||
| _PROPANE_POSITIONS = np.array( |
There was a problem hiding this comment.
Could we make this example batched? You could duplicate the system and just use modified positions to start the simulations
ALCHEMI Toolkit Pull Request
Description
Add UMAWrapper, a BaseModelMixin-compatible wrapper around fairchem-core's
UMA (Universal Models for Atoms) MLIPPredictUnit, so UMA foundation models can
drive nvalchemi dynamics and inference. UMA is multi-task: one checkpoint
(uma-s-1p1 / uma-s-1p2 / uma-m-1p1) ships heads for OMol, OMat, OC20, ODAC,
and OMC; the wrapper pins a single task at construction (one-wrapper-one-model,
drive nvalchemi dynamics and inference. UMA is multi-task: one checkpoint
(uma-s-1p1 / uma-s-1p2 / uma-m-1p1) ships heads for OMol, OMat, OC20, ODAC,
and OMC; the wrapper pins a single task at construction (one-wrapper-one-model,
matching MACEWrapper). The conversion is tensor-native (no ASE round trip),
energy is the differentiable primitive, and forces/stress come from autograd.
Type of Change
Changes Made
adapt_output, compute_embeddings, from_checkpoint). from_checkpoint exposes
fairchem's native
inference_settings(incl. "turbo"); forward routes theone-time lazy-init/MoLE-merge through CPU input to dodge a fairchem device-
placement bug under turbo on GPU-resident first batches.
umaextra (fairchem-core>=2.0.0); declare uma conflictingwith mace (e3nn pin) and cu12/cu13 (fairchem torch<2.9 vs toolkit-ops
torch>=2.11); pin setuptools<81 for fairchem's torchtnt. uv.lock regenerated.
LoggingHook + EnergyDriftMonitorHook; NPT exercises the stress path; turbo
selectable via inference_settings.
equivalence vs FAIRChemCalculator, charged-input response, NVE drift (@slow),
turbo/compile device path (@slow, CUDA).
torch-environment notes), modules/models.rst, models/index.md, examples README.
from it (gated on UMA-file changes or full runs); optional HF_TOKEN secret.
Testing
uma-s-1p1 on CUDA; 33 passed / 3 slow-skipped without --slow)
make lint)Equivalence matches FAIRChemCalculator to 1e-4 (OMol energy/forces, OMat
energy/forces/stress); charged OMol runs match the calculator at charge -1;
small (uma-s-1p1/1p2) and medium (uma-m-1p1) checkpoints load and run.
Additional Notes
UMA's deps conflict with the mace/cuXX stack, so it must be installed in its
own environment (
uv sync --extra uma); it brings its own CUDA-enabled torch(2.8, cu12.8) and does not use the cuXX GPU stack. Checkpoints are gated on
HuggingFace (facebook/UMA) — CI structural tests run without a token; the
checkpoint-based tests skip unless an HF_TOKEN secret is provided.
Tip
This repository uses Greptile, an AI code review service, to help conduct
pull request reviews. We encourage contributors to read and consider suggestions
made by Greptile, but note that human maintainers will provide the necessary
reviews for merging: Greptile's comments are not a qualitative judgement
of your code, nor is it an indication that the PR will be accepted/rejected.
We encourage the use of emoji reactions to Greptile comments, depending on
their usefulness and accuracy.