Add dcumentation to run MD simulation on CSCS by GardevoirX · Pull Request #1 · metatensor/hpc-docs

GardevoirX · 2026-02-17T13:46:39Z

No description provided.

Luthaf · 2026-02-17T13:52:36Z

@RMeli could you check this all makes sense to you?

CSCS-Alps/molecular-dynamics-with-i-Pi.md

Luthaf · 2026-02-17T13:54:35Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+# The image path of the nvidia container that you want to use
+FROM nvcr.io/nvidia/pytorch:26.01-py3 
+
+# Install micromamba and create conda environment


what do you install micromamba for? would this be better using python from the base container?

hmm just to show how to use it in container, since others may want to install something from conda channel, like lammps-metatomic?

Ideally people should not use conda on CSCS for this, both plumed-metatomic and lammps-metatomic are available on spack and should be built as uenv

ah okay, I try to replace this with a conda-free version?

It would be nice (unless @RMeli disagree with me and says this is fine!)

Using conda is risky, because you have little control over dependencies. In particular, it is difficult to tap into the networks stack, which is essential for using the high-speed network. The same applies to other package manager.

As @Luthaf mentioned, uenv is likely a better option for applications needing MPI (at least for the time being), since it will build from source using the correct dependencies for the network. I don't see I-PI in Spack, so that will need to be added if you want to go down that route (which I understand is annoying).

RMeli

Hello. I had a look, and the approach is somewhat distant to what you can find in our documentation. I see two main issues with this approach:

The use of Podman instead of the Container Engine,
The use of Nvidia NGC images, which use OpenMPI 4 and therefore do not work with our network stack.

I have no knowledge of i-PI and from what I can see it does not need MPI itself. You also don't seem to consider/use MPI in the example here (i.e. PLUMED without MPI, ...) so this might not be an issue for this particular case. However, if you want to use it MPI-enabled clients, or use this approach with MPI applications (you mentioned LAMMPS below), then (2) is a non-starter with the bare NGC images. You can try with our Alps Extended Images, which come with OpenMPI 5 and are compatible with our network stack. (They are focussed on AI/ML applications, and I have yet to test them myself for other applications/MD.)

The other option is to use uenv. This allows to build the whole software stack from scratch, tapping into the right network stack. However, this requires to have all the needed dependencies available in the Spack package manager, which I understand is not great (but I already contributed plumed+metatomic to Spack and I have a working lammps+metatomic Spack recipe ready).

RMeli · 2026-02-17T14:54:19Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+
+Last update 17-02-2026.
+
+CSCS recommends running machine learning-related applications in containers, see [this page](https://docs.cscs.ch/software/ml/#optimizing-data-loading-for-machine-learning) for reference. In this tutorial, we use the [Pytorch NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) provided by Nvidia to fully exploit the power of the GH200s installed on daint.alps.


For ML applications this is indeed recommended. For mixed applications, other considerations apply. For instance, the Nvidia NGC images ship with OpenMPI 4, which is incompatible with our network stack. Therefore, if MPI needed, things get more complicated.

RMeli · 2026-02-17T15:06:14Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+# The image path of the nvidia container that you want to use
+FROM nvcr.io/nvidia/pytorch:26.01-py3 
+
+# Install micromamba and create conda environment


Using conda is risky, because you have little control over dependencies. In particular, it is difficult to tap into the networks stack, which is essential for using the high-speed network. The same applies to other package manager.

As @Luthaf mentioned, uenv is likely a better option for applications needing MPI (at least for the time being), since it will build from source using the correct dependencies for the network. I don't see I-PI in Spack, so that will need to be added if you want to go down that route (which I understand is annoying).

RMeli · 2026-02-17T15:09:07Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+            export OMP_NUM_THREADS=1
+            export MKL_NUM_THREADS=1
+            export TORCH_NUM_THREADS=1


Why are you asking for 40 CPUs per task and then set these? MKL is not available for aarch64, so you should probably look into similar variables for OpenBLAS/NVPL if you really want to set this.

I found that setting these variables can speed up i-pi calculation a lot, but haven't check why and which really works...

RMeli · 2026-02-17T15:12:28Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+    srun --cpus-per-task=$SLURM_CPUS_PER_TASK --ntasks=1 --gpus=1 \
+        --gpu-bind=single:1 \
+        --export=ALL,CUDA_VISIBLE_DEVICES=$GPU,NVIDIA_VISIBLE_DEVICES=$GPU \
+        podman run --rm \


Running directly with podman on Alps is not officially supported. You should use the Container Engine instead. Additionally, this can only work on a single node and can't scale to multiple nodes.

Thanks! I'll take a look

RMeli · 2026-02-17T15:36:17Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+
+```Dockerfile
+# The image path of the nvidia container that you want to use
+FROM nvcr.io/nvidia/pytorch:26.01-py3 


Nvidia NGC containers come with OpenMPI 4, which is incompatible with our network stack. They work very well when using NCCL (with the hooks enabled by the Container Engine, see below), but do not work with MPI applications.

Our extended Alps images come with OpenMPI 5 and are compatible with our network stack, so they should be used as base images. However, so far they have been mainly tested for ML applications.

let me test it and see if it makes some differences

In this case it might not, if you are not using MPI at all. But mine was a more general remark (especially given the general title of the PR). This might work for i-PI, but will not work (efficiently) with LAMMPS+metatomic for example.

RMeli · 2026-02-17T15:53:42Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+At this stage, the container is identical to the upstream NGC image. You can enter this container with:
+
+```bash
+podman run --rm -it --gpus all <image_name:you_like> bash


See comment below about using podman directly for running.

RMeli · 2026-02-17T15:54:42Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+
+CSCS recommends running machine learning-related applications in containers, see [this page](https://docs.cscs.ch/software/ml/#optimizing-data-loading-for-machine-learning) for reference. In this tutorial, we use the [Pytorch NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) provided by Nvidia to fully exploit the power of the GH200s installed on daint.alps.
+
+## Selecting and Building a Base Container


It would be better not to build and run containers on the login node. You should do this in an allocation. See our documentation on building images with podman.

Yeah I was building it on an interactive computational node, I'll mention it here

RMeli · 2026-02-17T15:55:53Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+
+## Making Persistent Modifications
+
+After entering the container, you can start configure your simulation environment as usual. However, be sure that every command are recorded, because your modification to this loaded container is temporary and will lose after you exit. To do permenant modifications, you should write down the commands in the `Containerfile`. The way to do so is to add them with the `RUN` keyword:


This is not good advice IMO. You should only modify the Containerfile.

RMeli · 2026-02-17T15:57:33Z

CSCS-Alps/molecular-dynamics-with-i-Pi.md

+You can save the modified container:
+
+```bash
+podman save -o /path/to/container.tar <image_name:you_like>


You should use the Container Engine instead of Podman for running, and therefore import images in the Container Engine instead of using podman save.

Add dcumentation to run MD simulation on CSCS

68b37e2

Luthaf reviewed Feb 17, 2026

View reviewed changes

CSCS-Alps/molecular-dynamics-with-i-Pi.md Show resolved Hide resolved

Luthaf reviewed Feb 17, 2026

View reviewed changes

Renaming

a2c0d48

RMeli reviewed Feb 17, 2026

View reviewed changes


		Last update 17-02-2026.

		CSCS recommends running machine learning-related applications in containers, see [this page](https://docs.cscs.ch/software/ml/#optimizing-data-loading-for-machine-learning) for reference. In this tutorial, we use the [Pytorch NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) provided by Nvidia to fully exploit the power of the GH200s installed on daint.alps.


		CSCS recommends running machine learning-related applications in containers, see [this page](https://docs.cscs.ch/software/ml/#optimizing-data-loading-for-machine-learning) for reference. In this tutorial, we use the [Pytorch NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) provided by Nvidia to fully exploit the power of the GH200s installed on daint.alps.

		## Selecting and Building a Base Container


		## Making Persistent Modifications

		After entering the container, you can start configure your simulation environment as usual. However, be sure that every command are recorded, because your modification to this loaded container is temporary and will lose after you exit. To do permenant modifications, you should write down the commands in the `Containerfile`. The way to do so is to add them with the `RUN` keyword:

Comments

Conversation

GardevoirX commented Feb 17, 2026

Uh oh!

Luthaf commented Feb 17, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Luthaf Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RMeli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Luthaf Feb 17, 2026 •

edited

Loading