The normflow plugin is developed as part of the
itwinai repository,
for integration of the Lattice QCD use-case in the interTwin project.
This plugin allows the normflow developers to continue their developments
independently in this repository, which is already integrated with itwinai.
For further details about the normflow package, you are referred to the
repository of the use-case.
Integration Authors:
- Rakesh Sarma, Juelich
- Matteo Bunino, CERN
The dependencies in this project are managed with the pyproject.toml file,
which requires installation of itwinai and the normflow source files,
provided in this repository. To get started, clone the repository and follow
the steps below. In this project, it is recommened to use the uv package
manager, which allows much faster installation of Python packages compared
to pip, and also manages library overlaps and dependencies across different
packages.
- Move to the root of the directory and create a Python virtual environment
with
python -m venv .venv. - Activate the virtual environment
source .venv/bin/activate. - Install
uvpackage manager withpip install uv. - Then install the plugin with
uv pip install .. This will install bothitwinaiandnormflowrepisotory. For new developments in thenormflowpackage, the source files located undersrc/itwinai/plugins/normflowhave to be modified. To install in an HPC environment, please follow the instructions on the required modules in theitwinaidocumentation at this page. Once the modules are loaded, rest of the installation procedure is same as above.
These steps will setup the environment for running this plugin. For launching
a pipeline or training, further instructions can be found in the README.md
under src/itwinai/plugins/normflow.
If you want to use the remote interTwin MLflow server, remember to
source mlflow-setup.sh first.
Note
Remember to set the correct credentials in mlflow-setup.sh
This package provides utilities for implementing the method of normalizing flows as a generative model for lattice field theory. The method of normalizing flows is a powerful generative modeling approach that learns complex probability distributions by transforming samples from a simple distribution through a series of invertible transformations. It has found applications in various domains, including generative image modeling.
The package currently supports scalar theories in any dimension, and we are actively extending it to accommodate gauge theories, broadening its applicability.
In a nutshell, three essential components are required for the method of normalizing flows:
- A prior distribution to draw initial samples.
- A neural network to perform a series of invertible transformations on the samples.
- An action that specifies the target distribution, defining the goal of the generative model.
The central high-level class of the package is called Model, which can be
instantiated by providing instances of the three objects mentioned above:
the prior, the neural network, and the action.
The Fitter class trains this instance of the Model class, which is
instantiated during the initialization of the Fitter class. The Fitter
class inherits from the TorchTrainer class provided by itwinai. Importantly,
the distributed training logic is obtained through this class and allows the
user to seamlessly integrate any one of the multiple distributed training
strategies configured in itwinai. Besides, it also provides functionalities
to perform profiling, logging and HyperParameter Optimization (HPO), which
are all inherited from the TorchTrainer class.
The training configuration is handled with yaml based configuration files.
An example of such a configuration file is provided in config.yaml. An example
to demonstrate the scalar theory in zero dimension, i.e., a scenario with one
point and one degree of freedom is provided in this file.
model:
_target_: normflow.Model
net_:
_target_: normflow.nn.DistConvertor_
knots_len: 10
symmetric: True
prior:
_target_: normflow.prior.NormalPrior
shape: [1]
action:
_target_: normflow.action.ScalarPhi4Action
kappa: 0
m_sq: -1.2
lambd: 0.5In this example, we have:
-
Prior Distribution: A normal distribution is used with a shape of
[1]. -
Action: A quartic scalar theory is defined with parameters
kappa=0,m_sq=-2.0, andlambda=0.2. -
Neural Network: The
DistConvertor_class is used to create the transformation network, withknots_len=10and symmetry enabled. Any instance of this class converts the probability distribution of inputs using a rational quadratic spline. In this example, the spline has 10 knots, and the distribution is assumed to be symmetric with respect to the origin.
The other parameters for training are specified within the same yaml file.
For conciseness, only few of the options, the config parameter, epochs and strategy
are shown in the snippet here.
config:
optim_lr: 0.001
weight_decay: 0.01
batch_size: 128
epochs: 100
strategy: "ddp"These settings will train the model for 100 epochs with a batch size of 128,
learning rate optim_lr of 0.001 and weight_decay of 0.01. The strategy
for distributing the training is specified with the strategy flag, which is set
to ddp here, which implies the default PyTorch-based Distributed Data Parallel (DDP)
strategy.
In order to launch the pipeline, simply run:
itwinai exec-pipeline --config-name config.yaml +pipe_key=training_pipelineThe above code block results in an output similar to:
>>> Training progress (cpu) <<<
Note: log(q/p) is estimated with normalized p; mean & error are obtained from samples in a batch
Epoch: 1 | loss: -0.5387 | ess: 0.8755
ConsoleLogger: epoch_loss = -0.5386953949928284
Epoch: 10 | loss: -0.4528 | ess: 0.8750
ConsoleLogger: epoch_loss = -0.45278581976890564
Epoch: 20 | loss: -0.7644 | ess: 0.8989
ConsoleLogger: epoch_loss = -0.76436448097229
Epoch: 30 | loss: -0.6655 | ess: 0.8778
ConsoleLogger: epoch_loss = -0.6654551029205322
Epoch: 40 | loss: -0.7596 | ess: 0.8895
ConsoleLogger: epoch_loss = -0.7596336603164673
Epoch: 50 | loss: -0.7449 | ess: 0.8836
ConsoleLogger: epoch_loss = -0.7449398040771484
Epoch: 60 | loss: -0.7312 | ess: 0.8920
ConsoleLogger: epoch_loss = -0.7311607599258423
Epoch: 70 | loss: -0.8115 | ess: 0.8982
ConsoleLogger: epoch_loss = -0.8114994764328003
Epoch: 80 | loss: -0.8218 | ess: 0.9046
ConsoleLogger: epoch_loss = -0.8217660188674927
Epoch: 90 | loss: -0.7733 | ess: 0.9065
ConsoleLogger: epoch_loss = -0.7732580900192261
Epoch 100 | Model Snapshot saved at checkpoint.E100.tar
Epoch: 100 | loss: -0.9057 | ess: 0.9045
ConsoleLogger: epoch_loss = -0.9056745767593384
(cpu) Time = 2.72 sec.
###############################
# 'Fitter' executed in 4.229s #
###############################
#################################
# 'Pipeline' executed in 4.229s #
#################################
This output indicates the loss values at specified epochs during the training process, providing insight into the model's performance over time.
Alternatively, the model initialization and training can also be performed
using the train.py file, which provides another interface to the user to build
the models and launch the training with configurations files defined within the
Python script. When using this script to launch the training, one can do so by:
python train.pyFor working on HPC systems, a startscript.sh file is provided. This can be launched
by:
sbatch startscript.shIn the startscript, nodes specifies the number of workers to be used for training.
You can efficiently scale your model training across multiple GPUs, enhancing
performance and reducing training time. This flexibility allows you to tackle
larger datasets and more complex models with ease. When using the train.py file
to launch the training, the last line in the startscript.sh file should be modified
to train.py.
This example demonstrates the flexibility of using the package to implement scalar field theories in a simplified zero-dimensional setting. It can be generalized to any dimension by changing the shape provided to the prior distribution.
After training the model, one can draw samples using an attribute called
posterior.
To draw n samples from the trained distribution, use the following command:
x = model.posterior.sample(n)Note that the trained distribution is almost never identical to the target distribution, which is specified by the action. To generate samples that are correctly drawn from the target distribution, similar to Markov Chain Monte Carlo (MCMC) simulations, one can employ a Metropolis accept/reject step and discard some of the initial samples. To this end, you can use the following command:
x = model.mcmc.sample(n)This command draws n samples from the trained distribution and applies a
Metropolis accept/reject step to ensure that the samples are correctly drawn.
Block diagram for the method of normalizing flows
The TRAIN and GENERATE blocks in the above figure depict the procedures for training the model and generating samples/configurations. For more information see arXiv:2301.01504.
In summary, this package provides a robust and flexible framework for implementing the method of normalizing flows as a generative model for lattice field theory. With its intuitive design and support for scalar theories, you can easily adapt it to various dimensions and leverage GPU acceleration for efficient training. We encourage you to explore the features and capabilities of the package, and we welcome contributions and feedback to help us improve and expand its functionality.
