This is the repository for the Applied Deep Learning in Medicine practical at the chair of AI in Medicine at TUM.
Group Members:
- Andreas Ehrensberger
- Jan Rogalka
- Matteo Wohlrapp
This project implements implicit neural representations for MRI images using modulated SIREN (Sinusoidal Representation Networks). It is designed to reconstruct high-quality MRI images from undersampled k-space data. The network is based on work from Mehta et al. [1], and existing implementations [2].
The repository is structured as follows:
In the src folder are all the necessary files to train and test a model. Under src/data, everything related to the dataset and data loading can be found. src/train contains the trainer and auxiliary functions, while src/networks includes the actual neural network. The folder src/configuration contains information about the argument parsing, and src/utils defines helper functions. In configuration under the root directory, you can find several predefined training and testing configurations, including for the experiments we conducted.
We ran experiments to test different model configurations and evaluated our baseline model on various k-space sampling densities. All our models were trained on around 3000 single-coil simulated fastMRI [3] flair brain scans. The tests were performed on a validation set of flair images from the fastMRI dataset as well. We calculated the metrics for the results based on all available 940 files.
In total, six different ablations were tested:
- Baseline: Implementation with a pre-trained encoder trained on the fastMRI brain scans
- Edge: Introducing an additional edge loss to the model based on a Sobel filter
- VGG: Using a VGG encoder trained on ImageNet instead of our own trained one
- Morlet: Exchanging the sine-based activation functions with Morlet-based ones
- Perceptual: Using a perceptual loss based on a VGG encoder. The encoder for the perceptual loss can be downloaded here. Make sure you put it under a
modelfolder in the root directory. - Residual: Adding residual connections to the MLP layers and increasing the depth of the network while reducing the latent dimension
The configuration for each specific experiment can be found under configuration/ablations. To run and test the residual connections, you will need to check out to the residual-connections branch. The trained models can be downloaded from here.
The quantitative results can be found below.
| Configuration | PSNR Mean | PSNR Std | PSNR Min | PSNR Max | SSIM Mean | SSIM Std | SSIM Min | SSIM Max | NRMSE Mean | NRMSE Std | NRMSE Min | NRMSE Max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline | 26.646 | 1.994 | 21.081 | 35.084 | 0.850 | 0.034 | 0.724 | 0.934 | 0.310 | 0.126 | 0.180 | 1.082 |
| Edge | 26.766 | 2.040 | 20.975 | 34.730 | 0.855 | 0.034 | 0.729 | 0.937 | 0.313 | 0.146 | 0.167 | 1.192 |
| VGG | 21.233 | 2.676 | 10.238 | 31.297 | 0.739 | 0.030 | 0.566 | 0.839 | 0.608 | 0.384 | 0.195 | 4.193 |
| Morlet | 25.865 | 2.156 | 20.187 | 33.363 | 0.861 | 0.033 | 0.728 | 0.948 | 0.370 | 0.161 | 0.192 | 1.201 |
| Perceptual | 24.659 | 2.059 | 18.540 | 32.312 | 0.770 | 0.038 | 0.601 | 0.879 | 0.428 | 0.196 | 0.207 | 1.393 |
| Residual | 26.666 | 2.040 | 20.521 | 34.983 | 0.853 | 0.035 | 0.726 | 0.948 | 0.307 | 0.126 | 0.174 | 1.064 |
The qualitative results can be found below.

The fastMRI framework allows different k-space masks to be set, which results in different sampling densities. In this experiment, we tested different accelerations, higher numbers specifying less image information being retained, and center fractions, which means how much information is kept in the center of the k-space. In total, four different variations were tested:
- Acceleration 8, center fraction 0.05
- Acceleration 6, center fraction 0.05
- Acceleration 6, center fraction 0.1
The configuration for each specific experiment can be found under configuration/configuration. The trained models can be downloaded from here
The quantitative results can be found below.
| Configuration | PSNR Mean | PSNR Std | PSNR Min | PSNR Max | SSIM Mean | SSIM Std | SSIM Min | SSIM Max | NRMSE Mean | NRMSE Std | NRMSE Min | NRMSE Max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc 8, Cf 0.005 | 26.371 | 1.985 | 19.545 | 34.073 | 0.854 | 0.033 | 0.739 | 0.938 | 0.324 | 0.140 | 0.174 | 1.149 |
| Acc 6, Cf 0.005 | 26.646 | 1.994 | 21.081 | 35.084 | 0.850 | 0.034 | 0.724 | 0.934 | 0.310 | 0.126 | 0.180 | 1.082 |
| Acc 6, Cf 0.010 | 27.878 | 2.415 | 19.412 | 38.951 | 0.882 | 0.029 | 0.773 | 0.971 | 0.269 | 0.116 | 0.143 | 0.782 |
The qualitative results can be found below.

You can download the models from our result section here to continue training or run your own evaluation. The configuration files for the respective models can be found under configuration.
Ensure that you have Python installed on your system (Python 3.8+ recommended). Install all required dependencies by running:
pip install -r requirements.txtThis will install all necessary Python packages as specified in the requirements.txt file.
The project uses YAML files for configuration to specify parameters for training and testing. Modify these files to adjust various parameters like dataset paths, network architecture, learning rates, etc. Alternatively, you can adjust all of the parameters through command line arguments.
Example configuration files are located in the /configuration directory:
train_modulated_siren.ymlfor training setup.test_modulated_siren.ymlfor testing setup.
This section provides a comprehensive guide on preparing the dataset for both training and testing purposes. This is only necessary before running for the first time. Follow the steps below to ensure your dataset is correctly set up and ready for use.
- Initially, the script is configured with a placeholder path for the dataset.
- Action Required: Update the script with the actual path where your dataset is located. This ensures the script can access and process the dataset correctly.
- The dataset preparation involves applying masks to the data. These masks are defined by a list of tuples, with each tuple containing two key parameters:
- Center Fraction: Specifies the fraction of low-frequency k-space data to retain.
- Acceleration: Determines the rate at which data is undersampled.
- Action Required: Add the mask parameters to the script. Each tuple in the list specifies one mask configuration to be applied.
- The script processes .h5 files in the dataset by iterating over each scan.
- Each image within a scan is treated as a separate entity and is transformed from k-space to image space and normalized to [0,1].
- The transformed images are saved in the specified location, ready for further processing or training.
- Upon completion of the data transformation process, a CSV file is generated.
- This CSV file contains essential metadata about the created files, including file locations.
- Utility: The CSV file serves as a directory, enabling efficient file location, filtering, and access during training or testing phases.
- Ensure all the specified paths and parameters in the script are correctly set before running the dataset preparation process.
- The prepared dataset, now in image space and accompanied by a comprehensive CSV file, is ready for use in your machine learning models for training and testing purposes.
python preprocessing_script.py -p <path to folder with the original data>To train a custom autoencoder you the train_encoder.py script. The basic configuration is already set up in the train_encoder.yml file. You can adjust the parameters in the file. The parameters are like the ones used for training the SIREN network.
If you don't want to train your own encoder, or want to use the VGG encoder, you can download the used encoders from here.
To run the application, use one of the two main scripts. You can specify a custom configuration file or use the provided examples.
Training:
python train_mod_siren.py --config configuration/train_modulated_siren.ymlIn order to ensure training works, it is necessary to specify the correct encoder path. You can see visualizations of the training by running tensorboard --logdir={output_folder}/runs. Because of the complicated tiling and processing work necessary, we opted to load all the data samples into memory. If that is not possible, you can switch the dataset implementation to the MRIDatasetLowMemory class in src/data/mri_dataset.py.
Testing:
python test_mod_siren.py --config configuration/test_modulated_siren.ymlWhen testing, make sure to specify the correct folder and model path. Then, a test subfolder will be automatically generated in the target folder. You can sepcify the number of visual samples in the configuration file. In addition, you can specify test files if you want to visualize specific files. You can also add the number of samples where the metrics are calculated on. When no number of metrics samples is specified, the whole dataset is used.
There is a number of configuration parameters for both training and testing. They all have default values, that can be found src/configuration/configuration.py. Below, you can find a list of all parameters you can modify.
- model
dim_in: Input dimension of the model.dim_hidden: Dimension of hidden layers.dim_out: Output dimension of the model.latent_dim: Dimension of the latent space.num_layers: Number of layers in the model.w0: The omega_0 parameter for SIREN.w0_initial: Initial value of omega_0.use_bias: Boolean flag to use bias in layers.dropout: Dropout rate.encoder_type: Type of encoder used. Available options are 'vgg' or 'custom'encoder_path: Path to a custom encoder model.outer_patch_size: Size of the outer patch in the input.inner_patch_size: Size of the inner patch in the input.siren_patch_size: Size of the patch the siren network is actually trained on.activation: The type of activation functions used. Options aresineandmorlet.
-
data
traindataset: Path to the training dataset.num_samples: Number of samples to use from the training dataset.mri_type: Type of MRI images (e.g., FLAIR).num_workers: Number of workers for data loading.
valdataset: Path to the validation dataset.num_samples: Number of samples to use from the validation dataset.
acceleration: Acceleration rate used for the data.center_fraction: Fraction of the center that is sampled.
-
training
lr: Learning rate.batch_size: Batch size.epochs: Total number of epochs to train.output_dir: Directory to save output files.output_name: Base name for output files.optimizer: Type of optimizer to use (Adam,SGD, etc.).logging: Specify if tensorboard should be turned on or off.criterion: Specify which criterion to use, options areMSE,Perceptual, andEdge.modelcontinue_training: Boolean to indicate whether to continue training from a previous checkpoint.model_path: Path to the model checkpoint for resuming training. If not specified but continue training, the last model with the same name is searched.optimizer_path: Path to the optimizer checkpoint. If not specified but continue training, the last model with the same name is searched.
-
data
dataset: Path to the testing dataset.visual_samples: Number of samples that are visualized.metric_samples: Number of samples where metrics are calculated on.test_files: Specific files to test within the dataset.acceleration: Acceleration rate used for the data.center_fraction: Fraction of the center that is sampled.
-
testing
output_dir: Base directory of the output files.output_name: Specific folder within the base directory.model_path: Path to the model file for testing.
All output files, including saved models and reconstructed images, are stored in a subdirectory within the output_name directory specified by the output_dir argument within the configuration file. This allows for easy organization and retrieval of results from different runs. You can find a folder for model checkpoints, snapshots of the current training and validation results, tensorboard logs, a copy of the configuration file, information on which files you trained the model on, and a progress overview. When testing, you will get an additional test folder, which then includes visual samples, a CSV file, and a summary of the results.
- Mehta, I., Gharbi, M., Barnes, C., Shechtman, E., Ramamoorthi, R., & Chandraker, M. (2021). Modulated Periodic Activations for Generalizable Local Functional Representations. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 14194-14203.
- https://github.com/lucidrains/siren-pytorch
- https://fastmri.med.nyu.edu