This repo provides a full setup for DLMs.
The setup is based on:
- Sisyphus workflow manager. Job classes define individual workflow steps, such as training a model, or evaluating a model, or preparing data, etc. Sisyphus manages dependencies between jobs, and can run jobs locally or on a cluster (e.g. via Slurm).
- Existing Sisyphus recipes / job classes from i6_core, i6_experiments.
- RETURNN.
The main deep learning framework used in this setup,
based on PyTorch,
providing
Tensorwith named dimensions (Dimobjects), and many high-level modules for Transformers and other neural network architectures. We derive our code (modeling, training, decoding, recipes) from existing example code.
In the setup directory (see setup below), run:
python ./sis manager recipe/denoising_lm/sis_recipe/<recipe_name>.pywhere <recipe_name> is one of the recipes provided in src/denoising_lm/sis_recipe/.
The setup directory should have the following structure after following the setup instructions below:
<setup_name>
├── README.md
├── .gitignore
├── sis -> tools/sisyphus/sis
├── work -> <some work directory on a fast file system> # all jobs live here
├── alias/ # created and populated by sisyphus. contains symlinks to jobs
├── output/ # created and populated by sisyphus. contains symlinks to job outputs
├── tools/
│ ├── sisyphus/ # git clone of https://github.com/rwth-i6/sisyphus
│ └── returnn/ # git clone of https://github.com/rwth-i6/returnn
├── repos/
│ └── denoising_lm/ # git clone of <denoising lm repo>
├── recipe/
│ ├── denoising_lm -> ../repos/denoising_lm/src/denoising_lm
│ ├── i6_core/ # git clone of https://github.com/rwth-i6/i6_core
│ ├── i6_experiments/ # git clone of https://github.com/rwth-i6/i6_experiments
│ └── returnn_common/ # git clone of https://github.com/rwth-i6/returnn_common
└── ...
- Create the new setup folder in your user directory, typically like
~/setups/<setup_name>or~/experiments/<setup_name>. This will be your new setup root directory.
mkdir ~/experiments/<setup_name>
cd ~/experiments/<setup_name>Optional: The setup directory should be a Git repo itself, to keep track of the changes. You can do now:
git init .
edit README.md # write a short description about your setup
git add README.md
git commit . -m initial
# some initial content for gitignore
cat << EOF > .gitignore
/output
/alias
/tools
/recipe
.*.swp
*.pyc
__pycache__
.idea
*.history*
.directory
EOF
git add .gitignore
git commit .gitignore -m gitignore- Create a new work folder under a "work" file system and link this as
workinto the Sisyphus setup root (~/experiments/<setup_name>).
mkdir -p /work/<project>/<username>/sisyphus_work_dirs/<setup_name>
ln -s /work/<project>/<username>/sisyphus_work_dirs/<setup_name> work- Create a recipe folder in the Sisyphus setup root (
~/experiments/<setup_name>) and clone the necessary recipe repositories:
mkdir recipe
git clone git@github.com:rwth-i6/i6_core.git recipe/i6_core
git clone git@github.com:rwth-i6/i6_experiments.git recipe/i6_experiments
git clone git@github.com:rwth-i6/returnn_common.git recipe/returnn_common
mkdir repos
git clone <denoising-lm-repo-url> repos/denoising_lm
ln -s repos/denoising_lm/src/denoising_lm recipe/denoising_lmIf the access is denied for the Github repositories, you need to add your public ssh key (usually ~/.ssh/id_rsa.pub) to your Github account.
This can be done by pasting the content (displayed with cat ~/.ssh/id_rsa.pub) into your Github key settings.
More information on adding keys to a Github account can be found here.
- Setup Sisyphus and other setup-wide tools.
mkdir tools
git clone git@github.com:rwth-i6/sisyphus.git tools/sisyphus
ln -s tools/sisyphus/sis
git clone git@github.com:rwth-i6/returnn.git tools/returnn- Add a
settings.pywith Sisyphus settings. For example:
import os
import sys
_root_dir = os.path.dirname(os.path.abspath(__file__))
RETURNN_PYTHON_EXE = sys.executable
RETURNN_ROOT = _root_dir + "/tools/returnn"
sys.path.insert(0, RETURNN_ROOT)
VERBOSE_TRACEBACK_TYPE = 'better_exchook'
USE_SIGNAL_HANDLERS = True
def engine():
from sisyphus.engine import EngineSelector
from sisyphus.localengine import LocalEngine
from sisyphus.simple_linux_utility_for_resource_management_engine import (
SimpleLinuxUtilityForResourceManagementEngine, # Slurm
)
# Other engines are available, see sisyphus documentation.
return EngineSelector(
engines={
"short": LocalEngine(cpus=8),
"local": LocalEngine(cpus=8, gpus=1),
"slurm": SimpleLinuxUtilityForResourceManagementEngine(
default_rqmt={"cpu": 4, "mem": "1G", "gpu": 0, "time": 1}
),
},
default_engine="local", # or "slurm" ...
)You should adapt this file. This file is loaded via sisyphus.global_settings.py, update_global_settings_from_file specifically. See Sisyphus documentation.
- Create some Python environment.
You can use
virtualenvorcondaor other tools. We tested our setup with PyTorch 2.5.1. Install missing dependencies:
numpy
torch
torchaudio
torchcodec
datasets
psutil
sentencepiece
dm-tree
h5py
flask
soundfile
librosa
resampy
ipython
better_exchook
lovely_tensors
background_zmq_ipython