Forgather is a configuration-driven ML framework that uses template inheritance and code generation to eliminate configuration duplication and enable systematic experimentation. Instead of copying and modifying entire config files, you inherit from base templates and specify only what changes.
Key Benefits:
- No Config Duplication - Inherit and override instead of copy-paste
- Types as Hyperparameters - Change optimizers, models, datasets in config files
- Full Reproducibility - Automatic snapshots of code and configs with each run
- Pipeline Parallel Trainer - Includes Pipeline Parallel trainer, optmized for training on consumer GPUs
- Extensible Trainers - Easily extensible trainer implementation for modification and experimentation
- Dynamic Models Library - Define and customize model architectures entirely through configurtion files
- Templates Library - Extensive templates library for tokenizers, models, trainers, datasets, etc.
- Dec 14 Multiple new features:
- Forgather's models now work with vLLM, in both Tensor and Pipeline parallel mode. See documentation.
- Added support for fused-linear-causal-loss, which significantly reduces peak memory requirements for training models with large vocabularies. example usage. We support the following implementations: Liger, CCE, PyTorch compiled
- Added a Triton implementation of Forgather's Adafactor Optimizer. This reduces peak memory further and speeds up training.
- Enabled support for loading models with
device_map="auto", which allows our inference server to shard models across multiple GPUs.
- Nov 17, Completed a major overhaul of the model conversion tool and added support for Mistral, Qwen3 models, and Llama models with RoPE scaling and tied word embeddings.
- Nov 9, OpenAssistant Dataset - High-quality example demonstrating how to build custom datasets that dynamically generate examples on-the-fly. Features quality-weighted sampling from conversation trees, sequence packing, multi-language support, and deterministic generation. Includes complete Python examples and extensive documentation. There is also a demo finetune project.
- Nov 4, Added support for packed sequences and Flex Attention. Updating Samantha tutorial to demonstrate. Models now support KV cache.
- Oct 21, H.P. Lovecraft Project - Learn how to create workspaces and projects, while training a model to summon the Elder Gods. You can perform full-finetuning (not LoRA) on a 7B model, with a context length of up to 16K on a single 24 GB GPU!
- Oct 19, Samantha -- New tutorial on how to perform full finetuning on a 7B parameter model on a single, 24 GB, GPU on the "Samantha" dataset -- she believes she is sentient!
- Torch Titan integration -- Use Forgather to configure Torch Titan
1. Install Forgather:
# Requires python >= 3.10
# Setup python virtual envrionment for install
# You can also use conda or whatever you are most comfortable with.
python3 -m venv /path/to/new/venv
# Activate the virtaul environment
source /path/to/new/venv/bin/activate
git clone https://github.com/jdinalt/forgather.git
cd forgather
pip install -e .
# Verify install works with CLI
forgather ls -rNote: We are using bleeding-edge PyTorch features, like flex-attention, which require PyTorch 2.9.0. If you are updating from a previous install, run 'pip install -e .' again to force uprading to the latest libraries. If in doubt, nuke your venv and rebuild it.
Flex attention also depends upon having a working C compiler and python development packages installed.
sudo apt-get install build-essential python3-dev2. Try a tutorial project:
See: ./examples/tutorials/tiny_llama/project_index.ipynb
Or, from the comamand-line...
# Optional
forgather -i # Start interactive Forgather shell
forgather ls -r # List all forgather projects
cd examples/tutorials/tiny_llama
forgather index # Show project summary
forgather ls # List available configs
forgather -t train_tiny_llama.yaml pp | less # Show pre-processed configuration
forgather -t train_tiny_llama.yaml train # Train model
3. Monitor and control:
forgather -t train_tiny_llama.yaml tb # Start Tensorboard
forgather control list # List running traininig jobs
forgather control status JOB_ID # Get status of training job
fogrgater control [stop|abort|save] JOB_ID # Control training jobs4. Test Model Inference:
# Start inference server
forgather inf server -c -m /path/to/model
# Perform text completion on prompt
forgather inf client --completion "Once upon a time"That's it! You've just trained a small language model using Forgather's template system.
Create new experiments by inheriting from existing configs and specifying only the differences:
-- extends 'base_experiment.yaml'
[optimizer]
== super()
lr: 1.0e-3 # Only change learning rateUse any Python class or function directly in configs:
optimizer: !partial:torch.optim.AdamW
lr: 1.0e-3
weight_decay: 0.01
[layer_factory]
# Experiment: Switch from PreLayerNorm to PostLayerNorm
layer_factory: &layer_factory !partial:.post_ln_layer:PostLNLayer@layer_factory
feedforward_factory: *feedforward_factory
attention_factory: *attention_factory
norm_factory: *layer_norm_factory
dropout: !var "layer_dropout"
residual_dropout: !var "residual_dropout"Models are generated as standalone Python code with no framework dependencies:
- Trainer: Fast single-GPU training for small models.
- AccelTrainer: Multi-GPU with Accelerate
- PipelineTrainer: Pipeline parallelism
- Custom Optimizers: AdamW, AdaFactor, GaLore, Apollo
cd examples/tutorials/tiny_llama/- Train a small language model from scratchproject_composition/- Template inheritance patternsdynamic_lm/- Dynamic model constructionprojects_overview/- Overview of Forgather projects
cd forgather
# List all example projects and configurations
forgather ls -r
# cd to example project directory
cd examples/...
# Show project info
forgather indexRun the interactive shell.
forgather -iEvery Forgather experiment is a Project with this structure:
my_project/
├── meta.yaml # Project metadata
├── templates/
│ ├── project.yaml # Main template
│ └── configs/ # Experiment configs
├── output_models/ # Generated code & results
└── project_index.ipynb # Interactive notebook
Forgather uses Jinja2 + YAML with custom syntax:
-- extends 'template.yaml'- Template inheritance[block_name]- Override sections-- set ns.var = value- Set variables!partial:module:Class- Partial function construction!factory:module:Class- Factory construction!var "variable_name"- Variable references#---- inline.template.name ----- Split document into multiple templates
See Syntax Reference
Templates → YAML → Node Graph → Python Code → Executable Objects
Each step can be inspected:
forgather -t config.yaml pp # Preprocess with Jinja2 to YAML
forgather -t config.yaml graph --format yaml # Parsed node graph
forgather -t config.yaml targets # List constructable objects in graph
forgather -t config.yaml code [--target <target>] # [optional] Equivalent Python code for target
forgather -t config.yaml construct [--target <target>] [--call] # Materialize and show constructed objectsrc/forgather/- Core frameworkproject.py- Project managementconfig.py- Template processingcodegen.py- Python code generationml/- Training infrastructure
templatelib/- Reusable templatesbase/- Abstract base templatesexamples/- Common models, datasets, tokenizers
modelsrc/- Modular model components library
examples/tutorials/- Learning materialsexamples/tiny_experiments/- Example experimentsexamples/standalone/- Self-contained projectsexamples/template_project- Starting point for new projects.
# Generate command to run "my_experiment.yaml" on GPUs 0 and 1
# Print command, but don't execute it.
forgather -t my_experiment.yaml train -d "0,1" --dry-run
# Start Tensorboard to monitor progress on all models in project; bind to all ports.
forgather tb --all -- --bind_all
# Show running training jobs -- which can be controlled via the CLI
forgather control listForgather is actively developed and welcomes contributions:
- Bug Reports & Feature Requests: Open GitHub issues
- Code Contributions: Submit pull requests
- Documentation: Improve tutorials and examples
- Community: Share your experiments and templates