From a0fd395c179b75e51afe3b83d87a1c3ad514d605 Mon Sep 17 00:00:00 2001
From: Ross Wightman <rwightman@gmail.com>
Date: Tue, 16 Dec 2025 10:19:19 -0800
Subject: [PATCH 1/2] Add HParams sections to hfdocs

---
 hfdocs/source/_toctree.yml |  4 +++-
 hfdocs/source/hparams.mdx  | 31 +++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 hfdocs/source/hparams.mdx

diff --git a/hfdocs/source/_toctree.yml b/hfdocs/source/_toctree.yml
index 1ac8d102f6..f1768ed618 100644
--- a/hfdocs/source/_toctree.yml
+++ b/hfdocs/source/_toctree.yml
@@ -11,8 +11,10 @@
 - sections:
   - local: feature_extraction
     title: Using Pretrained Models as Feature Extractors
+  - local: hparams
+    title: Hyper-Parameters (HParams)
   - local: training_script
-    title: Training With The Official Training Script
+    title: Using The Official Training Script
   - local: hf_hub
     title: Share and Load Models from the 🤗 Hugging Face Hub
   title: Tutorials
diff --git a/hfdocs/source/hparams.mdx b/hfdocs/source/hparams.mdx
new file mode 100644
index 0000000000..23c1e54287
--- /dev/null
+++ b/hfdocs/source/hparams.mdx
@@ -0,0 +1,31 @@
+Over the years, many `timm` models have been trained with various hyper-parameters as the libraries and models evolved. I don't have a record of every instance, but have recorded instances of many that can serve as a very good starting point.a
+
+Most `timm` trained models have an identifier in their pretrained tag that relates them (roughly) to a family / version of hparams I've used over the years.
+
+| Tag(s) | Description | Optimizer | LR Schedule | Other Notes |
+|--------|-------------|-----------|-------------|-------------|
+| `a1h` | Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `A1` recipe | LAMB | Cosine with warmup | Stronger dropout, stochastic depth, and RandAugment than paper `A1` recipe |
+| `ah` | Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `A1` recipe | LAMB | Cosine with warmup | No CutMix. Stronger dropout, stochastic depth, and RandAugment than paper `A1` recipe |
+| `a1`, `a2`, `a3` | ResNet Strikes Back `A{1,2,3}` recipe | LAMB with BCE loss | Cosine with warmup | — |
+| `b1`, `b2`, `b1k`, `b2k` | Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `B` recipe (equivalent to `timm` `RA2` recipes) | RMSProp (TF 1.0 behaviour) | Step (exponential decay w/ staircase) with warmup | — |
+| `c`, `c1`, `c2`, `c3` | Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `C` recipes | SGD (Nesterov) with AGC | Cosine with warmup | — |
+| `ch` | Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `C` recipes | SGD (Nesterov) with AGC | Cosine with warmup | Stronger dropout, stochastic depth, and RandAugment than paper `C1`/`C2` recipes |
+| `d`, `d1`, `d2` | Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `D` recipe | AdamW with BCE loss | Cosine with warmup | — |
+| `sw` | Based on Swin Transformer train/pretrain recipe (basis of DeiT and ConvNeXt recipes) | AdamW with gradient clipping, EMA | Cosine with warmup | — |
+| `ra`, `ra2`, `ra3`, `racm`, `raa` | RandAugment recipes. Inspired by EfficientNet RandAugment recipes. Covered by `B` recipe in [ResNet Strikes Back](https://arxiv.org/abs/2110.00476). | RMSProp (TF 1.0 behaviour), EMA | Step (exponential decay w/ staircase) with warmup | — |
+| `ra4` | RandAugment v4. Inspired by MobileNetV4 hparams. | - |
+| `am` | AugMix recipe | SGD (Nesterov) with JSD loss | Cosine with warmup | — |
+| `ram` | AugMix (with RandAugment) recipe | SGD (Nesterov) with JSD loss | Cosine with warmup | — |
+| `bt` | Bag-of-Tricks recipe | SGD (Nesterov) | Cosine with warmup | — |
+
+I've collected several of the hparam families in a series of gists. These can be downloaded and used with the `--config hparam.yaml` argument with the `timm` train script. Some adjustment is always required for the LR vs effective global batch size.
+
+| Tag | Key Model Architectures | Gist Link |
+|-----|------------------------|-----------|
+| `ra2` | ResNet, EfficientNet, RegNet, NFNet | [Link](https://gist.github.com/rwightman/07839a82d0f50e42840168bc43df70b3) |
+| `ra3` | RegNet | [Link](https://gist.github.com/rwightman/37252f8d7d850a94e43f1fcb7b3b8322) |
+| `ra4` | MobileNetV4 | [Link](https://gist.github.com/rwightman/f6705cb65c03daeebca8aa129b1b94ad) |
+| `sw` | ViT, ConvNeXt, CoAtNet, MaxViT | [Link](https://gist.github.com/rwightman/943c0fe59293b44024bbd2d5d23e6303) |
+| `sbb` | ViT | [Link](https://gist.github.com/rwightman/fb37c339efd2334177ff99a8083ebbc4) |
+| — | Tiny Test Models | [Link](https://gist.github.com/rwightman/9ba8efc39a546426e99055720d2f705f) |
+

From c1425cad036f5e9399824236b9fe88488b0032be Mon Sep 17 00:00:00 2001
From: Ross Wightman <rwightman@gmail.com>
Date: Tue, 16 Dec 2025 10:25:42 -0800
Subject: [PATCH 2/2] Update hparams.mdx

---
 hfdocs/source/hparams.mdx | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hfdocs/source/hparams.mdx b/hfdocs/source/hparams.mdx
index 23c1e54287..906beaafec 100644
--- a/hfdocs/source/hparams.mdx
+++ b/hfdocs/source/hparams.mdx
@@ -1,5 +1,7 @@
-Over the years, many `timm` models have been trained with various hyper-parameters as the libraries and models evolved. I don't have a record of every instance, but have recorded instances of many that can serve as a very good starting point.a
+# HParams
+Over the years, many `timm` models have been trained with various hyper-parameters as the libraries and models evolved. I don't have a record of every instance, but have recorded instances of many that can serve as a very good starting point.
 
+## Tags
 Most `timm` trained models have an identifier in their pretrained tag that relates them (roughly) to a family / version of hparams I've used over the years.
 
 | Tag(s) | Description | Optimizer | LR Schedule | Other Notes |
@@ -18,6 +20,7 @@ Most `timm` trained models have an identifier in their pretrained tag that relat
 | `ram` | AugMix (with RandAugment) recipe | SGD (Nesterov) with JSD loss | Cosine with warmup | — |
 | `bt` | Bag-of-Tricks recipe | SGD (Nesterov) | Cosine with warmup | — |
 
+## Config File Gists
 I've collected several of the hparam families in a series of gists. These can be downloaded and used with the `--config hparam.yaml` argument with the `timm` train script. Some adjustment is always required for the LR vs effective global batch size.
 
 | Tag | Key Model Architectures | Gist Link |
@@ -28,4 +31,3 @@ I've collected several of the hparam families in a series of gists. These can be
 | `sw` | ViT, ConvNeXt, CoAtNet, MaxViT | [Link](https://gist.github.com/rwightman/943c0fe59293b44024bbd2d5d23e6303) |
 | `sbb` | ViT | [Link](https://gist.github.com/rwightman/fb37c339efd2334177ff99a8083ebbc4) |
 | — | Tiny Test Models | [Link](https://gist.github.com/rwightman/9ba8efc39a546426e99055720d2f705f) |
-