Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: tests

on:
push:
branches: [main]
pull_request:

jobs:
unit-tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.12"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Byte-compile common.smk
# common.smk is plain Python and carries the memory/length logic.
run: python -m py_compile workflow/rules/common.smk
- name: Run resource/length unit tests
# The suite loads common.smk directly and needs no third-party deps;
# the SLURM-plugin integration test self-skips when the plugin is absent.
run: python test/test_memory_resources.py
108 changes: 108 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,27 @@ you hit these.

- **Restrict to one model** with `structure_inference_gpu_model` (e.g. `"A100"`) → the plugin emits
`--gpus=<model>:<count>`. Accepts a single model name; leave `""` for any.
- **Route by complex size (VRAM)** with `structure_inference_gpu_tiers` → list your GPU pool as
tiers of `{min_vram_gb, nodes}`. A complex's estimated peak VRAM (≈ `per_token_sq·N²`) selects the
smallest tier that fits and all *smaller*-GPU nodes are excluded, so the job runs on **any** GPU at
or above that tier — using the whole pool, not one pinned model. A complex larger than every tier
uses the biggest tier and spills to host RAM via unified memory.

```yaml
# Example for EMBL gpu-el8 — replace nodes with your cluster's (nothing is hard-coded):
structure_inference_gpu_vram_headroom: 1.0 # <1.0 tolerates that fraction of host spill
structure_inference_gpu_tiers:
- {min_vram_gb: 24, nodes: "gpu21,gpu22,gpu29,gpu30,gpu31,gpu32,gpu33,gpu34,gpu35,gpu36,gpu37"}
- {min_vram_gb: 40, nodes: "gpu25,gpu26,gpu27,gpu28"}
- {min_vram_gb: 48, nodes: "gpu40,gpu41,gpu42,gpu43,gpu44,gpu45,gpu46,gpu47,gpu48"}
- {min_vram_gb: 80, nodes: "gpu38,gpu39"}
```

When set this drives `--exclude` per job and **overrides** `structure_inference_gpu_model` (the two
would conflict). It's the practical "fit to GPU" lever: requested host RAM is a separate pool and
does not size GPU VRAM, but excluding too-small GPUs by length does. Use explicit comma node lists
(bracket ranges may be glob-expanded by the shell). Multi-partition routing (e.g. EMBL's bigger
`gpu-training` cards) is out of scope — keep one partition and let unified memory spill the tail.
- **Exclude specific nodes** with `slurm_exclude_nodes` → passed verbatim to `sbatch --exclude`
(e.g. `"gpu50,gpu51"`). Use it for nodes whose GPU the container can't use — e.g. a CUDA compute
capability newer than the container's bundled `ptxas` (fails `ptxas too old` / `UNIMPLEMENTED`).
Expand Down Expand Up @@ -301,6 +322,93 @@ exactly what the fraction is sized against.

</details>

<details>
<summary>Length-aware memory requests (sized automatically from the input sequences)</summary>

Host RAM for both compute stages is requested **from the input sequence length**, so big
complexes get enough memory on the first attempt instead of failing and climbing the retry
ladder, while small jobs are not over-provisioned. The request is computed at scheduling
time by reading the per-chain FASTA(s) the pipeline already stages under
`<output_directory>/data/`:

```
create_features mem = safety * (feature_create_ram_bytes + per_residue * seq_len)
structure_inference mem = safety * (structure_inference_ram_bytes + per_token_sq * N^2)
```

- `seq_len` is the query length; `N` is the **total residues of the complex** (the
AlphaFold token count, summed over chains and copy numbers). AlphaFold's pair
representation is `O(N^2)`, hence the quadratic inference term.
- **The coefficients default by backend** (selected from `--data_pipeline` / `--fold_backend`).
AlphaFold-Multimer (AF2) is heavier than AlphaFold 3 — measured AF2 inference host RSS was
~4× higher than AF3 at the same complex size, and AF2's feature stage runs HHblits (the
main OOM source), whereas the AF3 pipeline is lighter. Defaults:

| backend | feature base | feature /residue | inference base | inference /N² |
|---|---|---|---|---|
| `alphafold2` | 64000 MB | 40 MB | 24000 MB | 0.0055 |
| `alphafold3` | 40000 MB | 25 MB | 16000 MB | 0.0045 |

The AF3 inference quadratic is sized to the observed GPU-VRAM demand so that, with unified
memory, the host spill ceiling (`host_mem / gpu_vram`) covers large complexes instead of
OOM-ing.
- The first attempt already includes `mem_safety_factor` (default `1.25`) of head-room.
**OOM retries still escalate** on top, multiplying by `..._ram_scaling ** (attempt - 1)`,
so a bad estimate self-heals.
- Override any backend default by setting the matching key in `config/config.yaml`
(`feature_create_ram_bytes`, `feature_create_ram_per_residue_mb`,
`structure_inference_ram_bytes`, `structure_inference_ram_per_token_sq_mb`); an explicit
value applies to all backends. Also tune `mem_safety_factor`, the `..._ram_scaling`
factors, `structure_inference_runtime_minutes`, and `max_mem_mb` (set it to your largest
node's RAM where an over-estimate would otherwise never schedule; `0` = no cap).
- The `..._ram_bytes` keys are the **fixed base** of each model rather than a flat request;
raising a base only raises the floor. Setting `per_residue`/`per_token_sq` to `0`
reproduces the old length-blind behaviour (a flat base × retry scaling).
- **Precomputed features:** when a chain is supplied via `feature_directory`, no
`data/<chain>.fasta` is generated. Length is then recovered from the precomputed
`<chain>_af3_input.json` (AF3) or from the parse-time length cache written by the length
filter below (covers AF2 too). If neither is available the job falls back to the base
allocation plus retry escalation. AF3 ligand atoms are not counted (no sequence), a small
undercount absorbed by the safety margin.

</details>

<details>
<summary>Skipping over-large complexes (length filtering)</summary>

Folds that are too large to be worth submitting are **skipped before any job is created**,
so a single oversized complex (or one giant chain) doesn't waste a GPU/feature allocation
that will only OOM. Two configurable limits (in `config/config.yaml`):

```yaml
# Max TOTAL complex length (sum of all chains), per backend — selected by --fold_backend.
max_total_length_alphafold2: 5000 # AF2-Multimer
max_total_length_alphafold3: 7000 # AF3 handles larger inputs
# max_total_length: 6000 # optional single override for both backends
# Max length of any SINGLE protein; 0 = off (issue #33). A protein over this drops every
# fold containing it, so it is never even downloaded.
max_protein_length: 0
length_filter_fetch_uniprot: true # set false for fully offline runs
```

- Lengths are resolved at **parse time** from, in order: a local FASTA, an
already-downloaded `data/<id>.fasta`, the persistent cache
`<output_directory>/.sequence_lengths.tsv`, and finally the UniProt REST API (cached for
next time). Set a limit to `0` to disable it; if both are `0`, no resolution/fetching
happens at all.
- Skipped folds are listed with reasons in `<output_directory>/skipped_folds.tsv` and logged
as a `[length-filter]` warning. **Unknown lengths fail open** (the fold is kept), so a
UniProt outage never silently drops work.
- First parse of a large all-UniProt sheet will fetch each unique length once (cached
afterwards); already-downloaded inputs and local FASTAs are read without any network call.
- **Applies to every profile, including local/workstation runs** (it runs during workflow
parsing, not in the executor). It's the only length-aware feature that does — the memory
and GPU-routing settings are SLURM resources that local runs ignore. To attempt a complex
larger than the caps on a big workstation, raise or zero the `max_total_length_*` values
(and set `length_filter_fetch_uniprot: false` for offline use).

</details>

### Using precomputed features

If you have precomputed protein features, specify the directory:
Expand Down
80 changes: 76 additions & 4 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,65 @@ analyze_structure_arguments:

# Memory allocation for feature creation and structure inference.
# NOTE: despite the "_bytes" suffix these values are in MEGABYTES (used directly as
# the SLURM --mem request), so 64000 = 64000 MB ~= 64 GB. They scale with retries.
feature_create_ram_bytes: 64000 # MB
feature_create_ram_scaling: 1.1
structure_inference_ram_bytes: 64000 # MB
# the SLURM --mem request), so 64000 = 64000 MB ~= 64 GB.
#
# Host RAM for both stages is sized automatically from the input sequence length
# with a safety margin, and still escalates on OOM retries. The requested memory is:
# create_features : safety * (feature_create_ram_bytes + per_residue * seq_len)
# structure_inference : safety * (structure_inference_ram_bytes + per_token_sq * N^2)
# where seq_len is the query length, N is the total residues of the complex, and the
# request is multiplied by (scaling ** (attempt - 1)) on each retry. AlphaFold's pair
# representation is O(N^2), hence the quadratic term for inference.
#
# The base / per-length coefficients DEFAULT BY BACKEND (AF2 is heavier than AF3),
# selected automatically from --data_pipeline / --fold_backend:
# feature base feature /res infer base infer /N^2
# alphafold2 (AF2 multimer): 64000 MB 40 MB 24000 MB 0.0055
# alphafold3 (AF3): 40000 MB 25 MB 16000 MB 0.0045
# Leave the keys below commented to use those backend defaults. Uncomment any key
# to override it for ALL backends.

# Safety margin applied to the first-attempt estimate of every length-aware stage.
mem_safety_factor: 1.25
# Optional hard ceiling (MB) on any single memory request; 0 = no cap. Set this to
# your largest node's RAM on clusters where an over-estimate would never schedule.
max_mem_mb: 0

# create_features (CPU/MSA) — base is database/MSA-tool dominated (AF2 HHblits is the
# main OOM source), with a mild linear dependence on query length.
# feature_create_ram_bytes: 64000 # MB, fixed base (default: backend-specific)
# feature_create_ram_per_residue_mb: 30 # MB per query residue (default: backend-specific)
feature_create_ram_scaling: 1.1 # per-retry escalation on OOM

# structure_inference (GPU host RAM) — base plus a quadratic term in complex size.
# With unified memory enabled the XLA spill fraction is derived from this host
# allocation, so this also sizes the effective GPU memory ceiling.
# structure_inference_ram_bytes: 24000 # MB, fixed base (default: backend-specific)
# structure_inference_ram_per_token_sq_mb: 0.0045 # MB per residue^2 (default: backend-specific)
structure_inference_ram_scaling: 1.1 # per-retry escalation on OOM
# Wall-time minutes per attempt for structure_inference (capped by
# structure_inference_max_runtime). Default 1440; AF3 on an adequate GPU finishes far
# sooner, but host-memory spilling can take many hours, so the default stays generous.
# structure_inference_runtime_minutes: 1440

# Length filtering: skip folds that are too large to be worth submitting. Lengths
# are resolved at parse time (local FASTA / already-downloaded data / cache / the
# UniProt REST API) and cached in <output_directory>/.sequence_lengths.tsv; skipped
# folds are listed in <output_directory>/skipped_folds.tsv. Unknown lengths fail
# open (the fold is kept). Set a limit to 0 to disable it.
#
# Max TOTAL complex length (sum of all chains, residues), per backend; selected by
# --fold_backend. AF3 handles larger inputs than AF2-Multimer.
max_total_length_alphafold2: 5000
max_total_length_alphafold3: 7000
# Optional single override applied to BOTH backends (takes precedence if set):
# max_total_length: 6000
# Max length of any SINGLE protein (residues); 0 disables (issue #33). A protein
# over this drops every fold containing it, so it is never downloaded/predicted.
max_protein_length: 0
# Whether parse-time length resolution may query the UniProt REST API for IDs that
# have no local FASTA / cached length yet. Set false for fully offline runs.
length_filter_fetch_uniprot: true

# Number of threads for AlphaFold inference
alphafold_inference_threads: 8
Expand All @@ -67,6 +122,23 @@ structure_inference_gpus_per_task: 1
# Restrict structure_inference to one GPU model (sbatch --gpus=<model>:N), e.g. "3090".
# Leave "" to let SLURM pick any GPU in the partition.
structure_inference_gpu_model: "3090"
# Optional length-based GPU routing by VRAM (within this partition). List your GPU
# pool as tiers of {min_vram_gb, nodes}; a complex's estimated peak VRAM
# (~ structure_inference_ram_per_token_sq * N^2) selects the smallest tier that fits
# and all SMALLER-GPU nodes are excluded, so the job runs on ANY GPU >= that tier
# (the whole pool, not one pinned model). A complex larger than every tier uses the
# biggest tier and spills to host RAM via unified memory. When set, this drives
# --exclude and OVERRIDES structure_inference_gpu_model. Use explicit comma node lists
# (avoid bracket ranges, which the shell may glob). The example below is for the EMBL
# gpu-el8 partition - replace nodes with your cluster's; nothing is hard-coded.
# structure_inference_gpu_vram_headroom: 1.0 # <1.0 tolerates that fraction of host spill
# structure_inference_gpu_tiers:
# - {min_vram_gb: 24, nodes: "gpu21,gpu22,gpu29,gpu30,gpu31,gpu32,gpu33,gpu34,gpu35,gpu36,gpu37"} # RTX 3090
# - {min_vram_gb: 40, nodes: "gpu25,gpu26,gpu27,gpu28"} # A100 40GB
# - {min_vram_gb: 48, nodes: "gpu40,gpu41,gpu42,gpu43,gpu44,gpu45,gpu46,gpu47,gpu48"} # L40s/A40 48GB
# - {min_vram_gb: 80, nodes: "gpu38,gpu39"} # H100 PCIe 80GB
# Note: RTX PRO 6000 (gpu50-53, 96GB) are ptxas-incompatible -> keep in slurm_exclude_nodes.
# H100-SXM/H200/B200 live on the separate gpu-training partition (not routed here).
# Optional: comma-separated nodes to keep structure_inference OFF, passed to sbatch
# as --exclude. Useful for GPUs the prediction container cannot use (e.g. a CUDA
# compute capability the bundled ptxas is too old for). Example:
Expand Down
Loading
Loading