Skip to content

Commit 44939e4

Browse files
committed
Format all files
1 parent 2c4d628 commit 44939e4

File tree

84 files changed

+11317
-1101
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+11317
-1101
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
**Changes**:
1212

1313
- Introducing **DockGen-E**, a new version of the DockGen benchmark dataset featuring enhanced biomolecular context for docking and co-folding predictions - namely, now all DockGen complexes represent the first (biologically relevant) bioassembly of the corresponding PDB structure
14-
- For the single-ligand datasets (i.e., Astex Diverse, PoseBusters Benchmark, and DockGen), now providing each baseline method with primary *and cofactor* ligand SMILES strings for prediction, to enhance the biomolecular context of these methods' predicted structures - as a result, for these single-ligand datasets, now the predicted ligand *most similar* to the primary ligand (in terms of both Tanimoto and structural similarity) is selected for scoring (which adds an additional layer of challenges for baseline methods)
14+
- For the single-ligand datasets (i.e., Astex Diverse, PoseBusters Benchmark, and DockGen), now providing each baseline method with primary _and cofactor_ ligand SMILES strings for prediction, to enhance the biomolecular context of these methods' predicted structures - as a result, for these single-ligand datasets, now the predicted ligand _most similar_ to the primary ligand (in terms of both Tanimoto and structural similarity) is selected for scoring (which adds an additional layer of challenges for baseline methods)
1515
- Updated Chai-1's inference code to commit `44375d5d4ea44c0b5b7204519e63f40b063e4a7c`, and ran it also with standardized (paired) MSAs
1616
- Replaced all AlphaFold 3 server predictions of each dataset's protein structures with predictions from AlphaFold 3's local inference code
1717

docs/.docs.environment.yaml

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -22,37 +22,37 @@ dependencies:
2222
- pdbfixer=1.9=pyh1a96a4e_0
2323
- python=3.10.14=hd12c33a_0_cpython
2424
- pip:
25-
- beartype
26-
- biopandas
27-
- biopython
28-
- docutils==0.20.1 # NOTE: currently required due to an `m2r2` bug: https://github.com/CrossNox/m2r2/issues/68
29-
- furo
30-
- ipython
31-
- lxml_html_clean
32-
- m2r2
33-
- matplotlib
34-
- nbsphinx
35-
- nbsphinx-link
36-
- nbstripout
37-
- pandas
38-
- pandoc
39-
- pdb4amber
40-
- pip
41-
- posebusters
42-
- prody
43-
- pydocstyle
44-
- pypdb
45-
- rdkit
46-
- rdkit-pypi
47-
- rootutils
48-
- sphinx
49-
- sphinx-copybutton
50-
- sphinx-inline-tabs
51-
- sphinx_mdinclude
52-
- sphinxext-opengraph
53-
- sphinxcontrib-gtagjs
54-
- sphinxcontrib-jquery
55-
- sphinx_codeautolink
56-
- wrapt_timeout_decorator
57-
- tqdm
58-
- watermark
25+
- beartype
26+
- biopandas
27+
- biopython
28+
- docutils==0.20.1 # NOTE: currently required due to an `m2r2` bug: https://github.com/CrossNox/m2r2/issues/68
29+
- furo
30+
- ipython
31+
- lxml_html_clean
32+
- m2r2
33+
- matplotlib
34+
- nbsphinx
35+
- nbsphinx-link
36+
- nbstripout
37+
- pandas
38+
- pandoc
39+
- pdb4amber
40+
- pip
41+
- posebusters
42+
- prody
43+
- pydocstyle
44+
- pypdb
45+
- rdkit
46+
- rdkit-pypi
47+
- rootutils
48+
- sphinx
49+
- sphinx-copybutton
50+
- sphinx-inline-tabs
51+
- sphinx_mdinclude
52+
- sphinxext-opengraph
53+
- sphinxcontrib-gtagjs
54+
- sphinxcontrib-jquery
55+
- sphinx_codeautolink
56+
- wrapt_timeout_decorator
57+
- tqdm
58+
- watermark

environments/chai_lab_environment.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,5 +178,4 @@ dependencies:
178178
- wcwidth==0.2.13
179179
- wrapt==1.16.0
180180
- yarl==1.12.1
181-
prefix:
182-
forks/chai-lab/chai-lab
181+
prefix: forks/chai-lab/chai-lab

environments/diffdock_environment.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
name:
2-
DiffDock
1+
name: DiffDock
32
channels:
43
- pyg
54
- pytorch
@@ -342,5 +341,4 @@ dependencies:
342341
- yarl==1.9.2
343342
- zope-event==5.0
344343
- zope-interface==6.0
345-
prefix:
346-
forks/DiffDock/DiffDock
344+
prefix: forks/DiffDock/DiffDock

environments/dynamicbind_environment.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,5 +261,4 @@ dependencies:
261261
- unicodedata2==15.0.0
262262
- urllib3==1.26.15
263263
- wheel==0.38.4
264-
prefix:
265-
forks/DynamicBind/DynamicBind
264+
prefix: forks/DynamicBind/DynamicBind

environments/fabind_environment.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,5 +160,4 @@ dependencies:
160160
- tzdata==2023.4
161161
- werkzeug==3.0.1
162162
- zipp==3.17.0
163-
prefix:
164-
forks/FABind/FABind
163+
prefix: forks/FABind/FABind

forks/DiffDock/README.md

Lines changed: 40 additions & 39 deletions
Large diffs are not rendered by default.

forks/DiffDock/app/README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,16 @@ app_file: main.py
99
pinned: false
1010
---
1111

12-
1312
## How to use this space
1413

1514
This is a simple app intended to showcase [DiffDock](https://github.com/gcorso/DiffDock).
1615
One can upload a protein and ligand, and calculate the predicted structure. The results are visualized in 3D and can be downloaded.
1716

18-
* This app is designed to take 1 protein (in PDB format) and 1 ligand (in SDF format) at a time. For bulk inference, use the [command line interface](https://github.com/gcorso/DiffDock).
17+
- This app is designed to take 1 protein (in PDB format) and 1 ligand (in SDF format) at a time. For bulk inference, use the [command line interface](https://github.com/gcorso/DiffDock).
1918

20-
* Our demonstration space uses a CPU, so it may take a few minutes to run. For faster results, use a GPU.
21-
One can duplicate this space (at their own expense) by selecting "⋮" -> "Duplicate this space" in the top right corner, and then selecting a GPU in the "Settings" tab.
19+
- Our demonstration space uses a CPU, so it may take a few minutes to run. For faster results, use a GPU.
20+
One can duplicate this space (at their own expense) by selecting "⋮" -> "Duplicate this space" in the top right corner, and then selecting a GPU in the "Settings" tab.
2221

23-
----------
22+
---
2423

2524
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

forks/DiffDock/environment.yml

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -10,31 +10,31 @@ dependencies:
1010
- pip
1111
# Need to install torch in order to build openfold, so install it first
1212
- pip:
13-
- --extra-index-url https://download.pytorch.org/whl/cu117
14-
- --find-links https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
15-
- torch==1.13.1+cu117
13+
- --extra-index-url https://download.pytorch.org/whl/cu117
14+
- --find-links https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
15+
- torch==1.13.1+cu117
1616
- pip:
17-
- --extra-index-url https://download.pytorch.org/whl/cu117
18-
- --find-links https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
19-
- dllogger @ git+https://github.com/NVIDIA/dllogger.git
20-
- e3nn==0.5.0
21-
- fair-esm[esmfold]==2.0.0
22-
- networkx==2.8.4
23-
- openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307
24-
- pandas==1.5.1
25-
- prody==2.2.0
26-
- prody==2.2.0
27-
- pybind11==2.11.1
28-
- rdkit==2022.03.3
29-
- scikit-learn==1.1.0
30-
- scipy==1.12.0
31-
- torch==1.13.1+cu117
32-
- torch-cluster==1.6.0+pt113cu117
33-
- torch-geometric==2.2.0
34-
- torch-scatter==2.1.0+pt113cu117
35-
- torch-sparse==0.6.16+pt113cu117
36-
- torch-spline-conv==1.2.1+pt113cu117
37-
- torchmetrics==0.11.0
17+
- --extra-index-url https://download.pytorch.org/whl/cu117
18+
- --find-links https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
19+
- dllogger @ git+https://github.com/NVIDIA/dllogger.git
20+
- e3nn==0.5.0
21+
- fair-esm[esmfold]==2.0.0
22+
- networkx==2.8.4
23+
- openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307
24+
- pandas==1.5.1
25+
- prody==2.2.0
26+
- prody==2.2.0
27+
- pybind11==2.11.1
28+
- rdkit==2022.03.3
29+
- scikit-learn==1.1.0
30+
- scipy==1.12.0
31+
- torch==1.13.1+cu117
32+
- torch-cluster==1.6.0+pt113cu117
33+
- torch-geometric==2.2.0
34+
- torch-scatter==2.1.0+pt113cu117
35+
- torch-sparse==0.6.16+pt113cu117
36+
- torch-spline-conv==1.2.1+pt113cu117
37+
- torchmetrics==0.11.0
3838
- pip:
39-
- gradio==3.50.*
40-
- requests
39+
- gradio==3.50.*
40+
- requests

forks/DiffDockv1/README.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
11
# DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
2+
23
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffdock-diffusion-steps-twists-and-turns-for/blind-docking-on-pdbbind)](https://paperswithcode.com/sota/blind-docking-on-pdbbind?p=diffdock-diffusion-steps-twists-and-turns-for)
34

45
### [Paper on arXiv](https://arxiv.org/abs/2210.01776)
56

6-
Implementation of DiffDock, state-of-the-art method for molecular docking, by Gabriele Corso*, Hannes Stark*, Bowen Jing*, Regina Barzilay and Tommi Jaakkola.
7-
This repository contains all code, instructions and model weights necessary to run the method or to retrain a model.
7+
Implementation of DiffDock, state-of-the-art method for molecular docking, by Gabriele Corso*, Hannes Stark*, Bowen Jing\*, Regina Barzilay and Tommi Jaakkola.
8+
This repository contains all code, instructions and model weights necessary to run the method or to retrain a model.
89
If you have any question, feel free to open an issue or reach out to us: [gcorso@mit.edu](gcorso@mit.edu), [hstark@mit.edu](hstark@mit.edu), [bjing@mit.edu](bjing@mit.edu).
910

1011
![Alt Text](visualizations/overview.png)
1112

1213
The repository also contains all the scripts to run the baselines and generate the figures.
1314
Additionally, there are visualization videos in `visualizations`.
1415

15-
You might also be interested in this [Google Colab notebook](https://colab.research.google.com/drive/1CTtUGg05-2MtlWmfJhqzLTtkDDaxCDOQ#scrollTo=zlPOKLIBsiPU) to run DiffDock by Brian Naughton.
16+
You might also be interested in this [Google Colab notebook](https://colab.research.google.com/drive/1CTtUGg05-2MtlWmfJhqzLTtkDDaxCDOQ#scrollTo=zlPOKLIBsiPU) to run DiffDock by Brian Naughton.
1617

1718
# Dataset
1819

1920
The files in `data` contain the names for the time-based data split.
2021

21-
If you want to train one of our models with the data then:
22-
1. download it from [zenodo](https://zenodo.org/record/6408497)
23-
2. unzip the directory and place it into `data` such that you have the path `data/PDBBind_processed`
24-
22+
If you want to train one of our models with the data then:
2523

24+
1. download it from [zenodo](https://zenodo.org/record/6408497)
25+
2. unzip the directory and place it into `data` such that you have the path `data/PDBBind_processed`
2626

2727
## Setup Environment
2828

@@ -45,27 +45,28 @@ Then you need to install ESM that we use both for protein sequence embeddings an
4545
pip install 'dllogger @ git+https://github.com/NVIDIA/dllogger.git'
4646
pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'
4747

48-
4948
# Running DiffDock on your own complexes
49+
5050
We support multiple input formats depending on whether you only want to make predictions for a single complex or for many at once.\
5151
The protein inputs need to be `.pdb` files or sequences that will be folded with ESMFold. The ligand input can either be a SMILES string or a filetype that RDKit can read like `.sdf` or `.mol2`.
5252

5353
For a single complex: specify the protein with `--protein_path protein.pdb` or `--protein_sequence GIQSYCTPPYSVLQDPPQPVV` and the ligand with `--ligand ligand.sdf` or `--ligand "COc(cc1)ccc1C#N"`
5454

55-
For many complexes: create a csv file with paths to proteins and ligand files or SMILES. It contains as columns `complex_name` (name used to save predictions, can be left empty), `protein_path` (path to `.pdb` file, if empty uses sequence), `ligand_description` (SMILE or file path) and `protein_sequence` (to fold with ESMFold in case the protein_path is empty).
55+
For many complexes: create a csv file with paths to proteins and ligand files or SMILES. It contains as columns `complex_name` (name used to save predictions, can be left empty), `protein_path` (path to `.pdb` file, if empty uses sequence), `ligand_description` (SMILE or file path) and `protein_sequence` (to fold with ESMFold in case the protein_path is empty).
5656
An example .csv is at `data/protein_ligand_example_csv.csv` and you would use it with `--protein_ligand_csv protein_ligand_example_csv.csv`.
5757

5858
And you are ready to run inference:
5959

6060
python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
6161

62-
When providing the `.pdb` files you can run DiffDock also on CPU, however, if possible, we recommend using a GPU as the model runs significantly faster. Note that the first time you run DiffDock on a device the program will precompute and store in cache look-up tables for SO(2) and SO(3) distributions (typically takes a couple of minutes), this won't be repeated in following runs.
63-
62+
When providing the `.pdb` files you can run DiffDock also on CPU, however, if possible, we recommend using a GPU as the model runs significantly faster. Note that the first time you run DiffDock on a device the program will precompute and store in cache look-up tables for SO(2) and SO(3) distributions (typically takes a couple of minutes), this won't be repeated in following runs.
6463

6564
# Retraining DiffDock
65+
6666
Download the data and place it as described in the "Dataset" section above.
6767

6868
### Generate the ESM2 embeddings for the proteins
69+
6970
First run:
7071

7172
python datasets/pdbbind_lm_embedding_preparation.py
@@ -80,10 +81,11 @@ Then run the command:
8081
python datasets/esm_embeddings_to_pt.py
8182

8283
### Using the provided model weights for evaluation
84+
8385
We first generate the language model embeddings for the testset, then run inference with DiffDock, and then evaluate the files that DiffDock produced:
8486

8587
python datasets/esm_embedding_preparation.py --protein_ligand_csv data/testset_csv.csv --out_file data/prepared_for_esm_testset.fasta
86-
git clone https://github.com/facebookresearch/esm
88+
git clone https://github.com/facebookresearch/esm
8789
cd esm
8890
pip install -e .
8991
cd ..
@@ -92,13 +94,15 @@ We first generate the language model embeddings for the testset, then run infere
9294
python evaluate_files.py --results_path results/user_predictions_testset --file_to_exclude rank1.sdf --num_predictions 40
9395

9496
<!--
95-
To predict binding structures using the provided model weights run:
97+
To predict binding structures using the provided model weights run:
9698
9799
python -m evaluate --model_dir workdir/paper_score_model --ckpt best_ema_inference_epoch_model.pt --confidence_ckpt best_model_epoch75.pt --confidence_model_dir workdir/paper_confidence_model --run_name DiffDockInference --inference_steps 20 --split_path data/splits/timesplit_test --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
98100
99101
To additionally save the .sdf files of the generated molecules, add the flag `--save_visualisation`
100102
-->
103+
101104
### Training a model yourself and using those weights
105+
102106
Train the large score model:
103107

104108
python -m train --run_name big_score_model --test_sigma_intervals --esm_embeddings_path data/esm2_3billion_embeddings.pt --log_dir workdir --lr 1e-3 --tr_sigma_min 0.1 --tr_sigma_max 19 --rot_sigma_min 0.03 --rot_sigma_max 1.55 --batch_size 16 --ns 48 --nv 10 --num_conv_layers 6 --dynamic_max_cross --scheduler plateau --scale_by_sigma --dropout 0.1 --remove_hs --c_alpha_max_neighbors 24 --receptor_radius 15 --num_dataloader_workers 1 --cudnn_benchmark --val_inference_freq 5 --num_inference_complexes 500 --use_ema --distance_embed_dim 64 --cross_distance_embed_dim 64 --sigma_embed_dim 64 --scheduler_patience 30 --n_epochs 850
@@ -122,22 +126,23 @@ Now everything is trained and you can run inference with:
122126

123127
python -m evaluate --model_dir workdir/big_score_model --ckpt best_ema_inference_epoch_model.pt --confidence_ckpt best_model_epoch75.pt --confidence_model_dir workdir/confidence_model --run_name DiffDockInference --inference_steps 20 --split_path data/splits/timesplit_test --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
124128

125-
Note: the notebook `data/apo_alignment.ipynb` contains the code used to align the ESMFold-generated apo-structures to the holo-structures.
129+
Note: the notebook `data/apo_alignment.ipynb` contains the code used to align the ESMFold-generated apo-structures to the holo-structures.
126130

127131
## Citation
132+
128133
@article{corso2023diffdock,
129-
title={DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking},
134+
title={DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking},
130135
author = {Corso, Gabriele and Stärk, Hannes and Jing, Bowen and Barzilay, Regina and Jaakkola, Tommi},
131136
journal={International Conference on Learning Representations (ICLR)},
132137
year={2023}
133138
}
134139

135140
## License
141+
136142
MIT
137143

138144
## Acknowledgements
139145

140146
We thank Wei Lu and Rachel Wu for pointing out some issues with the code.
141147

142-
143148
![Alt Text](visualizations/example_6agt_symmetric.gif)

0 commit comments

Comments
 (0)