You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@
11
11
**Changes**:
12
12
13
13
- Introducing **DockGen-E**, a new version of the DockGen benchmark dataset featuring enhanced biomolecular context for docking and co-folding predictions - namely, now all DockGen complexes represent the first (biologically relevant) bioassembly of the corresponding PDB structure
14
-
- For the single-ligand datasets (i.e., Astex Diverse, PoseBusters Benchmark, and DockGen), now providing each baseline method with primary *and cofactor* ligand SMILES strings for prediction, to enhance the biomolecular context of these methods' predicted structures - as a result, for these single-ligand datasets, now the predicted ligand *most similar* to the primary ligand (in terms of both Tanimoto and structural similarity) is selected for scoring (which adds an additional layer of challenges for baseline methods)
14
+
- For the single-ligand datasets (i.e., Astex Diverse, PoseBusters Benchmark, and DockGen), now providing each baseline method with primary _and cofactor_ ligand SMILES strings for prediction, to enhance the biomolecular context of these methods' predicted structures - as a result, for these single-ligand datasets, now the predicted ligand _most similar_ to the primary ligand (in terms of both Tanimoto and structural similarity) is selected for scoring (which adds an additional layer of challenges for baseline methods)
15
15
- Updated Chai-1's inference code to commit `44375d5d4ea44c0b5b7204519e63f40b063e4a7c`, and ran it also with standardized (paired) MSAs
16
16
- Replaced all AlphaFold 3 server predictions of each dataset's protein structures with predictions from AlphaFold 3's local inference code
Copy file name to clipboardExpand all lines: forks/DiffDock/app/README.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,17 +9,16 @@ app_file: main.py
9
9
pinned: false
10
10
---
11
11
12
-
13
12
## How to use this space
14
13
15
14
This is a simple app intended to showcase [DiffDock](https://github.com/gcorso/DiffDock).
16
15
One can upload a protein and ligand, and calculate the predicted structure. The results are visualized in 3D and can be downloaded.
17
16
18
-
* This app is designed to take 1 protein (in PDB format) and 1 ligand (in SDF format) at a time. For bulk inference, use the [command line interface](https://github.com/gcorso/DiffDock).
17
+
- This app is designed to take 1 protein (in PDB format) and 1 ligand (in SDF format) at a time. For bulk inference, use the [command line interface](https://github.com/gcorso/DiffDock).
19
18
20
-
* Our demonstration space uses a CPU, so it may take a few minutes to run. For faster results, use a GPU.
21
-
One can duplicate this space (at their own expense) by selecting "⋮" -> "Duplicate this space" in the top right corner, and then selecting a GPU in the "Settings" tab.
19
+
- Our demonstration space uses a CPU, so it may take a few minutes to run. For faster results, use a GPU.
20
+
One can duplicate this space (at their own expense) by selecting "⋮" -> "Duplicate this space" in the top right corner, and then selecting a GPU in the "Settings" tab.
22
21
23
-
----------
22
+
---
24
23
25
24
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
### [Paper on arXiv](https://arxiv.org/abs/2210.01776)
5
6
6
-
Implementation of DiffDock, state-of-the-art method for molecular docking, by Gabriele Corso*, Hannes Stark*, Bowen Jing*, Regina Barzilay and Tommi Jaakkola.
7
-
This repository contains all code, instructions and model weights necessary to run the method or to retrain a model.
7
+
Implementation of DiffDock, state-of-the-art method for molecular docking, by Gabriele Corso*, Hannes Stark*, Bowen Jing\*, Regina Barzilay and Tommi Jaakkola.
8
+
This repository contains all code, instructions and model weights necessary to run the method or to retrain a model.
8
9
If you have any question, feel free to open an issue or reach out to us: [gcorso@mit.edu](gcorso@mit.edu), [hstark@mit.edu](hstark@mit.edu), [bjing@mit.edu](bjing@mit.edu).
9
10
10
11

11
12
12
13
The repository also contains all the scripts to run the baselines and generate the figures.
13
14
Additionally, there are visualization videos in `visualizations`.
14
15
15
-
You might also be interested in this [Google Colab notebook](https://colab.research.google.com/drive/1CTtUGg05-2MtlWmfJhqzLTtkDDaxCDOQ#scrollTo=zlPOKLIBsiPU) to run DiffDock by Brian Naughton.
16
+
You might also be interested in this [Google Colab notebook](https://colab.research.google.com/drive/1CTtUGg05-2MtlWmfJhqzLTtkDDaxCDOQ#scrollTo=zlPOKLIBsiPU) to run DiffDock by Brian Naughton.
16
17
17
18
# Dataset
18
19
19
20
The files in `data` contain the names for the time-based data split.
20
21
21
-
If you want to train one of our models with the data then:
22
-
1. download it from [zenodo](https://zenodo.org/record/6408497)
23
-
2. unzip the directory and place it into `data` such that you have the path `data/PDBBind_processed`
24
-
22
+
If you want to train one of our models with the data then:
25
23
24
+
1. download it from [zenodo](https://zenodo.org/record/6408497)
25
+
2. unzip the directory and place it into `data` such that you have the path `data/PDBBind_processed`
26
26
27
27
## Setup Environment
28
28
@@ -45,27 +45,28 @@ Then you need to install ESM that we use both for protein sequence embeddings an
We support multiple input formats depending on whether you only want to make predictions for a single complex or for many at once.\
51
51
The protein inputs need to be `.pdb` files or sequences that will be folded with ESMFold. The ligand input can either be a SMILES string or a filetype that RDKit can read like `.sdf` or `.mol2`.
52
52
53
53
For a single complex: specify the protein with `--protein_path protein.pdb` or `--protein_sequence GIQSYCTPPYSVLQDPPQPVV` and the ligand with `--ligand ligand.sdf` or `--ligand "COc(cc1)ccc1C#N"`
54
54
55
-
For many complexes: create a csv file with paths to proteins and ligand files or SMILES. It contains as columns `complex_name` (name used to save predictions, can be left empty), `protein_path` (path to `.pdb` file, if empty uses sequence), `ligand_description` (SMILE or file path) and `protein_sequence` (to fold with ESMFold in case the protein_path is empty).
55
+
For many complexes: create a csv file with paths to proteins and ligand files or SMILES. It contains as columns `complex_name` (name used to save predictions, can be left empty), `protein_path` (path to `.pdb` file, if empty uses sequence), `ligand_description` (SMILE or file path) and `protein_sequence` (to fold with ESMFold in case the protein_path is empty).
56
56
An example .csv is at `data/protein_ligand_example_csv.csv` and you would use it with `--protein_ligand_csv protein_ligand_example_csv.csv`.
When providing the `.pdb` files you can run DiffDock also on CPU, however, if possible, we recommend using a GPU as the model runs significantly faster. Note that the first time you run DiffDock on a device the program will precompute and store in cache look-up tables for SO(2) and SO(3) distributions (typically takes a couple of minutes), this won't be repeated in following runs.
63
-
62
+
When providing the `.pdb` files you can run DiffDock also on CPU, however, if possible, we recommend using a GPU as the model runs significantly faster. Note that the first time you run DiffDock on a device the program will precompute and store in cache look-up tables for SO(2) and SO(3) distributions (typically takes a couple of minutes), this won't be repeated in following runs.
64
63
65
64
# Retraining DiffDock
65
+
66
66
Download the data and place it as described in the "Dataset" section above.
0 commit comments