You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**NOTE:** The preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data available via [Zenodo](https://doi.org/10.5281/zenodo.11477766) provide pre-holo-aligned predicted protein structures for these respective datasets.
dataset: posebusters_benchmark # the dataset to use - NOTE: must be one of (`posebusters_benchmark`, `astex_diverse`, `dockgen`, `casp15`)
2
+
data_dir: ${oc.env:PROJECT_ROOT}/data/${dataset}_set/ # where the processed datasets (e.g., PoseBusters Benchmark) are placed
3
+
predicted_structures_dir: ${oc.env:PROJECT_ROOT}/data/${dataset}_set/${dataset}_predicted_structures # where the predicted protein structures are placed
4
+
output_dir: ${oc.env:PROJECT_ROOT}/data/${dataset}_set/${dataset}_holo_aligned_predicted_structures # where the holo-aligned predicted apo structures should be stored
5
+
num_workers: 1# number of CPU workers for parallel processing
Copy file name to clipboardExpand all lines: notebooks/adding_new_dataset_tutorial.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@
20
20
"\n",
21
21
"1. Create a new directory under `data/` with the required suffix `_set` (e.g., `data/newest_set/`) and group your (ground-truth) data files by unique IDs within this new directory (e.g., `data/newest_set/1G9V_RQ4/1G9V_RQ4_{protein.pdb,ligand.sdf}`)\n",
22
22
"2. Update the config files throughout `configs/analysis/`, `configs/data/`, and `configs/model/` to list your new dataset as a CLI argument (e.g., `dataset: newest`)\n",
23
-
"3. Predict `apo` protein structures for your new dataset using ESMFold by integrating parsing for your dataset into the ESMFold-related source code within `src/data/components/esmfold_fasta_preparation.py` and `src/data/components/esmfold_apo_to_holo_alignment.py`\n",
23
+
"3. Predict `apo` protein structures for your new dataset using a structure predictor of your choice (e.g., ESMFold) by integrating parsing for your dataset into the prediction-related source code within `src/data/components/protein_fasta_preparation.py` and `src/data/components/protein_apo_to_holo_alignment.py`\n",
24
24
"4. Using `notebooks/posebusters_astex_inference_results_plotting.ipynb` as a template, add a new Jupyter notebook to `notebooks/` for plotting each method's results on your new dataset (after preparing each method's dataset inputs and running inference with each desired method)"
:param smoothing_factor: Smoothing factor controlling the alignment.
249
249
:param dataset_calpha_coords: Array of Ca atom coordinates for a dataset's protein structure.
250
-
:param esmfold_calpha_coords: Array of Ca atom coordinates for a dataset's protein structure.
250
+
:param predicted_calpha_coords: Array of Ca atom coordinates for a dataset's protein structure.
251
251
:param dataset_ligand_coords: Array of ligand coordinates from a dataset.
252
252
:param return_rotation: Whether to return the rotation matrix and centroids (default: `False`).
253
253
:return: If return_rotation is `True`, returns a tuple containing rotation matrix (`Rotation`), centroid of CA atoms for a dataset protein (`np.ndarray`),
254
-
and centroid of CA atoms for ESMFold (`np.ndarray`). If return_rotation is `False`, returns the inverse root mean square error of reciprocal distances (`float`).
254
+
and centroid of CA atoms for a prediction (`np.ndarray`). If return_rotation is `False`, returns the inverse root mean square error of reciprocal distances (`float`).
0 commit comments