Skip to content

Utilizing the Primary Degrees of Freedom of Protein Backbone for Geometric Modeling and Generation

Notifications You must be signed in to change notification settings

NatureGeorge/FoldDoF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FoldDoF: Utilizing the Major Degrees of Freedom of Protein Backbone Conformation

Zefeng Zhu1,2,3, Chen Song2,3
{1Peking University-Tsinghua University-National Institute of Biological Sciences Joint Graduate Program, 2Center for Quantitative Biology, 3Peking-Tsinghua Center for Life Sciences } Academy for Advanced Interdisciplinary Studies, Peking University

Citation
@software{Zhu_FoldDoF_2022,
  author  = {Zhu, Zefeng},
  license = {Apache-2.0},
  month   = {8},
  title   = {{`FoldDoF`: Utilizing the Major Degrees of Freedom of Protein Backbone Conformation}},
  url     = {https://github.com/NatureGeorge/FoldField},
  year    = {2022}
}

Installation

git clone https://github.com/NatureGeorge/FoldDoF.git

cd FoldDoF
pip install -e .

Basic Usage

backbone reconstruction from idealized peptide isomers as example

import gemmi
from folddof import to_backbone, to_bb_mode, to_rottrans, to_rottrans_mode
from folddof.frame import PeptideUnitFrame
from folddof.io import get_coords_with_mask, savebb2pdb


st = gemmi.read_structure('3HSF.cif.gz')
st.remove_alternative_conformations()
chain = st[0].get_subchain("A")
full_threeletter_seq = st.get_entity_of(chain).full_sequence
chain = list(chain)
threeletter_seq = full_threeletter_seq[chain[0].label_seq-1:chain[-1].label_seq]
bb_coords, bb_masks = get_coords_with_mask(chain, atoms=('N','CA','C','O'))

global_rots, global_trans, ret_isomer, _, _ = to_rottrans(
    bb_coords, bb_masks, 
    to_rottrans_mode.PeptideUnitFrame, 
    rot_repr_is_q=True)

avg_isomer = PeptideUnitFrame.to_avg_loc_ca_ia1_wrt_n_ia1(ret_isomer)

avg_bb_coords = to_backbone(
    global_rots[None], 
    avg_isomer[None],
    mode=to_bb_mode.Pep_GlobalRots_IsoRots,
    rot_repr_is_q=True,
    ).squeeze(0)

savebb2pdb(threeletter_seq, avg_bb_coords, output_path=f'3HSF.0.A.avg.pdb')

Note

For additional examples, refer to the ./notebooks/. For instance, folddof.pymanopt.ipynb provides a script for optimizing backbone conformations on the $\mathrm{SO(3)}^N$ manifold via $\mathrm{SO(3)}$ connection.

For FrameFlow Variants

Installation

git clone https://github.com/NatureGeorge/PepFrameFlow.git

cd PepFrameFlow
pip install -e .

Training

$\text{FrameFlow}$

python -W ignore experiments/train_se3_flows.py bb_repr=original scope_dataset.csv_path=./metadata/scope_metadata.clean.csv data.dataset=scope experiment.trainer.max_epochs=120 experiment.trainer.log_every_n_steps=10 experiment.checkpointer.monitor=valid/pseudo_score experiment.trainer.check_val_every_n_epoch=1 experiment.num_devices=2 experiment.checkpointer.save_last=False experiment.checkpointer.save_top_k=10 experiment.enable_wandb=False shared.samples_per_eval_length=10 shared.num_eval_lengths=30 interpolant.rots.igso3.sigma_grid.start=1.5 interpolant.rots.igso3.sigma_grid.end=1.5 interpolant.rots.igso3.sigma_grid.steps=1

$\text{FrameFlow}_{\text{Pep}}$

python -W ignore experiments/train_se3_flows.py bb_repr=global_pep scope_dataset.csv_path=./metadata/scope_metadata.clean.csv data.dataset=scope experiment.trainer.max_epochs=120 experiment.trainer.log_every_n_steps=10 experiment.checkpointer.monitor=valid/pseudo_score experiment.trainer.check_val_every_n_epoch=1 experiment.num_devices=2 experiment.checkpointer.save_last=False experiment.checkpointer.save_top_k=10 experiment.enable_wandb=False shared.samples_per_eval_length=10 shared.num_eval_lengths=30 interpolant.rots.igso3.sigma_grid.start=1.5 interpolant.rots.igso3.sigma_grid.end=1.5 interpolant.rots.igso3.sigma_grid.steps=1

$\text{FrameFlow}_{\text{Pep}}^{\text{Rel}}$

python -W ignore experiments/train_se3_flows.py bb_repr=global_pep scope_dataset.csv_path=./metadata/scope_metadata.clean.csv data.dataset=scope experiment.trainer.max_epochs=120 experiment.trainer.log_every_n_steps=10 experiment.checkpointer.monitor=valid/pseudo_score experiment.trainer.check_val_every_n_epoch=1 experiment.num_devices=2 experiment.checkpointer.save_last=False experiment.checkpointer.save_top_k=10 experiment.enable_wandb=False shared.samples_per_eval_length=10 shared.num_eval_lengths=30 interpolant.rots.igso3.sigma_grid.start=1.5 interpolant.rots.igso3.sigma_grid.end=1.5 interpolant.rots.igso3.sigma_grid.steps=1 model.relative_pep_trans_on_ipa_update=True

Inference

Optional Prerequisite

If inference.samples.seq_per_sample > 0, you should install ESMFold and ProteinMPNN. By default, inference.samples.seq_per_sample is 8.

Checkpoint Paths

If you want to use the pretrained model weights, please turn to Zenodo for downloading below files:

export frameflow_ckpt_path=weights/frameflow.ckpt
export frameflow_pep_ckpt_path=weights/frameflow.pep.ckpt
export frameflow_pep_rel_ckpt_path=weights/frameflow.pep.rel.ckpt

Note: you can also specified your own model path.

$\text{FrameFlow}$

for num_timesteps in 10 20 50 100 200 300 400 500; do
    python -W ignore experiments/inference_se3_flows.py -cn inference_unconditional bb_repr=original inference.ckpt_path="$frameflow_ckpt_path" inference.samples.samples_per_length=10 inference.num_gpus=4 inference.samples.seq_per_sample=8 inference.interpolant.sampling.num_timesteps="$num_timesteps" inference.samples.min_length=60 inference.samples.max_length=128 inference.samples.length_step=1 inference.samples.length_subset=null
done

$\text{FrameFlow}_{\text{Pep}}$

for num_timesteps in 10 20 50 100 200 300 400 500; do
    python -W ignore experiments/inference_se3_flows.py -cn inference_unconditional bb_repr=global_pep inference.ckpt_path="$frameflow_pep_ckpt_path" inference.samples.samples_per_length=10 inference.num_gpus=4 inference.samples.seq_per_sample=8 inference.interpolant.sampling.num_timesteps="$num_timesteps" inference.samples.min_length=61 inference.samples.max_length=129 inference.samples.length_step=1 inference.samples.length_subset=null
done

$\text{FrameFlow}_{\text{Pep}}^{\text{Rel}}$

for num_timesteps in 10 20 50 100 200 300 400 500; do
    python -W ignore experiments/inference_se3_flows.py -cn inference_unconditional bb_repr=global_pep inference.ckpt_path="$frameflow_pep_rel_ckpt_path" inference.samples.samples_per_length=10 inference.num_gpus=4 inference.samples.seq_per_sample=8 inference.interpolant.sampling.num_timesteps="$num_timesteps" inference.samples.min_length=61 inference.samples.max_length=129 inference.samples.length_step=1 inference.samples.length_subset=null model.relative_pep_trans_on_ipa_update=True
done

The resulting directories are inference_outputs/hallucination_scope/*/*/unconditional/run_*.

Analysis

modified scripts of ReQFlow.

python analysis/all_metric_calculation.py --inference_dir "$your_result_dir" --script_path analysis/run_foldseek_parallel.sh --dataset_dir $your_pdb100_dir --type FrameFlow

Note that you should install foldseek and prepare the PDB100 database (specify the $your_pdb100_dir) before running above script (https://github.com/steineggerlab/foldseek?tab=readme-ov-file#databases).

Aggregate the Analysis Results:

Please turn to ./notebooks/frameflow.variants.analysis.ipynb.

About

Utilizing the Primary Degrees of Freedom of Protein Backbone for Geometric Modeling and Generation

Resources

Stars

Watchers

Forks