This folder contains Thompson sampling workflows for peptide screening with cluster-aware selection.
replay.py: replay workflow that reads precomputed AF2 metrics fromdata/dict.csv, combines them with cluster assignments, and writesdata.csvplus TS outputs.
Cluster assignments live in ./clusters as CSV files with columns:
namesequencecluster
Current clustering result files include:
cd-0.4.csv,cd-0.5.csv,cd-0.7.csv,cd-0.9.csveasy-cluster-0.4.csv,easy-cluster-0.5.csv,easy-cluster-0.7.csv,easy-cluster-0.9.csveasy-linclust-0.4.csv,easy-linclust-0.5.csv,easy-linclust-0.7.csv,easy-linclust-0.9.csv
data/dict.csv stores AF2 metrics with columns:
namesequenceplddt_0,dist_0,rog_0plddt_1,dist_1,rog_1plddt_2,dist_2,rog_2plddt_3,dist_3,rog_3plddt_4,dist_4,rog_4
For the offline replay workflow, these metrics are converted into a binary binder label and then written to data.csv in the standard format:
nameclusterlabel
The ./example folder contains the outputs from:
python replay.py --config data/config.json --library clusters/cd-0.5.csv --dict_path data/dict.csv --data_out example/data.csv --metrics_out example/af2_metrics.csv --out_prefix example/replayThis produces:
example/data.csvexample/af2_metrics.csvexample/replay_seed.csvexample/replay_selections.csvexample/replay_curve.csvexample/replay_summary.jsonexample/replay_plot.png
numpypandasmatplotlib