ARC is a single-model quality assessment method for protein complexes. It uses an ensemble of graph neural networks to assign per-residue scores on modeled chain-chain interfaces, with emphasis on interface-residue identification and local interface quality.
This repository contains the inference code, CASP16 evaluation pipeline, packaged asset workflow, and manuscript figure and table generation used for the ARC study.
The minimal workflow to reproduce ARC results:
# 1. Install environment
pixi install
pixi run smoke
# 2. Fetch packaged data and predictions
pixi run python -u scripts/assets/fetch_assets.py
# 3. Run evaluation (uses packaged predictions)
pixi run eval-fullTo run inference instead of using packaged predictions:
pixi run run-batch
pixi run eval-fullARC produces two complementary outputs per model:
LOCAL.json: per-residue interface quality scores, used for residue-level evaluation and interface discrimination.QSCORE.json: global interface quality scores, used for model ranking and CASP-style evaluation.
The ARC/ directory contains the ensemble prediction used in the manuscript, while ARC_<suffix>/ directories correspond to individual GNN components.
Clone the ARC repository and enter it (all commands assume you are at the repository root):
git clone https://github.com/zwang-bioinformatics/ARC.git
cd ARCCreate the ARC environment with pixi (reads pixi.toml in this directory):
pixi install
pixi run smokeIf pixi is not on your PATH, install it from https://pixi.sh/latest/installation/ or use ~/.pixi/bin/pixi.
Important
All commands below are meant to be run from the repository root.
Populate the repository data layout.
You have two options:
- Recommended (reproducible): download packaged assets
- Manual: prepare inputs following the required directory structure
The recommended approach is to fetch the packaged tarballs:
pixi run python -u scripts/assets/fetch_assets.pyThis extracts each archive (Zstandard .tar.zst) at the repository root:
arc_eval_inputs_core.tar.zst:
data/
├── raw_16/
├── casp16_ema_reference_results/
├── casp16_targets/
├── casp_model_scores.csv
├── casp16_approx_target_sizes.json
├── ema_local_scores_with_lddt_added_mdl_contacts.csv
└── target_margin_scores.csv
arc_graph_data_casp16.tar.zst:
data/
└── CASP16/
└── <target>/
└── <model>/
├── meta.json
└── data.st
predictions_apollo.tar.zst:
outputs/
└── predictions/
└── CASP16/
└── <target>/
└── APOLLO/
└── LOCAL.json
predictions_arc.tar.zst:
outputs/
└── predictions/
└── CASP16/
└── <target>/
├── ARC/
│ ├── LOCAL.json
│ └── QSCORE.json
└── ARC_<suffix>/
├── LOCAL.json
└── QSCORE.json
The APOLLO predictions are included as a baseline for comparison during evaluation.
In scripts/assets/fetch_assets.py, each archive is controlled by an ENABLE_* flag (all default to True). Set a flag to False to skip specific downloads.
If you prepare inputs manually, mirror the following layout:
data/
├── CASP16/
│ └── <target>/
│ └── <model>/
│ ├── meta.json
│ └── data.st
├── raw_16/
├── casp16_ema_reference_results/
│ └── <model>_<target>.json
├── casp16_targets/
│ └── <target>.pdb
├── casp_model_scores.csv
├── casp16_approx_target_sizes.json
├── ema_local_scores_with_lddt_added_mdl_contacts.csv
└── target_margin_scores.csv
For predictions, either use the packaged layouts above or run inference to populate:
outputs/predictions/CASP16/
Note
Evaluation requires predictions in outputs/predictions/.
- If you fetched packaged predictions, you can run evaluation directly.
- If not, you must run inference before evaluation.
Single target:
pixi run predict -- -t H1202Batch over all targets:
pixi run run-batchExpected input:
data/CASP16/<target>/<model>/
├── meta.json
└── data.st
Outputs:
outputs/predictions/CASP16/<target>/
├── ARC/
│ ├── LOCAL.json
│ └── QSCORE.json
├── ARC_GENConv/
├── ARC_GINEConv/
├── ARC_GLFP/
├── ARC_GeneralConv/
├── ARC_PDNConv/
├── ARC_ResGatedGraphConv/
└── ARC_TransConv/
The ARC/ directory contains the ensemble prediction used in the manuscript. The ARC_<suffix>/ directories correspond to individual GNN components.
Run full CASP16 evaluation and manuscript outputs:
pixi run eval-fullGenerate figures:
pixi run manuscript-figuresGenerate tables:
pixi run manuscript-tablesoutputs/
├── predictions/
│ └── CASP16/<target>/...
└── results/
├── eval/
├── logs/
└── local_results/
├── arc_ensemble/
├── manuscript_figures/
├── manuscript_tables/
├── per_target_analysis/
└── pooled_analysis/
If you use ARC, please cite: [TBD]
@article{shrestha2026arc,
title={ARC: Assessment of Interface Residue Conformation using an Ensemble of Graph Neural Networks},
author={Shrestha, Bishal and Siciliano, Andrew J. and Huang, Gabriel and Bao, Yifan and Wang, Zheng},
journal={Proteins: Structure, Function, and Bioinformatics},
year={2026},
publisher={Wiley Online Library},
status={Submitted}
}