Select representative molecules from a candidate set to maximize coverage of the MW × ALogP chemical space.
Given an Excel file of molecules with molecular weight (MW) and ALogP values, MolScope picks a subset that spreads as evenly as possible across the 2D chemical space. It uses greedy farthest-point initialization followed by Metropolis-Hastings simulated annealing to maximize the minimum pairwise distance between selected molecules.
pip install -e .molscope molecules.xlsx -n 100This reads molecules.xlsx, selects 100 representative molecules, and outputs:
selected_molecules.xlsx— the selected subsetchemical_space.png— coverage visualization
from molscope import run_pipeline
df_selected, indices, min_dist, history = run_pipeline(
input_file="molecules.xlsx",
n_select=100,
)MolScope can run as an MCP server for integration with Claude Code or other MCP clients:
python molscope/server.py| Option | Default | Description |
|---|---|---|
input_file |
(required) | Input Excel file |
-n, --n-select |
100 | Number of molecules to select |
-i, --iterations |
100000 | Metropolis-Hastings iterations |
-o, --output |
selected_molecules.xlsx |
Output Excel path |
-p, --plot |
chemical_space.png |
Output plot path |
--mw-col |
MW |
Column name for molecular weight |
--alogp-col |
ALogP |
Column name for ALogP |
--lambda-var |
0.01 | Weight for spatial variance |
--seed |
42 | Random seed |
-q, --quiet |
off | Suppress progress output |
An Excel file (.xlsx) with at least two numeric columns for MW and ALogP. Rows with missing values in these columns are automatically dropped.
Given
Normalization. Raw descriptors are scaled to
Each molecule is then represented as a point
Objective. The optimization maximizes:
where
Step 1 — Greedy farthest-point initialization. Start with the pair $(i^, j^)$ of maximum distance. Iteratively add the point with the largest minimum distance to the current set:
This gives a 2-approximation to the maximin problem and serves as the initial solution.
Step 2 — Metropolis-Hastings refinement with simulated annealing. At each iteration
- Propose a swap: randomly pick
$a \in S$ and$b \notin S$ , let$S' = S \setminus {a} \cup {b}$ - Compute
$\Delta = f(S') - f(S)$ - Accept with probability:
The temperature follows an exponential decay schedule:
where
The achieved
MIT