Command-line tool to search the ChEMBL database for small molecules and export results to Excel.
pip install -e .Search by target keyword — find molecules related to cyclin dependent kinase inhibitors:
chemblfind --text-search "cyclin dependent kinase inhibitors" --min-relevance 0.75 --top-result 1000Search by structural similarity — find molecules similar to a given SMILES:
chemblfind --similarity-search "O=Nc1c(-c2c(O)[nH]c3ccccc23)[nH]c2ccccc12" --threshold 40 --top-result 100Results are saved to chemblfind_result_<timestamp>.xlsx.
Read multiple queries from an Excel file and search them in one run. Results are deduplicated by ChEMBL ID.
# Batch text search (reads first column by default)
chemblfind --text-search-file targets.xlsx --top-result 100
# Specify which column to read
chemblfind --text-search-file targets.xlsx --column "Target Name" --top-result 100
# Batch similarity search
chemblfind --similarity-search-file compounds.xlsx --column "SMILES" --threshold 70 --top-result 100Batch mode generates two output files:
| File | Content |
|---|---|
chemblfind_result_<timestamp>.xlsx |
Results sheet (all molecules, with Source Query column) + Summary sheet |
chemblfind_summary_<timestamp>.xlsx |
Standalone summary: each query, hit count, and status |
| Option | Default | Description |
|---|---|---|
--text-search TEXT |
— | Search by target keyword (single query) |
--similarity-search TEXT |
— | Search by SMILES similarity (single query) |
--text-search-file PATH |
— | Excel file with target keywords (batch mode) |
--similarity-search-file PATH |
— | Excel file with SMILES strings (batch mode) |
--column TEXT |
First column | Column name to read from the Excel file |
--threshold INT |
70 | Similarity threshold, 40–100. Only used with similarity search |
--top-result INT |
100 | Max molecules to return. In batch mode, this applies per query |
--min-relevance FLOAT |
0.5 | Minimum keyword match ratio (0–1) for filtering targets. Only used with text search |
--output TEXT |
Auto-generated | Custom output file name |
The four search options are mutually exclusive — use exactly one per run.
| Column | Description |
|---|---|
| Source Query | The query that produced this result (batch mode only) |
| ChEMBL ID | ChEMBL molecule identifier |
| Name | Preferred compound name |
| SMILES | Canonical SMILES |
| MW | Molecular weight |
| ALogP | Calculated LogP |
| HBA | Hydrogen bond acceptors |
| HBD | Hydrogen bond donors |
| PSA | Polar surface area |
| RO5 Violations | Lipinski Rule-of-5 violations |
| Similarity | Similarity score (similarity search only) |