Skip to content

AIB001/ChEMBLFind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

chemblfind

Command-line tool to search the ChEMBL database for small molecules and export results to Excel.

Installation

pip install -e .

Quick Start

Search by target keyword — find molecules related to cyclin dependent kinase inhibitors:

chemblfind --text-search "cyclin dependent kinase inhibitors" --min-relevance 0.75 --top-result 1000

Search by structural similarity — find molecules similar to a given SMILES:

chemblfind --similarity-search "O=Nc1c(-c2c(O)[nH]c3ccccc23)[nH]c2ccccc12" --threshold 40 --top-result 100

Results are saved to chemblfind_result_<timestamp>.xlsx.

Batch Search

Read multiple queries from an Excel file and search them in one run. Results are deduplicated by ChEMBL ID.

# Batch text search (reads first column by default)
chemblfind --text-search-file targets.xlsx --top-result 100

# Specify which column to read
chemblfind --text-search-file targets.xlsx --column "Target Name" --top-result 100

# Batch similarity search
chemblfind --similarity-search-file compounds.xlsx --column "SMILES" --threshold 70 --top-result 100

Batch mode generates two output files:

File Content
chemblfind_result_<timestamp>.xlsx Results sheet (all molecules, with Source Query column) + Summary sheet
chemblfind_summary_<timestamp>.xlsx Standalone summary: each query, hit count, and status

Options

Option Default Description
--text-search TEXT Search by target keyword (single query)
--similarity-search TEXT Search by SMILES similarity (single query)
--text-search-file PATH Excel file with target keywords (batch mode)
--similarity-search-file PATH Excel file with SMILES strings (batch mode)
--column TEXT First column Column name to read from the Excel file
--threshold INT 70 Similarity threshold, 40–100. Only used with similarity search
--top-result INT 100 Max molecules to return. In batch mode, this applies per query
--min-relevance FLOAT 0.5 Minimum keyword match ratio (0–1) for filtering targets. Only used with text search
--output TEXT Auto-generated Custom output file name

The four search options are mutually exclusive — use exactly one per run.

Output Columns

Column Description
Source Query The query that produced this result (batch mode only)
ChEMBL ID ChEMBL molecule identifier
Name Preferred compound name
SMILES Canonical SMILES
MW Molecular weight
ALogP Calculated LogP
HBA Hydrogen bond acceptors
HBD Hydrogen bond donors
PSA Polar surface area
RO5 Violations Lipinski Rule-of-5 violations
Similarity Similarity score (similarity search only)

About

ChEMBLE Dataset API Toolkit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages