Skip to content

Colin-Jay/NMREluBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 NMREluBench: Benchmarking Molecular Structure Elucidation from Experimental NMR Chemical Shifts

NMREluBench framework

🤗 Hugging Face 📊 Zenodo ⭐ GitHub

🎯 Overview

NMREluBench is a comprehensive benchmark specifically designed for evaluating deep learning models on the inverse elucidation of molecular structures from experimental 1H and 13C NMR chemical shifts. This benchmark addresses a critical gap in computational chemistry by providing standardized evaluation protocols for NMR-based structure determination.

✅ Key Features

  • 📈 Two Core Tasks: De novo structure generation and library matching
  • 🧪 Experimental Data Focus: Real-world NMR chemical shifts from experimental measurements
  • 🔄 Comparative Analysis: Performance evaluation against computed NMR datasets
  • 📊 Standardized Metrics: Rigorous evaluation protocols for fair model comparison
  • 🌐 Open Source: Publicly available for research and development

🚀 Quick Start

📁 Dataset Structure

NMREluBench/
├── nmr_denovo/         # De novo structure generation task
│   ├── ...             # Task-specific code
│   └── README.md       # Task-specific documentation
├── nmr_retrieval/      # Library matching task
│   ├── ...             # Task-specific code
│   └── README.md       # Task-specific documentation
└── README.md           # NMREluBench documentation

📋 Tasks Overview

Generate molecular structures directly from experimental NMR chemical shifts without prior knowledge of potential candidates.

Input: 1H and 13C NMR chemical shifts
Output: Molecular structure (Smiles or Selfies)
Evaluation: For the de novo molecular structure generation task, we report the overall molecular validity rate across all generated structures ($R_{\text{valid}}$), along with Top-1 and Top-10 performance for structural match rate, MCES distance ($D_{\text{mces}}^{(1)}$, $D_{\text{mces}}^{(10)}$), and Tanimoto similarity ($S_{\text{tani}}^{(1)}$, $S_{\text{tani}}^{(10)}$).

🔍 Task 2: Library Matching

Identify the most likely molecular structure by matching experimental NMR data against a curated molecular library.

Input: 1H and 13C NMR chemical shifts
Output: Ranked list of candidate structures from library
Evaluation: For the library matching task, we use Top-1, Top-3, and Top-10 performance for structural match rate and MCES distance.

📚 Citation

Please kindly cite us after publication if you use our data or code.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

This project is built upon the following open-source works, and we deeply appreciate the contributions of their authors:

✔️ Dataset & Baseline Methods

  • NMRNet - Provided the NMR spectral dataset.

✔️ Core Development Framework

  • MassSpecGym - Our code is extended from this mass spectrometry toolkit.

✔️ Model Architecture

  • CMGNet - The BART-based model was adapted from this repository.

We also thank the broader open-source community for enabling reproducible research.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors