🌐 Online Database Browser: https://sunyrain.github.io/OPV2D/
Search, filter, and download the full dataset interactively — no installation required.
This repository hosts OPV2D, a continuously updated dataset for organic photovoltaic (OPV) donor–acceptor (D/A) materials. The dataset integrates previously published OPV datasets and expands them through ongoing manual verification and new data collection. It is designed to support machine learning research, molecular generation, and large-scale OPV screening.
The dataset is available in two formats:
- Active_Database.csv — The full dataset in CSV format. Each entry is annotated with a
checkedfield indicating whether it has been manually validated. - Online Browser — An interactive web interface hosted via GitHub Pages (
docs/folder), supporting search, filtering, sorting, and CSV/JSON export.
This design allows the dataset to grow organically while maintaining transparent data quality tracking.
This dataset is ideal for:
- Machine Learning: Training predictive models to estimate OPV performance (e.g., PCE, HOMO-LUMO levels) and classify OPV-active materials.
- Generative Design: Using the dataset with generative models to create new OPV materials with optimized properties.
- Materials Discovery: Accelerating the discovery of new donor-acceptor pairs for OPVs by leveraging both the active and verified databases.
If you use this dataset in your research, please cite the following paper:
@misc{qiu2025,
title={Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning},
author={Jiangjie Qiu and Hou Hei Lam and Xiuyuan Hu and Wentao Li and Siwei Fu and Fankun Zeng and Hao Zhang and Xiaonan Wang},
year={2025},
eprint={2503.23766},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.23766},
}
@misc{lam2025cyclechemistdualprongedmachinelearning,
title={CycleChemist: A Dual-Pronged Machine Learning Framework for Organic Photovoltaic Discovery},
author={Hou Hei Lam and Jiangjie Qiu and Xiuyuan Hu and Wentao Li and Fankun Zeng and Siwei Fu and Hao Zhang and Xiaonan Wang},
year={2025},
eprint={2511.19500},
archivePrefix={arXiv},
primaryClass={cond-mat.mtrl-sci},
url={https://arxiv.org/abs/2511.19500},
}And cite the original data sources:
@dataset{li2025opv,
author = {Li, Yuan-Fang},
title = {Training dataset for predicting PCE of organic solar cells},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.16784519},
url = {https://doi.org/10.5281/zenodo.16784519},
note = {5,628 experimental records of binary OSCs with SMILES and energy levels, CC-BY 4.0}
}
@article{Min2020,
author = {Wu, Yao and Guo, Jie and Sun, Rui and Min, Jie},
title = {Machine learning for accelerating the discovery of high-performance donor/acceptor pairs in non-fullerene organic solar cells},
journal = {npj Computational Materials},
volume = {6},
number = {1},
pages = {120},
year = {2020}
}
@article{Saeki2021,
author = {Miyake, Yuta and Saeki, Akinori},
title = {Machine Learning-Assisted Development of Organic Solar Cell Materials: Issues, Analyses, and Outlooks},
journal = {The Journal of Physical Chemistry Letters},
volume = {12},
number = {51},
pages = {12391-12401},
year = {2021}
}This dataset is made publicly available for academic and research purposes. It is distributed under the terms of the MIT License, allowing free usage for research activities.
For any further inquiries or to collaborate, please contact Me at whilesunny@gmail.com