Skip to content

PubChem as a dynamic dataset #165

@sfluegel05

Description

@sfluegel05

This originated from the March 26 StrOntEx meeting.

Status Quo

The ChEBI datasets all extend the _DynamicDataset class. This means that they produce

  • a data.pkl file with all molecule-label pairs in ChEBI
  • a splits.csv file matching ChEBI IDs to train/val/test subsets

This has the advantage of enhancing reproducibility (e.g. by only supplying the splits file and chebai version used)

Goal

Implement the same for PubChem

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions