Refactor: Generalize dataset base classes & consolidate dynamic splitting logic



**Description:**
Currently, the code related to **dynamic splitting** in `chebi.py` and the proteins repo’s data class is duplicated. Both implementations are effectively the same, which leads to unnecessary code redundancy.

**Proposed changes:**

1. **Move common code to base class** — e.g., `DynamicDataset` — to encapsulate shared dynamic splitting logic.

   * Both ChEBI and protein dataset classes should inherit from this base class.
   * This will centralize changes and make maintenance easier.

2. **Refactor dataset hierarchy to be more generic**:

   * Certain hyperparameters that are specific to ChEBI, such as

     ```python
     chebi_version: int = 200
     ```

     in `XYBaseDataModule`, should be pushed down into a **ChEBI-specific base class** rather than existing in a generic base.

3. **Outcome:**

   * Eliminate duplicate code between `chebi.py` and the proteins repo.
   * Improve maintainability by isolating dataset-specific configurations.
   * Make it easier to introduce new datasets without rewriting the splitting logic.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Generalize dataset base classes & consolidate dynamic splitting logic #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor: Generalize dataset base classes & consolidate dynamic splitting logic #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions