Skip to content

mpilhlt/dal-toolbox

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,288 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAL-Toolbox: A PyTorch Toolbox for Deep Active Learning Research

This toolbox is a modular framework designed to facilitate the implementation and evaluation of active learning (AL) workflows in PyTorch. It includes implementations for the following publications:

Paper Title Venue Code
ActiveGLAE: A Benchmark for Deep Active Learning with Transformers ECML-PKDD 2023 ./publications/active_glae
Role of Hyperparameters in Deep Active Learning IAL ECML-PKDD 2023 ./publications/hyperparameters
Fast Fishing: Approximating Bait for Efficient and Scalable Deep Active Image Classification ECML-PKDD 2024 ./publications/fast_fishing
The Interplay of Uncertainty Modeling and Deep Active Learning TMLR 2024 ./publications/udal
Efficient Bayesian Updates for Deep Active Learning via Laplace Approximations ECML-PKDD 2025 ./publications/laplace_updates
TBD Under Review 2025 ./publications/boss_oracle
Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning CVPR 2026 ./publications/cleaning_the_pool

Getting Started

Installation

Setting up the DAL-Toolbox is straightforward. Clone the repository and execute the following commands:

conda create -n dal-toolbox python=3.12
pip install -e .

Afterward, install additional packages as required for your task. The implementations in the publication directory typically require additional dependencies, which are aggregated into different requirements.txt files.

Usage Example

The following snippet demonstrates a basic AL cycle on a two-dimensional toy dataset:

import torch
import lightning as L
from sklearn.datasets import make_moons
from torch.utils.data import TensorDataset

from dal_toolbox.active_learning import ActiveLearningDataModule
from dal_toolbox.active_learning.strategies import LeastConfidentSampling
from dal_toolbox.models.deterministic import DeterministicModel
from dal_toolbox.models.deterministic.simplenet import SimpleNet

# 1. Create the two moons dataset
X, y = make_moons(n_samples=200, noise=.1, random_state=42)
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).long())

# 2. Setup the AL Data Module with 2 initial randomly labeled samples
al_datamodule = ActiveLearningDataModule(dataset, train_batch_size=32)
al_datamodule.random_init(n_samples=2, class_balanced=True)

# 3. Initialize the Model and Strategy
strategy = LeastConfidentSampling()
model = SimpleNet(dropout_rate=0., num_classes=2)
model = DeterministicModel(
    model, 
    optimizer=torch.optim.SGD(model.parameters(), lr=1e-1, momentum=.9)
)

# 4. Perform Active Learning Cycles
for cycle in range(4):
    # Query and update annotations (skip for the initial cycle)
    if cycle != 0:
        indices = strategy.query(model=model, al_datamodule=al_datamodule, acq_size=2)
        al_datamodule.update_annotations(indices)

    # Train the model
    model.reset_states()
    trainer = L.Trainer(max_epochs=50, enable_progress_bar=False)
    trainer.fit(model, al_datamodule)

Note: While this example uses PyTorch Lightning for convenience, it is not strictly required for most strategies. You can easily replace the L.Trainer with a standard PyTorch training function.

More Complex Examples

Check out tbd and the ./publications directory for more sophisticated implementations.

Citation

If you find this toolbox useful for your research, please consider citing us.

@inproceedings{huseljic2026refine,
	title = {Cleaning the {Pool}: {Progressive} {Filtering} of {Unlabeled} {Pools} in {Deep} {Active} {Learning}},
	shorttitle = {Cleaning the {Pool}},
	author = {Huseljic, Denis and Herde, Marek and Rauch, Lukas and Hahn, Paul and Sick, Bernhard},
	booktitle = {CVPR},
	year = {2026},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 96.7%
  • Python 2.1%
  • Shell 1.2%