Getting Started with Multiple Instance Learning

Multiple Instance Learning (MIL)

MIL is a weak supervision paradigm that lends itself to work with data that is labeled inexactly. E.g. a whole-slide image of tissue (either traditionally stained as is the case in digital/computational pathology or native tissue from a infrared microscopy approach as might be your target if you are using OpenVibSpec) without any annotation available and only a coarse-grained label in the form of the patient status malignant vs. benign. MIL now tries to find the key elements that distinguish positive bags of elements from negative ones.

This means that data needs to be preprocessed accordingly: Let's say you have a WSI of size 10000x10000 pixels with makes for a numpy array of shape (10000, 10000, z) where z is your channel depth. For a normal RGB image this would be 3, however in the realm of spectral histopathology z is a vector of wavenumbers from infrared measurement which can be quite large. The data now needs to be tiled into smaller patches of a given size, e.g. 224x224. This is possible with a large number of methods so pick what fits your approach the best. After tiling all of your samples you will have a new dataset consisting of the same N WSIs/samples but now each N is an array of shape (n, 224, 224, z) containing n tiles with the same dimensions.

Minimal example

In the mil.py submodule everything can be found to define and train an attention-based MIL model.

model = mil.DeepAttentionMIL(spectra=z) # this instantiates the model with a chosen input depth channel
# define your optimizer of choice, e.g.
optim = torch.optim.Adadelta(
   model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0
)

# now construct your torch dataloaders from your prepared datasets, e.g.
train_dl = torch.utils.data.DataLoader(training_ds, batch_size=1, shuffle=True)
validation_dl = torch.utils.data.DataLoader(validation_ds, batch_size=1, shuffle=False)

# and with that you are ready to train the model!
history = mil.fit(model, optim, train_dl, validation_dl, model_savepath)

Data & Device Distribution

Because of the computational load, OpenVibSpec enables you to distribute computations on up to four GPUs. First, you should determine the current hardware the calculations will be performed on (GPU (if available and preferred) or CPU):

device = get_device(gpu_preferred = True)

Then you need to choose a GPU distribution. You can pick one of the following predefined distributions or create a custom one if you have more than four GPUs:

device_ordinals_cpu - Runs all calculations on the CPU (not recommended)
device_ordinals_single_gpu - Runs all calculations on the GPU with device index 0
device_ordinals_cluster_gpu - Runs all calculations on four GPUs

In the example, we pick the single GPU:

chosen_ordinals = device_ordinals_cluster_gpu

So to apply this, these two configurations need to be passed to the model above:

model = DeepAttentionMIL(spectra=z, device=device, device_ordinals=chosen_ordinals)

Correspondence:

Prof. Dr. Axel Mosig, Bioinformatics Group, Ruhr Universität Bochum, Germany

Got any trouble?

Is there a topic missing?

Reach out to us via an issue:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started with Multiple Instance Learning

Multiple Instance Learning (MIL)

Minimal example

Data & Device Distribution

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally