Conditional Sampling using Pytorch Samplers#38
Draft
alexander-koch wants to merge 5 commits intodevfrom
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements conditional sampling using Pytorch samplers. Instead of implementing the functionality on a Pytorch dataset level, a scheduled sampler is created at the setup. This allows to implement conditional sampling with minimal changes to the codebase and requires no hard-coding. The dataframes can be used directly for conditioning information
and MultiModalDatsets do not need to ship with extra labels which would not be used by the end user.
Conditionings are created by the user of the following format:
and can then be provided via jsonargparse in the yaml file directly as a classpath:
Conditioning classes should implement a call function that returns the index that should be sampled of the dataframe.
Ideally, the generator should be used to make this a somewhat non-deterministic process, unless the user wants a deterministic conditioning.
One restriction of this implementation is that
max_stepsneeds to be set, because the random conditional schedule cannot be created on the fly.Under the hood, permutations are created for the main dataset, simulating a virtual "walkthrough" until the dataloader is finished and has to restart with a new permutation. According to this schedule, the auxiliary and augmented dataset are then created using the user-provided conditioning.