Skip to content

Batch Dataset/DataLoader #3

@NegatioN

Description

@NegatioN

Hey @EvenOldridge , I stumbled upon your post about cudf dataloaders speeding up training a long time ago... and recently got around to actually trying my hand at it, so thanks for introducing me to the idea!

I'm just curious, but is there a specific reason you implemented the batchdataloaders like you did in this repo instead of using a custom sampler + regular DataLoader with 0 workers?

I wrote out a quick sketch of the implementation I'm thinking of here: https://gist.github.com/NegatioN/1f63c3a79dfe13b183d413123d37d4fa

I understand that your implementation might already have changed significantly since you mentioned integrating it with fast.ai, but I was curious if you ruled this out for any specific reason that I can't clearly see atm. I would think it has the same performance capabilities?

Edit: The biggest difference might be we can grab each batch as a single read from contiguous memory? Did you test how large the impact of this was?

/Joakim

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions