Batch Dataset/DataLoader

Hey @EvenOldridge , I stumbled upon your post about cudf dataloaders speeding up training a long time ago... and recently got around to actually trying my hand at it, so thanks for introducing me to the idea!

I'm just curious, but is there a specific reason you implemented the batchdataloaders like you did in this repo instead of using a custom sampler + regular `DataLoader` with 0 workers?

I wrote out a quick sketch of the implementation I'm thinking of here: https://gist.github.com/NegatioN/1f63c3a79dfe13b183d413123d37d4fa

I understand that your implementation might already have changed significantly since you mentioned integrating it with fast.ai, but I was curious if you ruled this out for any specific reason that I can't clearly see atm. I would think it has the same performance capabilities?

Edit: The biggest difference might be `we can grab each batch as a single read from contiguous memory`? Did you test how large the impact of this was?

/Joakim


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch Dataset/DataLoader #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batch Dataset/DataLoader #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions