Skip to content

Number of shards for imagenet dataset?  #4

@biocyberman

Description

@biocyberman

I downloaded the original imagenet dataset

1d675b47d978889d74fa0da5fadfb00e ILSVRC2012_img_train.tar
ccaf1013018ac1037801578038d370da ILSVRC2012_img_train_t3.tar
29b22e2961454d5413ddabcf34fc5622 ILSVRC2012_img_val.tar
e1b8681fff3d63731c599df9b4b6fc02 ILSVRC2012_img_test_v10102019.tar

After unpacking, I ran ./run makeshards.
Number of shards is only 147 for train and 6 for val. I wonder why nshards is set so high in this line:
https://github.com/webdataset/webdataset-lightning/blob/7b98a6a4e9e8735973f9de29151e6215380e5c9d/run#L3

and this line:
https://github.com/webdataset/webdataset-lightning/blob/7b98a6a4e9e8735973f9de29151e6215380e5c9d/train.py#L116

What is the dataset it is expecting, or what is the correct size of the shards?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions