Skip to content

change in ImageNet format after being hosted on kaggle. #3

@Spandan-Madan

Description

@Spandan-Madan

Hi,

I'm extremely excited about support for large scale/fast I/O in PyTorch. I am trying to run the example and downloaded ImageNet. As you might be aware, ImageNet is no longer available for download from http://www.image-net.org/download and is now hosted at Kaggle. I downloaded the dataset, but it seems there's a change in the format from the previous version and can no longer be loaded with PyTorch's inbuilt Dataset class. This leads to errors in creating shards.

Here's the error I get:-

The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in ./data

The structure of the downloaded dataset contains:-

.
├── Annotations
│   └── CLS-LOC
│       ├── train
│       └── val
├── Data
│   └── CLS-LOC
│       ├── test
│       ├── train
│       └── val
└── ImageSets
    └── CLS-LOC
        ├── test.txt
        ├── train_cls.txt
        ├── train_loc.txt
        └── val.txt

Can we come up with a work-around which works out of the box with the current distribution of ImageNet? The original PyTorch ImageNet example works with it as we only need the image files. I think the error originates from the parsing of metadata while making shards, so a workaround should be possible I think. Happy to help with this.

Best,
Spandan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions