ncxlib Dataset storage and loader

This repository provides a structured approach for loading, processing, and saving datasets in a binary format using Python. It is designed to work with popular datasets (such as MNIST) stored in binary formats and allows for easy serialization with pickle. The code processes images and labels into structured data, which can be loaded into memory as needed.

This repo is mainly for internal usage but also has perma links for preprocesssed and pickle loaded popular datasets.

Storage Format

Each data file is named as ncxlib..data inside the data// folder. Every pickle file contains data in the following structure once loaded:

    {
        "X_train": list[],
        "X_test": list[],
        "y_train": list[],
        "y_test": list[],
    }

Getting started

You can directly download the dataset using curl:

curl -o ncxlib.mnist.data <perma-link>

Datasets

Dataset	Description	Permanent Link
MNIST	A dataset for handwritten number images and labels by the NIST foundation.	Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ncxlib Dataset storage and loader

Storage Format

Getting started

Datasets

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ncxlib Dataset storage and loader

Storage Format

Getting started

Datasets