AODN (Australian Ocean Data Network) Cloud Optimised library

AODN Cloud Optimised library allows to convert oceanographic datasets from IMOS (Integrated Marine Observing System) / AODN (Australian Ocean Data Network) into cloud-optimised formats such as Zarr (for gridded multidimensional data) and Parquet (for tabular data).

Documentation

Visit the documentation on ReadTheDocs for detailed information.

Key Features

Data Conversion

Convert CSV or NetCDF (single or multidimensional) to Zarr or Parquet.
Dataset configuration: YAML-based configuration with inheritance, allowing similar datasets to share settings. Example: Radar ACORN, GHRSST.
Semi-automatic creation of dataset configuration: ReadTheDocs guide.
Generic handlers for standard datasets: GenericParquetHandler, GenericZarrHandler
Custom handlers can inherit from generic handlers: Argo handler, Mooring Timeseries Handler

Clustering & Parallel Processing

Supports local Dask cluster and remote clusters:
- Coiled
- EC2
- Fargate cluster
Cluster behaviour is configuration-driven and can be easily overridden.
Automatic restart of remote cluster upon Dask failure.
Zarr: Gridded datasets are processed in batch and in parallel using xarray.open_mfdataset.
Parquet: Tabular files are processed in batch and in parallel as independent tasks, implemented with concurrent.futures.Future.
S3 / S3-Compatible Storage Support: Support for AWS S3 and S3-compatible endpoints (e.g., MinIO, LocalStack) with configurable input/output buckets and authentication via s3fs and boto3.

Reprocessing

Zarr: Reprocessing is achieved by writing to specific slices, including non-contiguous regions.
Parquet: Reprocessing uses PyArrow internal overwriting; can also be forced when input files change significantly.

Chunking & Partitioning

Improves performance for querying and parallel processing.
Parquet: Partitioned by polygon and timestamp slices. Issue reference
Zarr: Chunking is defined in dataset configuration.

Dynamic Variable Definition

See doc

Global Attributes -> variable
variable attribute -> variable
filename part -> variable
...

Metadata

Parquet: Metadata stored as a sidecar _metadata.parquet file for faster queries and schema discovery.

MCP Server (AI Assistant Integration)

This library ships with an MCP (Model Context Protocol) server that exposes the AODN dataset catalogue to AI assistants such as GitHub Copilot CLI, Gemini CLI, and Claude Desktop.

It enables an AI to discover datasets, inspect schemas, verify real S3 data coverage, and generate validated Jupyter notebooks for oceanographic analysis.

See aodn_cloud_optimised/mcp/README.md for installation and usage instructions.

Quick Guide

Installation

Requirements:

Python >= 3.11
AWS SSO configured for pushing files to S3
Optional: Coiled account for remote clustering

Core install (data processing only)

To use the library for data processing pipelines (Zarr/Parquet conversion), no notebook or test dependencies needed:

git clone https://github.com/aodn/aodn_cloud_optimised.git
cd aodn_cloud_optimised
make core   # installs core deps via Poetry venv

Automatic installation of the latest wheel release

curl -s https://raw.githubusercontent.com/aodn/aodn_cloud_optimised/main/install.sh | bash

Otherwise, go to the release page.

Development

Full contributor setup (notebooks + tests + docs + tooling):

git clone https://github.com/aodn/aodn_cloud_optimised.git
cd aodn_cloud_optimised
make dev                         # Poetry venv — recommended
# or: ./setup_miniforge_venvs.sh dev   # named mamba env alternative
poetry run pre-commit install

See ReadTheDocs - Dev for full details.

Usage

See ReadTheDocs - Usage

Getting Started - Notebooks

A curated list of Jupyter Notebooks ready to be loaded in Google Colab and Binder for users to play with IMOS/AODN converted to Cloud Optimised dataset. Click on the badge above

Name		Name	Last commit message	Last commit date
Latest commit History 1,190 Commits
.github		.github
aodn_cloud_optimised		aodn_cloud_optimised
docs		docs
integration_testing		integration_testing
notebooks		notebooks
test_aodn_cloud_optimised		test_aodn_cloud_optimised
.gitattributes		.gitattributes
.gitignore		.gitignore
.poetry-version		.poetry-version
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
install.sh		install.sh
motd		motd
poetry.lock		poetry.lock
poetry.toml		poetry.toml
poetry_lock_helper.sh		poetry_lock_helper.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_miniforge_venvs.sh		setup_miniforge_venvs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AODN (Australian Ocean Data Network) Cloud Optimised library

Documentation

Key Features

Data Conversion

Clustering & Parallel Processing

Reprocessing

Chunking & Partitioning

Dynamic Variable Definition

Metadata

MCP Server (AI Assistant Integration)

Quick Guide

Installation

Core install (data processing only)

Automatic installation of the latest wheel release

Development

Usage

Getting Started - Notebooks

About

Uh oh!

Releases 69

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AODN (Australian Ocean Data Network) Cloud Optimised library

Documentation

Key Features

Data Conversion

Clustering & Parallel Processing

Reprocessing

Chunking & Partitioning

Dynamic Variable Definition

Metadata

MCP Server (AI Assistant Integration)

Quick Guide

Installation

Core install (data processing only)

Automatic installation of the latest wheel release

Development

Usage

Getting Started - Notebooks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 69

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages