National Data Platform - Endpoint API (NDP-EP API)

A REST API that provides unified access to dataset management across the National Data Platform (NDP). Users can search the NDP catalog, ingest new datasets, and manage their own data collections through a single, streamlined M2M interface.

🌐 About the National Data Platform

The NDP-EP API integrates seamlessly with the National Data Platform ecosystem:

🔐 Unified Authentication: Uses NDP's authentication system - your NDP account works directly with this API
📊 Multi-Catalog Management: Control and access datasets across three different CKAN environments
🔍 Centralized Discovery: Search the main NDP catalog and other connected data sources
📥 Streamlined Ingestion: Simplified workflow for adding new datasets to the platform

🏗️ NDP Catalog Architecture

The National Data Platform uses CKAN as its data catalog management software. This API provides access to three different catalog environments, each with specific access levels and purposes:

1. Local Catalog 🏠 (CKAN or MongoDB)

You can use your own catalog backend for local dataset management, with your choice of storage (see Adding New Catalog Backends for custom implementations):

CKAN Backend (Traditional):

Full CKAN compatibility with all extensions
Ideal if you already have CKAN infrastructure
Complete administrative access to your catalog

MongoDB Backend (Modern NoSQL):

Lightweight, no CKAN installation required
Fast document-based storage
Easy to deploy and scale
Perfect for new deployments or cloud-native environments

Both options give you:

Full Control: Create, read, update, and delete datasets
Use Case: Personal or organizational data catalogs
Flexibility: Switch between backends via configuration

2. NDP Central Catalog 🌍

This is the main public catalog of the National Data Platform. Through this API you can:

Read-Only Access: Search and discover publicly available datasets
Use Case: Exploring the official NDP data collection
Permissions: Search and view only - no modifications allowed

3. PreCKAN (Staging Environment) 🔄

This is a staging environment provided by the NDP for dataset submission and review. Here's how it works:

Ingestion Gateway: Submit new datasets for validation and review
Use Case: Contributing datasets to the NDP central catalog
Workflow: Your datasets are analyzed, validated, and if approved, promoted to the central catalog

🚀 Key Features

🔐 NDP Authentication Integration: Seamless login with your National Data Platform credentials
🔄 Pluggable Catalog Backends: Choose between CKAN or MongoDB for your local catalog
🔍 Federated Search: Discover datasets across local, NDP, and staging catalogs
🚀 Specialized Ingestion: Purpose-built endpoints for Kafka topics, S3 resources, web services, and URLs
📦 MINIO S3 Storage: Direct bucket and object management with secure presigned URLs
📋 General Dataset Management: Flexible API for managing datasets with custom metadata
🔧 Service Registry: Register and discover other services (such as microservices, APIs, or apps)
🤖 AI Agent Integration: Model Context Protocol (MCP) support for AI assistants to interact with the API
🌐 Pelican Federation: Access distributed scientific data from OSDF and serve your own data to federations
📈 System Monitoring: Built-in metrics and health monitoring
📚 RESTful API: Comprehensive OpenAPI/Swagger documentation
🔌 Extensible Architecture: Easy to add new catalog backends (Elasticsearch, PostgreSQL, etc.)

⚡ Quick Start

Get the NDP-EP API running with Docker in under 5 minutes:

Prerequisites

Before you begin, ensure you have:

Docker: Container platform for running the API
- Install from docker.com
- Verify installation: docker --version
Docker Compose: Container orchestration tool
- Usually included with Docker Desktop
- Verify installation: docker-compose --version
CKAN Instance (Optional):
- Required only if: You want to use local CKAN or PreCKAN features
- Not needed if: You only plan to use NDP Central Catalog (read-only access)
- Install CKAN following the official documentation
S3-Compatible Storage (Optional):
- Required only if: You want to use S3 object storage features
- Not needed if: You don't plan to use bucket/object management endpoints
- Example: MINIO is a popular S3-compatible service - see MINIO setup guide for Docker installation instructions

1. Configure Environment Variables

Create a .env file or prepare environment variables with your configuration:

# ==============================================
# API CONFIGURATION
# ==============================================
# API root path prefix (e.g., "/test" or "" for root)
# If empty or not set, the API will be available at the root path
# This is useful when deploying the API behind a reverse proxy at a subpath
ROOT_PATH=

# ==============================================
# ORGANIZATION SETTINGS
# ==============================================
# Your organization name for identification and metrics
ORGANIZATION="My organization"

# Endpoint name for identification in metrics and monitoring
EP_NAME="EP Name"

# ==============================================
# METRICS CONFIGURATION
# ==============================================
# Interval in seconds for sending metrics (default: 3300 seconds = 55 minutes)
METRICS_INTERVAL_SECONDS=3300

# ==============================================
# AUTHENTICATION CONFIGURATION
# ==============================================
# URL for the authentication API to retrieve user information
# This endpoint is used to validate tokens and fetch user details
AUTH_API_URL=https://idp.nationaldataplatform.org/temp/information

# ==============================================
# ACCESS CONTROL (Optional)
# ==============================================
# Enable group-based access control (True/False)
# When enabled, only users belonging to one of the groups in GROUP_NAMES
# can perform POST, PUT, DELETE operations. Other authenticated users
# will receive 403 Forbidden on write operations.
# GET endpoints remain public regardless of this setting.
ENABLE_GROUP_BASED_ACCESS=False

# Comma-separated list of allowed groups for write operations
# Only used when ENABLE_GROUP_BASED_ACCESS=True
GROUP_NAMES=admins,developers

# ==============================================
# LOCAL CATALOG CONFIGURATION
# ==============================================
# Choose your local catalog backend: "ckan" or "mongodb"
# Global and Pre-CKAN always use CKAN regardless of this setting
LOCAL_CATALOG_BACKEND=ckan

# ==============================================
# LOCAL CKAN CONFIGURATION (if LOCAL_CATALOG_BACKEND=ckan)
# ==============================================
# Enable or disable the local CKAN instance (True/False)
# Set to True if you have your own CKAN installation
CKAN_LOCAL_ENABLED=True

# Base URL of your local CKAN instance (Required if CKAN_LOCAL_ENABLED=True)
# Example: http://192.168.1.134:5000/ or https://your-ckan-domain.com/
CKAN_URL=http://XXX.XXX.XXX.XXX:XXXX/

# API Key for CKAN authentication (Required if CKAN_LOCAL_ENABLED=True)
# Get this from your CKAN user profile -> API Tokens
CKAN_API_KEY=

# ==============================================
# MONGODB CONFIGURATION (if LOCAL_CATALOG_BACKEND=mongodb)
# ==============================================
# MongoDB connection string
MONGODB_CONNECTION_STRING=mongodb://localhost:27017

# MongoDB database name for local catalog
MONGODB_DATABASE=ndp_local_catalog

# ==============================================
# PRE-CKAN CONFIGURATION
# ==============================================
# Enable or disable the Pre-CKAN instance (True/False)
# Set to True if you want to submit datasets to NDP Central Catalog
PRE_CKAN_ENABLED=True

# URL of the Pre-CKAN staging instance (Required if PRE_CKAN_ENABLED=True)
# This is typically provided by the NDP team
PRE_CKAN_URL=http://XX.XX.XX.XXX:5000/

# API key for Pre-CKAN authentication (Required if PRE_CKAN_ENABLED=True)
# Obtain this from the NDP team or your Pre-CKAN user profile
PRE_CKAN_API_KEY=

# ==============================================
# STREAMING CONFIGURATION
# ==============================================
# Enable or disable Kafka connectivity (True/False)
# Set to True if you want to ingest data from Kafka streams
KAFKA_CONNECTION=False

# Kafka broker hostname or IP address (Required if KAFKA_CONNECTION=True)
KAFKA_HOST=

# Kafka broker port number (Required if KAFKA_CONNECTION=True)
# Default Kafka port is 9092
KAFKA_PORT=9092

# ==============================================
# DEVELOPMENT & TESTING
# ==============================================
# Test token for development purposes (Optional)
# Leave blank in production environments for security
TEST_TOKEN=testing_token

# ==============================================
# EXTERNAL SERVICE INTEGRATIONS
# ==============================================
# Enable or disable JupyterLab integration (True/False)
# Set to True if you want to integrate with a JupyterLab instance
USE_JUPYTERLAB=False

# URL to your JupyterLab instance (Required if USE_JUPYTERLAB=True)
# Example: https://jupyter.your-domain.com or http://localhost:8888
JUPYTER_URL=

# ==============================================
# S3 STORAGE CONFIGURATION
# ==============================================
# Enable or disable S3 storage (True/False)
S3_ENABLED=True

# S3 endpoint (host:port) - use your S3-compatible service endpoint
S3_ENDPOINT=XXX.XXX.XXX.XXX:9000

# S3 access credentials
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin123

# Use secure connection (True for HTTPS, False for HTTP)
S3_SECURE=False

# Default region
S3_REGION=us-east-1

2. Run with Docker

Create the .env file with your configuration (see step 1)
Run the container:

docker run -p 8001:8000 --env-file .env rbardaji/ndp-ep-api

3. Run with Docker Compose (Optional Services)

The docker-compose.yml uses profiles to let you choose which services to start. By default, only the API starts. Use profiles to add optional services:

Available Profiles:

Profile	Services Included
`mongodb`	MongoDB + Mongo Express
`kafka`	Kafka + Zookeeper + Kafka UI
`s3`	MinIO (S3-compatible storage)
`jupyter`	JupyterLab
`pelican`	Pelican Federation (Registry, Director, Origin, Cache)
`frontend`	NDP-EP Frontend Web UI
`full`	All services

Usage Examples:

# API only (no additional services)
docker compose up

# API + MongoDB
docker compose --profile mongodb up

# API + MongoDB + Kafka
docker compose --profile mongodb --profile kafka up

# API + all services
docker compose --profile full up

Note: When using external services (e.g., your own CKAN or Kafka), just run docker compose up and configure the external URLs in your .env file.

4. Verify Installation

Once the container is running, verify everything is working:

API Documentation: http://localhost:8001/docs
Health Check: http://localhost:8001/status/
Interactive API Explorer: Available at the docs URL

5. Common Configuration Scenarios

Scenario 1: NDP Central Catalog Only (Read-Only)

# Minimal configuration for read-only access to NDP Central Catalog
ORGANIZATION="Your Organization"
CKAN_LOCAL_ENABLED=False
PRE_CKAN_ENABLED=False
KAFKA_CONNECTION=False
USE_JUPYTERLAB=False

Scenario 2: Local CKAN Development

# Configuration for local CKAN development
ORGANIZATION="Your Organization"
LOCAL_CATALOG_BACKEND=ckan
CKAN_LOCAL_ENABLED=True
CKAN_URL=http://localhost:5000/
CKAN_API_KEY=your-local-ckan-api-key
PRE_CKAN_ENABLED=False
TEST_TOKEN=dev_token

Scenario 3: MongoDB Local Catalog (No CKAN Required)

# Lightweight setup with MongoDB backend
ORGANIZATION="Your Organization"
LOCAL_CATALOG_BACKEND=mongodb
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
MONGODB_DATABASE=ndp_local_catalog
PRE_CKAN_ENABLED=False
TEST_TOKEN=dev_token

Scenario 4: Full NDP Integration with CKAN

# Complete setup with local CKAN and NDP submission capability
ORGANIZATION="Your Organization"
CKAN_LOCAL_ENABLED=True
CKAN_URL=http://your-ckan-instance:5000/
CKAN_API_KEY=your-local-ckan-api-key
PRE_CKAN_ENABLED=True
PRE_CKAN_URL=https://preckan.nationaldataplatform.org
PRE_CKAN_API_KEY=your-ndp-preckan-api-key

🔒 Group-Based Access Control

The API supports optional group-based access control to restrict write operations (POST, PUT, DELETE) to users belonging to specific groups.

How It Works

Authentication: When a user makes a request with a Bearer token, the API validates the token against the configured AUTH_API_URL
Group Retrieval: The authentication service returns user information including their groups array
Authorization: If ENABLE_GROUP_BASED_ACCESS=True, the API checks if any of the user's groups match the allowed groups in GROUP_NAMES
Access Decision:
- ✅ User belongs to at least one allowed group → Write operation permitted
- ❌ User doesn't belong to any allowed group → 403 Forbidden

Configuration

# Enable group-based access control
ENABLE_GROUP_BASED_ACCESS=True

# Comma-separated list of groups allowed to perform write operations
GROUP_NAMES=admins,developers,data-managers

Behavior

Setting	Read (GET)	Write (POST/PUT/DELETE)
`ENABLE_GROUP_BASED_ACCESS=False`	✅ Public	✅ Any authenticated user
`ENABLE_GROUP_BASED_ACCESS=True`	✅ Public	✅ Only users in `GROUP_NAMES`

Example

If your authentication service returns:

{
  "sub": "user123",
  "groups": ["researchers", "data-managers"] # from ndp keycloak 
}

And your configuration is:

ENABLE_GROUP_BASED_ACCESS=True
GROUP_NAMES=admins,data-managers

The user will be authorized because data-managers is in both the user's groups and GROUP_NAMES.

Notes

Group matching is case-insensitive (Admins matches admins)
GET endpoints remain public regardless of this setting
If ENABLE_GROUP_BASED_ACCESS=True but GROUP_NAMES is empty, all write operations will be denied

📖 Usage Examples

For detailed usage examples and tutorials, please check the documentation in the /docs folder.

🤖 AI Agent Integration (MCP)

The NDP-EP API includes built-in support for the Model Context Protocol (MCP), enabling AI assistants and agents to interact programmatically with all API endpoints.

What is MCP?

The Model Context Protocol is an emerging standard that defines how AI agents communicate with applications. It allows AI assistants like Claude, ChatGPT, and custom agents to discover and invoke API operations automatically.

MCP Endpoint

Once the API is running, the MCP server is automatically available at:

http://your-api-host:port/mcp

For example, with the default Docker setup:

http://localhost:8001/mcp

Key Benefits

Zero Configuration: Automatically exposes all existing API endpoints as MCP tools
AI-Friendly: AI agents can discover available operations and their parameters
Schema Preservation: Maintains all request/response models and validation
Secure: Respects existing authentication mechanisms
Standard Protocol: Compatible with any MCP-compliant AI client

Use Cases

Dataset Management with AI Assistants:

"Search for oceanography datasets in the NDP catalog"
"Create a new dataset with these metadata fields"
"List all my S3 buckets and their contents"

Automated Workflows:

AI agents can orchestrate complex data ingestion pipelines
Automated catalog synchronization between environments
Intelligent data discovery and recommendation

Development & Testing:

AI-assisted API testing and validation
Automatic documentation generation
Code generation for API clients

Connecting AI Clients

The MCP endpoint works with any MCP-compatible client. Example clients include:

Claude Code: Anthropic's AI coding assistant
Custom MCP Clients: Using the official MCP SDK
AI Automation Tools: Any tool supporting the MCP protocol

For configuration examples and integration guides, visit the FastAPI-MCP documentation.

📊 System Metrics

⚠️ CAUTION: This API automatically collects and logs system metrics (default: every 55 minutes, configurable via METRICS_INTERVAL_SECONDS).

The NDP-EP API automatically collects and logs comprehensive system metrics at configurable intervals (default: 55 minutes). These metrics provide visibility into system health, resource usage, catalog statistics, and service connectivity.

Collected Metrics

System Information:

Public IP Address: External IP of the API instance
Resource Usage: Real-time CPU percentage, memory (used/total GB), and disk (used/total GB)
API Version: Current version of the NDP-EP API
Organization: Configured organization name
EP Name: Endpoint identifier name

Catalog Statistics:

Number of Datasets: Total datasets in local catalog
Number of Services: Total registered services
Services List: Array of all registered service titles

Service Registry:

Global CKAN: NDP central catalog connection details
Pre-CKAN: Staging environment configuration (if enabled)
Local CKAN: Local catalog instance details (if configured)
Kafka: Streaming service configuration (if enabled)
JupyterLab: Notebook service integration (if configured)

Metrics Output Example

{
  "public_ip": "203.0.113.45",
  "cpu": "5.7%",
  "memory": "4.8GB/30.8GB",
  "disk": "265.4GB/936.8GB",
  "version": "0.3.2",
  "organization": "Your Organization",
  "ep_name": "Your EP",
  "num_datasets": 23,
  "num_services": 5,
  "services": [
    "Service Title 1",
    "Service Title 2",
    "Service Title 3"
  ],
  "timestamp": "2025-10-09T16:48:09.874843Z"
}

🌐 Pelican Federation Integration

The NDP-EP API integrates with the Pelican Platform to enable access to distributed scientific data federations and to serve your own data to the global scientific community.

What is Pelican?

Pelican is an open-source data federation platform that connects distributed data repositories under a unified architecture. It enables:

Federated Data Access: Browse and download from 20+ PB of scientific data in the Open Science Data Federation (OSDF)
Data Sharing: Serve your MinIO/S3 data to the global scientific federation
Distributed Caching: Automatic caching improves delivery efficiency for popular datasets
Unified Namespace: Access heterogeneous sources (S3, POSIX, HTTP) through a common pelican:// protocol

Two Integration Approaches

1. Access External Federations (Phase 1)

Use dedicated Pelican endpoints to browse and download from external federations like OSDF:

Available Endpoints:

GET /pelican/federations - List available federations (OSDF, PATh-CC, etc.)
GET /pelican/browse?path=/ospool/data&federation=osdf - Browse federation namespaces
GET /pelican/info?path=/ospool/file.nc&federation=osdf - Get file metadata
GET /pelican/download?path=/ospool/file.nc&stream=true - Download/stream files
POST /pelican/import-metadata - Import external file as resource in local catalog

Example Usage:

# List available federations
curl http://localhost:8002/pelican/federations

# Browse OSDF public data
curl "http://localhost:8002/pelican/browse?path=/ospool/uc-shared/public&detail=true"

# Download file from federation
curl "http://localhost:8002/pelican/download?path=/ospool/data/file.nc&stream=true" -o file.nc

# Import external Pelican file into local catalog
curl -X POST http://localhost:8002/pelican/import-metadata \
  -H "Content-Type: application/json" \
  -d '{
    "pelican_url": "pelican://osg-htc.org/ospool/data/temperature.nc",
    "package_id": "my-dataset-id",
    "resource_name": "OSDF Temperature Data"
  }'

2. Pelican as Storage Backend (Phase 2)

Use pelican:// URLs in your resource definitions - the API automatically handles downloads:

# Register dataset with Pelican URL
curl -X POST http://localhost:8002/services \
  -H "Content-Type: application/json" \
  -d '{
    "name": "osdf-climate-data",
    "title": "Climate Data from OSDF",
    "url": "pelican://osg-htc.org/ospool/climate/dataset.nc"
  }'

# The download handler automatically detects and uses Pelican
# No changes needed to existing endpoints!

Running Your Own Pelican Federation

The included docker-compose.yml sets up a complete local Pelican federation with 4 services:

Pelican Registry (port 8444): Manages namespace registrations
Pelican Director (port 8445): Routes client requests to appropriate origins/caches
Pelican Origin (port 8446-8447): Serves MinIO data at federation path /ndp-demo
Pelican Cache (port 8448-8449): Caches popular objects for faster delivery

Your MinIO data becomes accessible via:

pelican://pelican-origin/ndp-demo/bucket-name/object-key

Configuration

Enable Pelican in your .env file:

# Enable Pelican federation access
PELICAN_ENABLED=True

# Default federation (leave empty for OSDF)
PELICAN_FEDERATION_URL=

# Use caching infrastructure (recommended)
PELICAN_DIRECT_READS=False

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    NDP-EP API                            │
│  ┌──────────────────┐         ┌──────────────────┐     │
│  │ Phase 1 Routes   │         │  Phase 2 Handler │     │
│  │ /pelican/*       │         │  pelican:// URLs │     │
│  └────────┬─────────┘         └─────────┬────────┘     │
│           │                              │               │
│           └──────────┬───────────────────┘               │
│                      │                                   │
│            ┌─────────▼──────────┐                       │
│            │ PelicanRepository  │                       │
│            │   (pelicanfs)      │                       │
│            └─────────┬──────────┘                       │
└──────────────────────┼──────────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │               │
   ┌────▼─────┐   ┌───▼────┐    ┌────▼─────┐
   │   OSDF   │   │ PATh-CC│    │  Local   │
   │ Director │   │Director│    │ Director │
   └────┬─────┘   └───┬────┘    └────┬─────┘
        │             │               │
   ┌────▼─────┐  ┌───▼────┐     ┌────▼──────┐
   │  Cache   │  │ Cache  │     │   Cache   │
   └────┬─────┘  └───┬────┘     └────┬──────┘
        │            │                │
   ┌────▼─────┐ ┌───▼────┐      ┌────▼──────┐
   │  Origin  │ │ Origin │      │  Origin   │
   │(20+ PB)  │ │        │      │  (MinIO)  │
   └──────────┘ └────────┘      └───────────┘

Benefits

✅ Access 20+ PB of Scientific Data: OSDF provides access to datasets from major research institutions ✅ Distributed Caching: Popular datasets are cached closer to compute resources ✅ Backward Compatible: Existing endpoints work unchanged with pelican:// URLs ✅ Share Your Data: Expose MinIO datasets to the global scientific federation ✅ Unified Protocol: Single API for HTTP, S3, Kafka, and Pelican resources

Learn More

Pelican Platform: https://pelicanplatform.org
OSDF Documentation: https://osg-htc.org/services/osdf.html
Configuration Guide: pelican-origin.yml

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

For more information about the National Data Platform, visit nationaldataplatform.org

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
api		api
docs		docs
tests		tests
.coverage		.coverage
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
example.env		example.env
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
pelican-origin-config.yaml		pelican-origin-config.yaml
pelican-origin.yml		pelican-origin.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
seed_sample_data.py		seed_sample_data.py
setup.sh		setup.sh
stack-presentation.html		stack-presentation.html
test_all_endpoints.py		test_all_endpoints.py

License

national-data-platform/ep-api

Folders and files

Latest commit

History

Repository files navigation

National Data Platform - Endpoint API (NDP-EP API)

🌐 About the National Data Platform

🏗️ NDP Catalog Architecture

1. Local Catalog 🏠 (CKAN or MongoDB)

2. NDP Central Catalog 🌍

3. PreCKAN (Staging Environment) 🔄

🚀 Key Features

⚡ Quick Start

Prerequisites

1. Configure Environment Variables

2. Run with Docker

3. Run with Docker Compose (Optional Services)

4. Verify Installation

5. Common Configuration Scenarios

Scenario 1: NDP Central Catalog Only (Read-Only)

Scenario 2: Local CKAN Development

Scenario 3: MongoDB Local Catalog (No CKAN Required)

Scenario 4: Full NDP Integration with CKAN

🔒 Group-Based Access Control

How It Works

Configuration

Behavior

Example

Notes

📖 Usage Examples

🤖 AI Agent Integration (MCP)

What is MCP?

MCP Endpoint

Key Benefits

Use Cases

Connecting AI Clients

📊 System Metrics

Collected Metrics

Metrics Output Example

🌐 Pelican Federation Integration

What is Pelican?

Two Integration Approaches

1. Access External Federations (Phase 1)

2. Pelican as Storage Backend (Phase 2)

Running Your Own Pelican Federation

Configuration

Architecture Overview

Benefits

Learn More

📄 License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Contributors 4

Uh oh!

Languages

Packages