A REST API that provides unified access to dataset management across the National Data Platform (NDP). Users can search the NDP catalog, ingest new datasets, and manage their own data collections through a single, streamlined M2M interface.
The NDP-EP API integrates seamlessly with the National Data Platform ecosystem:
- π Unified Authentication: Uses NDP's authentication system - your NDP account works directly with this API
- π Multi-Catalog Management: Control and access datasets across three different CKAN environments
- π Centralized Discovery: Search the main NDP catalog and other connected data sources
- π₯ Streamlined Ingestion: Simplified workflow for adding new datasets to the platform
The National Data Platform uses CKAN as its data catalog management software. This API provides access to three different catalog environments, each with specific access levels and purposes:
You can use your own catalog backend for local dataset management, with your choice of storage (see Adding New Catalog Backends for custom implementations):
CKAN Backend (Traditional):
- Full CKAN compatibility with all extensions
- Ideal if you already have CKAN infrastructure
- Complete administrative access to your catalog
MongoDB Backend (Modern NoSQL):
- Lightweight, no CKAN installation required
- Fast document-based storage
- Easy to deploy and scale
- Perfect for new deployments or cloud-native environments
Both options give you:
- Full Control: Create, read, update, and delete datasets
- Use Case: Personal or organizational data catalogs
- Flexibility: Switch between backends via configuration
This is the main public catalog of the National Data Platform. Through this API you can:
- Read-Only Access: Search and discover publicly available datasets
- Use Case: Exploring the official NDP data collection
- Permissions: Search and view only - no modifications allowed
This is a staging environment provided by the NDP for dataset submission and review. Here's how it works:
- Ingestion Gateway: Submit new datasets for validation and review
- Use Case: Contributing datasets to the NDP central catalog
- Workflow: Your datasets are analyzed, validated, and if approved, promoted to the central catalog
- π NDP Authentication Integration: Seamless login with your National Data Platform credentials
- π Pluggable Catalog Backends: Choose between CKAN or MongoDB for your local catalog
- π Federated Search: Discover datasets across local, NDP, and staging catalogs
- π Specialized Ingestion: Purpose-built endpoints for Kafka topics, S3 resources, web services, and URLs
- π¦ MINIO S3 Storage: Direct bucket and object management with secure presigned URLs
- π General Dataset Management: Flexible API for managing datasets with custom metadata
- π§ Service Registry: Register and discover other services (such as microservices, APIs, or apps)
- π€ AI Agent Integration: Model Context Protocol (MCP) support for AI assistants to interact with the API
- π Pelican Federation: Access distributed scientific data from OSDF and serve your own data to federations
- π System Monitoring: Built-in metrics and health monitoring
- π RESTful API: Comprehensive OpenAPI/Swagger documentation
- π Extensible Architecture: Easy to add new catalog backends (Elasticsearch, PostgreSQL, etc.)
Get the NDP-EP API running with Docker in under 5 minutes:
Before you begin, ensure you have:
-
Docker: Container platform for running the API
- Install from docker.com
- Verify installation:
docker --version
-
Docker Compose: Container orchestration tool
- Usually included with Docker Desktop
- Verify installation:
docker-compose --version
-
CKAN Instance (Optional):
- Required only if: You want to use local CKAN or PreCKAN features
- Not needed if: You only plan to use NDP Central Catalog (read-only access)
- Install CKAN following the official documentation
-
S3-Compatible Storage (Optional):
- Required only if: You want to use S3 object storage features
- Not needed if: You don't plan to use bucket/object management endpoints
- Example: MINIO is a popular S3-compatible service - see MINIO setup guide for Docker installation instructions
Create a .env file or prepare environment variables with your configuration:
# ==============================================
# API CONFIGURATION
# ==============================================
# API root path prefix (e.g., "/test" or "" for root)
# If empty or not set, the API will be available at the root path
# This is useful when deploying the API behind a reverse proxy at a subpath
ROOT_PATH=
# ==============================================
# ORGANIZATION SETTINGS
# ==============================================
# Your organization name for identification and metrics
ORGANIZATION="My organization"
# Endpoint name for identification in metrics and monitoring
EP_NAME="EP Name"
# ==============================================
# METRICS CONFIGURATION
# ==============================================
# Interval in seconds for sending metrics (default: 3300 seconds = 55 minutes)
METRICS_INTERVAL_SECONDS=3300
# ==============================================
# AUTHENTICATION CONFIGURATION
# ==============================================
# URL for the authentication API to retrieve user information
# This endpoint is used to validate tokens and fetch user details
AUTH_API_URL=https://idp.nationaldataplatform.org/temp/information
# ==============================================
# ACCESS CONTROL (Optional)
# ==============================================
# Enable group-based access control (True/False)
# When enabled, only users belonging to one of the groups in GROUP_NAMES
# can perform POST, PUT, DELETE operations. Other authenticated users
# will receive 403 Forbidden on write operations.
# GET endpoints remain public regardless of this setting.
ENABLE_GROUP_BASED_ACCESS=False
# Comma-separated list of allowed groups for write operations
# Only used when ENABLE_GROUP_BASED_ACCESS=True
GROUP_NAMES=admins,developers
# ==============================================
# LOCAL CATALOG CONFIGURATION
# ==============================================
# Choose your local catalog backend: "ckan" or "mongodb"
# Global and Pre-CKAN always use CKAN regardless of this setting
LOCAL_CATALOG_BACKEND=ckan
# ==============================================
# LOCAL CKAN CONFIGURATION (if LOCAL_CATALOG_BACKEND=ckan)
# ==============================================
# Enable or disable the local CKAN instance (True/False)
# Set to True if you have your own CKAN installation
CKAN_LOCAL_ENABLED=True
# Base URL of your local CKAN instance (Required if CKAN_LOCAL_ENABLED=True)
# Example: http://192.168.1.134:5000/ or https://your-ckan-domain.com/
CKAN_URL=http://XXX.XXX.XXX.XXX:XXXX/
# API Key for CKAN authentication (Required if CKAN_LOCAL_ENABLED=True)
# Get this from your CKAN user profile -> API Tokens
CKAN_API_KEY=
# ==============================================
# MONGODB CONFIGURATION (if LOCAL_CATALOG_BACKEND=mongodb)
# ==============================================
# MongoDB connection string
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
# MongoDB database name for local catalog
MONGODB_DATABASE=ndp_local_catalog
# ==============================================
# PRE-CKAN CONFIGURATION
# ==============================================
# Enable or disable the Pre-CKAN instance (True/False)
# Set to True if you want to submit datasets to NDP Central Catalog
PRE_CKAN_ENABLED=True
# URL of the Pre-CKAN staging instance (Required if PRE_CKAN_ENABLED=True)
# This is typically provided by the NDP team
PRE_CKAN_URL=http://XX.XX.XX.XXX:5000/
# API key for Pre-CKAN authentication (Required if PRE_CKAN_ENABLED=True)
# Obtain this from the NDP team or your Pre-CKAN user profile
PRE_CKAN_API_KEY=
# ==============================================
# STREAMING CONFIGURATION
# ==============================================
# Enable or disable Kafka connectivity (True/False)
# Set to True if you want to ingest data from Kafka streams
KAFKA_CONNECTION=False
# Kafka broker hostname or IP address (Required if KAFKA_CONNECTION=True)
KAFKA_HOST=
# Kafka broker port number (Required if KAFKA_CONNECTION=True)
# Default Kafka port is 9092
KAFKA_PORT=9092
# ==============================================
# DEVELOPMENT & TESTING
# ==============================================
# Test token for development purposes (Optional)
# Leave blank in production environments for security
TEST_TOKEN=testing_token
# ==============================================
# EXTERNAL SERVICE INTEGRATIONS
# ==============================================
# Enable or disable JupyterLab integration (True/False)
# Set to True if you want to integrate with a JupyterLab instance
USE_JUPYTERLAB=False
# URL to your JupyterLab instance (Required if USE_JUPYTERLAB=True)
# Example: https://jupyter.your-domain.com or http://localhost:8888
JUPYTER_URL=
# ==============================================
# S3 STORAGE CONFIGURATION
# ==============================================
# Enable or disable S3 storage (True/False)
S3_ENABLED=True
# S3 endpoint (host:port) - use your S3-compatible service endpoint
S3_ENDPOINT=XXX.XXX.XXX.XXX:9000
# S3 access credentials
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin123
# Use secure connection (True for HTTPS, False for HTTP)
S3_SECURE=False
# Default region
S3_REGION=us-east-1-
Create the .env file with your configuration (see step 1)
-
Run the container:
docker run -p 8001:8000 --env-file .env rbardaji/ndp-ep-apiThe docker-compose.yml uses profiles to let you choose which services to start. By default, only the API starts. Use profiles to add optional services:
Available Profiles:
| Profile | Services Included |
|---|---|
mongodb |
MongoDB + Mongo Express |
kafka |
Kafka + Zookeeper + Kafka UI |
s3 |
MinIO (S3-compatible storage) |
jupyter |
JupyterLab |
pelican |
Pelican Federation (Registry, Director, Origin, Cache) |
frontend |
NDP-EP Frontend Web UI |
full |
All services |
Usage Examples:
# API only (no additional services)
docker compose up
# API + MongoDB
docker compose --profile mongodb up
# API + MongoDB + Kafka
docker compose --profile mongodb --profile kafka up
# API + all services
docker compose --profile full upNote: When using external services (e.g., your own CKAN or Kafka), just run docker compose up and configure the external URLs in your .env file.
Once the container is running, verify everything is working:
- API Documentation: http://localhost:8001/docs
- Health Check: http://localhost:8001/status/
- Interactive API Explorer: Available at the docs URL
# Minimal configuration for read-only access to NDP Central Catalog
ORGANIZATION="Your Organization"
CKAN_LOCAL_ENABLED=False
PRE_CKAN_ENABLED=False
KAFKA_CONNECTION=False
USE_JUPYTERLAB=False# Configuration for local CKAN development
ORGANIZATION="Your Organization"
LOCAL_CATALOG_BACKEND=ckan
CKAN_LOCAL_ENABLED=True
CKAN_URL=http://localhost:5000/
CKAN_API_KEY=your-local-ckan-api-key
PRE_CKAN_ENABLED=False
TEST_TOKEN=dev_token# Lightweight setup with MongoDB backend
ORGANIZATION="Your Organization"
LOCAL_CATALOG_BACKEND=mongodb
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
MONGODB_DATABASE=ndp_local_catalog
PRE_CKAN_ENABLED=False
TEST_TOKEN=dev_token# Complete setup with local CKAN and NDP submission capability
ORGANIZATION="Your Organization"
CKAN_LOCAL_ENABLED=True
CKAN_URL=http://your-ckan-instance:5000/
CKAN_API_KEY=your-local-ckan-api-key
PRE_CKAN_ENABLED=True
PRE_CKAN_URL=https://preckan.nationaldataplatform.org
PRE_CKAN_API_KEY=your-ndp-preckan-api-keyThe API supports optional group-based access control to restrict write operations (POST, PUT, DELETE) to users belonging to specific groups.
- Authentication: When a user makes a request with a Bearer token, the API validates the token against the configured
AUTH_API_URL - Group Retrieval: The authentication service returns user information including their
groupsarray - Authorization: If
ENABLE_GROUP_BASED_ACCESS=True, the API checks if any of the user's groups match the allowed groups inGROUP_NAMES - Access Decision:
- β User belongs to at least one allowed group β Write operation permitted
- β User doesn't belong to any allowed group β 403 Forbidden
# Enable group-based access control
ENABLE_GROUP_BASED_ACCESS=True
# Comma-separated list of groups allowed to perform write operations
GROUP_NAMES=admins,developers,data-managers| Setting | Read (GET) | Write (POST/PUT/DELETE) |
|---|---|---|
ENABLE_GROUP_BASED_ACCESS=False |
β Public | β Any authenticated user |
ENABLE_GROUP_BASED_ACCESS=True |
β Public | β
Only users in GROUP_NAMES |
If your authentication service returns:
{
"sub": "user123",
"groups": ["researchers", "data-managers"] # from ndp keycloak
}And your configuration is:
ENABLE_GROUP_BASED_ACCESS=True
GROUP_NAMES=admins,data-managersThe user will be authorized because data-managers is in both the user's groups and GROUP_NAMES.
- Group matching is case-insensitive (
Adminsmatchesadmins) - GET endpoints remain public regardless of this setting
- If
ENABLE_GROUP_BASED_ACCESS=TruebutGROUP_NAMESis empty, all write operations will be denied
For detailed usage examples and tutorials, please check the documentation in the /docs folder.
The NDP-EP API includes built-in support for the Model Context Protocol (MCP), enabling AI assistants and agents to interact programmatically with all API endpoints.
The Model Context Protocol is an emerging standard that defines how AI agents communicate with applications. It allows AI assistants like Claude, ChatGPT, and custom agents to discover and invoke API operations automatically.
Once the API is running, the MCP server is automatically available at:
http://your-api-host:port/mcp
For example, with the default Docker setup:
http://localhost:8001/mcp
- Zero Configuration: Automatically exposes all existing API endpoints as MCP tools
- AI-Friendly: AI agents can discover available operations and their parameters
- Schema Preservation: Maintains all request/response models and validation
- Secure: Respects existing authentication mechanisms
- Standard Protocol: Compatible with any MCP-compliant AI client
Dataset Management with AI Assistants:
- "Search for oceanography datasets in the NDP catalog"
- "Create a new dataset with these metadata fields"
- "List all my S3 buckets and their contents"
Automated Workflows:
- AI agents can orchestrate complex data ingestion pipelines
- Automated catalog synchronization between environments
- Intelligent data discovery and recommendation
Development & Testing:
- AI-assisted API testing and validation
- Automatic documentation generation
- Code generation for API clients
The MCP endpoint works with any MCP-compatible client. Example clients include:
- Claude Code: Anthropic's AI coding assistant
- Custom MCP Clients: Using the official MCP SDK
- AI Automation Tools: Any tool supporting the MCP protocol
For configuration examples and integration guides, visit the FastAPI-MCP documentation.
β οΈ CAUTION: This API automatically collects and logs system metrics (default: every 55 minutes, configurable viaMETRICS_INTERVAL_SECONDS).
The NDP-EP API automatically collects and logs comprehensive system metrics at configurable intervals (default: 55 minutes). These metrics provide visibility into system health, resource usage, catalog statistics, and service connectivity.
System Information:
- Public IP Address: External IP of the API instance
- Resource Usage: Real-time CPU percentage, memory (used/total GB), and disk (used/total GB)
- API Version: Current version of the NDP-EP API
- Organization: Configured organization name
- EP Name: Endpoint identifier name
Catalog Statistics:
- Number of Datasets: Total datasets in local catalog
- Number of Services: Total registered services
- Services List: Array of all registered service titles
Service Registry:
- Global CKAN: NDP central catalog connection details
- Pre-CKAN: Staging environment configuration (if enabled)
- Local CKAN: Local catalog instance details (if configured)
- Kafka: Streaming service configuration (if enabled)
- JupyterLab: Notebook service integration (if configured)
{
"public_ip": "203.0.113.45",
"cpu": "5.7%",
"memory": "4.8GB/30.8GB",
"disk": "265.4GB/936.8GB",
"version": "0.3.2",
"organization": "Your Organization",
"ep_name": "Your EP",
"num_datasets": 23,
"num_services": 5,
"services": [
"Service Title 1",
"Service Title 2",
"Service Title 3"
],
"timestamp": "2025-10-09T16:48:09.874843Z"
}The NDP-EP API integrates with the Pelican Platform to enable access to distributed scientific data federations and to serve your own data to the global scientific community.
Pelican is an open-source data federation platform that connects distributed data repositories under a unified architecture. It enables:
- Federated Data Access: Browse and download from 20+ PB of scientific data in the Open Science Data Federation (OSDF)
- Data Sharing: Serve your MinIO/S3 data to the global scientific federation
- Distributed Caching: Automatic caching improves delivery efficiency for popular datasets
- Unified Namespace: Access heterogeneous sources (S3, POSIX, HTTP) through a common pelican:// protocol
Use dedicated Pelican endpoints to browse and download from external federations like OSDF:
Available Endpoints:
GET /pelican/federations- List available federations (OSDF, PATh-CC, etc.)GET /pelican/browse?path=/ospool/data&federation=osdf- Browse federation namespacesGET /pelican/info?path=/ospool/file.nc&federation=osdf- Get file metadataGET /pelican/download?path=/ospool/file.nc&stream=true- Download/stream filesPOST /pelican/import-metadata- Import external file as resource in local catalog
Example Usage:
# List available federations
curl http://localhost:8002/pelican/federations
# Browse OSDF public data
curl "http://localhost:8002/pelican/browse?path=/ospool/uc-shared/public&detail=true"
# Download file from federation
curl "http://localhost:8002/pelican/download?path=/ospool/data/file.nc&stream=true" -o file.nc
# Import external Pelican file into local catalog
curl -X POST http://localhost:8002/pelican/import-metadata \
-H "Content-Type: application/json" \
-d '{
"pelican_url": "pelican://osg-htc.org/ospool/data/temperature.nc",
"package_id": "my-dataset-id",
"resource_name": "OSDF Temperature Data"
}'Use pelican:// URLs in your resource definitions - the API automatically handles downloads:
# Register dataset with Pelican URL
curl -X POST http://localhost:8002/services \
-H "Content-Type: application/json" \
-d '{
"name": "osdf-climate-data",
"title": "Climate Data from OSDF",
"url": "pelican://osg-htc.org/ospool/climate/dataset.nc"
}'
# The download handler automatically detects and uses Pelican
# No changes needed to existing endpoints!The included docker-compose.yml sets up a complete local Pelican federation with 4 services:
- Pelican Registry (port 8444): Manages namespace registrations
- Pelican Director (port 8445): Routes client requests to appropriate origins/caches
- Pelican Origin (port 8446-8447): Serves MinIO data at federation path
/ndp-demo - Pelican Cache (port 8448-8449): Caches popular objects for faster delivery
Your MinIO data becomes accessible via:
pelican://pelican-origin/ndp-demo/bucket-name/object-key
Enable Pelican in your .env file:
# Enable Pelican federation access
PELICAN_ENABLED=True
# Default federation (leave empty for OSDF)
PELICAN_FEDERATION_URL=
# Use caching infrastructure (recommended)
PELICAN_DIRECT_READS=Falseβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NDP-EP API β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Phase 1 Routes β β Phase 2 Handler β β
β β /pelican/* β β pelican:// URLs β β
β ββββββββββ¬ββββββββββ βββββββββββ¬βββββββββ β
β β β β
β ββββββββββββ¬ββββββββββββββββββββ β
β β β
β βββββββββββΌβββββββββββ β
β β PelicanRepository β β
β β (pelicanfs) β β
β βββββββββββ¬βββββββββββ β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β β β
ββββββΌββββββ βββββΌβββββ ββββββΌββββββ
β OSDF β β PATh-CCβ β Local β
β Director β βDirectorβ β Director β
ββββββ¬ββββββ βββββ¬βββββ ββββββ¬ββββββ
β β β
ββββββΌββββββ βββββΌβββββ ββββββΌβββββββ
β Cache β β Cache β β Cache β
ββββββ¬ββββββ βββββ¬βββββ ββββββ¬βββββββ
β β β
ββββββΌββββββ βββββΌβββββ ββββββΌβββββββ
β Origin β β Origin β β Origin β
β(20+ PB) β β β β (MinIO) β
ββββββββββββ ββββββββββ βββββββββββββ
β
Access 20+ PB of Scientific Data: OSDF provides access to datasets from major research institutions
β
Distributed Caching: Popular datasets are cached closer to compute resources
β
Backward Compatible: Existing endpoints work unchanged with pelican:// URLs
β
Share Your Data: Expose MinIO datasets to the global scientific federation
β
Unified Protocol: Single API for HTTP, S3, Kafka, and Pelican resources
- Pelican Platform: https://pelicanplatform.org
- OSDF Documentation: https://osg-htc.org/services/osdf.html
- Configuration Guide: pelican-origin.yml
This project is licensed under the MIT License - see the LICENSE file for details.
For more information about the National Data Platform, visit nationaldataplatform.org