CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is the dp-python-lib repository, a Python client library for the Machine Learning Data Platform (MLDP) gRPC API. It provides Python bindings for interacting with the MLDP services.

Project Structure

src/dp_python_lib/ - Main library source code
- grpc/ - Auto-generated Protocol Buffer and gRPC stub files (DO NOT EDIT)
- client/ - Client wrapper classes (e.g., MldpClient)
- models/ - Data model definitions
tests/ - Test suite with unit/ and integration/ subdirectories
pyproject.toml - Project configuration and dependencies

Development Commands

Testing

# Run all tests
pytest tests/

# Run unit tests only
pytest tests/unit/

# Run specific test file
pytest tests/unit/test_ingestion_client.py -v

Dependencies

Core dependencies are managed in pyproject.toml:

grpcio - gRPC runtime
grpcio-tools - gRPC development tools
protobuf - Protocol Buffers runtime
pydantic-settings - Type-safe configuration with environment variable support
PyYAML - YAML file parsing

Architecture Notes

The grpc/ directory contains auto-generated code from Protocol Buffer definitions
These files are generated from the upstream dp-grpc project and should not be manually edited
Import Fix Process: The gRPC generation process includes a post-processing step to fix relative import paths in the generated files (e.g., converting import common_pb2 to from . import common_pb2)
The main client entry point is MldpClient in src/dp_python_lib/client/mldp_client.py
Client classes like IngestionClient provide user-friendly wrappers around gRPC service calls
The library follows standard Python packaging conventions with pyproject.toml
Type Hints: All framework classes use comprehensive type annotations for better IDE support and error detection
Logging: Built-in logging throughout the framework using Python's standard logging module

Key Files

src/dp_python_lib/client/mldp_client.py - Main client wrapper for the gRPC services
src/dp_python_lib/client/ingestion_client.py - Ingestion service client with methods like register_provider()
tests/unit/test_ingestion_client.py - Unit tests for IngestionClient functionality
pyproject.toml - Project metadata and dependencies
Generated gRPC stubs include services for:
- Ingestion (ingestion_pb2.py, ingestion_pb2_grpc.py)
- Queries (query_pb2.py, query_pb2_grpc.py)
- Annotations (annotation_pb2.py, annotation_pb2_grpc.py)
- Common types (common_pb2.py, common_pb2_grpc.py)

Development Guidelines

Client Implementation Pattern

Follow the standard pattern: user params → build gRPC request → send request → return wrapped result
Always write unit tests for new client methods in tests/unit/
Use parameter classes (e.g., RegisterProviderRequestParams) for user-friendly APIs
Client methods should return result objects that wrap gRPC responses with error handling

gRPC Error Handling

Use synchronous gRPC calls with DpIngestionServiceStub for simplicity
Implement three-tier error handling:
1. gRPC Exceptions (grpc.RpcError) - network/connection errors
2. Business Logic Errors - check response exceptionalResult field
3. General Exceptions - unexpected errors
Check protobuf union fields with response.HasField('fieldName')
Return consistent result objects with is_error flag and appropriate messages

Testing Best Practices

Use @patch decorators to mock gRPC stubs and avoid real network calls
Mock the response behavior with side_effect for conditional logic (e.g., HasField)
Always verify mocks were called correctly with assert_called_once_with()
Test all error scenarios: success, business errors, gRPC exceptions, and unexpected cases

Type Hints and Modern Python

All framework classes use comprehensive type hints with Python 3.5+ syntax
Parameter types: str, bool, Optional[str], List[str], Dict[str, str]
gRPC-specific types: grpc.Channel, ingestion_pb2.RegisterProviderRequest
Return type annotations: -> None, -> RegisterProviderApiResult
Import required types: from typing import Optional, Dict, List

Logging System

Architecture: Uses Python's standard logging module with hierarchical logger names

Implementation Pattern:

import logging

class MyClient:
    def __init__(self):
        self.logger = logging.getLogger(__name__)
        
    def my_method(self):
        self.logger.info("Starting operation")
        self.logger.debug("Technical details: %s", details)
        self.logger.warning("Recoverable issue: %s", issue)
        self.logger.error("Serious problem: %s", error, exc_info=True)

Logger Hierarchy:

dp_python_lib.client.mldp_client - Main client initialization and configuration
dp_python_lib.client.ingestion_client - API operations with detailed request/response logging
dp_python_lib.config.config - Configuration loading and YAML processing
dp_python_lib.config.loader - Config file discovery and priority handling

Log Levels Used:

DEBUG - Technical details (request building, parameter processing)
INFO - Business events (API calls, successful operations)
WARNING - Recoverable issues (business logic errors from API)
ERROR - Serious problems (gRPC errors, unexpected exceptions with stack traces)

Application Usage:

import logging

# Configure logging in your application (not the library!)
logging.basicConfig(level=logging.INFO, 
                   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# Library will log useful operational information
client = MldpClient(config_file="config.yaml")
result = client.ingestion_client.register_provider(params)

Configuration System

Overview

The library uses a flexible configuration system supporting YAML files and environment variables with pydantic-settings for type safety.

Configuration Files

Default location: mldp-config.yaml in project root

ingestion:
  host: localhost
  port: 50051
  use_tls: false
query:
  host: localhost
  port: 50052
  use_tls: false
annotation:
  host: localhost
  port: 50053
  use_tls: false

Environment Variables

Use pattern: MLDP_<SERVICE>_<SETTING>

# Override specific service settings  
MLDP_INGESTION_HOST=prod-ingestion.example.com
MLDP_INGESTION_PORT=443
MLDP_INGESTION_USE_TLS=true

# Custom config file location
MLDP_CONFIG_FILE=/path/to/custom-config.yaml

Usage Patterns

from dp_python_lib.client import MldpClient
from dp_python_lib.config import MldpConfig, ServiceConfig

# Auto-load from default locations (env vars override YAML)
client = MldpClient()

# Specify config file
client = MldpClient(config_file="custom-config.yaml")

# Direct config object
config = MldpConfig(
    ingestion=ServiceConfig(host="custom-host", port=8080, use_tls=True)
)
client = MldpClient(config=config)

# Backward compatibility - direct channels
import grpc
channel = grpc.insecure_channel("localhost:50051")
client = MldpClient(ingestion_channel=channel)

Configuration Priority (High to Low)

Explicit parameters (direct channels, config objects)
Environment variables (MLDP_*)
YAML configuration file
Built-in defaults

Configuration Implementation

Architecture: Uses flattened pydantic-settings approach for standard environment variable handling:

class MldpConfig(BaseSettings):
    # Flat field structure for better env var support
    ingestion_host: str = "localhost"
    ingestion_port: int = 50051
    ingestion_use_tls: bool = False
    
    query_host: str = "localhost" 
    query_port: int = 50052
    query_use_tls: bool = False
    
    annotation_host: str = "localhost"
    annotation_port: int = 50053
    annotation_use_tls: bool = False
    
    model_config = SettingsConfigDict(
        env_prefix='MLDP_',
        case_sensitive=False
    )
    
    # Properties provide access to grouped ServiceConfig objects
    @property
    def ingestion(self) -> ServiceConfig:
        return ServiceConfig(
            host=self.ingestion_host,
            port=self.ingestion_port, 
            use_tls=self.ingestion_use_tls
        )

Key Configuration Classes

ServiceConfig - Individual service configuration (host, port, use_tls) with gRPC channel creation
MldpConfig - Main config container with flattened fields for environment variable support
load_config() - Configuration loader with priority handling
find_config_file() - Config file discovery (explicit path > env var > project locations)

Dependencies Added

pydantic-settings - Type-safe configuration with environment variable support
PyYAML - YAML file parsing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Overview

Project Structure

Development Commands

Testing

Dependencies

Architecture Notes

Key Files

Development Guidelines

Client Implementation Pattern

gRPC Error Handling

Testing Best Practices

Type Hints and Modern Python

Logging System

Configuration System

Overview

Configuration Files

Environment Variables

Usage Patterns

Configuration Priority (High to Low)

Configuration Implementation

Key Configuration Classes

Dependencies Added

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Overview

Project Structure

Development Commands

Testing

Dependencies

Architecture Notes

Key Files

Development Guidelines

Client Implementation Pattern

gRPC Error Handling

Testing Best Practices

Type Hints and Modern Python

Logging System

Configuration System

Overview

Configuration Files

Environment Variables

Usage Patterns

Configuration Priority (High to Low)

Configuration Implementation

Key Configuration Classes

Dependencies Added