diff --git a/.github/workflows/sync-docs.yaml b/.github/workflows/sync-docs.yaml
new file mode 100644
index 0000000..95e7848
--- /dev/null
+++ b/.github/workflows/sync-docs.yaml
@@ -0,0 +1,19 @@
+name: Sync Docs to GitBook
+
+on:
+  push:
+    branches: [main]
+    paths: ["docs/**"]
+
+jobs:
+  sync:
+    runs-on: ubuntu-latest
+    steps:
+      steps:
+        - uses: actions/checkout@v3
+        - name: GitBook Sync
+          uses: gitbook/gitbook-sync@v1
+          with:
+            gitbook-token: ${{ secrets.GITBOOK_TOKEN }}
+            gitbook-space: ${{ secrets.GITBOOK_SPACE_ID }}
+            source-dir: docs/
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..51f04e1
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,59 @@
+# Atlas Python SDK Documentation
+
+Welcome to the official documentation for the Atlas Python SDK. This library provides convenient access to the LayerLens Atlas REST API from any Python 3.8+ application.
+
+## What is Atlas?
+
+Atlas is LayerLens's evaluation platform that allows you to benchmark AI models against various datasets and metrics. The Python SDK provides a synchronous HTTP client powered by [httpx](https://github.com/encode/httpx) and [Pydantic](https://pydantic.dev/) models for type-safe API interactions.
+
+## Key Features
+
+- **Simple Authentication**: Secure API key-based authentication
+- **Type Safety**: Full Pydantic model support for all API responses
+- **Comprehensive Error Handling**: Detailed exception hierarchy for different error scenarios
+- **Configurable Timeouts**: Fine-grained timeout control for different operations
+- **Environment Variable Support**: Easy configuration through environment variables
+- **Python 3.8+ Compatibility**: Works with modern Python versions
+
+## Quick Start
+
+```python
+import os
+from atlas import Atlas
+
+# Initialize the client
+client = Atlas(
+    api_key=os.environ.get("LAYERLENS_ATLAS_API_KEY"),
+    organization_id=os.environ.get("LAYERLENS_ATLAS_ORG_ID"), 
+    project_id=os.environ.get("LAYERLENS_ATLAS_PROJECT_ID"),
+)
+
+# Create an evaluation
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"
+)
+
+# Get results
+if evaluation:
+    results = client.results.get(evaluation_id=evaluation.id)
+    print(f"Evaluation completed with {len(results)} results")
+```
+
+## Navigation
+
+- **[Getting Started](getting-started/)** - Installation, setup, and your first API call
+- **[API Reference](api-reference/)** - Complete documentation of all available methods
+- **[Code Examples](examples/)** - Practical examples for common use cases
+- **[Troubleshooting](troubleshooting/)** - Solutions to common issues
+- **[Security](security/)** - Best practices for secure API usage
+
+## Support
+
+- **LayerLens Support**: Contact support through your LayerLens dashboard
+- **Documentation**: Visit [docs.layerlens.com](https://docs.layerlens.com) for additional resources
+- **API Status**: Check the [LayerLens status page](https://status.layerlens.com) for service updates
+
+## License
+
+This SDK is released under the MIT License.
\ No newline at end of file
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
new file mode 100644
index 0000000..40e99b3
--- /dev/null
+++ b/docs/SUMMARY.md
@@ -0,0 +1,33 @@
+# Table of Contents
+
+* [Introduction](README.md)
+
+## Getting Started
+* [Installation](getting-started/installation.md)
+* [Authentication & Configuration](getting-started/authentication.md)
+* [Quick Start Guide](getting-started/quickstart.md)
+
+## API Reference
+* [Client Configuration](api-reference/client.md)
+* [Evaluations](api-reference/evaluations.md)
+* [Results](api-reference/results.md)
+* [Models & Benchmarks](api-reference/models-benchmarks.md)
+* [Error Handling](api-reference/errors.md)
+
+## Code Examples
+* [Creating Evaluations](examples/creating-evaluations.md)
+* [Retrieving Results](examples/retrieving-results.md)
+* [Working with Timeouts](examples/timeouts.md)
+* [Advanced Usage Patterns](examples/advanced-usage.md)
+
+## Troubleshooting
+* [Common Issues](troubleshooting/common-issues.md)
+* [Authentication Problems](troubleshooting/authentication.md)
+* [Error Codes Reference](troubleshooting/error-codes.md)
+
+## Security Best Practices
+* [API Key Management](security/api-key-management.md)
+* [Environment Variables](security/environment-variables.md)
+* [Rate Limiting](security/rate-limiting.md)
+* [Data Privacy](security/data-privacy.md)
+
diff --git a/docs/api-reference/client.md b/docs/api-reference/client.md
new file mode 100644
index 0000000..a2720bb
--- /dev/null
+++ b/docs/api-reference/client.md
@@ -0,0 +1,277 @@
+# Client Configuration
+
+The `Atlas` class is the main entry point for interacting with the LayerLens Atlas API. This page covers client initialization, configuration options, and advanced usage patterns.
+
+## Basic Usage
+
+```python
+from atlas import Atlas
+
+# Using environment variables (recommended)
+client = Atlas()
+
+# Explicit configuration
+client = Atlas(
+    api_key="your_api_key",
+    organization_id="your_org_id",
+    project_id="your_project_id"
+)
+```
+
+## Constructor Parameters
+
+### `Atlas(api_key, organization_id, project_id, base_url, timeout)`
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | Yes* | `None` | Your LayerLens Atlas API key |
+| `organization_id` | `str \| None` | Yes* | `None` | Your organization identifier |
+| `project_id` | `str \| None` | Yes* | `None` | The project you want to work with |
+| `base_url` | `str \| httpx.URL \| None` | No | Atlas API URL | Custom API base URL |
+| `timeout` | `float \| httpx.Timeout \| None` | No | 10 minutes | Request timeout configuration |
+
+*Required unless set via environment variables
+
+## Environment Variable Configuration
+
+The client automatically loads configuration from these environment variables:
+
+```bash
+LAYERLENS_ATLAS_API_KEY="your_api_key_here"
+LAYERLENS_ATLAS_ORG_ID="your_org_id_here"
+LAYERLENS_ATLAS_PROJECT_ID="your_project_id_here"
+LAYERLENS_ATLAS_BASE_URL="https://custom-endpoint.com/api/v1"  # Optional
+```
+
+## Timeout Configuration
+
+### Simple Timeout
+
+```python
+from atlas import Atlas
+
+# 30-second timeout for all requests
+client = Atlas(timeout=30.0)
+```
+
+### Advanced Timeout Configuration
+
+```python
+import httpx
+from atlas import Atlas
+
+client = Atlas(
+    timeout=httpx.Timeout(
+        connect=5.0,    # Connection timeout: 5 seconds
+        read=60.0,      # Read timeout: 60 seconds  
+        write=30.0,     # Write timeout: 30 seconds
+        pool=10.0       # Connection pool timeout: 10 seconds
+    )
+)
+```
+
+### Per-Request Timeout Override
+
+```python
+client = Atlas()
+
+# Override timeout for a specific request
+evaluation = client.with_options(timeout=120.0).evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"
+)
+```
+
+## Client Methods
+
+### `copy(**kwargs)`
+
+Create a new client instance with modified configuration:
+
+```python
+# Base client
+client = Atlas(api_key="key1", organization_id="org1")
+
+# Create a copy with different project
+project_client = client.copy(project_id="different_project")
+
+# Create a copy with different timeout
+slow_client = client.copy(timeout=300.0)  # 5 minutes
+```
+
+### `with_options(**kwargs)`
+
+Temporarily override client options for a single request chain:
+
+```python
+client = Atlas()
+
+# Use different timeout for this request only
+evaluation = client.with_options(timeout=60.0).evaluations.create(
+    model="gpt-4", 
+    benchmark="mmlu"
+)
+
+# Back to original timeout for subsequent requests
+results = client.results.get(evaluation_id=evaluation.id)
+```
+
+## Resource Access
+
+The client provides access to different API resources through properties:
+
+```python
+client = Atlas()
+
+# Access evaluations resource
+client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+# Access results resource  
+client.results.get(evaluation_id="eval_123")
+```
+
+Available resources:
+- `client.evaluations` - Create and manage evaluations
+- `client.results` - Retrieve evaluation results
+- More resources coming soon...
+
+## Error Handling
+
+The client raises specific exceptions for different error conditions:
+
+```python
+import atlas
+from atlas import Atlas
+
+client = Atlas()
+
+try:
+    evaluation = client.evaluations.create(model="invalid", benchmark="invalid")
+except atlas.AuthenticationError:
+    # 401 - Invalid API key
+    print("Authentication failed")
+except atlas.PermissionDeniedError:
+    # 403 - Valid API key, insufficient permissions  
+    print("Permission denied")
+except atlas.NotFoundError:
+    # 404 - Resource not found
+    print("Model or benchmark not found")
+except atlas.RateLimitError:
+    # 429 - Too many requests
+    print("Rate limit exceeded")
+except atlas.InternalServerError:
+    # 500+ - Server error
+    print("Server error occurred")
+except atlas.APIConnectionError:
+    # Network/connection issues
+    print("Connection failed")
+except atlas.APITimeoutError:
+    # Request timeout
+    print("Request timed out")
+```
+
+## Authentication Headers
+
+The client automatically handles authentication by adding the required headers:
+
+```python
+# The client adds this header to all requests:
+# x-api-key: your_api_key_value
+```
+
+You don't need to manually handle authentication headers.
+
+## Base URL Configuration
+
+### Default Base URL
+The client uses the default LayerLens Atlas API endpoint unless overridden.
+
+### Custom Base URL
+For enterprise or self-hosted deployments:
+
+```python
+from atlas import Atlas
+
+client = Atlas(
+    base_url="https://your-atlas-instance.com/api/v1"
+)
+
+# Or via environment variable
+# LAYERLENS_ATLAS_BASE_URL="https://your-atlas-instance.com/api/v1"
+client = Atlas()  # Will use custom base URL from environment
+```
+
+## Best Practices
+
+### 1. Use Environment Variables
+```python
+# ✅ Good - secure and flexible
+client = Atlas()
+
+# ❌ Bad - hardcoded credentials
+client = Atlas(api_key="hardcoded_key")
+```
+
+### 2. Configure Appropriate Timeouts
+```python
+# ✅ Good - reasonable timeout for evaluation creation
+client = Atlas(timeout=120.0)  # 2 minutes
+
+# ❌ Bad - too short for long-running operations
+client = Atlas(timeout=5.0)  # 5 seconds might be too short
+```
+
+### 3. Handle Errors Gracefully
+```python
+# ✅ Good - specific error handling
+try:
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.RateLimitError:
+    time.sleep(60)  # Wait before retrying
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APIError as e:
+    logger.error(f"API error: {e}")
+    raise
+```
+
+### 4. Reuse Client Instances
+```python
+# ✅ Good - reuse the same client
+client = Atlas()
+eval1 = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+eval2 = client.evaluations.create(model="claude-3", benchmark="hellaswag")
+
+# ❌ Bad - creating new clients unnecessarily
+client1 = Atlas()
+eval1 = client1.evaluations.create(model="gpt-4", benchmark="mmlu")
+client2 = Atlas()  # Unnecessary
+eval2 = client2.evaluations.create(model="claude-3", benchmark="hellaswag")
+```
+
+## Thread Safety
+
+The Atlas client is thread-safe and can be shared across multiple threads:
+
+```python
+import threading
+from atlas import Atlas
+
+client = Atlas()
+
+def create_evaluation(model_name):
+    evaluation = client.evaluations.create(
+        model=model_name,
+        benchmark="mmlu"
+    )
+    print(f"Created evaluation for {model_name}: {evaluation.id}")
+
+# Safe to use the same client across threads
+threads = []
+for model in ["gpt-4", "claude-3", "llama-2"]:
+    thread = threading.Thread(target=create_evaluation, args=(model,))
+    threads.append(thread)
+    thread.start()
+
+for thread in threads:
+    thread.join()
+```
\ No newline at end of file
diff --git a/docs/api-reference/errors.md b/docs/api-reference/errors.md
new file mode 100644
index 0000000..ae172d4
--- /dev/null
+++ b/docs/api-reference/errors.md
@@ -0,0 +1,614 @@
+# Error Handling
+
+The Atlas Python SDK provides a comprehensive exception hierarchy to help you handle different error conditions gracefully. This guide covers all available exception types and best practices for error handling.
+
+## Exception Hierarchy
+
+All Atlas exceptions inherit from the base `AtlasError` class:
+
+```
+AtlasError
+├── APIError
+│   ├── APIConnectionError
+│   │   └── APITimeoutError
+│   ├── APIResponseValidationError
+│   └── APIStatusError
+│       ├── BadRequestError (400)
+│       ├── AuthenticationError (401)
+│       ├── PermissionDeniedError (403)
+│       ├── NotFoundError (404)
+│       ├── ConflictError (409)
+│       ├── UnprocessableEntityError (422)
+│       ├── RateLimitError (429)
+│       └── InternalServerError (500+)
+```
+
+## Exception Types
+
+### Base Exceptions
+
+#### `AtlasError`
+Base exception for all Atlas-related errors.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.AtlasError as e:
+    print(f"Atlas error occurred: {e}")
+```
+
+#### `APIError`
+Base exception for all API-related errors. Contains additional context about the request.
+
+**Properties:**
+- `message`: Error message
+- `request`: The HTTP request that caused the error
+- `body`: Response body (if available)
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APIError as e:
+    print(f"API error: {e.message}")
+    print(f"Request URL: {e.request.url}")
+    print(f"Response body: {e.body}")
+```
+
+### Connection Errors
+
+#### `APIConnectionError`
+Raised when the client cannot connect to the API server.
+
+**Common causes:**
+- Network connectivity issues
+- DNS resolution problems
+- Server is down
+- Firewall blocking requests
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APIConnectionError as e:
+    print("Connection failed - check your network connection")
+    print(f"Error details: {e}")
+```
+
+#### `APITimeoutError`
+Raised when a request times out.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas(timeout=0.2)  # Very short timeout
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APITimeoutError:
+    print("Request timed out - try increasing timeout or check network")
+```
+
+### HTTP Status Errors
+
+All HTTP status errors inherit from `APIStatusError` and include additional properties:
+
+**Properties:**
+- `status_code`: HTTP status code
+- `response`: Full HTTP response object
+- `request_id`: Request ID for tracking (if provided by server)
+
+#### `BadRequestError` (400)
+Request was malformed or contained invalid parameters.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    # Invalid parameters
+    evaluation = client.evaluations.create(model="", benchmark="")
+except atlas.BadRequestError as e:
+    print(f"Bad request: {e}")
+    print(f"Status code: {e.status_code}")
+```
+
+#### `AuthenticationError` (401)
+API key is missing, invalid, or expired.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas(api_key="invalid_key")
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.AuthenticationError:
+    print("Authentication failed - check your API key")
+    print("Make sure LAYERLENS_ATLAS_API_KEY is set correctly")
+```
+
+#### `PermissionDeniedError` (403)
+Valid API key but insufficient permissions for the requested operation.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="restricted-model", benchmark="mmlu")
+except atlas.PermissionDeniedError:
+    print("Permission denied - check your organization/project access")
+    print("Contact your administrator for access to this resource")
+```
+
+#### `NotFoundError` (404)
+Requested resource (model, benchmark, evaluation) does not exist.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="nonexistent-model", benchmark="mmlu")
+except atlas.NotFoundError:
+    print("Model or benchmark not found")
+    print("Check available models and benchmarks in the Atlas dashboard")
+```
+
+#### `ConflictError` (409)
+Request conflicts with current resource state.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    # Some operation that conflicts with current state
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.ConflictError:
+    print("Request conflicts with current state")
+```
+
+#### `UnprocessableEntityError` (422)
+Request parameters are valid but cannot be processed.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="invalid-benchmark")
+except atlas.UnprocessableEntityError as e:
+    print(f"Cannot process request: {e}")
+    print("Parameters are valid but operation cannot be completed")
+```
+
+#### `RateLimitError` (429)
+Too many requests sent in a given time period.
+
+```python
+import atlas
+import time
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.RateLimitError as e:
+    print("Rate limit exceeded")
+    # Extract retry-after header if available
+    retry_after = e.response.headers.get('retry-after')
+    if retry_after:
+        print(f"Retry after {retry_after} seconds")
+        time.sleep(int(retry_after))
+    else:
+        print("Waiting 60 seconds before retry...")
+        time.sleep(60)
+```
+
+#### `InternalServerError` (500+)
+Server-side error occurred.
+
+```python
+import atlas
+
+try:
+    client = atlas.Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.InternalServerError as e:
+    print(f"Server error: {e.status_code}")
+    print("This is a server-side issue - try again later")
+    print(f"Request ID: {e.request_id}")  # For support tickets
+```
+
+## Best Practices
+
+### 1. Handle Specific Exceptions
+
+```python
+import atlas
+import time
+from atlas import Atlas
+
+def robust_create_evaluation(model: str, benchmark: str, max_retries: int = 3):
+    client = Atlas()
+    
+    for attempt in range(max_retries):
+        try:
+            evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+            return evaluation
+            
+        except atlas.AuthenticationError:
+            print("❌ Authentication failed - check your API key")
+            break  # Don't retry auth errors
+            
+        except atlas.PermissionDeniedError:
+            print("❌ Permission denied - contact your administrator")
+            break  # Don't retry permission errors
+            
+        except atlas.NotFoundError:
+            print(f"❌ Model '{model}' or benchmark '{benchmark}' not found")
+            break  # Don't retry not found errors
+            
+        except atlas.RateLimitError as e:
+            retry_after = e.response.headers.get('retry-after', 60)
+            print(f"⏳ Rate limited - waiting {retry_after} seconds...")
+            time.sleep(int(retry_after))
+            continue  # Retry after waiting
+            
+        except atlas.InternalServerError:
+            if attempt < max_retries - 1:
+                wait_time = 2 ** attempt  # Exponential backoff
+                print(f"🔄 Server error - retrying in {wait_time}s (attempt {attempt + 1})")
+                time.sleep(wait_time)
+                continue
+            else:
+                print("❌ Server error - max retries exceeded")
+                break
+                
+        except atlas.APIConnectionError:
+            if attempt < max_retries - 1:
+                wait_time = 2 ** attempt
+                print(f"🔄 Connection error - retrying in {wait_time}s (attempt {attempt + 1})")
+                time.sleep(wait_time)
+                continue
+            else:
+                print("❌ Connection failed - check your network")
+                break
+                
+        except atlas.APIError as e:
+            print(f"❌ Unexpected API error: {e}")
+            break
+    
+    return None
+```
+
+### 2. Graceful Degradation
+
+```python
+import atlas
+from atlas import Atlas
+
+def get_evaluation_results_with_fallback(evaluation_id: str):
+    client = Atlas()
+    
+    try:
+        results = client.results.get(evaluation_id=evaluation_id)
+        
+        if results:
+            return {"success": True, "data": results, "message": "Results retrieved successfully"}
+        else:
+            return {"success": False, "data": None, "message": "No results found"}
+            
+    except atlas.NotFoundError:
+        return {"success": False, "data": None, "message": "Evaluation not found"}
+        
+    except atlas.AuthenticationError:
+        return {"success": False, "data": None, "message": "Authentication required"}
+        
+    except atlas.APIConnectionError:
+        return {"success": False, "data": None, "message": "Service temporarily unavailable"}
+        
+    except atlas.APIError as e:
+        return {"success": False, "data": None, "message": f"Service error: {e}"}
+
+# Usage
+result = get_evaluation_results_with_fallback("eval_123")
+if result["success"]:
+    process_results(result["data"])
+else:
+    print(f"Could not get results: {result['message']}")
+```
+
+### 3. Logging and Monitoring
+
+```python
+import logging
+import atlas
+from atlas import Atlas
+
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+def monitored_api_call():
+    client = Atlas()
+    
+    try:
+        logger.info("Creating evaluation...")
+        evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+        
+        if evaluation:
+            logger.info(f"Evaluation created successfully: {evaluation.id}")
+            return evaluation
+        else:
+            logger.warning("Evaluation creation returned None")
+            return None
+            
+    except atlas.RateLimitError as e:
+        logger.warning(f"Rate limited - request ID: {e.request_id}")
+        raise
+        
+    except atlas.AuthenticationError:
+        logger.error("Authentication failed - check API key configuration")
+        raise
+        
+    except atlas.APIConnectionError:
+        logger.error("Network connection failed")
+        raise
+        
+    except atlas.InternalServerError as e:
+        logger.error(f"Server error: {e.status_code} - request ID: {e.request_id}")
+        raise
+        
+    except atlas.APIError as e:
+        logger.error(f"Unexpected API error: {e} - request ID: {getattr(e, 'request_id', 'N/A')}")
+        raise
+```
+
+### 4. Context Managers for Resource Management
+
+```python
+import atlas
+from contextlib import contextmanager
+from atlas import Atlas
+
+@contextmanager
+def atlas_client():
+    """Context manager for Atlas client with error handling"""
+    client = None
+    try:
+        client = Atlas()
+        yield client
+    except atlas.AuthenticationError:
+        print("Authentication failed")
+        raise
+    except atlas.APIConnectionError:
+        print("Connection failed")
+        raise
+    finally:
+        # Cleanup if needed
+        pass
+
+# Usage
+try:
+    with atlas_client() as client:
+        evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+        results = client.results.get(evaluation_id=evaluation.id)
+except atlas.AtlasError:
+    print("Atlas operation failed")
+```
+
+## Error Response Details
+
+### Status Error Properties
+
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="invalid", benchmark="invalid")
+except atlas.APIStatusError as e:
+    print(f"Status Code: {e.status_code}")
+    print(f"Request ID: {e.request_id}")
+    print(f"Response Headers: {dict(e.response.headers)}")
+    print(f"Response Body: {e.body}")
+    print(f"Request URL: {e.request.url}")
+    print(f"Request Method: {e.request.method}")
+```
+
+### Extracting Useful Information
+
+```python
+import atlas
+from atlas import Atlas
+
+def extract_error_info(error: atlas.APIError):
+    info = {
+        "type": type(error).__name__,
+        "message": str(error),
+        "request_url": error.request.url if hasattr(error, 'request') else None,
+        "request_method": error.request.method if hasattr(error, 'request') else None,
+    }
+    
+    if hasattr(error, 'status_code'):
+        info["status_code"] = error.status_code
+        
+    if hasattr(error, 'request_id'):
+        info["request_id"] = error.request_id
+        
+    if hasattr(error, 'response'):
+        info["response_headers"] = dict(error.response.headers)
+        
+    return info
+
+# Usage
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APIError as e:
+    error_info = extract_error_info(e)
+    print(f"Error details: {error_info}")
+```
+
+## Testing Error Handling
+
+```python
+import pytest
+import atlas
+from unittest.mock import Mock, patch
+from atlas import Atlas
+
+def test_authentication_error_handling():
+    """Test that authentication errors are handled properly"""
+    with patch('atlas.Atlas') as mock_atlas:
+        mock_atlas.side_effect = atlas.AuthenticationError(
+            "Invalid API key", 
+            request=Mock(), 
+            response=Mock()
+        )
+        
+        with pytest.raises(atlas.AuthenticationError):
+            client = Atlas()
+            client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+def test_rate_limit_retry():
+    """Test that rate limit errors trigger appropriate retry logic"""
+    # Your retry logic test here
+    pass
+```
+
+## Common Error Scenarios
+
+### Invalid Configuration
+
+```python
+# Missing API key
+try:
+    client = Atlas(api_key=None)
+except atlas.AtlasError as e:
+    print(f"Configuration error: {e}")
+
+# Invalid organization/project
+try:
+    client = Atlas(organization_id="invalid", project_id="invalid")
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.PermissionDeniedError:
+    print("Invalid organization or project ID")
+```
+
+### Network Issues
+
+```python
+# Connection timeout
+try:
+    client = Atlas(timeout=0.1)  # Very short timeout
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APITimeoutError:
+    print("Request timed out")
+
+# Network connectivity
+try:
+    # Simulate network issues
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APIConnectionError:
+    print("Network connectivity issue")
+```
+
+## Error Recovery Strategies
+
+### Exponential Backoff
+
+```python
+import time
+import random
+import atlas
+from atlas import Atlas
+
+def exponential_backoff_retry(func, max_retries=3, base_delay=1):
+    """Retry function with exponential backoff"""
+    for attempt in range(max_retries):
+        try:
+            return func()
+        except (atlas.InternalServerError, atlas.APIConnectionError) as e:
+            if attempt == max_retries - 1:
+                raise
+                
+            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
+            print(f"Attempt {attempt + 1} failed, retrying in {delay:.2f}s...")
+            time.sleep(delay)
+
+# Usage
+def create_evaluation():
+    client = Atlas()
+    return client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+evaluation = exponential_backoff_retry(create_evaluation)
+```
+
+### Circuit Breaker Pattern
+
+```python
+import time
+from enum import Enum
+from atlas import Atlas
+import atlas
+
+class CircuitState(Enum):
+    CLOSED = "closed"
+    OPEN = "open"
+    HALF_OPEN = "half_open"
+
+class CircuitBreaker:
+    def __init__(self, failure_threshold=5, timeout=60):
+        self.failure_threshold = failure_threshold
+        self.timeout = timeout
+        self.failure_count = 0
+        self.last_failure_time = None
+        self.state = CircuitState.CLOSED
+    
+    def call(self, func, *args, **kwargs):
+        if self.state == CircuitState.OPEN:
+            if time.time() - self.last_failure_time < self.timeout:
+                raise atlas.APIConnectionError(message="Circuit breaker is OPEN")
+            else:
+                self.state = CircuitState.HALF_OPEN
+        
+        try:
+            result = func(*args, **kwargs)
+            self.on_success()
+            return result
+        except (atlas.InternalServerError, atlas.APIConnectionError) as e:
+            self.on_failure()
+            raise
+    
+    def on_success(self):
+        self.failure_count = 0
+        self.state = CircuitState.CLOSED
+    
+    def on_failure(self):
+        self.failure_count += 1
+        self.last_failure_time = time.time()
+        if self.failure_count >= self.failure_threshold:
+            self.state = CircuitState.OPEN
+
+# Usage
+breaker = CircuitBreaker()
+client = Atlas()
+
+try:
+    evaluation = breaker.call(
+        client.evaluations.create, 
+        model="gpt-4", 
+        benchmark="mmlu"
+    )
+except atlas.APIError as e:
+    print(f"Circuit breaker prevented call or operation failed: {e}")
+```
diff --git a/docs/api-reference/evaluations.md b/docs/api-reference/evaluations.md
new file mode 100644
index 0000000..328c29e
--- /dev/null
+++ b/docs/api-reference/evaluations.md
@@ -0,0 +1,284 @@
+# Evaluations
+
+The `evaluations` resource allows you to create and manage AI model evaluations against various benchmarks. This is the core functionality of the Atlas platform.
+
+## Overview
+
+An evaluation runs a specified model against a benchmark dataset and returns comprehensive metrics including accuracy, readability, toxicity, and ethics scores.
+
+## Methods
+
+### `create(model, benchmark, timeout=None)`
+
+Creates a new evaluation for the specified model and benchmark.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `model` | `str` | Yes | The model identifier to evaluate |
+| `benchmark` | `str` | Yes | The benchmark dataset identifier |
+| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
+
+#### Returns
+
+Returns an `Evaluation` object if successful, `None` if the evaluation could not be created.
+
+#### Example
+
+```python
+from atlas import Atlas
+
+client = Atlas()
+
+# Create a basic evaluation
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"
+)
+
+if evaluation:
+    print(f"Evaluation created: {evaluation.id}")
+    print(f"Status: {evaluation.status}")
+else:
+    print("Failed to create evaluation")
+```
+
+#### With Custom Timeout
+
+```python
+# Create evaluation with custom timeout (5 minutes)
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu",
+    timeout=300.0
+)
+```
+
+## Response Object
+
+The `create` method returns an `Evaluation` object with the following properties:
+
+### Core Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `id` | `str` | Unique evaluation identifier |
+| `status` | `str` | Current evaluation status |
+| `status_description` | `str` | Detailed status description |
+| `submitted_at` | `int` | Unix timestamp when evaluation was submitted |
+| `finished_at` | `int` | Unix timestamp when evaluation finished |
+
+### Model Information
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `model_id` | `str` | Model identifier used in the request |
+| `model_name` | `str` | Human-readable model name |
+| `model_key` | `str` | Internal model key |
+| `model_company` | `str` | Company that created the model |
+
+### Benchmark Information
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `dataset_id` | `str` | Benchmark identifier used in the request |
+| `dataset_name` | `str` | Human-readable benchmark name |
+
+### Performance Metrics
+
+These properties are available once the evaluation is completed:
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `accuracy` | `float` | Overall accuracy score (0.0 to 1.0) |
+| `readability_score` | `float` | Readability assessment score |
+| `toxicity_score` | `float` | Toxicity assessment score |
+| `ethics_score` | `float` | Ethics assessment score |
+| `average_duration` | `int` | Average response time in milliseconds |
+
+## Evaluation Status
+
+The `status` field can have the following values:
+
+| Status | Description |
+|--------|-------------|
+| `"pending"` | Evaluation queued but not yet started |
+| `"running"` | Evaluation currently in progress |
+| `"completed"` | Evaluation finished successfully |
+| `"failed"` | Evaluation failed due to an error |
+| `"cancelled"` | Evaluation was cancelled by user |
+
+## Complete Example
+
+```python
+import time
+from atlas import Atlas
+import atlas
+
+def create_and_monitor_evaluation():
+    client = Atlas()
+    
+    try:
+        # Create evaluation
+        evaluation = client.evaluations.create(
+            model="gpt-3.5-turbo",
+            benchmark="mmlu"
+        )
+        
+        if not evaluation:
+            print("❌ Failed to create evaluation")
+            return None
+            
+        print(f"✅ Evaluation created: {evaluation.id}")
+        print(f"📊 Model: {evaluation.model_name} ({evaluation.model_company})")
+        print(f"📋 Benchmark: {evaluation.dataset_name}")
+        print(f"⏰ Submitted at: {evaluation.submitted_at}")
+        print(f"🔄 Status: {evaluation.status}")
+        
+        # Note: In practice, you'd use webhooks or polling to check status
+        # This is just for demonstration
+        if evaluation.status == "completed":
+            print(f"\n📈 Results:")
+            print(f"   Accuracy: {evaluation.accuracy:.2%}")
+            print(f"   Readability: {evaluation.readability_score:.2f}")
+            print(f"   Toxicity: {evaluation.toxicity_score:.2f}")
+            print(f"   Ethics: {evaluation.ethics_score:.2f}")
+            print(f"   Avg Duration: {evaluation.average_duration}ms")
+        
+        return evaluation
+        
+    except atlas.AuthenticationError:
+        print("❌ Authentication failed - check your API key")
+    except atlas.PermissionDeniedError:
+        print("❌ Permission denied - check your organization/project access")
+    except atlas.NotFoundError:
+        print("❌ Model or benchmark not found")
+    except atlas.RateLimitError:
+        print("❌ Rate limit exceeded - please wait and try again")
+    except atlas.APIConnectionError as e:
+        print(f"❌ Connection error: {e}")
+    except atlas.APIError as e:
+        print(f"❌ API error: {e}")
+    
+    return None
+
+if __name__ == "__main__":
+    evaluation = create_and_monitor_evaluation()
+```
+
+## Available Models
+
+Common model identifiers include:
+
+- `"gpt-4"` - OpenAI GPT-4
+- `"gpt-3.5-turbo"` - OpenAI GPT-3.5 Turbo
+- `"claude-3-opus"` - Anthropic Claude 3 Opus
+- `"claude-3-sonnet"` - Anthropic Claude 3 Sonnet
+- `"llama-2-70b"` - Meta Llama 2 70B
+- `"mistral-7b"` - Mistral 7B
+
+> **Note**: Available models may vary based on your organization's access. Check the LayerLens Atlas dashboard for the complete list of available models.
+
+## Available Benchmarks
+
+Common benchmark identifiers include:
+
+- `"mmlu"` - Massive Multitask Language Understanding
+- `"hellaswag"` - HellaSwag commonsense reasoning
+- `"arc-challenge"` - AI2 Reasoning Challenge
+- `"truthfulqa"` - TruthfulQA
+- `"winogrande"` - WinoGrande
+- `"gsm8k"` - Grade School Math 8K
+
+> **Note**: Available benchmarks may vary based on your organization's access. Check the LayerLens Atlas dashboard for the complete list of available benchmarks.
+
+## Error Handling
+
+### Common Errors
+
+```python
+import atlas
+from atlas import Atlas
+
+client = Atlas()
+
+try:
+    evaluation = client.evaluations.create(
+        model="nonexistent-model",
+        benchmark="mmlu"
+    )
+except atlas.NotFoundError:
+    print("Model 'nonexistent-model' not found")
+except atlas.BadRequestError:
+    print("Invalid request parameters")
+except atlas.UnprocessableEntityError:
+    print("Request parameters are valid but cannot be processed")
+```
+
+### Timeout Handling
+
+```python
+import atlas
+from atlas import Atlas
+
+client = Atlas()
+
+try:
+    evaluation = client.evaluations.create(
+        model="gpt-4",
+        benchmark="mmlu",
+        timeout=30.0  # 30 seconds
+    )
+except atlas.APITimeoutError:
+    print("Request timed out - try increasing timeout or check network")
+```
+
+## Best Practices
+
+### 1. Check Return Values
+```python
+# ✅ Good - always check if evaluation was created
+evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+if evaluation:
+    print(f"Success: {evaluation.id}")
+else:
+    print("Failed to create evaluation")
+
+# ❌ Bad - assuming success
+evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+print(f"Success: {evaluation.id}")  # Could raise AttributeError
+```
+
+### 2. Handle Long-Running Operations
+```python
+# ✅ Good - appropriate timeout for evaluation creation
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu",
+    timeout=120.0  # 2 minutes
+)
+
+# ❌ Bad - timeout too short
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu", 
+    timeout=5.0  # Likely to timeout
+)
+```
+
+### 3. Store Evaluation IDs
+```python
+# ✅ Good - store evaluation ID for later retrieval
+evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+if evaluation:
+    # Store this ID in your database/system
+    evaluation_id = evaluation.id
+    print(f"Store this ID: {evaluation_id}")
+```
+
+## Next Steps
+
+- Learn how to [retrieve results](results.md) for your evaluations
+- Explore [code examples](../examples/creating-evaluations.md) for common patterns
+- Understand [error handling](errors.md) for robust applications
\ No newline at end of file
diff --git a/docs/api-reference/models-benchmarks.md b/docs/api-reference/models-benchmarks.md
new file mode 100644
index 0000000..ccefdb1
--- /dev/null
+++ b/docs/api-reference/models-benchmarks.md
@@ -0,0 +1,323 @@
+# Models & Benchmarks
+
+This page provides reference information about available models and benchmarks in the Atlas platform, along with guidance on selecting appropriate combinations for your evaluations.
+
+## Overview
+
+Atlas evaluations require two key components:
+- **Model**: The AI model you want to evaluate
+- **Benchmark**: The dataset/test suite to evaluate the model against
+
+The availability of models and benchmarks depends on your organization's access level and the specific Atlas deployment you're using.
+
+## Models
+
+### Model Identification
+
+Models are identified by string IDs that you pass to the `evaluations.create()` method:
+
+```python
+from atlas import Atlas
+
+client = Atlas()
+
+# Using model ID
+evaluation = client.evaluations.create(
+    model="gpt-4",  # Model ID
+    benchmark="mmlu"
+)
+```
+
+### Model Information
+
+When you create an evaluation, the response includes detailed model information:
+
+```python
+evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+if evaluation:
+    print(f"Model ID: {evaluation.model_id}")           # "gpt-4"
+    print(f"Model Name: {evaluation.model_name}")       # "GPT-4"
+    print(f"Model Key: {evaluation.model_key}")         # Internal key
+    print(f"Model Company: {evaluation.model_company}") # "OpenAI"
+```
+
+## Benchmarks
+
+### Benchmark Identification
+
+Benchmarks are identified by string IDs representing different evaluation datasets:
+
+```python
+from atlas import Atlas
+
+client = Atlas()
+
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"  # Benchmark ID
+)
+```
+
+### Benchmark Information
+
+Evaluation responses include benchmark details:
+
+```python
+evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+if evaluation:
+    print(f"Dataset ID: {evaluation.dataset_id}")       # "mmlu"
+    print(f"Dataset Name: {evaluation.dataset_name}")   # "MMLU"
+```
+
+### Performance Expectations
+
+Different model-benchmark combinations yield different types of insights:
+
+#### General Intelligence Assessment
+```python
+# Broad capability assessment
+models = ["gpt-4", "claude-3-opus", "llama-2-70b"]
+benchmark = "mmlu"
+
+for model in models:
+    evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+    # Compare general intelligence across models
+```
+
+#### Specialized Task Performance
+```python
+# Code generation comparison
+models = ["gpt-4", "code-llama-34b", "claude-3-sonnet"]
+benchmark = "humaneval"
+
+for model in models:
+    evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+    # Compare coding abilities
+```
+
+## Discovery and Validation
+
+### Finding Available Models and Benchmarks
+
+#### Check the Atlas Dashboard
+The most reliable way to find available models and benchmarks:
+
+1. Log into your Atlas dashboard
+2. Navigate to the evaluation creation page
+3. View dropdown lists of available models and benchmarks
+4. Note the exact IDs for use in your code
+
+#### Programmatic Discovery
+
+While the SDK doesn't currently provide discovery endpoints, you can validate model/benchmark existence:
+
+```python
+import atlas
+from atlas import Atlas
+
+def validate_model_benchmark(model_id: str, benchmark_id: str) -> bool:
+    """Test if a model/benchmark combination is available"""
+    client = Atlas()
+    
+    try:
+        evaluation = client.evaluations.create(
+            model=model_id,
+            benchmark=benchmark_id
+        )
+        
+        if evaluation:
+            print(f"✅ Valid: {model_id} + {benchmark_id}")
+            return True
+        else:
+            print(f"❌ Invalid: {model_id} + {benchmark_id}")
+            return False
+            
+    except atlas.NotFoundError:
+        print(f"❌ Not found: {model_id} or {benchmark_id}")
+        return False
+    except atlas.PermissionDeniedError:
+        print(f"❌ No access: {model_id} or {benchmark_id}")
+        return False
+    except atlas.APIError as e:
+        print(f"❌ Error: {e}")
+        return False
+
+# Test combinations
+combinations = [
+    ("gpt-4", "mmlu"),
+    ("claude-3-opus", "hellaswag"),
+    ("llama-2-70b", "arc-challenge"),
+    ("nonexistent-model", "mmlu"),  # Should fail
+]
+
+for model, benchmark in combinations:
+    validate_model_benchmark(model, benchmark)
+```
+
+### Batch Validation
+
+```python
+def batch_validate_combinations(model_benchmark_pairs):
+    """Validate multiple model/benchmark combinations"""
+    client = Atlas()
+    results = {}
+    
+    for model, benchmark in model_benchmark_pairs:
+        try:
+            evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+            results[(model, benchmark)] = {
+                "valid": evaluation is not None,
+                "evaluation_id": evaluation.id if evaluation else None,
+                "model_name": evaluation.model_name if evaluation else None,
+                "dataset_name": evaluation.dataset_name if evaluation else None,
+            }
+        except atlas.APIError as e:
+            results[(model, benchmark)] = {
+                "valid": False,
+                "error": str(e),
+                "error_type": type(e).__name__
+            }
+    
+    return results
+
+# Example usage
+combinations = [
+    ("gpt-4", "mmlu"),
+    ("claude-3-sonnet", "hellaswag"),
+    ("llama-2-70b", "gsm8k"),
+]
+
+results = batch_validate_combinations(combinations)
+for (model, benchmark), result in results.items():
+    status = "✅" if result["valid"] else "❌"
+    print(f"{status} {model} + {benchmark}: {result}")
+```
+
+### Validate Before Production Use
+
+```python
+def safe_create_evaluation(model: str, benchmark: str):
+    """Create evaluation with validation and error handling"""
+    client = Atlas()
+    
+    # Validate combination first
+    if not validate_model_benchmark(model, benchmark):
+        return None
+    
+    try:
+        evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+        
+        if evaluation:
+            print(f"✅ Evaluation created successfully:")
+            print(f"   ID: {evaluation.id}")
+            print(f"   Model: {evaluation.model_name} ({evaluation.model_company})")
+            print(f"   Benchmark: {evaluation.dataset_name}")
+            return evaluation
+        else:
+            print(f"❌ Failed to create evaluation")
+            return None
+            
+    except atlas.APIError as e:
+        print(f"❌ API error: {e}")
+        return None
+
+# Usage
+evaluation = safe_create_evaluation("gpt-4", "mmlu")
+```
+
+### 4. Document Model and Benchmark Choices
+
+```python
+# Document your evaluation strategy
+EVALUATION_CONFIGS = {
+    "general_intelligence": {
+        "models": ["gpt-4", "claude-3-opus", "gemini-pro"],
+        "benchmarks": ["mmlu", "arc-challenge", "hellaswag"],
+        "description": "Broad cognitive ability assessment"
+    },
+    "code_generation": {
+        "models": ["gpt-4", "code-llama-34b", "claude-3-sonnet"],
+        "benchmarks": ["humaneval", "mbpp", "apps"],
+        "description": "Programming and code generation capabilities"
+    },
+    "mathematical_reasoning": {
+        "models": ["gpt-4", "claude-3-opus", "minerva-62b"],
+        "benchmarks": ["gsm8k", "math", "minerva-math"],
+        "description": "Mathematical problem-solving abilities"
+    }
+}
+
+def run_evaluation_suite(suite_name: str):
+    """Run a predefined evaluation suite"""
+    if suite_name not in EVALUATION_CONFIGS:
+        print(f"Unknown suite: {suite_name}")
+        return
+    
+    config = EVALUATION_CONFIGS[suite_name]
+    print(f"Running {suite_name}: {config['description']}")
+    
+    client = Atlas()
+    evaluations = []
+    
+    for model in config["models"]:
+        for benchmark in config["benchmarks"]:
+            evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+            if evaluation:
+                evaluations.append(evaluation)
+                print(f"✅ {model} + {benchmark}: {evaluation.id}")
+    
+    return evaluations
+
+# Run comprehensive evaluation
+evaluations = run_evaluation_suite("general_intelligence")
+```
+
+## Troubleshooting
+
+### Model or Benchmark Not Found
+
+```python
+try:
+    evaluation = client.evaluations.create(
+        model="nonexistent-model",
+        benchmark="mmlu"
+    )
+except atlas.NotFoundError:
+    print("Model or benchmark not found. Check:")
+    print("1. Spelling of model/benchmark ID")
+    print("2. Available options in Atlas dashboard")
+    print("3. Your organization's access permissions")
+```
+
+### Permission Issues
+
+```python
+try:
+    evaluation = client.evaluations.create(
+        model="restricted-model",
+        benchmark="private-benchmark"
+    )
+except atlas.PermissionDeniedError:
+    print("Access denied. Possible causes:")
+    print("1. Model requires higher permission level")
+    print("2. Benchmark is not available to your organization")
+    print("3. Project doesn't have access to these resources")
+```
+
+### Validation Errors
+
+```python
+try:
+    evaluation = client.evaluations.create(
+        model="",  # Empty string
+        benchmark="mmlu"
+    )
+except atlas.BadRequestError:
+    print("Invalid request parameters:")
+    print("- Model and benchmark IDs cannot be empty")
+    print("- IDs must be valid strings")
+```
+
+For more information about available models and benchmarks, consult your Atlas dashboard or contact your LayerLens administrator.
\ No newline at end of file
diff --git a/docs/api-reference/results.md b/docs/api-reference/results.md
new file mode 100644
index 0000000..54d6196
--- /dev/null
+++ b/docs/api-reference/results.md
@@ -0,0 +1,384 @@
+# Results
+
+The `results` resource allows you to retrieve detailed results from completed evaluations. This provides granular insight into how your model performed on individual test cases.
+
+## Overview
+
+Results contain detailed information about each test case in an evaluation, including the prompt, model response, expected answer, scoring metrics, and performance data.
+
+## Methods
+
+### `get(evaluation_id, timeout=None)`
+
+Retrieves detailed results for a specific evaluation.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `evaluation_id` | `str` | Yes | The evaluation identifier to get results for |
+| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
+
+#### Returns
+
+Returns a list of `Result` objects if successful, `None` if no results are found or the evaluation doesn't exist.
+
+#### Example
+
+```python
+from atlas import Atlas
+
+client = Atlas()
+
+# Get results for a specific evaluation
+results = client.results.get(evaluation_id="eval_12345")
+
+if results:
+    print(f"Retrieved {len(results)} results")
+    for i, result in enumerate(results[:3]):  # Show first 3
+        print(f"\nResult {i+1}:")
+        print(f"  Subset: {result.subset}")
+        print(f"  Score: {result.score}")
+        print(f"  Duration: {result.duration}")
+else:
+    print("No results found or evaluation doesn't exist")
+```
+
+#### With Custom Timeout
+
+```python
+# Get results with custom timeout (2 minutes)
+results = client.results.get(
+    evaluation_id="eval_12345",
+    timeout=120.0
+)
+```
+
+## Result Object
+
+Each `Result` object contains the following properties:
+
+### Core Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `subset` | `str` | The benchmark subset or category this test case belongs to |
+| `prompt` | `str` | The input prompt given to the model |
+| `result` | `str` | The model's response/output |
+| `truth` | `str` | The expected or correct answer |
+| `score` | `float` | Individual score for this test case (typically 0.0 to 1.0) |
+| `duration` | `timedelta` | Time taken for the model to respond |
+| `metrics` | `Dict[str, float]` | Additional metrics specific to this test case |
+
+### Understanding Properties
+
+- **`subset`**: Groups related test cases (e.g., "elementary_mathematics", "world_history")
+- **`prompt`**: The exact input sent to the model
+- **`result`**: The model's actual response 
+- **`truth`**: The ground truth or expected answer for comparison
+- **`score`**: Individual test case score, usually binary (0.0 or 1.0) for correctness
+- **`duration`**: Response latency as a Python `timedelta` object
+- **`metrics`**: Additional scoring metrics that may be benchmark-specific
+
+## Complete Example
+
+```python
+import atlas
+from atlas import Atlas
+from datetime import timedelta
+
+def analyze_evaluation_results(evaluation_id: str):
+    client = Atlas()
+    
+    try:
+        # Get results
+        results = client.results.get(evaluation_id=evaluation_id)
+        
+        if not results:
+            print(f"❌ No results found for evaluation {evaluation_id}")
+            return
+            
+        print(f"📊 Analysis for evaluation {evaluation_id}")
+        print(f"📈 Total test cases: {len(results)}")
+        
+        # Calculate overall statistics
+        total_score = sum(result.score for result in results)
+        avg_score = total_score / len(results)
+        correct_answers = sum(1 for result in results if result.score > 0.5)
+        accuracy = correct_answers / len(results)
+        
+        # Calculate timing statistics  
+        durations = [result.duration for result in results]
+        avg_duration = sum(durations, timedelta()) / len(durations)
+        min_duration = min(durations)
+        max_duration = max(durations)
+        
+        print(f"\n🎯 Performance Metrics:")
+        print(f"   Average Score: {avg_score:.3f}")
+        print(f"   Accuracy: {accuracy:.1%} ({correct_answers}/{len(results)})")
+        print(f"   Average Duration: {avg_duration}")
+        print(f"   Min Duration: {min_duration}")
+        print(f"   Max Duration: {max_duration}")
+        
+        # Group by subset
+        subset_stats = {}
+        for result in results:
+            if result.subset not in subset_stats:
+                subset_stats[result.subset] = {"scores": [], "count": 0}
+            subset_stats[result.subset]["scores"].append(result.score)
+            subset_stats[result.subset]["count"] += 1
+        
+        print(f"\n📋 Performance by Subset:")
+        for subset, stats in subset_stats.items():
+            subset_avg = sum(stats["scores"]) / len(stats["scores"])
+            subset_acc = sum(1 for s in stats["scores"] if s > 0.5) / len(stats["scores"])
+            print(f"   {subset}: {subset_acc:.1%} accuracy ({subset_avg:.3f} avg score, {stats['count']} cases)")
+        
+        # Show some example results
+        print(f"\n🔍 Sample Results:")
+        for i, result in enumerate(results[:3]):
+            status = "✅ Correct" if result.score > 0.5 else "❌ Incorrect"
+            print(f"\n   Example {i+1} [{result.subset}] - {status}")
+            print(f"   Prompt: {result.prompt[:100]}...")
+            print(f"   Model Answer: {result.result[:100]}...")
+            print(f"   Expected: {result.truth[:100]}...")
+            print(f"   Score: {result.score}, Duration: {result.duration}")
+            
+            if result.metrics:
+                print(f"   Additional Metrics: {result.metrics}")
+        
+        return results
+        
+    except atlas.NotFoundError:
+        print(f"❌ Evaluation {evaluation_id} not found")
+    except atlas.AuthenticationError:
+        print("❌ Authentication failed - check your API key")
+    except atlas.APIConnectionError as e:
+        print(f"❌ Connection error: {e}")
+    except atlas.APIError as e:
+        print(f"❌ API error: {e}")
+    
+    return None
+
+if __name__ == "__main__":
+    # Example usage
+    evaluation_id = "eval_12345"  # Replace with actual evaluation ID
+    results = analyze_evaluation_results(evaluation_id)
+```
+
+## Working with Large Result Sets
+
+For evaluations with many test cases, consider processing results in batches:
+
+```python
+from atlas import Atlas
+
+def process_results_efficiently(evaluation_id: str):
+    client = Atlas()
+    
+    results = client.results.get(evaluation_id=evaluation_id)
+    if not results:
+        return
+    
+    print(f"Processing {len(results)} results...")
+    
+    # Process in chunks to avoid memory issues with very large result sets
+    chunk_size = 100
+    for i in range(0, len(results), chunk_size):
+        chunk = results[i:i+chunk_size]
+        
+        print(f"Processing results {i+1}-{min(i+chunk_size, len(results))}...")
+        
+        # Process this chunk
+        for result in chunk:
+            # Your processing logic here
+            pass
+```
+
+## Filtering and Analysis
+
+### Filter by Subset
+
+```python
+def analyze_subset_performance(results, target_subset):
+    subset_results = [r for r in results if r.subset == target_subset]
+    
+    if not subset_results:
+        print(f"No results found for subset '{target_subset}'")
+        return
+        
+    accuracy = sum(1 for r in subset_results if r.score > 0.5) / len(subset_results)
+    avg_duration = sum(r.duration for r in subset_results) / len(subset_results)
+    
+    print(f"Subset '{target_subset}' Performance:")
+    print(f"  Test cases: {len(subset_results)}")
+    print(f"  Accuracy: {accuracy:.1%}")
+    print(f"  Average duration: {avg_duration}")
+
+# Usage
+results = client.results.get(evaluation_id="eval_12345")
+if results:
+    analyze_subset_performance(results, "elementary_mathematics")
+```
+
+### Find Difficult Cases
+
+```python
+def find_difficult_cases(results, score_threshold=0.3):
+    """Find test cases where the model struggled"""
+    difficult_cases = [r for r in results if r.score < score_threshold]
+    
+    print(f"Found {len(difficult_cases)} difficult cases (score < {score_threshold})")
+    
+    for case in difficult_cases[:5]:  # Show first 5
+        print(f"\nDifficult Case [{case.subset}]:")
+        print(f"  Prompt: {case.prompt[:100]}...")
+        print(f"  Model: {case.result[:50]}...")
+        print(f"  Expected: {case.truth[:50]}...")
+        print(f"  Score: {case.score}")
+
+# Usage
+results = client.results.get(evaluation_id="eval_12345")
+if results:
+    find_difficult_cases(results)
+```
+
+## Error Handling
+
+### Common Errors
+
+```python
+import atlas
+from atlas import Atlas
+
+client = Atlas()
+
+try:
+    results = client.results.get(evaluation_id="nonexistent_eval")
+except atlas.NotFoundError:
+    print("Evaluation not found or no results available")
+except atlas.AuthenticationError:
+    print("Authentication failed")
+except atlas.PermissionDeniedError:
+    print("No permission to access this evaluation")
+```
+
+### Handling Empty Results
+
+```python
+def safe_get_results(client, evaluation_id):
+    """Safely get results with proper error handling"""
+    try:
+        results = client.results.get(evaluation_id=evaluation_id)
+        
+        if results is None:
+            print(f"No results found for evaluation {evaluation_id}")
+            print("This could mean:")
+            print("- Evaluation doesn't exist")
+            print("- Evaluation not yet completed")
+            print("- No permission to access results")
+            return []
+            
+        if len(results) == 0:
+            print(f"Evaluation {evaluation_id} has no test cases")
+            return []
+            
+        return results
+        
+    except atlas.APIError as e:
+        print(f"Error retrieving results: {e}")
+        return []
+```
+
+## Performance Considerations
+
+### Large Result Sets
+Results can contain thousands of individual test cases. Consider:
+
+```python
+# ✅ Good - check result size first
+results = client.results.get(evaluation_id="eval_12345")
+if results:
+    print(f"Retrieved {len(results)} results")
+    if len(results) > 1000:
+        print("Large result set - consider processing in chunks")
+
+# ❌ Bad - not considering memory usage
+results = client.results.get(evaluation_id="eval_12345")
+# Process all results in memory without considering size
+```
+
+### Caching Results
+For repeated analysis, consider caching results:
+
+```python
+import pickle
+from pathlib import Path
+
+def get_cached_results(client, evaluation_id, cache_dir="cache"):
+    cache_path = Path(cache_dir) / f"{evaluation_id}_results.pkl"
+    
+    if cache_path.exists():
+        print("Loading cached results...")
+        with open(cache_path, 'rb') as f:
+            return pickle.load(f)
+    
+    print("Fetching fresh results...")
+    results = client.results.get(evaluation_id=evaluation_id)
+    
+    if results:
+        cache_path.parent.mkdir(exist_ok=True)
+        with open(cache_path, 'wb') as f:
+            pickle.dump(results, f)
+    
+    return results
+```
+
+## Best Practices
+
+### 1. Always Check for Results
+```python
+# ✅ Good - check if results exist
+results = client.results.get(evaluation_id="eval_12345")
+if results:
+    print(f"Found {len(results)} results")
+else:
+    print("No results available")
+
+# ❌ Bad - assume results exist
+results = client.results.get(evaluation_id="eval_12345") 
+print(f"Found {len(results)} results")  # Could raise AttributeError
+```
+
+### 2. Handle Large Result Sets Appropriately  
+```python
+# ✅ Good - process in chunks for large sets
+if len(results) > 1000:
+    for i in range(0, len(results), 100):
+        chunk = results[i:i+100]
+        process_chunk(chunk)
+
+# ❌ Bad - process everything in memory
+for result in results:  # Could be thousands of results
+    expensive_processing(result)
+```
+
+### 3. Use Meaningful Analysis
+```python
+# ✅ Good - extract meaningful insights
+subset_performance = {}
+for result in results:
+    if result.subset not in subset_performance:
+        subset_performance[result.subset] = []
+    subset_performance[result.subset].append(result.score)
+
+# ❌ Bad - just print raw data
+for result in results:
+    print(result.score)  # Not very useful
+```
+
+## Next Steps
+
+- Learn about [error handling](errors.md) for robust applications
+- Explore [code examples](../examples/retrieving-results.md) for common analysis patterns
+- Check out [troubleshooting](../troubleshooting/) for common issues
\ No newline at end of file
diff --git a/docs/examples/advanced-usage.md b/docs/examples/advanced-usage.md
new file mode 100644
index 0000000..6db48c5
--- /dev/null
+++ b/docs/examples/advanced-usage.md
@@ -0,0 +1,461 @@
+# Advanced Usage Patterns
+
+This guide covers practical advanced techniques for using the Atlas Python SDK in production environments.
+
+## Environment Variables Setup
+
+The Atlas SDK reads your credentials from environment variables. You can set them up however you prefer:
+
+```python
+import os
+from atlas import Atlas
+
+# Option 1: Load from system environment variables
+client = Atlas()  # Automatically uses LAYERLENS_ATLAS_API_KEY, etc.
+
+# Option 2: Using python-dotenv (if you prefer .env files)
+from dotenv import load_dotenv
+load_dotenv()  # Loads from .env file
+client = Atlas()
+```
+
+Required environment variables:
+- `LAYERLENS_ATLAS_API_KEY` - Your Atlas API key
+- `LAYERLENS_ATLAS_ORG_ID` - Your organization ID  
+- `LAYERLENS_ATLAS_PROJECT_ID` - Your project ID
+
+## Batch Processing
+
+### Running Multiple Evaluations
+
+```python
+import time
+from atlas import Atlas
+import atlas
+
+def run_evaluation_batch(models, benchmarks):
+    """Run evaluations for multiple model-benchmark combinations"""
+    client = Atlas()
+    
+    results = {'successful': [], 'failed': []}
+    
+    for model in models:
+        for benchmark in benchmarks:
+            print(f"Creating evaluation: {model} on {benchmark}")
+            
+            try:
+                evaluation = client.evaluations.create(
+                    model=model,
+                    benchmark=benchmark
+                )
+                
+                if evaluation:
+                    results['successful'].append({
+                        'model': model,
+                        'benchmark': benchmark, 
+                        'evaluation_id': evaluation.id
+                    })
+                    print(f"✅ Created: {evaluation.id}")
+                else:
+                    results['failed'].append({
+                        'model': model,
+                        'benchmark': benchmark,
+                        'error': 'No evaluation returned'
+                    })
+                    
+            except atlas.RateLimitError:
+                print("Rate limited, waiting 60 seconds...")
+                time.sleep(60)
+                
+            except atlas.APIError as e:
+                print(f"❌ Failed: {e}")
+                results['failed'].append({
+                    'model': model,
+                    'benchmark': benchmark, 
+                    'error': str(e)
+                })
+            
+            time.sleep(2)
+    
+    return results
+
+# Usage
+models = ["gpt-4", "claude-3-opus"]
+benchmarks = ["mmlu", "hellaswag"]
+
+batch_results = run_evaluation_batch(models, benchmarks)
+print(f"✅ Successful: {len(batch_results['successful'])}")
+print(f"❌ Failed: {len(batch_results['failed'])}")
+```
+
+## Error Handling Patterns
+
+### Robust Error Handling
+
+```python
+import time
+from atlas import Atlas
+import atlas
+
+def create_evaluation_with_retries(model, benchmark, max_retries=3):
+    """Create evaluation with automatic retries"""
+    client = Atlas()
+    
+    for attempt in range(max_retries):
+        try:
+            evaluation = client.evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                print(f"✅ Success on attempt {attempt + 1}")
+                return evaluation
+            
+        except atlas.RateLimitError as e:
+            print(f"Rate limited on attempt {attempt + 1}")
+            if attempt < max_retries - 1:
+                # Check if server provided retry-after header
+                retry_after = getattr(e.response, 'headers', {}).get('retry-after', 60)
+                wait_time = int(retry_after)
+                print(f"Waiting {wait_time} seconds...")
+                time.sleep(wait_time)
+            else:
+                raise
+                
+        except atlas.NotFoundError:
+            print(f"❌ Model '{model}' or benchmark '{benchmark}' not found")
+            return None
+            
+        except atlas.AuthenticationError:
+            print("❌ Authentication failed - check your API key")
+            raise
+            
+        except atlas.APIError as e:
+            print(f"❌ API error on attempt {attempt + 1}: {e}")
+            if attempt < max_retries - 1:
+                time.sleep(2 ** attempt)  # Exponential backoff
+            else:
+                raise
+    
+    return None
+
+# Usage
+evaluation = create_evaluation_with_retries("gpt-4", "mmlu")
+```
+
+## Result Processing
+
+### Processing Large Result Sets
+
+```python
+from atlas import Atlas
+import json
+from typing import Dict, List
+
+def analyze_evaluation_results(evaluation_id: str) -> Dict:
+    """Analyze results from an evaluation"""
+    client = Atlas()
+    
+    try:
+        results = client.results.get(evaluation_id=evaluation_id)
+        
+        if not results:
+            return {"error": "No results found"}
+        
+        # Basic analysis
+        analysis = {
+            "total_results": len(results),
+            "subsets": {},
+            "overall_accuracy": 0,
+            "avg_duration": 0
+        }
+        
+        total_score = 0
+        total_duration = 0
+        
+        for result in results:
+            # Track by subset
+            if result.subset not in analysis["subsets"]:
+                analysis["subsets"][result.subset] = {
+                    "count": 0,
+                    "total_score": 0,
+                    "accuracy": 0
+                }
+            
+            analysis["subsets"][result.subset]["count"] += 1
+            analysis["subsets"][result.subset]["total_score"] += result.score
+            
+            total_score += result.score
+            total_duration += result.duration.total_seconds()
+        
+        # Calculate averages
+        analysis["overall_accuracy"] = total_score / len(results)
+        analysis["avg_duration"] = total_duration / len(results)
+        
+        # Calculate subset accuracies
+        for subset_data in analysis["subsets"].values():
+            subset_data["accuracy"] = subset_data["total_score"] / subset_data["count"]
+        
+        return analysis
+        
+    except atlas.APIError as e:
+        return {"error": str(e)}
+
+# Usage
+analysis = analyze_evaluation_results("eval_123")
+if "error" not in analysis:
+    print(f"📊 Analysis Results:")
+    print(f"   Total results: {analysis['total_results']}")
+    print(f"   Overall accuracy: {analysis['overall_accuracy']:.2%}")
+    print(f"   Average duration: {analysis['avg_duration']:.2f}s")
+    
+    print(f"   By subset:")
+    for subset, data in analysis['subsets'].items():
+        print(f"     {subset}: {data['accuracy']:.2%} ({data['count']} results)")
+```
+
+## Production Timeouts
+
+### Different Timeout Strategies
+
+```python
+from atlas import Atlas
+
+# Different timeout configurations for different use cases
+
+# Development: Fail fast
+dev_client = Atlas(timeout=30.0)  # 30 seconds
+
+# Production: More patient
+prod_client = Atlas(timeout=600.0)  # 10 minutes
+
+# Long-running batch jobs: Very patient  
+batch_client = Atlas(timeout=1800.0)  # 30 minutes
+
+def adaptive_timeout_client(operation_type="default"):
+    """Get client with timeout appropriate for operation"""
+    timeouts = {
+        "quick": 30.0,      # For testing connectivity
+        "default": 300.0,   # For normal operations
+        "batch": 1800.0,    # For batch processing
+        "patient": 3600.0   # For very long evaluations
+    }
+    
+    timeout = timeouts.get(operation_type, timeouts["default"])
+    return Atlas(timeout=timeout)
+
+# Usage
+quick_client = adaptive_timeout_client("quick")
+batch_client = adaptive_timeout_client("batch")
+```
+
+## Logging and Monitoring
+
+### Simple Logging Setup
+
+```python
+import logging
+import time
+from atlas import Atlas
+import atlas
+
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger('atlas-client')
+
+def create_evaluation_with_logging(model, benchmark):
+    """Create evaluation with comprehensive logging"""
+    client = Atlas()
+    
+    logger.info(f"Creating evaluation: {model} on {benchmark}")
+    start_time = time.time()
+    
+    try:
+        evaluation = client.evaluations.create(
+            model=model,
+            benchmark=benchmark
+        )
+        
+        duration = time.time() - start_time
+        
+        if evaluation:
+            logger.info(
+                f"Evaluation created successfully: {evaluation.id} "
+                f"(duration: {duration:.2f}s)"
+            )
+            return evaluation
+        else:
+            logger.warning(
+                f"No evaluation returned for {model}+{benchmark} "
+                f"(duration: {duration:.2f}s)"
+            )
+            return None
+            
+    except atlas.APIError as e:
+        duration = time.time() - start_time
+        logger.error(
+            f"Failed to create evaluation {model}+{benchmark}: {e} "
+            f"(duration: {duration:.2f}s)"
+        )
+        raise
+
+# Usage
+evaluation = create_evaluation_with_logging("gpt-4", "mmlu")
+```
+
+## Health Checks
+
+### Simple Health Check
+
+```python
+from atlas import Atlas
+import atlas
+
+def check_atlas_health():
+    """Simple health check for Atlas service"""
+    try:
+        client = Atlas(timeout=10.0)  # Short timeout for health check
+        
+        # Try to create a test evaluation (will fail but tests connectivity)
+        try:
+            client.evaluations.create(
+                model="__health_check__", 
+                benchmark="__health_check__"
+            )
+        except atlas.NotFoundError:
+            # Expected - health check resources don't exist
+            return {"status": "healthy", "message": "API is reachable"}
+        except atlas.BadRequestError:
+            # Also expected - invalid parameters
+            return {"status": "healthy", "message": "API is reachable"}
+            
+    except atlas.AuthenticationError:
+        return {
+            "status": "unhealthy", 
+            "error": "Authentication failed - check API key"
+        }
+    except atlas.APIConnectionError:
+        return {
+            "status": "unhealthy",
+            "error": "Cannot connect to Atlas API"
+        }
+    except atlas.APITimeoutError:
+        return {
+            "status": "unhealthy", 
+            "error": "Health check timed out"
+        }
+    except Exception as e:
+        return {
+            "status": "unhealthy",
+            "error": f"Unexpected error: {e}"
+        }
+
+# Usage
+health = check_atlas_health()
+if health["status"] == "healthy":
+    print("✅ Atlas service is healthy")
+else:
+    print(f"❌ Atlas service is unhealthy: {health['error']}")
+```
+
+## Integration Patterns
+
+### Using with Flask/FastAPI
+
+```python
+from flask import Flask, jsonify, request
+from atlas import Atlas
+import atlas
+
+app = Flask(__name__)
+
+# Initialize Atlas client once
+atlas_client = Atlas()
+
+@app.route('/health')
+def health_check():
+    """Health check endpoint"""
+    health = check_atlas_health()  # From example above
+    status_code = 200 if health["status"] == "healthy" else 503
+    return jsonify(health), status_code
+
+@app.route('/evaluations', methods=['POST'])
+def create_evaluation():
+    """Create evaluation endpoint"""
+    try:
+        data = request.get_json()
+        model = data.get('model')
+        benchmark = data.get('benchmark')
+        
+        if not model or not benchmark:
+            return jsonify({
+                "error": "Missing required fields: model, benchmark"
+            }), 400
+        
+        evaluation = atlas_client.evaluations.create(
+            model=model,
+            benchmark=benchmark
+        )
+        
+        if evaluation:
+            return jsonify({
+                "success": True,
+                "evaluation_id": evaluation.id,
+                "status": evaluation.status
+            })
+        else:
+            return jsonify({
+                "success": False,
+                "error": "Failed to create evaluation"
+            }), 500
+            
+    except atlas.NotFoundError:
+        return jsonify({
+            "success": False,
+            "error": "Model or benchmark not found"
+        }), 404
+        
+    except atlas.APIError as e:
+        return jsonify({
+            "success": False, 
+            "error": str(e)
+        }), 500
+
+@app.route('/evaluations/<evaluation_id>/results')
+def get_results(evaluation_id):
+    """Get evaluation results endpoint"""
+    try:
+        results = atlas_client.results.get(evaluation_id=evaluation_id)
+        
+        if results:
+            return jsonify({
+                "success": True,
+                "result_count": len(results),
+                "results": [
+                    {
+                        "subset": r.subset,
+                        "score": r.score,
+                        "duration_seconds": r.duration.total_seconds()
+                    }
+                    for r in results
+                ]
+            })
+        else:
+            return jsonify({
+                "success": False,
+                "error": "No results found"
+            }), 404
+            
+    except atlas.APIError as e:
+        return jsonify({
+            "success": False,
+            "error": str(e)
+        }), 500
+
+if __name__ == '__main__':
+    app.run(debug=True)
+```
diff --git a/docs/examples/creating-evaluations.md b/docs/examples/creating-evaluations.md
new file mode 100644
index 0000000..37da6bf
--- /dev/null
+++ b/docs/examples/creating-evaluations.md
@@ -0,0 +1,799 @@
+# Creating Evaluations
+
+This guide provides practical examples for creating evaluations with the Atlas Python SDK.
+
+## Basic Evaluation Creation
+
+### Simple Evaluation
+
+The most straightforward way to create an evaluation:
+
+```python
+from atlas import Atlas
+
+# Initialize client
+client = Atlas()
+
+# Create evaluation
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"
+)
+
+if evaluation:
+    print(f"✅ Evaluation created: {evaluation.id}")
+    print(f"   Model: {evaluation.model_name}")
+    print(f"   Benchmark: {evaluation.dataset_name}")
+    print(f"   Status: {evaluation.status}")
+else:
+    print("❌ Failed to create evaluation")
+```
+
+### With Explicit Configuration
+
+Using explicit client configuration instead of environment variables:
+
+```python
+from atlas import Atlas
+
+# Explicit configuration
+client = Atlas(
+    api_key="your_api_key_here",
+    organization_id="your_org_id", 
+    project_id="your_project_id"
+)
+
+evaluation = client.evaluations.create(
+    model="claude-3-opus",
+    benchmark="hellaswag"
+)
+
+if evaluation:
+    print(f"Evaluation ID: {evaluation.id}")
+    print(f"Submitted at: {evaluation.submitted_at}")
+```
+
+## Batch Evaluation Creation
+
+### Multiple Models on Same Benchmark
+
+Compare multiple models against the same benchmark:
+
+```python
+from atlas import Atlas
+import time
+
+def compare_models_on_benchmark(models: list, benchmark: str):
+    """Create evaluations for multiple models on the same benchmark"""
+    client = Atlas()
+    evaluations = []
+    
+    print(f"🔄 Creating evaluations for {len(models)} models on {benchmark}")
+    
+    for model in models:
+        try:
+            evaluation = client.evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                evaluations.append({
+                    "model": model,
+                    "evaluation_id": evaluation.id,
+                    "model_name": evaluation.model_name,
+                    "status": evaluation.status
+                })
+                print(f"✅ {model}: {evaluation.id}")
+            else:
+                print(f"❌ Failed to create evaluation for {model}")
+                
+        except Exception as e:
+            print(f"❌ Error creating evaluation for {model}: {e}")
+            
+        # Brief pause between requests to avoid rate limits
+        time.sleep(0.5)
+    
+    return evaluations
+
+# Usage
+models_to_compare = [
+    "gpt-4",
+    "gpt-3.5-turbo", 
+    "claude-3-opus",
+    "claude-3-sonnet",
+    "llama-2-70b"
+]
+
+evaluations = compare_models_on_benchmark(models_to_compare, "mmlu")
+
+# Print summary
+print(f"\n📊 Created {len(evaluations)} evaluations:")
+for eval_info in evaluations:
+    print(f"   {eval_info['model_name']}: {eval_info['evaluation_id']}")
+```
+
+### Single Model on Multiple Benchmarks
+
+Evaluate one model across multiple benchmarks:
+
+```python
+from atlas import Atlas
+import time
+
+def evaluate_model_on_benchmarks(model: str, benchmarks: list):
+    """Evaluate a single model across multiple benchmarks"""
+    client = Atlas()
+    evaluations = []
+    
+    print(f"🔄 Evaluating {model} on {len(benchmarks)} benchmarks")
+    
+    for benchmark in benchmarks:
+        try:
+            evaluation = client.evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                evaluations.append({
+                    "benchmark": benchmark,
+                    "evaluation_id": evaluation.id,
+                    "dataset_name": evaluation.dataset_name,
+                    "status": evaluation.status
+                })
+                print(f"✅ {benchmark}: {evaluation.id}")
+            else:
+                print(f"❌ Failed to create evaluation for {benchmark}")
+                
+        except Exception as e:
+            print(f"❌ Error evaluating on {benchmark}: {e}")
+            
+        time.sleep(0.5)
+    
+    return evaluations
+
+# Usage
+benchmarks_to_test = [
+    "mmlu",
+    "hellaswag", 
+    "arc-challenge",
+    "truthfulqa",
+    "gsm8k"
+]
+
+evaluations = evaluate_model_on_benchmarks("gpt-4", benchmarks_to_test)
+
+print(f"\n📊 Created {len(evaluations)} evaluations for GPT-4:")
+for eval_info in evaluations:
+    print(f"   {eval_info['dataset_name']}: {eval_info['evaluation_id']}")
+```
+
+### Full Matrix Evaluation
+
+Create evaluations for all model-benchmark combinations:
+
+```python
+from atlas import Atlas
+import time
+import itertools
+
+def create_evaluation_matrix(models: list, benchmarks: list, delay: float = 1.0):
+    """Create evaluations for all model-benchmark combinations"""
+    client = Atlas()
+    results = {}
+    total_combinations = len(models) * len(benchmarks)
+    
+    print(f"🔄 Creating {total_combinations} evaluations...")
+    
+    for i, (model, benchmark) in enumerate(itertools.product(models, benchmarks), 1):
+        print(f"\n[{i}/{total_combinations}] {model} + {benchmark}")
+        
+        try:
+            evaluation = client.evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                if model not in results:
+                    results[model] = {}
+                
+                results[model][benchmark] = {
+                    "evaluation_id": evaluation.id,
+                    "model_name": evaluation.model_name,
+                    "dataset_name": evaluation.dataset_name,
+                    "status": evaluation.status,
+                    "success": True
+                }
+                print(f"✅ Success: {evaluation.id}")
+            else:
+                print(f"❌ Failed: No evaluation created")
+                
+        except Exception as e:
+            print(f"❌ Error: {e}")
+            if model not in results:
+                results[model] = {}
+            results[model][benchmark] = {
+                "error": str(e),
+                "success": False
+            }
+        
+        # Rate limiting
+        if i < total_combinations:
+            time.sleep(delay)
+    
+    return results
+
+# Usage
+test_models = ["gpt-4", "claude-3-opus", "llama-2-70b"]
+test_benchmarks = ["mmlu", "hellaswag", "arc-challenge"]
+
+matrix_results = create_evaluation_matrix(test_models, test_benchmarks, delay=2.0)
+
+# Print summary table
+print(f"\n📊 Evaluation Matrix Results:")
+print("Model".ljust(15), end="")
+for benchmark in test_benchmarks:
+    print(benchmark.ljust(15), end="")
+print()
+
+for model in test_models:
+    print(model.ljust(15), end="")
+    for benchmark in test_benchmarks:
+        if model in matrix_results and benchmark in matrix_results[model]:
+            result = matrix_results[model][benchmark]
+            status = "✅" if result["success"] else "❌"
+            print(status.ljust(15), end="")
+        else:
+            print("❓".ljust(15), end="")
+    print()
+```
+
+## Error Handling and Resilience
+
+### Robust Evaluation Creation with Retries
+
+```python
+import atlas
+from atlas import Atlas
+import time
+import random
+
+def create_evaluation_with_retry(
+    model: str, 
+    benchmark: str, 
+    max_retries: int = 3,
+    base_delay: float = 1.0
+):
+    """Create evaluation with exponential backoff retry logic"""
+    client = Atlas()
+    
+    for attempt in range(max_retries):
+        try:
+            print(f"🔄 Attempt {attempt + 1}/{max_retries}: Creating evaluation...")
+            
+            evaluation = client.evaluations.create(
+                model=model,
+                benchmark=benchmark,
+                timeout=120.0  # 2-minute timeout
+            )
+            
+            if evaluation:
+                print(f"✅ Success on attempt {attempt + 1}: {evaluation.id}")
+                return evaluation
+            else:
+                print(f"❌ Evaluation creation returned None on attempt {attempt + 1}")
+                
+        except atlas.RateLimitError as e:
+            retry_after = e.response.headers.get('retry-after', base_delay * (2 ** attempt))
+            print(f"⏳ Rate limited, waiting {retry_after}s...")
+            time.sleep(float(retry_after))
+            continue
+            
+        except atlas.InternalServerError:
+            if attempt < max_retries - 1:
+                delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
+                print(f"🔄 Server error, retrying in {delay:.1f}s...")
+                time.sleep(delay)
+                continue
+            else:
+                print("❌ Server error - max retries exceeded")
+                break
+                
+        except atlas.APIConnectionError:
+            if attempt < max_retries - 1:
+                delay = base_delay * (2 ** attempt)
+                print(f"🔄 Connection error, retrying in {delay:.1f}s...")
+                time.sleep(delay)
+                continue
+            else:
+                print("❌ Connection failed - max retries exceeded")
+                break
+                
+        except atlas.AuthenticationError:
+            print("❌ Authentication failed - check your API key")
+            break
+            
+        except atlas.NotFoundError:
+            print(f"❌ Model '{model}' or benchmark '{benchmark}' not found")
+            break
+            
+        except atlas.PermissionDeniedError:
+            print("❌ Permission denied - check your access rights")
+            break
+            
+        except atlas.APIError as e:
+            print(f"❌ API error: {e}")
+            break
+    
+    return None
+
+# Usage
+evaluation = create_evaluation_with_retry(
+    model="gpt-4",
+    benchmark="mmlu",
+    max_retries=3
+)
+
+if evaluation:
+    print(f"Final result: {evaluation.id}")
+else:
+    print("Failed to create evaluation after all attempts")
+```
+
+### Validation Before Creation
+
+```python
+import atlas
+from atlas import Atlas
+
+def validate_and_create_evaluation(model: str, benchmark: str):
+    """Validate model and benchmark before creating evaluation"""
+    client = Atlas()
+    
+    # Pre-validation checks
+    if not model or not model.strip():
+        print("❌ Model cannot be empty")
+        return None
+        
+    if not benchmark or not benchmark.strip():
+        print("❌ Benchmark cannot be empty")
+        return None
+    
+    print(f"🔍 Validating {model} + {benchmark}...")
+    
+    try:
+        # Attempt to create the evaluation
+        evaluation = client.evaluations.create(
+            model=model.strip(),
+            benchmark=benchmark.strip()
+        )
+        
+        if evaluation:
+            print(f"✅ Validation successful!")
+            print(f"   Evaluation ID: {evaluation.id}")
+            print(f"   Model: {evaluation.model_name} ({evaluation.model_company})")
+            print(f"   Benchmark: {evaluation.dataset_name}")
+            print(f"   Status: {evaluation.status}")
+            return evaluation
+        else:
+            print("❌ Validation failed: No evaluation returned")
+            return None
+            
+    except atlas.NotFoundError:
+        print(f"❌ Validation failed: Model '{model}' or benchmark '{benchmark}' not found")
+        print("💡 Suggestions:")
+        print("   • Check spelling of model and benchmark IDs")
+        print("   • Verify available options in Atlas dashboard")
+        print("   • Ensure your organization has access to these resources")
+        return None
+        
+    except atlas.AuthenticationError:
+        print("❌ Authentication failed")
+        print("💡 Check your API key configuration")
+        return None
+        
+    except atlas.PermissionDeniedError:
+        print("❌ Permission denied")
+        print("💡 Contact your administrator for access")
+        return None
+        
+    except atlas.APIError as e:
+        print(f"❌ Validation failed: {e}")
+        return None
+
+# Usage with validation
+test_combinations = [
+    ("gpt-4", "mmlu"),
+    ("claude-3-opus", "hellaswag"),
+    ("nonexistent-model", "mmlu"),  # This should fail
+    ("gpt-4", "nonexistent-benchmark"),  # This should fail
+]
+
+for model, benchmark in test_combinations:
+    print(f"\n{'='*50}")
+    evaluation = validate_and_create_evaluation(model, benchmark)
+    
+    if evaluation:
+        print(f"Ready to monitor evaluation: {evaluation.id}")
+```
+
+## Custom Timeout Configurations
+
+### Different Timeouts for Different Operations
+
+```python
+from atlas import Atlas
+import httpx
+
+def create_evaluations_with_custom_timeouts():
+    """Demonstrate different timeout configurations"""
+    
+    # Quick timeout for testing connectivity
+    quick_client = Atlas(timeout=30.0)  # 30 seconds
+    
+    # Standard timeout for regular evaluations  
+    standard_client = Atlas(timeout=300.0)  # 5 minutes
+    
+    # Long timeout for complex evaluations
+    patient_client = Atlas(
+        timeout=httpx.Timeout(
+            connect=10.0,   # 10s to connect
+            read=1800.0,    # 30min to read response
+            write=60.0,     # 1min to send request
+            pool=30.0       # 30s for connection pool
+        )
+    )
+    
+    # Test connectivity with quick client
+    print("🔍 Testing connectivity...")
+    try:
+        test_eval = quick_client.evaluations.create(
+            model="gpt-3.5-turbo",  # Faster model for testing
+            benchmark="arc-easy"     # Smaller benchmark for testing
+        )
+        print("✅ Connectivity test passed")
+    except atlas.APITimeoutError:
+        print("❌ Quick connectivity test failed - network issues?")
+        return
+    except atlas.APIError as e:
+        print(f"❌ API error during connectivity test: {e}")
+        return
+    
+    # Create standard evaluation
+    print("\n🔄 Creating standard evaluation...")
+    try:
+        standard_eval = standard_client.evaluations.create(
+            model="gpt-4",
+            benchmark="mmlu"
+        )
+        if standard_eval:
+            print(f"✅ Standard evaluation created: {standard_eval.id}")
+    except atlas.APITimeoutError:
+        print("❌ Standard evaluation timed out")
+    
+    # Create complex evaluation with patient timeout
+    print("\n🔄 Creating complex evaluation...")
+    try:
+        complex_eval = patient_client.evaluations.create(
+            model="gpt-4",
+            benchmark="math"  # Complex benchmark
+        )
+        if complex_eval:
+            print(f"✅ Complex evaluation created: {complex_eval.id}")
+    except atlas.APITimeoutError:
+        print("❌ Complex evaluation timed out even with extended timeout")
+
+# Run the example
+create_evaluations_with_custom_timeouts()
+```
+
+### Per-Request Timeout Override
+
+```python
+from atlas import Atlas
+
+def create_evaluation_with_override_timeout():
+    """Override timeout for specific requests"""
+    client = Atlas(timeout=60.0)  # Default 1-minute timeout
+    
+    evaluations = []
+    
+    # Quick evaluation with short timeout
+    print("🔄 Quick evaluation (30s timeout)...")
+    try:
+        quick_eval = client.with_options(timeout=30.0).evaluations.create(
+            model="gpt-3.5-turbo",
+            benchmark="arc-easy"
+        )
+        if quick_eval:
+            evaluations.append(("Quick", quick_eval))
+            print(f"✅ Quick: {quick_eval.id}")
+    except atlas.APITimeoutError:
+        print("❌ Quick evaluation timed out")
+    
+    # Standard evaluation (uses default timeout)
+    print("\n🔄 Standard evaluation (default 60s timeout)...")
+    try:
+        standard_eval = client.evaluations.create(
+            model="gpt-4",
+            benchmark="mmlu"
+        )
+        if standard_eval:
+            evaluations.append(("Standard", standard_eval))
+            print(f"✅ Standard: {standard_eval.id}")
+    except atlas.APITimeoutError:
+        print("❌ Standard evaluation timed out")
+    
+    # Long evaluation with extended timeout
+    print("\n🔄 Long evaluation (5min timeout)...")
+    try:
+        long_eval = client.with_options(timeout=300.0).evaluations.create(
+            model="gpt-4",
+            benchmark="math"
+        )
+        if long_eval:
+            evaluations.append(("Long", long_eval))
+            print(f"✅ Long: {long_eval.id}")
+    except atlas.APITimeoutError:
+        print("❌ Long evaluation timed out")
+    
+    return evaluations
+
+evaluations = create_evaluation_with_override_timeout()
+print(f"\n📊 Created {len(evaluations)} evaluations total")
+```
+
+## Monitoring and Logging
+
+### Evaluation Creation with Logging
+
+```python
+import logging
+from datetime import datetime
+from atlas import Atlas
+import atlas
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.FileHandler('atlas_evaluations.log'),
+        logging.StreamHandler()
+    ]
+)
+logger = logging.getLogger(__name__)
+
+def create_evaluation_with_logging(model: str, benchmark: str, context: dict = None):
+    """Create evaluation with comprehensive logging"""
+    client = Atlas()
+    context = context or {}
+    
+    logger.info(f"Starting evaluation creation: {model} + {benchmark}")
+    logger.info(f"Context: {context}")
+    
+    start_time = datetime.now()
+    
+    try:
+        evaluation = client.evaluations.create(
+            model=model,
+            benchmark=benchmark
+        )
+        
+        end_time = datetime.now()
+        duration = (end_time - start_time).total_seconds()
+        
+        if evaluation:
+            logger.info(f"✅ Evaluation created successfully in {duration:.2f}s")
+            logger.info(f"   ID: {evaluation.id}")
+            logger.info(f"   Model: {evaluation.model_name} ({evaluation.model_company})")
+            logger.info(f"   Benchmark: {evaluation.dataset_name}")
+            logger.info(f"   Status: {evaluation.status}")
+            logger.info(f"   Submitted at: {evaluation.submitted_at}")
+            
+            return {
+                "success": True,
+                "evaluation": evaluation,
+                "duration": duration,
+                "timestamp": start_time.isoformat()
+            }
+        else:
+            logger.error(f"❌ Evaluation creation failed - returned None")
+            return {
+                "success": False,
+                "error": "No evaluation returned",
+                "duration": duration,
+                "timestamp": start_time.isoformat()
+            }
+            
+    except atlas.RateLimitError as e:
+        logger.warning(f"⏳ Rate limited - request ID: {getattr(e, 'request_id', 'N/A')}")
+        return {"success": False, "error": "rate_limited", "retry_after": e.response.headers.get('retry-after')}
+        
+    except atlas.AuthenticationError:
+        logger.error("❌ Authentication failed - check API key")
+        return {"success": False, "error": "authentication_failed"}
+        
+    except atlas.NotFoundError:
+        logger.error(f"❌ Model '{model}' or benchmark '{benchmark}' not found")
+        return {"success": False, "error": "not_found", "model": model, "benchmark": benchmark}
+        
+    except atlas.APIError as e:
+        logger.error(f"❌ API error: {e}")
+        return {"success": False, "error": str(e), "error_type": type(e).__name__}
+    
+    except Exception as e:
+        logger.error(f"❌ Unexpected error: {e}")
+        return {"success": False, "error": f"unexpected: {e}"}
+
+# Usage
+evaluation_configs = [
+    {"model": "gpt-4", "benchmark": "mmlu", "context": {"purpose": "baseline_test"}},
+    {"model": "claude-3-opus", "benchmark": "hellaswag", "context": {"purpose": "reasoning_comparison"}},
+    {"model": "llama-2-70b", "benchmark": "gsm8k", "context": {"purpose": "math_evaluation"}},
+]
+
+results = []
+for config in evaluation_configs:
+    result = create_evaluation_with_logging(**config)
+    results.append(result)
+    
+    if not result["success"]:
+        logger.error(f"Failed to create evaluation: {config}")
+
+# Summary
+successful = [r for r in results if r["success"]]
+failed = [r for r in results if not r["success"]]
+
+logger.info(f"📊 Summary: {len(successful)} successful, {len(failed)} failed")
+for result in successful:
+    logger.info(f"   ✅ {result['evaluation'].id} ({result['duration']:.2f}s)")
+for result in failed:
+    logger.info(f"   ❌ {result.get('error', 'unknown_error')}")
+```
+
+## Advanced Patterns
+
+### Evaluation Factory Pattern
+
+```python
+from atlas import Atlas
+from abc import ABC, abstractmethod
+from typing import List, Dict, Any
+import atlas
+
+class EvaluationStrategy(ABC):
+    """Abstract base class for evaluation strategies"""
+    
+    @abstractmethod
+    def get_model_benchmark_pairs(self) -> List[tuple]:
+        pass
+    
+    @abstractmethod
+    def get_description(self) -> str:
+        pass
+
+class GeneralIntelligenceStrategy(EvaluationStrategy):
+    """Strategy for general intelligence assessment"""
+    
+    def get_model_benchmark_pairs(self) -> List[tuple]:
+        models = ["gpt-4", "claude-3-opus", "llama-2-70b"]
+        benchmarks = ["mmlu", "arc-challenge", "hellaswag"]
+        return [(m, b) for m in models for b in benchmarks]
+    
+    def get_description(self) -> str:
+        return "General intelligence assessment across major benchmarks"
+
+class CodeGenerationStrategy(EvaluationStrategy):
+    """Strategy for code generation assessment"""
+    
+    def get_model_benchmark_pairs(self) -> List[tuple]:
+        models = ["gpt-4", "code-llama-34b", "claude-3-sonnet"]
+        benchmarks = ["humaneval", "mbpp"]
+        return [(m, b) for m in models for b in benchmarks]
+    
+    def get_description(self) -> str:
+        return "Code generation capability assessment"
+
+class MathReasoningStrategy(EvaluationStrategy):
+    """Strategy for mathematical reasoning assessment"""
+    
+    def get_model_benchmark_pairs(self) -> List[tuple]:
+        models = ["gpt-4", "claude-3-opus", "minerva-62b"]
+        benchmarks = ["gsm8k", "math"]
+        return [(m, b) for m in models for b in benchmarks]
+    
+    def get_description(self) -> str:
+        return "Mathematical reasoning and problem-solving assessment"
+
+class EvaluationFactory:
+    """Factory for creating evaluations based on strategies"""
+    
+    def __init__(self):
+        self.client = Atlas()
+    
+    def execute_strategy(self, strategy: EvaluationStrategy) -> Dict[str, Any]:
+        """Execute an evaluation strategy"""
+        pairs = strategy.get_model_benchmark_pairs()
+        description = strategy.get_description()
+        
+        print(f"🔄 Executing strategy: {description}")
+        print(f"📊 Creating {len(pairs)} evaluations...")
+        
+        results = {
+            "strategy": description,
+            "evaluations": [],
+            "errors": [],
+            "summary": {"total": len(pairs), "successful": 0, "failed": 0}
+        }
+        
+        for model, benchmark in pairs:
+            try:
+                evaluation = self.client.evaluations.create(
+                    model=model,
+                    benchmark=benchmark
+                )
+                
+                if evaluation:
+                    results["evaluations"].append({
+                        "model": model,
+                        "benchmark": benchmark,
+                        "evaluation_id": evaluation.id,
+                        "model_name": evaluation.model_name,
+                        "dataset_name": evaluation.dataset_name,
+                        "status": evaluation.status
+                    })
+                    results["summary"]["successful"] += 1
+                    print(f"✅ {model} + {benchmark}: {evaluation.id}")
+                else:
+                    results["errors"].append({
+                        "model": model,
+                        "benchmark": benchmark,
+                        "error": "No evaluation returned"
+                    })
+                    results["summary"]["failed"] += 1
+                    print(f"❌ {model} + {benchmark}: Failed")
+                    
+            except atlas.APIError as e:
+                results["errors"].append({
+                    "model": model,
+                    "benchmark": benchmark,
+                    "error": str(e),
+                    "error_type": type(e).__name__
+                })
+                results["summary"]["failed"] += 1
+                print(f"❌ {model} + {benchmark}: {e}")
+        
+        return results
+
+# Usage
+factory = EvaluationFactory()
+
+# Run different strategies
+strategies = [
+    GeneralIntelligenceStrategy(),
+    CodeGenerationStrategy(),
+    MathReasoningStrategy()
+]
+
+all_results = []
+for strategy in strategies:
+    result = factory.execute_strategy(strategy)
+    all_results.append(result)
+    
+    print(f"\n📈 Strategy Results: {result['strategy']}")
+    print(f"   Successful: {result['summary']['successful']}")
+    print(f"   Failed: {result['summary']['failed']}")
+    print()
+
+# Overall summary
+total_evaluations = sum(r["summary"]["successful"] for r in all_results)
+total_errors = sum(r["summary"]["failed"] for r in all_results)
+
+print(f"🎯 Overall Summary:")
+print(f"   Total evaluations created: {total_evaluations}")
+print(f"   Total errors: {total_errors}")
+print(f"   Success rate: {total_evaluations/(total_evaluations+total_errors)*100:.1f}%")
+```
diff --git a/docs/examples/retrieving-results.md b/docs/examples/retrieving-results.md
new file mode 100644
index 0000000..9b4261a
--- /dev/null
+++ b/docs/examples/retrieving-results.md
@@ -0,0 +1,828 @@
+# Retrieving Results
+
+This guide provides practical examples for retrieving and analyzing evaluation results with the Atlas Python SDK.
+
+## Basic Result Retrieval
+
+### Simple Result Fetching
+
+```python
+from atlas import Atlas
+
+# Initialize client
+client = Atlas()
+
+# Get results for a specific evaluation
+evaluation_id = "eval_12345"  # Replace with your evaluation ID
+results = client.results.get(evaluation_id=evaluation_id)
+
+if results:
+    print(f"📊 Retrieved {len(results)} results")
+    
+    # Show first few results
+    for i, result in enumerate(results[:3]):
+        print(f"\nResult {i+1}:")
+        print(f"  Subset: {result.subset}")
+        print(f"  Prompt: {result.prompt[:100]}...")
+        print(f"  Model Response: {result.result[:100]}...")
+        print(f"  Expected: {result.truth}")
+        print(f"  Score: {result.score}")
+        print(f"  Duration: {result.duration}")
+else:
+    print("❌ No results found")
+```
+
+### Complete Evaluation Workflow
+
+```python
+from atlas import Atlas
+import time
+
+def complete_evaluation_workflow(model: str, benchmark: str):
+    """Complete workflow: create evaluation and retrieve results"""
+    client = Atlas()
+    
+    # Step 1: Create evaluation
+    print(f"🔄 Creating evaluation: {model} + {benchmark}")
+    evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+    
+    if not evaluation:
+        print("❌ Failed to create evaluation")
+        return None
+    
+    print(f"✅ Evaluation created: {evaluation.id}")
+    print(f"   Status: {evaluation.status}")
+    
+    # Step 2: Wait for completion (simplified polling)
+    # In production, use webhooks instead of polling
+    print("⏳ Waiting for evaluation to complete...")
+    
+    # Note: This is a simplified example. In practice, you'd:
+    # 1. Use webhooks for real-time updates
+    # 2. Store evaluation ID and check periodically
+    # 3. Handle various status states properly
+    
+    if evaluation.status == "completed":
+        print("🎉 Evaluation completed!")
+        
+        # Step 3: Retrieve results
+        results = client.results.get(evaluation_id=evaluation.id)
+        
+        if results:
+            print(f"📊 Retrieved {len(results)} detailed results")
+            
+            # Basic analysis
+            correct_answers = sum(1 for r in results if r.score > 0.5)
+            accuracy = correct_answers / len(results)
+            avg_duration = sum(r.duration for r in results) / len(results)
+            
+            print(f"📈 Quick Analysis:")
+            print(f"   Accuracy: {accuracy:.1%} ({correct_answers}/{len(results)})")
+            print(f"   Average Duration: {avg_duration}")
+            
+            return results
+        else:
+            print("❌ No results available")
+    else:
+        print(f"⏰ Evaluation status: {evaluation.status}")
+        print("   Check back later for results")
+    
+    return None
+
+# Usage
+results = complete_evaluation_workflow("gpt-4", "mmlu")
+```
+
+## Result Analysis Patterns
+
+### Performance Analysis
+
+```python
+from atlas import Atlas
+from collections import defaultdict, Counter
+import statistics
+from datetime import timedelta
+
+def analyze_evaluation_performance(evaluation_id: str):
+    """Comprehensive performance analysis of evaluation results"""
+    client = Atlas()
+    
+    results = client.results.get(evaluation_id=evaluation_id)
+    if not results:
+        print(f"❌ No results found for evaluation {evaluation_id}")
+        return None
+    
+    print(f"📊 Performance Analysis for {evaluation_id}")
+    print(f"{'='*60}")
+    
+    # Overall statistics
+    total_cases = len(results)
+    correct_answers = sum(1 for r in results if r.score > 0.5)
+    total_score = sum(r.score for r in results)
+    
+    accuracy = correct_answers / total_cases
+    avg_score = total_score / total_cases
+    
+    print(f"\n🎯 Overall Performance:")
+    print(f"   Total test cases: {total_cases:,}")
+    print(f"   Correct answers: {correct_answers:,}")
+    print(f"   Accuracy: {accuracy:.1%}")
+    print(f"   Average score: {avg_score:.3f}")
+    
+    # Timing analysis
+    durations = [r.duration for r in results]
+    avg_duration = sum(durations, timedelta()) / len(durations)
+    min_duration = min(durations)
+    max_duration = max(durations)
+    median_duration = statistics.median(durations)
+    
+    print(f"\n⏱️  Timing Analysis:")
+    print(f"   Average duration: {avg_duration}")
+    print(f"   Median duration: {median_duration}")
+    print(f"   Min duration: {min_duration}")
+    print(f"   Max duration: {max_duration}")
+    
+    # Score distribution
+    score_ranges = {
+        "Perfect (1.0)": 0,
+        "High (0.8-0.99)": 0,
+        "Medium (0.5-0.79)": 0,
+        "Low (0.1-0.49)": 0,
+        "Zero (0.0)": 0
+    }
+    
+    for result in results:
+        score = result.score
+        if score == 1.0:
+            score_ranges["Perfect (1.0)"] += 1
+        elif 0.8 <= score < 1.0:
+            score_ranges["High (0.8-0.99)"] += 1
+        elif 0.5 <= score < 0.8:
+            score_ranges["Medium (0.5-0.79)"] += 1
+        elif 0.1 <= score < 0.5:
+            score_ranges["Low (0.1-0.49)"] += 1
+        else:
+            score_ranges["Zero (0.0)"] += 1
+    
+    print(f"\n📈 Score Distribution:")
+    for range_name, count in score_ranges.items():
+        percentage = count / total_cases * 100
+        print(f"   {range_name}: {count:,} ({percentage:.1f}%)")
+    
+    # Subset analysis
+    subset_stats = defaultdict(lambda: {"scores": [], "durations": []})
+    
+    for result in results:
+        subset_stats[result.subset]["scores"].append(result.score)
+        subset_stats[result.subset]["durations"].append(result.duration)
+    
+    print(f"\n📋 Performance by Subset:")
+    print(f"{'Subset':<25} {'Cases':<8} {'Accuracy':<10} {'Avg Score':<10} {'Avg Duration':<12}")
+    print("-" * 75)
+    
+    for subset, data in sorted(subset_stats.items()):
+        case_count = len(data["scores"])
+        subset_accuracy = sum(1 for s in data["scores"] if s > 0.5) / case_count
+        subset_avg_score = sum(data["scores"]) / case_count
+        subset_avg_duration = sum(data["durations"], timedelta()) / case_count
+        
+        print(f"{subset:<25} {case_count:<8} {subset_accuracy:<10.1%} {subset_avg_score:<10.3f} {str(subset_avg_duration):<12}")
+    
+    return {
+        "total_cases": total_cases,
+        "accuracy": accuracy,
+        "avg_score": avg_score,
+        "avg_duration": avg_duration,
+        "score_distribution": score_ranges,
+        "subset_stats": dict(subset_stats)
+    }
+
+# Usage
+analysis = analyze_evaluation_performance("eval_12345")
+```
+
+### Comparative Analysis
+
+```python
+from atlas import Atlas
+from typing import List, Dict
+
+def compare_evaluation_results(evaluation_ids: List[str], labels: List[str] = None):
+    """Compare results across multiple evaluations"""
+    client = Atlas()
+    
+    if labels and len(labels) != len(evaluation_ids):
+        labels = [f"Eval {i+1}" for i in range(len(evaluation_ids))]
+    elif not labels:
+        labels = [f"Eval {i+1}" for i in range(len(evaluation_ids))]
+    
+    print(f"📊 Comparing {len(evaluation_ids)} evaluations")
+    print(f"{'='*80}")
+    
+    # Collect results for all evaluations
+    all_results = {}
+    for eval_id, label in zip(evaluation_ids, labels):
+        results = client.results.get(evaluation_id=eval_id)
+        if results:
+            all_results[label] = results
+            print(f"✅ Loaded {len(results)} results for {label}")
+        else:
+            print(f"❌ No results found for {label} ({eval_id})")
+    
+    if not all_results:
+        print("❌ No results to compare")
+        return
+    
+    print(f"\n📈 Comparative Analysis:")
+    print(f"{'Metric':<20} " + " ".join(f"{label:<15}" for label in labels))
+    print("-" * (20 + 15 * len(labels)))
+    
+    # Compare key metrics
+    metrics = {}
+    for label, results in all_results.items():
+        total_cases = len(results)
+        correct_answers = sum(1 for r in results if r.score > 0.5)
+        accuracy = correct_answers / total_cases
+        avg_score = sum(r.score for r in results) / total_cases
+        avg_duration = sum(r.duration for r in results) / len(results)
+        
+        metrics[label] = {
+            "total_cases": total_cases,
+            "accuracy": accuracy,
+            "avg_score": avg_score,
+            "avg_duration": avg_duration
+        }
+    
+    # Print comparison table
+    print(f"{'Total Cases':<20} " + " ".join(f"{metrics[label]['total_cases']:<15,}" for label in labels))
+    print(f"{'Accuracy':<20} " + " ".join(f"{metrics[label]['accuracy']:<15.1%}" for label in labels))
+    print(f"{'Average Score':<20} " + " ".join(f"{metrics[label]['avg_score']:<15.3f}" for label in labels))
+    print(f"{'Average Duration':<20} " + " ".join(f"{str(metrics[label]['avg_duration']):<15}" for label in labels))
+    
+    # Find best performing evaluation
+    best_accuracy = max(metrics.values(), key=lambda x: x["accuracy"])
+    best_speed = min(metrics.values(), key=lambda x: x["avg_duration"])
+    
+    best_accuracy_label = next(label for label, data in metrics.items() if data == best_accuracy)
+    best_speed_label = next(label for label, data in metrics.items() if data == best_speed)
+    
+    print(f"\n🏆 Winners:")
+    print(f"   Best Accuracy: {best_accuracy_label} ({best_accuracy['accuracy']:.1%})")
+    print(f"   Fastest: {best_speed_label} ({best_speed['avg_duration']})")
+    
+    # Subset-level comparison (if results have same subsets)
+    if len(all_results) >= 2:
+        first_subsets = set(r.subset for r in list(all_results.values())[0])
+        common_subsets = first_subsets
+        
+        for results in list(all_results.values())[1:]:
+            result_subsets = set(r.subset for r in results)
+            common_subsets = common_subsets.intersection(result_subsets)
+        
+        if common_subsets:
+            print(f"\n📋 Subset Comparison ({len(common_subsets)} common subsets):")
+            print(f"{'Subset':<25} " + " ".join(f"{label} Acc":<12 for label in labels))
+            print("-" * (25 + 12 * len(labels)))
+            
+            for subset in sorted(common_subsets):
+                subset_accuracies = []
+                for label, results in all_results.items():
+                    subset_results = [r for r in results if r.subset == subset]
+                    if subset_results:
+                        subset_accuracy = sum(1 for r in subset_results if r.score > 0.5) / len(subset_results)
+                        subset_accuracies.append(f"{subset_accuracy:.1%}")
+                    else:
+                        subset_accuracies.append("N/A")
+                
+                print(f"{subset:<25} " + " ".join(f"{acc:<12}" for acc in subset_accuracies))
+    
+    return metrics
+
+# Usage - compare GPT-4 vs Claude-3 on MMLU
+evaluation_ids = ["eval_gpt4_mmlu", "eval_claude3_mmlu", "eval_llama2_mmlu"]
+labels = ["GPT-4", "Claude-3", "Llama-2"]
+
+comparison = compare_evaluation_results(evaluation_ids, labels)
+```
+
+### Error Analysis
+
+```python
+from atlas import Atlas
+
+def analyze_failures(evaluation_id: str, error_threshold: float = 0.3):
+    """Analyze cases where the model performed poorly"""
+    client = Atlas()
+    
+    results = client.results.get(evaluation_id=evaluation_id)
+    if not results:
+        print(f"❌ No results found for evaluation {evaluation_id}")
+        return None
+    
+    # Find poor-performing cases
+    poor_results = [r for r in results if r.score < error_threshold]
+    good_results = [r for r in results if r.score >= error_threshold]
+    
+    print(f"🔍 Error Analysis for {evaluation_id}")
+    print(f"{'='*60}")
+    print(f"Total cases: {len(results)}")
+    print(f"Poor performance (< {error_threshold}): {len(poor_results)} ({len(poor_results)/len(results):.1%})")
+    print(f"Good performance (>= {error_threshold}): {len(good_results)} ({len(good_results)/len(results):.1%})")
+    
+    if not poor_results:
+        print("🎉 No poor-performing cases found!")
+        return {"poor_results": [], "analysis": "No errors to analyze"}
+    
+    # Analyze failure patterns by subset
+    failure_by_subset = {}
+    for result in poor_results:
+        if result.subset not in failure_by_subset:
+            failure_by_subset[result.subset] = []
+        failure_by_subset[result.subset].append(result)
+    
+    print(f"\n❌ Failure Distribution by Subset:")
+    for subset, failures in sorted(failure_by_subset.items(), key=lambda x: len(x[1]), reverse=True):
+        total_in_subset = len([r for r in results if r.subset == subset])
+        failure_rate = len(failures) / total_in_subset
+        print(f"   {subset}: {len(failures)}/{total_in_subset} failures ({failure_rate:.1%})")
+    
+    # Show worst-performing examples
+    worst_results = sorted(poor_results, key=lambda x: x.score)[:5]
+    
+    print(f"\n🔍 Worst Performing Examples:")
+    for i, result in enumerate(worst_results, 1):
+        print(f"\n   Example {i} [Score: {result.score:.3f}]")
+        print(f"   Subset: {result.subset}")
+        print(f"   Prompt: {result.prompt[:200]}...")
+        print(f"   Model Answer: {result.result[:100]}...")
+        print(f"   Expected: {result.truth[:100]}...")
+        print(f"   Duration: {result.duration}")
+        
+        if result.metrics:
+            print(f"   Additional Metrics: {result.metrics}")
+    
+    # Common failure patterns
+    print(f"\n🔍 Common Patterns in Failures:")
+    
+    # Analyze prompt lengths
+    poor_prompt_lengths = [len(r.prompt) for r in poor_results]
+    good_prompt_lengths = [len(r.prompt) for r in good_results]
+    
+    avg_poor_prompt_len = sum(poor_prompt_lengths) / len(poor_prompt_lengths)
+    avg_good_prompt_len = sum(good_prompt_lengths) / len(good_prompt_lengths)
+    
+    print(f"   Average prompt length in failures: {avg_poor_prompt_len:.0f} chars")
+    print(f"   Average prompt length in successes: {avg_good_prompt_len:.0f} chars")
+    
+    # Analyze response lengths
+    poor_response_lengths = [len(r.result) for r in poor_results]
+    good_response_lengths = [len(r.result) for r in good_results]
+    
+    avg_poor_response_len = sum(poor_response_lengths) / len(poor_response_lengths)
+    avg_good_response_len = sum(good_response_lengths) / len(good_response_lengths)
+    
+    print(f"   Average response length in failures: {avg_poor_response_len:.0f} chars")
+    print(f"   Average response length in successes: {avg_good_response_len:.0f} chars")
+    
+    # Analyze durations
+    avg_poor_duration = sum(r.duration for r in poor_results) / len(poor_results)
+    avg_good_duration = sum(r.duration for r in good_results) / len(good_results)
+    
+    print(f"   Average duration for failures: {avg_poor_duration}")
+    print(f"   Average duration for successes: {avg_good_duration}")
+    
+    return {
+        "poor_results": poor_results,
+        "failure_by_subset": failure_by_subset,
+        "worst_examples": worst_results,
+        "patterns": {
+            "avg_poor_prompt_len": avg_poor_prompt_len,
+            "avg_good_prompt_len": avg_good_prompt_len,
+            "avg_poor_response_len": avg_poor_response_len,
+            "avg_good_response_len": avg_good_response_len,
+            "avg_poor_duration": avg_poor_duration,
+            "avg_good_duration": avg_good_duration
+        }
+    }
+
+# Usage
+error_analysis = analyze_failures("eval_12345", error_threshold=0.5)
+```
+
+## Advanced Result Processing
+
+### Batch Processing Large Result Sets
+
+```python
+from atlas import Atlas
+from typing import Iterator, List
+import time
+
+def process_results_in_batches(evaluation_id: str, batch_size: int = 100, processor_func=None):
+    """Process large result sets in manageable batches"""
+    client = Atlas()
+    
+    results = client.results.get(evaluation_id=evaluation_id)
+    if not results:
+        print(f"❌ No results found for evaluation {evaluation_id}")
+        return None
+    
+    total_results = len(results)
+    print(f"📊 Processing {total_results:,} results in batches of {batch_size}")
+    
+    if not processor_func:
+        # Default processor: just count scores
+        def processor_func(batch):
+            return {
+                "count": len(batch),
+                "avg_score": sum(r.score for r in batch) / len(batch),
+                "correct": sum(1 for r in batch if r.score > 0.5)
+            }
+    
+    batch_results = []
+    
+    for i in range(0, total_results, batch_size):
+        batch = results[i:i + batch_size]
+        batch_num = i // batch_size + 1
+        total_batches = (total_results + batch_size - 1) // batch_size
+        
+        print(f"🔄 Processing batch {batch_num}/{total_batches} ({len(batch)} items)")
+        
+        start_time = time.time()
+        batch_result = processor_func(batch)
+        end_time = time.time()
+        
+        batch_result.update({
+            "batch_num": batch_num,
+            "processing_time": end_time - start_time,
+            "items_processed": len(batch)
+        })
+        
+        batch_results.append(batch_result)
+        
+        print(f"   ✅ Completed in {batch_result['processing_time']:.2f}s")
+        
+        # Small delay to prevent overwhelming the system
+        if batch_num < total_batches:
+            time.sleep(0.1)
+    
+    # Aggregate results
+    total_processing_time = sum(br["processing_time"] for br in batch_results)
+    total_correct = sum(br.get("correct", 0) for br in batch_results)
+    overall_accuracy = total_correct / total_results
+    
+    print(f"\n📈 Batch Processing Summary:")
+    print(f"   Total batches: {len(batch_results)}")
+    print(f"   Total processing time: {total_processing_time:.2f}s")
+    print(f"   Average time per batch: {total_processing_time/len(batch_results):.2f}s")
+    print(f"   Overall accuracy: {overall_accuracy:.1%}")
+    
+    return {
+        "batch_results": batch_results,
+        "summary": {
+            "total_items": total_results,
+            "total_batches": len(batch_results),
+            "total_processing_time": total_processing_time,
+            "overall_accuracy": overall_accuracy
+        }
+    }
+
+# Custom processor for subset analysis
+def subset_analyzer(batch):
+    """Custom processor that analyzes subsets in a batch"""
+    subset_stats = {}
+    
+    for result in batch:
+        if result.subset not in subset_stats:
+            subset_stats[result.subset] = {"count": 0, "total_score": 0, "correct": 0}
+        
+        subset_stats[result.subset]["count"] += 1
+        subset_stats[result.subset]["total_score"] += result.score
+        if result.score > 0.5:
+            subset_stats[result.subset]["correct"] += 1
+    
+    return {
+        "subset_stats": subset_stats,
+        "unique_subsets": len(subset_stats)
+    }
+
+# Usage
+batch_results = process_results_in_batches(
+    evaluation_id="eval_12345",
+    batch_size=50,
+    processor_func=subset_analyzer
+)
+```
+
+### Result Caching and Persistence
+
+```python
+import json
+import pickle
+from pathlib import Path
+from datetime import datetime
+from atlas import Atlas
+import atlas
+
+class ResultsCache:
+    """Cache evaluation results to avoid repeated API calls"""
+    
+    def __init__(self, cache_dir: str = "results_cache"):
+        self.cache_dir = Path(cache_dir)
+        self.cache_dir.mkdir(exist_ok=True)
+    
+    def _get_cache_path(self, evaluation_id: str, format: str = "json") -> Path:
+        """Get cache file path for an evaluation"""
+        return self.cache_dir / f"{evaluation_id}_results.{format}"
+    
+    def _get_metadata_path(self, evaluation_id: str) -> Path:
+        """Get metadata file path for an evaluation"""
+        return self.cache_dir / f"{evaluation_id}_metadata.json"
+    
+    def is_cached(self, evaluation_id: str) -> bool:
+        """Check if results are already cached"""
+        return self._get_cache_path(evaluation_id).exists()
+    
+    def save_results(self, evaluation_id: str, results: list, metadata: dict = None):
+        """Save results to cache"""
+        try:
+            # Save as JSON (human-readable)
+            json_path = self._get_cache_path(evaluation_id, "json")
+            with open(json_path, 'w') as f:
+                # Convert results to serializable format
+                serializable_results = []
+                for result in results:
+                    result_dict = {
+                        "subset": result.subset,
+                        "prompt": result.prompt,
+                        "result": result.result,
+                        "truth": result.truth,
+                        "score": result.score,
+                        "duration": str(result.duration),  # Convert timedelta to string
+                        "metrics": result.metrics
+                    }
+                    serializable_results.append(result_dict)
+                
+                json.dump(serializable_results, f, indent=2, ensure_ascii=False)
+            
+            # Save as pickle (preserves exact object types)
+            pickle_path = self._get_cache_path(evaluation_id, "pkl")
+            with open(pickle_path, 'wb') as f:
+                pickle.dump(results, f)
+            
+            # Save metadata
+            if not metadata:
+                metadata = {}
+            
+            metadata.update({
+                "evaluation_id": evaluation_id,
+                "cached_at": datetime.now().isoformat(),
+                "result_count": len(results),
+                "cache_format": "both"
+            })
+            
+            metadata_path = self._get_metadata_path(evaluation_id)
+            with open(metadata_path, 'w') as f:
+                json.dump(metadata, f, indent=2)
+            
+            print(f"💾 Cached {len(results)} results for {evaluation_id}")
+            
+        except Exception as e:
+            print(f"❌ Error caching results: {e}")
+    
+    def load_results(self, evaluation_id: str, format: str = "pickle"):
+        """Load results from cache"""
+        try:
+            if format == "pickle":
+                cache_path = self._get_cache_path(evaluation_id, "pkl")
+                with open(cache_path, 'rb') as f:
+                    results = pickle.load(f)
+            else:
+                cache_path = self._get_cache_path(evaluation_id, "json")
+                with open(cache_path, 'r') as f:
+                    results = json.load(f)
+            
+            print(f"💾 Loaded {len(results)} results from cache for {evaluation_id}")
+            return results
+            
+        except Exception as e:
+            print(f"❌ Error loading cached results: {e}")
+            return None
+    
+    def get_metadata(self, evaluation_id: str):
+        """Get cached metadata"""
+        try:
+            metadata_path = self._get_metadata_path(evaluation_id)
+            with open(metadata_path, 'r') as f:
+                return json.load(f)
+        except Exception as e:
+            print(f"❌ Error loading metadata: {e}")
+            return None
+
+def get_results_with_cache(evaluation_id: str, cache: ResultsCache = None, force_refresh: bool = False):
+    """Get results with automatic caching"""
+    if not cache:
+        cache = ResultsCache()
+    
+    # Check cache first (unless force refresh)
+    if not force_refresh and cache.is_cached(evaluation_id):
+        print(f"📂 Loading results from cache...")
+        cached_results = cache.load_results(evaluation_id)
+        
+        if cached_results:
+            metadata = cache.get_metadata(evaluation_id)
+            if metadata:
+                cached_at = metadata.get("cached_at", "unknown")
+                print(f"📅 Cached at: {cached_at}")
+            return cached_results
+    
+    # Fetch from API
+    print(f"🌐 Fetching fresh results from API...")
+    client = Atlas()
+    
+    try:
+        results = client.results.get(evaluation_id=evaluation_id)
+        
+        if results:
+            # Cache the results
+            cache.save_results(evaluation_id, results)
+            return results
+        else:
+            print(f"❌ No results found for evaluation {evaluation_id}")
+            return None
+            
+    except atlas.APIError as e:
+        print(f"❌ Error fetching results: {e}")
+        
+        # Try to return cached results as fallback
+        if cache.is_cached(evaluation_id):
+            print(f"🔄 Falling back to cached results...")
+            return cache.load_results(evaluation_id)
+        
+        return None
+
+# Usage examples
+cache = ResultsCache("./my_results_cache")
+
+# First call - fetches from API and caches
+results1 = get_results_with_cache("eval_12345", cache)
+
+# Second call - loads from cache
+results2 = get_results_with_cache("eval_12345", cache)
+
+# Force refresh from API
+results3 = get_results_with_cache("eval_12345", cache, force_refresh=True)
+
+# Batch cache multiple evaluations
+evaluation_ids = ["eval_001", "eval_002", "eval_003"]
+
+for eval_id in evaluation_ids:
+    results = get_results_with_cache(eval_id, cache)
+    if results:
+        print(f"✅ {eval_id}: {len(results)} results cached")
+
+print(f"\n📁 Cache contents:")
+for cache_file in cache.cache_dir.glob("*.json"):
+    if cache_file.name.endswith("_metadata.json"):
+        continue
+    evaluation_id = cache_file.stem.replace("_results", "")
+    metadata = cache.get_metadata(evaluation_id)
+    if metadata:
+        count = metadata.get("result_count", "unknown")
+        cached_at = metadata.get("cached_at", "unknown")
+        print(f"   {evaluation_id}: {count} results (cached: {cached_at})")
+```
+
+### Export and Reporting
+
+```python
+import csv
+from pathlib import Path
+from datetime import datetime
+from atlas import Atlas
+
+def export_results_to_csv(evaluation_id: str, output_path: str = None):
+    """Export evaluation results to CSV format"""
+    client = Atlas()
+    
+    results = client.results.get(evaluation_id=evaluation_id)
+    if not results:
+        print(f"❌ No results found for evaluation {evaluation_id}")
+        return None
+    
+    if not output_path:
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        output_path = f"results_{evaluation_id}_{timestamp}.csv"
+    
+    try:
+        with open(output_path, 'w', newline='', encoding='utf-8') as csvfile:
+            fieldnames = [
+                'subset', 'prompt', 'model_response', 'expected_answer', 
+                'score', 'duration_ms', 'prompt_length', 'response_length'
+            ]
+            
+            # Add metric columns if they exist
+            if results and results[0].metrics:
+                metric_keys = list(results[0].metrics.keys())
+                fieldnames.extend([f"metric_{key}" for key in metric_keys])
+            
+            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
+            writer.writeheader()
+            
+            for result in results:
+                row = {
+                    'subset': result.subset,
+                    'prompt': result.prompt,
+                    'model_response': result.result,
+                    'expected_answer': result.truth,
+                    'score': result.score,
+                    'duration_ms': int(result.duration.total_seconds() * 1000),
+                    'prompt_length': len(result.prompt),
+                    'response_length': len(result.result)
+                }
+                
+                # Add metrics if present
+                if result.metrics:
+                    for key, value in result.metrics.items():
+                        row[f"metric_{key}"] = value
+                
+                writer.writerow(row)
+        
+        print(f"📄 Exported {len(results)} results to {output_path}")
+        return output_path
+        
+    except Exception as e:
+        print(f"❌ Error exporting to CSV: {e}")
+        return None
+
+def generate_summary_report(evaluation_ids: list, output_path: str = None):
+    """Generate a summary report comparing multiple evaluations"""
+    client = Atlas()
+    
+    if not output_path:
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        output_path = f"evaluation_summary_{timestamp}.txt"
+    
+    with open(output_path, 'w') as f:
+        f.write("ATLAS EVALUATION SUMMARY REPORT\n")
+        f.write("=" * 50 + "\n")
+        f.write(f"Generated: {datetime.now().isoformat()}\n")
+        f.write(f"Evaluations analyzed: {len(evaluation_ids)}\n\n")
+        
+        for i, eval_id in enumerate(evaluation_ids, 1):
+            f.write(f"EVALUATION {i}: {eval_id}\n")
+            f.write("-" * 30 + "\n")
+            
+            results = client.results.get(evaluation_id=eval_id)
+            
+            if not results:
+                f.write("❌ No results found\n\n")
+                continue
+            
+            # Calculate statistics
+            total_cases = len(results)
+            correct_answers = sum(1 for r in results if r.score > 0.5)
+            accuracy = correct_answers / total_cases
+            avg_score = sum(r.score for r in results) / total_cases
+            avg_duration = sum(r.duration for r in results) / len(results)
+            
+            # Write statistics
+            f.write(f"Total test cases: {total_cases:,}\n")
+            f.write(f"Correct answers: {correct_answers:,}\n")
+            f.write(f"Accuracy: {accuracy:.1%}\n")
+            f.write(f"Average score: {avg_score:.3f}\n")
+            f.write(f"Average duration: {avg_duration}\n")
+            
+            # Subset breakdown
+            subset_stats = {}
+            for result in results:
+                if result.subset not in subset_stats:
+                    subset_stats[result.subset] = []
+                subset_stats[result.subset].append(result.score)
+            
+            f.write(f"\nSubset Performance:\n")
+            for subset, scores in sorted(subset_stats.items()):
+                subset_accuracy = sum(1 for s in scores if s > 0.5) / len(scores)
+                subset_avg = sum(scores) / len(scores)
+                f.write(f"  {subset}: {subset_accuracy:.1%} accuracy, {subset_avg:.3f} avg score ({len(scores)} cases)\n")
+            
+            f.write("\n")
+        
+        f.write("END OF REPORT\n")
+    
+    print(f"📊 Summary report generated: {output_path}")
+    return output_path
+
+# Usage examples
+
+# Export single evaluation to CSV
+csv_path = export_results_to_csv("eval_12345")
+
+# Generate summary report for multiple evaluations
+evaluation_list = ["eval_gpt4_mmlu", "eval_claude3_mmlu", "eval_llama2_mmlu"]
+report_path = generate_summary_report(evaluation_list)
+
+print(f"Files generated:")
+print(f"  CSV Export: {csv_path}")
+print(f"  Summary Report: {report_path}")
+```
diff --git a/docs/examples/timeouts.md b/docs/examples/timeouts.md
new file mode 100644
index 0000000..c635dc0
--- /dev/null
+++ b/docs/examples/timeouts.md
@@ -0,0 +1,715 @@
+# Working with Timeouts
+
+This guide provides practical examples for configuring and handling timeouts effectively with the Atlas Python SDK.
+
+## Understanding Timeouts
+
+Timeouts in the Atlas SDK control how long to wait for API responses. Different operations may require different timeout configurations based on their expected duration and criticality.
+
+## Basic Timeout Configuration
+
+### Simple Timeout
+
+```python
+from atlas import Atlas
+
+# Set a 2-minute timeout for all requests
+client = Atlas(timeout=120.0)
+
+# Create evaluation with 2-minute timeout
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"
+)
+```
+
+### Default Timeout Behavior
+
+```python
+from atlas import Atlas
+
+# Uses default timeout (10 minutes)
+client = Atlas()
+
+print(f"Default timeout: {client.timeout}")  # Should show 10 minutes in seconds
+```
+
+## Advanced Timeout Configuration
+
+### Granular Timeout Control
+
+```python
+import httpx
+from atlas import Atlas
+
+# Configure different timeouts for different operations
+client = Atlas(
+    timeout=httpx.Timeout(
+        connect=10.0,   # 10 seconds to establish connection
+        read=300.0,     # 5 minutes to read response
+        write=30.0,     # 30 seconds to send request
+        pool=60.0       # 1 minute for connection pool operations
+    )
+)
+
+evaluation = client.evaluations.create(
+    model="gpt-4",
+    benchmark="mmlu"
+)
+```
+
+### Per-Request Timeout Override
+
+```python
+from atlas import Atlas
+
+# Client with default 1-minute timeout
+client = Atlas(timeout=60.0)
+
+# Override timeout for specific operations
+try:
+    # Quick operation with short timeout
+    quick_eval = client.with_options(timeout=30.0).evaluations.create(
+        model="gpt-3.5-turbo",
+        benchmark="arc-easy"
+    )
+    
+    # Long operation with extended timeout
+    complex_eval = client.with_options(timeout=600.0).evaluations.create(
+        model="gpt-4",
+        benchmark="math"  # Complex benchmark
+    )
+    
+    # Results retrieval with medium timeout
+    results = client.with_options(timeout=120.0).results.get(
+        evaluation_id=quick_eval.id
+    )
+    
+except Exception as e:
+    print(f"Operation failed: {e}")
+```
+
+## Timeout Strategies by Use Case
+
+### Development and Testing
+
+```python
+from atlas import Atlas
+import atlas
+
+def development_client():
+    """Client optimized for development with shorter timeouts"""
+    return Atlas(
+        timeout=30.0  # 30 seconds - fail fast during development
+    )
+
+def test_api_connectivity():
+    """Quick connectivity test with very short timeout"""
+    client = development_client()
+    
+    try:
+        # Use simple, fast operation to test connectivity
+        evaluation = client.with_options(timeout=10.0).evaluations.create(
+            model="gpt-3.5-turbo",  # Usually faster
+            benchmark="arc-easy"    # Smaller benchmark
+        )
+        
+        if evaluation:
+            print("✅ API connectivity confirmed")
+            return True
+        else:
+            print("❌ API returned no evaluation")
+            return False
+            
+    except atlas.APITimeoutError:
+        print("❌ API timeout - connectivity issues or server overload")
+        return False
+    except atlas.APIConnectionError:
+        print("❌ Connection failed - check network")
+        return False
+    except atlas.APIError as e:
+        print(f"❌ API error: {e}")
+        return False
+
+# Usage
+if test_api_connectivity():
+    print("Proceeding with full evaluation...")
+else:
+    print("Fix connectivity issues before continuing")
+```
+
+### Production Workloads
+
+```python
+import httpx
+from atlas import Atlas
+import atlas
+
+def production_client():
+    """Client optimized for production workloads"""
+    return Atlas(
+        timeout=httpx.Timeout(
+            connect=30.0,    # 30s to connect (allows for network delays)
+            read=1800.0,     # 30 minutes for complex evaluations
+            write=60.0,      # 1 minute to send large requests
+            pool=120.0       # 2 minutes for connection pool
+        )
+    )
+
+def robust_evaluation_creation(model: str, benchmark: str, max_retries: int = 3):
+    """Production-ready evaluation creation with timeout handling"""
+    client = production_client()
+    
+    for attempt in range(max_retries):
+        try:
+            print(f"🔄 Attempt {attempt + 1}/{max_retries}: Creating evaluation...")
+            
+            evaluation = client.evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                print(f"✅ Success: {evaluation.id}")
+                return evaluation
+            else:
+                print("❌ No evaluation returned")
+                
+        except atlas.APITimeoutError:
+            print(f"⏰ Timeout on attempt {attempt + 1}")
+            if attempt < max_retries - 1:
+                # Increase timeout for retry
+                retry_timeout = 1800.0 + (attempt * 600.0)  # Add 10 minutes per retry
+                print(f"🔄 Retrying with extended timeout: {retry_timeout/60:.0f} minutes")
+                
+                try:
+                    evaluation = client.with_options(timeout=retry_timeout).evaluations.create(
+                        model=model,
+                        benchmark=benchmark
+                    )
+                    if evaluation:
+                        print(f"✅ Success on retry: {evaluation.id}")
+                        return evaluation
+                except atlas.APITimeoutError:
+                    print(f"⏰ Extended timeout also failed")
+                    continue
+            else:
+                print("❌ All timeout retry attempts failed")
+                
+        except atlas.APIError as e:
+            print(f"❌ API error: {e}")
+            break  # Don't retry API errors
+    
+    return None
+
+# Usage
+evaluation = robust_evaluation_creation("gpt-4", "mmlu")
+```
+
+### Batch Operations
+
+```python
+from atlas import Atlas
+import atlas
+import time
+
+def batch_evaluations_with_adaptive_timeout(model_benchmark_pairs: list):
+    """Create multiple evaluations with adaptive timeout strategy"""
+    client = Atlas(timeout=120.0)  # Start with 2-minute timeout
+    
+    results = []
+    consecutive_timeouts = 0
+    current_timeout = 120.0
+    
+    for i, (model, benchmark) in enumerate(model_benchmark_pairs, 1):
+        print(f"\n[{i}/{len(model_benchmark_pairs)}] {model} + {benchmark}")
+        print(f"Current timeout: {current_timeout/60:.1f} minutes")
+        
+        try:
+            evaluation = client.with_options(timeout=current_timeout).evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                results.append({
+                    "model": model,
+                    "benchmark": benchmark,
+                    "evaluation_id": evaluation.id,
+                    "success": True,
+                    "timeout_used": current_timeout
+                })
+                print(f"✅ Success: {evaluation.id}")
+                
+                # Reset timeout on success
+                consecutive_timeouts = 0
+                current_timeout = max(120.0, current_timeout * 0.9)  # Slightly reduce timeout
+            else:
+                results.append({
+                    "model": model,
+                    "benchmark": benchmark,
+                    "success": False,
+                    "error": "no_evaluation_returned"
+                })
+                
+        except atlas.APITimeoutError:
+            print(f"⏰ Timeout after {current_timeout/60:.1f} minutes")
+            consecutive_timeouts += 1
+            
+            results.append({
+                "model": model,
+                "benchmark": benchmark,
+                "success": False,
+                "error": "timeout",
+                "timeout_used": current_timeout
+            })
+            
+            # Increase timeout after consecutive timeouts
+            if consecutive_timeouts >= 2:
+                current_timeout = min(3600.0, current_timeout * 1.5)  # Max 1 hour
+                print(f"🔄 Increased timeout to {current_timeout/60:.1f} minutes")
+                consecutive_timeouts = 0  # Reset counter after adjustment
+        
+        except atlas.APIError as e:
+            print(f"❌ API error: {e}")
+            results.append({
+                "model": model,
+                "benchmark": benchmark,
+                "success": False,
+                "error": str(e)
+            })
+        
+        # Brief pause between requests
+        time.sleep(1.0)
+    
+    # Summary
+    successful = [r for r in results if r["success"]]
+    timeouts = [r for r in results if r.get("error") == "timeout"]
+    
+    print(f"\n📊 Batch Summary:")
+    print(f"   Total requests: {len(results)}")
+    print(f"   Successful: {len(successful)}")
+    print(f"   Timeouts: {len(timeouts)}")
+    print(f"   Other errors: {len(results) - len(successful) - len(timeouts)}")
+    
+    return results
+
+# Usage
+pairs = [
+    ("gpt-4", "mmlu"),
+    ("claude-3-opus", "hellaswag"),
+    ("llama-2-70b", "arc-challenge"),
+    ("gpt-3.5-turbo", "gsm8k"),
+]
+
+batch_results = batch_evaluations_with_adaptive_timeout(pairs)
+```
+
+## Error Handling and Recovery
+
+### Timeout-Specific Error Handling
+
+```python
+import atlas
+from atlas import Atlas
+import time
+
+def handle_timeout_gracefully(operation_func, *args, **kwargs):
+    """Generic timeout handler for any Atlas operation"""
+    max_retries = 3
+    base_timeout = 60.0
+    
+    for attempt in range(max_retries):
+        # Calculate timeout for this attempt
+        attempt_timeout = base_timeout * (2 ** attempt)  # Exponential increase
+        
+        print(f"🔄 Attempt {attempt + 1}/{max_retries} (timeout: {attempt_timeout/60:.1f}min)")
+        
+        try:
+            result = operation_func(timeout=attempt_timeout, *args, **kwargs)
+            print(f"✅ Operation succeeded on attempt {attempt + 1}")
+            return result
+            
+        except atlas.APITimeoutError:
+            print(f"⏰ Timeout on attempt {attempt + 1}")
+            
+            if attempt == max_retries - 1:
+                print("❌ All retry attempts exhausted")
+                raise
+            else:
+                wait_time = 5 * (attempt + 1)  # Progressive wait
+                print(f"⏳ Waiting {wait_time}s before retry...")
+                time.sleep(wait_time)
+        
+        except atlas.APIError as e:
+            print(f"❌ Non-timeout error: {e}")
+            raise  # Don't retry non-timeout errors
+
+def create_evaluation_with_timeout_handling(model: str, benchmark: str):
+    """Wrapper function for evaluation creation"""
+    def operation_func(timeout, *args, **kwargs):
+        client = Atlas(timeout=timeout)
+        return client.evaluations.create(model=model, benchmark=benchmark)
+    
+    return handle_timeout_gracefully(operation_func)
+
+def get_results_with_timeout_handling(evaluation_id: str):
+    """Wrapper function for results retrieval"""
+    def operation_func(timeout, *args, **kwargs):
+        client = Atlas(timeout=timeout)
+        return client.results.get(evaluation_id=evaluation_id)
+    
+    return handle_timeout_gracefully(operation_func)
+
+# Usage
+try:
+    evaluation = create_evaluation_with_timeout_handling("gpt-4", "mmlu")
+    if evaluation:
+        results = get_results_with_timeout_handling(evaluation.id)
+        print(f"📊 Retrieved {len(results) if results else 0} results")
+        
+except atlas.APITimeoutError:
+    print("❌ Operation failed due to persistent timeouts")
+except atlas.APIError as e:
+    print(f"❌ Operation failed: {e}")
+```
+
+### Circuit Breaker Pattern
+
+```python
+import time
+from enum import Enum
+import atlas
+from atlas import Atlas
+
+class CircuitState(Enum):
+    CLOSED = "closed"      # Normal operation
+    OPEN = "open"          # Failing, don't try
+    HALF_OPEN = "half_open" # Testing if recovered
+
+class TimeoutCircuitBreaker:
+    """Circuit breaker specifically for timeout management"""
+    
+    def __init__(self, 
+                 failure_threshold: int = 5,
+                 timeout_threshold: float = 300.0,  # 5 minutes
+                 recovery_timeout: int = 60):       # 1 minute
+        self.failure_threshold = failure_threshold
+        self.timeout_threshold = timeout_threshold
+        self.recovery_timeout = recovery_timeout
+        
+        self.failure_count = 0
+        self.last_failure_time = None
+        self.state = CircuitState.CLOSED
+        self.current_timeout = 120.0  # Start with 2 minutes
+    
+    def call(self, func, *args, **kwargs):
+        """Execute function with circuit breaker protection"""
+        if self.state == CircuitState.OPEN:
+            if (time.time() - self.last_failure_time) < self.recovery_timeout:
+                raise atlas.APIConnectionError(
+                    message="Circuit breaker is OPEN - too many recent timeouts"
+                )
+            else:
+                self.state = CircuitState.HALF_OPEN
+                print("🔄 Circuit breaker transitioning to HALF_OPEN")
+        
+        try:
+            # Use adaptive timeout
+            if 'timeout' not in kwargs:
+                kwargs['timeout'] = self.current_timeout
+            
+            print(f"🔄 Calling function with {self.current_timeout/60:.1f}min timeout")
+            result = func(*args, **kwargs)
+            
+            # Success - reset circuit breaker
+            self.on_success()
+            return result
+            
+        except atlas.APITimeoutError as e:
+            self.on_timeout_failure()
+            raise
+        except atlas.APIError as e:
+            # Non-timeout API errors don't affect circuit state
+            raise
+    
+    def on_success(self):
+        """Handle successful operation"""
+        print("✅ Circuit breaker: Operation succeeded")
+        self.failure_count = 0
+        self.state = CircuitState.CLOSED
+        
+        # Gradually reduce timeout on success
+        self.current_timeout = max(60.0, self.current_timeout * 0.95)
+    
+    def on_timeout_failure(self):
+        """Handle timeout failure"""
+        self.failure_count += 1
+        self.last_failure_time = time.time()
+        
+        print(f"⏰ Circuit breaker: Timeout failure {self.failure_count}/{self.failure_threshold}")
+        
+        # Increase timeout for next attempt
+        self.current_timeout = min(self.timeout_threshold, self.current_timeout * 1.5)
+        
+        if self.failure_count >= self.failure_threshold:
+            self.state = CircuitState.OPEN
+            print("🔴 Circuit breaker: OPEN - too many consecutive timeouts")
+
+# Usage with circuit breaker
+def protected_atlas_operations():
+    """Example of using circuit breaker with Atlas operations"""
+    breaker = TimeoutCircuitBreaker(
+        failure_threshold=3,
+        timeout_threshold=600.0,  # Max 10 minutes
+        recovery_timeout=120      # 2 minute recovery time
+    )
+    
+    def create_evaluation_protected(model: str, benchmark: str):
+        def operation(timeout):
+            client = Atlas(timeout=timeout)
+            return client.evaluations.create(model=model, benchmark=benchmark)
+        return breaker.call(operation)
+    
+    def get_results_protected(evaluation_id: str):
+        def operation(timeout):
+            client = Atlas(timeout=timeout)
+            return client.results.get(evaluation_id=evaluation_id)
+        return breaker.call(operation)
+    
+    # Test with multiple operations
+    operations = [
+        ("gpt-4", "mmlu"),
+        ("claude-3-opus", "hellaswag"),
+        ("llama-2-70b", "gsm8k"),
+    ]
+    
+    successful_evaluations = []
+    
+    for model, benchmark in operations:
+        try:
+            print(f"\n🔄 Creating evaluation: {model} + {benchmark}")
+            evaluation = create_evaluation_protected(model, benchmark)
+            
+            if evaluation:
+                successful_evaluations.append(evaluation)
+                print(f"✅ Success: {evaluation.id}")
+                
+                # Try to get results
+                print(f"🔄 Getting results for {evaluation.id}")
+                results = get_results_protected(evaluation.id)
+                
+                if results:
+                    print(f"📊 Retrieved {len(results)} results")
+                
+        except atlas.APIConnectionError as e:
+            if "Circuit breaker is OPEN" in str(e):
+                print("🔴 Circuit breaker prevented operation")
+                print(f"⏳ Waiting {breaker.recovery_timeout}s for recovery...")
+                time.sleep(breaker.recovery_timeout)
+            else:
+                print(f"❌ Connection error: {e}")
+                
+        except atlas.APITimeoutError:
+            print("⏰ Timeout occurred - circuit breaker updated")
+            
+        except atlas.APIError as e:
+            print(f"❌ API error: {e}")
+    
+    print(f"\n📈 Final Results:")
+    print(f"   Circuit state: {breaker.state.value}")
+    print(f"   Current timeout: {breaker.current_timeout/60:.1f} minutes")
+    print(f"   Successful evaluations: {len(successful_evaluations)}")
+    
+    return successful_evaluations
+
+# Run protected operations
+results = protected_atlas_operations()
+```
+
+## Monitoring and Metrics
+
+### Timeout Performance Tracking
+
+```python
+import time
+from dataclasses import dataclass
+from typing import List, Optional
+from atlas import Atlas
+import atlas
+
+@dataclass
+class TimeoutMetrics:
+    operation: str
+    model: str
+    benchmark: str
+    timeout_set: float
+    actual_duration: float
+    success: bool
+    error_type: Optional[str] = None
+    timestamp: float = None
+    
+    def __post_init__(self):
+        if self.timestamp is None:
+            self.timestamp = time.time()
+
+class TimeoutMonitor:
+    """Monitor and analyze timeout patterns"""
+    
+    def __init__(self):
+        self.metrics: List[TimeoutMetrics] = []
+    
+    def record_operation(self, operation: str, model: str, benchmark: str, 
+                        timeout_set: float, start_time: float, success: bool, 
+                        error_type: str = None):
+        """Record an operation's timeout metrics"""
+        actual_duration = time.time() - start_time
+        
+        metric = TimeoutMetrics(
+            operation=operation,
+            model=model,
+            benchmark=benchmark,
+            timeout_set=timeout_set,
+            actual_duration=actual_duration,
+            success=success,
+            error_type=error_type
+        )
+        
+        self.metrics.append(metric)
+        
+        print(f"📊 Recorded: {operation} took {actual_duration:.1f}s (timeout: {timeout_set:.1f}s)")
+    
+    def get_timeout_efficiency(self) -> dict:
+        """Analyze timeout efficiency"""
+        if not self.metrics:
+            return {}
+        
+        successful_ops = [m for m in self.metrics if m.success]
+        timeout_ops = [m for m in self.metrics if m.error_type == "timeout"]
+        
+        analysis = {
+            "total_operations": len(self.metrics),
+            "successful_operations": len(successful_ops),
+            "timeout_operations": len(timeout_ops),
+            "success_rate": len(successful_ops) / len(self.metrics),
+            "timeout_rate": len(timeout_ops) / len(self.metrics),
+        }
+        
+        if successful_ops:
+            avg_success_duration = sum(m.actual_duration for m in successful_ops) / len(successful_ops)
+            avg_success_timeout = sum(m.timeout_set for m in successful_ops) / len(successful_ops)
+            
+            analysis.update({
+                "avg_success_duration": avg_success_duration,
+                "avg_success_timeout_set": avg_success_timeout,
+                "timeout_efficiency": avg_success_duration / avg_success_timeout if avg_success_timeout > 0 else 0
+            })
+        
+        return analysis
+    
+    def suggest_optimal_timeouts(self) -> dict:
+        """Suggest optimal timeouts based on historical data"""
+        if not self.metrics:
+            return {"message": "No data available"}
+        
+        # Group by operation type
+        by_operation = {}
+        for metric in self.metrics:
+            if metric.success:  # Only use successful operations
+                key = (metric.operation, metric.model, metric.benchmark)
+                if key not in by_operation:
+                    by_operation[key] = []
+                by_operation[key].append(metric.actual_duration)
+        
+        suggestions = {}
+        for (operation, model, benchmark), durations in by_operation.items():
+            # Suggest timeout as 95th percentile + 50% buffer
+            durations.sort()
+            p95_index = int(len(durations) * 0.95)
+            p95_duration = durations[p95_index] if p95_index < len(durations) else durations[-1]
+            suggested_timeout = p95_duration * 1.5  # 50% buffer
+            
+            suggestions[f"{operation}_{model}_{benchmark}"] = {
+                "suggested_timeout": suggested_timeout,
+                "based_on_operations": len(durations),
+                "p95_actual_duration": p95_duration
+            }
+        
+        return suggestions
+
+def monitored_atlas_operations():
+    """Example of Atlas operations with timeout monitoring"""
+    monitor = TimeoutMonitor()
+    client = Atlas()
+    
+    test_operations = [
+        ("gpt-3.5-turbo", "arc-easy", 60.0),    # Should be fast
+        ("gpt-4", "mmlu", 180.0),               # Medium complexity
+        ("claude-3-opus", "math", 600.0),       # Complex, longer timeout
+    ]
+    
+    for model, benchmark, timeout in test_operations:
+        print(f"\n🔄 Testing {model} + {benchmark} (timeout: {timeout/60:.1f}min)")
+        
+        # Evaluation creation
+        start_time = time.time()
+        try:
+            evaluation = client.with_options(timeout=timeout).evaluations.create(
+                model=model,
+                benchmark=benchmark
+            )
+            
+            if evaluation:
+                monitor.record_operation("create_evaluation", model, benchmark, 
+                                       timeout, start_time, True)
+                
+                # Results retrieval
+                start_time = time.time()
+                try:
+                    results = client.with_options(timeout=timeout).results.get(
+                        evaluation_id=evaluation.id
+                    )
+                    
+                    success = results is not None
+                    monitor.record_operation("get_results", model, benchmark,
+                                           timeout, start_time, success,
+                                           None if success else "no_results")
+                    
+                except atlas.APITimeoutError:
+                    monitor.record_operation("get_results", model, benchmark,
+                                           timeout, start_time, False, "timeout")
+                except atlas.APIError as e:
+                    monitor.record_operation("get_results", model, benchmark,
+                                           timeout, start_time, False, str(e))
+            else:
+                monitor.record_operation("create_evaluation", model, benchmark,
+                                       timeout, start_time, False, "no_evaluation")
+                
+        except atlas.APITimeoutError:
+            monitor.record_operation("create_evaluation", model, benchmark,
+                                   timeout, start_time, False, "timeout")
+        except atlas.APIError as e:
+            monitor.record_operation("create_evaluation", model, benchmark,
+                                   timeout, start_time, False, str(e))
+    
+    # Analyze results
+    print(f"\n📊 Timeout Analysis:")
+    efficiency = monitor.get_timeout_efficiency()
+    
+    for key, value in efficiency.items():
+        if isinstance(value, float):
+            print(f"   {key}: {value:.2f}")
+        else:
+            print(f"   {key}: {value}")
+    
+    print(f"\n💡 Timeout Suggestions:")
+    suggestions = monitor.suggest_optimal_timeouts()
+    for operation, suggestion in suggestions.items():
+        print(f"   {operation}:")
+        print(f"     Suggested timeout: {suggestion['suggested_timeout']:.0f}s")
+        print(f"     Based on {suggestion['based_on_operations']} successful operations")
+        print(f"     95th percentile duration: {suggestion['p95_actual_duration']:.1f}s")
+
+# Run monitoring example
+monitored_atlas_operations()
+```
diff --git a/docs/getting-started/authentication.md b/docs/getting-started/authentication.md
new file mode 100644
index 0000000..5658bb4
--- /dev/null
+++ b/docs/getting-started/authentication.md
@@ -0,0 +1,169 @@
+# Authentication & Configuration
+
+The Atlas Python SDK uses API key authentication to securely access the LayerLens Atlas API. This guide covers how to set up authentication and configure your client.
+
+## Required Credentials
+
+You need three pieces of information to use the Atlas SDK:
+
+1. **API Key** - Your secret API key for authentication
+2. **Organization ID** - Your organization identifier 
+3. **Project ID** - The project you want to work with
+
+## Getting Your Credentials
+
+1. **Log in to LayerLens Atlas**: Visit the LayerLens Atlas dashboard
+2. **Navigate to Settings**: Go to your account or organization settings
+3. **Generate API Key**: Create a new API key if you don't have one
+4. **Copy IDs**: Note your Organization ID and Project ID from the dashboard
+
+## Environment Variables (Recommended)
+
+The most secure way to configure authentication is using environment variables:
+
+### Setting Environment Variables
+
+**Linux/macOS:**
+```bash
+export LAYERLENS_ATLAS_API_KEY="your_api_key_here"
+export LAYERLENS_ATLAS_ORG_ID="your_org_id_here"
+export LAYERLENS_ATLAS_PROJECT_ID="your_project_id_here"
+```
+
+**Windows (Command Prompt):**
+```cmd
+set LAYERLENS_ATLAS_API_KEY=your_api_key_here
+set LAYERLENS_ATLAS_ORG_ID=your_org_id_here
+set LAYERLENS_ATLAS_PROJECT_ID=your_project_id_here
+```
+
+**Windows (PowerShell):**
+```powershell
+$env:LAYERLENS_ATLAS_API_KEY="your_api_key_here"
+$env:LAYERLENS_ATLAS_ORG_ID="your_org_id_here" 
+$env:LAYERLENS_ATLAS_PROJECT_ID="your_project_id_here"
+```
+
+### Using a `.env` File
+
+Create a `.env` file in your project root:
+
+```bash
+LAYERLENS_ATLAS_API_KEY=your_api_key_here
+LAYERLENS_ATLAS_ORG_ID=your_org_id_here
+LAYERLENS_ATLAS_PROJECT_ID=your_project_id_here
+```
+
+Then load it in your Python code:
+
+```python
+from dotenv import load_dotenv
+import os
+from atlas import Atlas
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Client will automatically use environment variables
+client = Atlas()
+```
+
+> **⚠️ Security Note**: Never commit `.env` files to version control. Add `.env` to your `.gitignore` file.
+
+## Client Configuration
+
+### Automatic Configuration
+
+When environment variables are set, the client configures itself automatically:
+
+```python
+from atlas import Atlas
+
+# Uses environment variables automatically
+client = Atlas()
+```
+
+### Explicit Configuration
+
+You can also pass credentials directly to the client:
+
+```python
+from atlas import Atlas
+
+client = Atlas(
+    api_key="your_api_key_here",
+    organization_id="your_org_id_here",
+    project_id="your_project_id_here"
+)
+```
+
+### Mixed Configuration
+
+You can mix environment variables with explicit parameters:
+
+```python
+import os
+from atlas import Atlas
+
+client = Atlas(
+    api_key=os.environ.get("LAYERLENS_ATLAS_API_KEY"),
+    organization_id="override_org_id",  # Override from environment
+    project_id=os.environ.get("LAYERLENS_ATLAS_PROJECT_ID")
+)
+```
+
+## Advanced Configuration
+
+### Timeout Configuration
+
+Configure request timeouts:
+
+```python
+from atlas import Atlas
+import httpx
+
+# Simple timeout (10 seconds)
+client = Atlas(timeout=10.0)
+
+# Advanced timeout configuration
+client = Atlas(
+    timeout=httpx.Timeout(
+        connect=5.0,    # Connection timeout
+        read=30.0,      # Read timeout
+        write=10.0,     # Write timeout
+        pool=2.0        # Pool timeout
+    )
+)
+```
+
+## Validation
+
+The SDK will validate your configuration on first use:
+
+```python
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    # Test the connection
+    evaluation = client.evaluations.create(model="test", benchmark="test")
+except atlas.AuthenticationError:
+    print("Invalid API key or authentication failed")
+except atlas.PermissionDeniedError:
+    print("Valid API key but insufficient permissions")
+except atlas.AtlasError as e:
+    print(f"Configuration error: {e}")
+```
+
+## Security Best Practices
+
+1. **Never hardcode credentials** in your source code
+2. **Use environment variables** or secure credential management systems
+3. **Rotate API keys regularly** for enhanced security
+4. **Use different API keys** for different environments (dev, staging, prod)
+5. **Monitor API key usage** in the LayerLens dashboard
+6. **Revoke unused keys** immediately
+
+## Next Steps
+
+Once authentication is configured, proceed to the [Quick Start Guide](quickstart.md) to make your first API call.
\ No newline at end of file
diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md
new file mode 100644
index 0000000..37e75e5
--- /dev/null
+++ b/docs/getting-started/installation.md
@@ -0,0 +1,55 @@
+# Installation
+
+The Atlas Python SDK supports Python 3.8 and above. You can install it using pip or your preferred Python package manager.
+
+## Install from PyPI
+
+```bash
+pip install atlas
+```
+
+## Verify Installation
+
+After installation, verify that the SDK is working correctly:
+
+```python
+import atlas
+print(atlas.__version__)
+```
+
+This should print the version number of the installed SDK.
+
+## System Requirements
+
+- **Python**: 3.8 or higher
+- **Operating Systems**: Windows, macOS, Linux
+- **Dependencies**: The SDK automatically installs required dependencies:
+  - `httpx` - HTTP client library
+  - `pydantic` - Data validation and serialization
+  - `typing-extensions` - Enhanced type hints for older Python versions
+
+## Virtual Environment (Recommended)
+
+We strongly recommend using a virtual environment to avoid dependency conflicts:
+
+```bash
+# Create virtual environment
+python -m venv atlas-env
+
+# Activate it (Linux/macOS)
+source atlas-env/bin/activate
+
+# Activate it (Windows)
+atlas-env\Scripts\activate
+
+# Install the SDK
+pip install atlas
+```
+
+## Upgrading
+
+To upgrade to the latest version:
+
+```bash
+pip install --upgrade atlas
+```
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
new file mode 100644
index 0000000..e47aaf6
--- /dev/null
+++ b/docs/getting-started/quickstart.md
@@ -0,0 +1,197 @@
+# Quick Start Guide
+
+This guide will help you make your first API call with the Atlas Python SDK. We'll walk through creating an evaluation and retrieving results.
+
+## Prerequisites
+
+Before you begin, ensure you have:
+
+1. ✅ [Installed the Atlas SDK](installation.md)
+2. ✅ [Configured authentication](authentication.md) with your API key, organization ID, and project ID
+3. ✅ Access to LayerLens Atlas platform
+
+## Your First Evaluation
+
+Let's create a simple evaluation to test your setup:
+
+```python
+import os
+from atlas import Atlas
+
+# Initialize the client (uses environment variables)
+client = Atlas(
+    api_key=os.environ.get("LAYERLENS_ATLAS_API_KEY"),
+    organization_id=os.environ.get("LAYERLENS_ATLAS_ORG_ID"),
+    project_id=os.environ.get("LAYERLENS_ATLAS_PROJECT_ID")
+)
+
+# Create an evaluation
+evaluation = client.evaluations.create(
+    model="gpt-3.5-turbo",  # Replace with your model ID
+    benchmark="mmlu"        # Replace with your benchmark ID
+)
+
+if evaluation:
+    print(f"✅ Evaluation created successfully!")
+    print(f"   ID: {evaluation.id}")
+    print(f"   Status: {evaluation.status}")
+    print(f"   Model: {evaluation.model_name}")
+    print(f"   Benchmark: {evaluation.dataset_name}")
+else:
+    print("❌ Failed to create evaluation")
+```
+
+## Understanding the Response
+
+A successful evaluation creation returns an `Evaluation` object with the following key properties:
+
+```python
+evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+print(f"Evaluation ID: {evaluation.id}")
+print(f"Status: {evaluation.status}")
+print(f"Status Description: {evaluation.status_description}")
+print(f"Submitted At: {evaluation.submitted_at}")
+print(f"Model: {evaluation.model_name} ({evaluation.model_company})")
+print(f"Dataset: {evaluation.dataset_name}")
+
+# Available when evaluation is completed
+if evaluation.status == "completed":
+    print(f"Accuracy: {evaluation.accuracy}")
+    print(f"Readability Score: {evaluation.readability_score}")
+    print(f"Toxicity Score: {evaluation.toxicity_score}")
+    print(f"Ethics Score: {evaluation.ethics_score}")
+```
+
+## Retrieving Results
+
+Once your evaluation is complete, you can retrieve detailed results:
+
+```python
+# Wait for evaluation to complete, then get results
+if evaluation and evaluation.status == "completed":
+    results = client.results.get(evaluation_id=evaluation.id)
+    
+    if results:
+        print(f"📊 Retrieved {len(results)} results")
+        
+        # Examine the first result
+        first_result = results[0]
+        print(f"\nFirst Result:")
+        print(f"  Subset: {first_result.subset}")
+        print(f"  Prompt: {first_result.prompt[:100]}...")  # First 100 chars
+        print(f"  Model Output: {first_result.result[:100]}...")
+        print(f"  Expected Answer: {first_result.truth}")
+        print(f"  Score: {first_result.score}")
+        print(f"  Duration: {first_result.duration}")
+        print(f"  Metrics: {first_result.metrics}")
+```
+
+## Complete Example
+
+Here's a complete example that creates an evaluation and waits for results:
+
+```python
+import os
+import time
+from atlas import Atlas
+
+def main():
+    # Initialize client
+    client = Atlas()
+    
+    print("🚀 Creating evaluation...")
+    
+    try:
+        # Create evaluation
+        evaluation = client.evaluations.create(
+            model="gpt-3.5-turbo",
+            benchmark="mmlu"
+        )
+        
+        if not evaluation:
+            print("❌ Failed to create evaluation")
+            return
+            
+        print(f"✅ Evaluation created: {evaluation.id}")
+        print(f"   Status: {evaluation.status}")
+        
+        # Poll for completion (in a real app, use webhooks instead)
+        print("\n⏳ Waiting for evaluation to complete...")
+        
+        while evaluation.status not in ["completed", "failed", "cancelled"]:
+            time.sleep(30)  # Wait 30 seconds
+            
+            # In practice, you'd re-fetch the evaluation status
+            # This is a simplified example
+            print(f"   Status: {evaluation.status}")
+        
+        if evaluation.status == "completed":
+            print(f"🎉 Evaluation completed!")
+            print(f"   Accuracy: {evaluation.accuracy:.2%}")
+            
+            # Get detailed results
+            results = client.results.get(evaluation_id=evaluation.id)
+            print(f"📊 Retrieved {len(results) if results else 0} detailed results")
+            
+        else:
+            print(f"❌ Evaluation failed with status: {evaluation.status}")
+            
+    except Exception as e:
+        print(f"❌ Error: {e}")
+
+if __name__ == "__main__":
+    main()
+```
+
+## Error Handling
+
+Always wrap your API calls in try-catch blocks:
+
+```python
+import atlas
+from atlas import Atlas
+
+client = Atlas()
+
+try:
+    evaluation = client.evaluations.create(
+        model="gpt-4",
+        benchmark="mmlu"
+    )
+except atlas.AuthenticationError:
+    print("❌ Authentication failed. Check your API key.")
+except atlas.PermissionDeniedError:
+    print("❌ Permission denied. Check your organization/project access.")
+except atlas.RateLimitError:
+    print("❌ Rate limit exceeded. Please wait and try again.")
+except atlas.APIConnectionError as e:
+    print(f"❌ Connection error: {e}")
+except atlas.APIStatusError as e:
+    print(f"❌ API error: {e.status_code} - {e}")
+except Exception as e:
+    print(f"❌ Unexpected error: {e}")
+```
+
+## Available Models and Benchmarks
+
+To see available models and benchmarks, you can:
+
+1. **Check the LayerLens Atlas dashboard** for the most up-to-date list
+2. **Contact support** for specific model or benchmark IDs
+
+## What's Next?
+
+Now that you've successfully made your first API call:
+
+1. **[Explore the API Reference](../api-reference/)** - Learn about all available methods
+2. **[Check out Code Examples](../examples/)** - See practical usage patterns
+3. **[Review Error Handling](../api-reference/errors.md)** - Handle edge cases gracefully
+4. **[Security Best Practices](../security/)** - Secure your API usage
+
+## Need Help?
+
+- **Documentation**: Browse the complete [API Reference](../api-reference/)
+- **Examples**: Check out more [Code Examples](../examples/)
+- **Support**: Contact LayerLens support through your dashboard for technical assistance
+- **Status**: Check [status.layerlens.com](https://status.layerlens.com) for service updates
\ No newline at end of file
diff --git a/docs/security/api-key-management.md b/docs/security/api-key-management.md
new file mode 100644
index 0000000..ca89c19
--- /dev/null
+++ b/docs/security/api-key-management.md
@@ -0,0 +1,275 @@
+# API Key Management
+
+This guide covers best practices for securely managing your Atlas API keys throughout the development lifecycle.
+
+## API Key Security Fundamentals
+
+### What Makes API Keys Sensitive
+
+API keys are sensitive credentials that provide access to your Atlas organization and projects. They should be treated with the same level of security as passwords or other authentication tokens.
+
+**Risks of compromised API keys**:
+- Unauthorized access to your evaluations and data
+- Unintended usage charges on your account
+- Potential data breaches or intellectual property theft
+- Abuse of your API quotas and rate limits
+
+### API Key Best Practices
+
+1. **Never hardcode API keys in source code**
+2. **Use environment variables or secure credential stores**
+3. **Rotate keys regularly**
+4. **Use different keys for different environments**
+5. **Monitor key usage and access patterns**
+6. **Revoke unused or compromised keys immediately**
+
+## Secure API Key Storage
+
+### Environment Variables (Recommended)
+
+**✅ Good - Using environment variables**:
+```python
+import os
+from atlas import Atlas
+
+# Secure: Load from environment variables
+client = Atlas(
+    api_key=os.getenv('LAYERLENS_ATLAS_API_KEY'),
+    organization_id=os.getenv('LAYERLENS_ATLAS_ORG_ID'),
+    project_id=os.getenv('LAYERLENS_ATLAS_PROJECT_ID')
+)
+```
+### Setting Environment Variables Securely
+
+**Linux/macOS**:
+```bash
+# Add to your shell profile (.bashrc, .zshrc, etc.)
+export LAYERLENS_ATLAS_API_KEY="sk-your-key-here"
+export LAYERLENS_ATLAS_ORG_ID="org-your-org-here" 
+export LAYERLENS_ATLAS_PROJECT_ID="proj-your-project-here"
+
+# Reload your shell configuration
+source ~/.bashrc  # or ~/.zshrc
+```
+
+**Windows**:
+```cmd
+# Command Prompt (persistent)
+setx LAYERLENS_ATLAS_API_KEY "sk-your-key-here"
+setx LAYERLENS_ATLAS_ORG_ID "org-your-org-here"
+setx LAYERLENS_ATLAS_PROJECT_ID "proj-your-project-here"
+
+# PowerShell (session-only)
+$env:LAYERLENS_ATLAS_API_KEY="sk-your-key-here"
+$env:LAYERLENS_ATLAS_ORG_ID="org-your-org-here"
+$env:LAYERLENS_ATLAS_PROJECT_ID="proj-your-project-here"
+```
+
+### Using .env Files
+
+**Create a .env file** (never commit this to version control):
+```bash
+# .env
+LAYERLENS_ATLAS_API_KEY=sk-your-key-here
+LAYERLENS_ATLAS_ORG_ID=org-your-org-here
+LAYERLENS_ATLAS_PROJECT_ID=proj-your-project-here
+```
+
+**Load .env file in Python**:
+```python
+from dotenv import load_dotenv
+import os
+
+# Load environment variables from .env file
+load_dotenv()
+
+from atlas import Atlas
+
+# Now environment variables are available
+client = Atlas()
+```
+
+**Important**: Add `.env` to your `.gitignore` file:
+```bash
+# .gitignore
+.env
+.env.local
+.env.*.local
+*.env
+```
+
+### Advanced Credential Management
+
+#### Using External Secret Managers
+
+**AWS Secrets Manager**:
+```python
+import boto3
+import json
+from atlas import Atlas
+
+def get_atlas_credentials_from_aws():
+    """Retrieve Atlas credentials from AWS Secrets Manager"""
+    session = boto3.session.Session()
+    client = session.client('secretsmanager', region_name='us-east-1')
+    
+    try:
+        response = client.get_secret_value(SecretId='layerlens/atlas/credentials')
+        secrets = json.loads(response['SecretString'])
+        
+        return {
+            'api_key': secrets['api_key'],
+            'organization_id': secrets['organization_id'],
+            'project_id': secrets['project_id']
+        }
+    except Exception as e:
+        print(f"Error retrieving secrets: {e}")
+        return None
+
+# Usage
+credentials = get_atlas_credentials_from_aws()
+if credentials:
+    client = Atlas(**credentials)
+```
+## Environment-Specific Key Management
+
+### Separating Development and Production Keys
+
+**Use different API keys for different environments**:
+
+```python
+import os
+from atlas import Atlas
+
+def get_atlas_client():
+    """Get Atlas client based on environment"""
+    environment = os.getenv('ATLAS_ENV', 'development')
+    
+    if environment == 'development':
+        return Atlas(
+            api_key=os.getenv('DEV_ATLAS_API_KEY'),
+            organization_id=os.getenv('DEV_ATLAS_ORG_ID'),
+            project_id=os.getenv('DEV_ATLAS_PROJECT_ID'),
+            base_url=os.getenv('DEV_ATLAS_BASE_URL')  # Dev server if applicable
+        )
+    elif environment == 'staging':
+        return Atlas(
+            api_key=os.getenv('STAGING_ATLAS_API_KEY'),
+            organization_id=os.getenv('STAGING_ATLAS_ORG_ID'),
+            project_id=os.getenv('STAGING_ATLAS_PROJECT_ID')
+        )
+    elif environment == 'production':
+        return Atlas(
+            api_key=os.getenv('PROD_ATLAS_API_KEY'),
+            organization_id=os.getenv('PROD_ATLAS_ORG_ID'),
+            project_id=os.getenv('PROD_ATLAS_PROJECT_ID')
+        )
+    else:
+        raise ValueError(f"Unknown environment: {environment}")
+
+# Usage
+client = get_atlas_client()
+```
+
+**Environment-specific .env files**:
+```bash
+# .env.development
+DEV_ATLAS_API_KEY=sk-dev-key-here
+DEV_ATLAS_ORG_ID=dev-org-id
+DEV_ATLAS_PROJECT_ID=dev-project-id
+DEV_ATLAS_BASE_URL=https://dev-api.layerlens.com
+
+# .env.production
+PROD_ATLAS_API_KEY=sk-prod-key-here
+PROD_ATLAS_ORG_ID=prod-org-id
+PROD_ATLAS_PROJECT_ID=prod-project-id
+```
+
+### Container and Deployment Security
+
+**Docker Secrets**:
+```yaml
+# docker-compose.yml
+version: '3.8'
+
+services:
+  atlas-app:
+    image: your-app:latest
+    secrets:
+      - atlas_api_key
+      - atlas_org_id
+      - atlas_project_id
+    environment:
+      - LAYERLENS_ATLAS_API_KEY_FILE=/run/secrets/atlas_api_key
+      - LAYERLENS_ATLAS_ORG_ID_FILE=/run/secrets/atlas_org_id
+      - LAYERLENS_ATLAS_PROJECT_ID_FILE=/run/secrets/atlas_project_id
+
+secrets:
+  atlas_api_key:
+    file: ./secrets/atlas_api_key.txt
+  atlas_org_id:
+    file: ./secrets/atlas_org_id.txt
+  atlas_project_id:
+    file: ./secrets/atlas_project_id.txt
+```
+
+**Reading Docker secrets in Python**:
+```python
+import os
+from atlas import Atlas
+
+def read_docker_secret(secret_name):
+    """Read secret from Docker secrets file"""
+    secret_file = f"/run/secrets/{secret_name}"
+    try:
+        with open(secret_file, 'r') as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return None
+
+def get_atlas_client_from_docker_secrets():
+    """Initialize Atlas client using Docker secrets"""
+    # Try Docker secrets first, fall back to environment variables
+    api_key = (read_docker_secret('atlas_api_key') or 
+               os.getenv('LAYERLENS_ATLAS_API_KEY'))
+    
+    org_id = (read_docker_secret('atlas_org_id') or 
+              os.getenv('LAYERLENS_ATLAS_ORG_ID'))
+    
+    project_id = (read_docker_secret('atlas_project_id') or 
+                  os.getenv('LAYERLENS_ATLAS_PROJECT_ID'))
+    
+    if not all([api_key, org_id, project_id]):
+        raise ValueError("Missing required Atlas credentials")
+    
+    return Atlas(
+        api_key=api_key,
+        organization_id=org_id,
+        project_id=project_id
+    )
+
+# Usage
+client = get_atlas_client_from_docker_secrets()
+```
+
+
+## Security Checklist
+
+### Development Security Checklist
+
+- [ ] ✅ API keys stored in environment variables, not hardcoded
+- [ ] ✅ `.env` files added to `.gitignore`
+- [ ] ✅ Different API keys for development, staging, and production
+- [ ] ✅ API key validation implemented before deployment
+- [ ] ✅ Error handling doesn't expose API keys in logs
+- [ ] ✅ Code review process includes credential security checks
+
+### Production Security Checklist
+
+- [ ] ✅ API keys stored in secure credential management system
+- [ ] ✅ Key rotation schedule established and automated
+- [ ] ✅ API usage monitoring and alerting configured
+- [ ] ✅ Audit logging enabled for all API operations
+- [ ] ✅ Network security controls (firewalls, VPNs) in place
+- [ ] ✅ Least privilege access principles applied
+- [ ] ✅ Incident response plan includes credential compromise scenarios
diff --git a/docs/security/data-privacy.md b/docs/security/data-privacy.md
new file mode 100644
index 0000000..cf2953b
--- /dev/null
+++ b/docs/security/data-privacy.md
@@ -0,0 +1,81 @@
+# Data Privacy
+
+This guide covers data privacy considerations and best practices when using the Atlas Python SDK to ensure compliance with privacy regulations and protect sensitive information.
+
+## Overview
+
+When using the Atlas Python SDK, you may be handling sensitive data including:
+
+- **AI model outputs** and evaluation results
+- **Prompt data** used in evaluations
+- **API credentials** and authentication tokens
+- **Organizational information** and project data
+- **Usage patterns** and performance metrics
+
+Proper data privacy practices are essential for regulatory compliance and maintaining user trust.
+
+## Data Classification
+
+### Understanding Your Data Types
+
+**Public Data** ✅ (No privacy concerns):
+- Model names and identifiers
+- Benchmark names and types
+- General evaluation statistics
+- Documentation and configuration
+
+**Internal Data** ⚠️ (Moderate privacy):
+- Evaluation results and scores
+- Performance metrics
+- Usage analytics
+- System logs (without sensitive content)
+
+**Confidential Data** 🔒 (High privacy):
+- API keys and credentials
+- Custom prompts and datasets
+- Proprietary model outputs
+- Personal identifiable information (PII)
+
+**Restricted Data** 🚫 (Maximum privacy):
+- Personal data under GDPR/CCPA
+- Financial or healthcare information
+- Trade secrets and intellectual property
+- Customer data requiring special handling
+
+### Data Classification Example
+
+```python
+from enum import Enum
+from dataclasses import dataclass
+from typing import Optional, List
+
+class DataClassification(Enum):
+    PUBLIC = "public"
+    INTERNAL = "internal"
+    CONFIDENTIAL = "confidential"
+    RESTRICTED = "restricted"
+
+@dataclass
+class EvaluationDataMap:
+    """Map Atlas data types to privacy classifications"""
+    
+    model_name: DataClassification = DataClassification.PUBLIC
+    benchmark_name: DataClassification = DataClassification.PUBLIC
+    evaluation_scores: DataClassification = DataClassification.INTERNAL
+    model_outputs: DataClassification = DataClassification.CONFIDENTIAL
+    api_credentials: DataClassification = DataClassification.RESTRICTED
+    custom_prompts: DataClassification = DataClassification.CONFIDENTIAL
+
+def classify_atlas_data():
+    """Example data classification for Atlas SDK usage"""
+    data_map = EvaluationDataMap()
+    
+    print("🔍 Atlas Data Classification:")
+    for field_name, field_value in data_map.__dict__.items():
+        privacy_level = field_value.value
+        print(f"   {field_name}: {privacy_level.upper()}")
+    
+    return data_map
+
+classify_atlas_data()
+```
diff --git a/docs/security/environment-variables.md b/docs/security/environment-variables.md
new file mode 100644
index 0000000..437864e
--- /dev/null
+++ b/docs/security/environment-variables.md
@@ -0,0 +1,223 @@
+# Environment Variables
+
+This guide covers secure practices for managing environment variables when using the Atlas Python SDK.
+
+## Overview
+
+Environment variables provide a secure way to configure your Atlas SDK without hardcoding sensitive credentials in your source code. This approach separates configuration from code and enables different configurations for different environments.
+
+## Required Environment Variables
+
+The Atlas SDK uses these primary environment variables:
+
+| Variable | Description | Required | Example |
+|----------|-------------|----------|---------|
+| `LAYERLENS_ATLAS_API_KEY` | Your Atlas API key | Yes | `sk-abc123...` |
+| `LAYERLENS_ATLAS_ORG_ID` | Organization identifier | Yes | `org-abc123` |
+| `LAYERLENS_ATLAS_PROJECT_ID` | Project identifier | Yes | `proj-xyz789` |
+
+## Setting Environment Variables
+
+### Development Environment
+
+**Linux/macOS (Bash/Zsh)**:
+```bash
+# Set for current session
+export LAYERLENS_ATLAS_API_KEY="sk-your-key-here"
+export LAYERLENS_ATLAS_ORG_ID="org-your-org-here"
+export LAYERLENS_ATLAS_PROJECT_ID="proj-your-project-here"
+
+# Add to shell profile for persistence (.bashrc, .zshrc, etc.)
+echo 'export LAYERLENS_ATLAS_API_KEY="sk-your-key-here"' >> ~/.bashrc
+echo 'export LAYERLENS_ATLAS_ORG_ID="org-your-org-here"' >> ~/.bashrc
+echo 'export LAYERLENS_ATLAS_PROJECT_ID="proj-your-project-here"' >> ~/.bashrc
+
+# Reload shell configuration
+source ~/.bashrc
+```
+
+**Windows Command Prompt**:
+```cmd
+# Set for current session
+set LAYERLENS_ATLAS_API_KEY=sk-your-key-here
+set LAYERLENS_ATLAS_ORG_ID=org-your-org-here
+set LAYERLENS_ATLAS_PROJECT_ID=proj-your-project-here
+
+# Set permanently (requires admin rights)
+setx LAYERLENS_ATLAS_API_KEY "sk-your-key-here"
+setx LAYERLENS_ATLAS_ORG_ID "org-your-org-here"
+setx LAYERLENS_ATLAS_PROJECT_ID "proj-your-project-here"
+```
+
+**Windows PowerShell**:
+```powershell
+# Set for current session
+$env:LAYERLENS_ATLAS_API_KEY="sk-your-key-here"
+$env:LAYERLENS_ATLAS_ORG_ID="org-your-org-here"
+$env:LAYERLENS_ATLAS_PROJECT_ID="proj-your-project-here"
+
+# Set permanently for current user
+[Environment]::SetEnvironmentVariable("LAYERLENS_ATLAS_API_KEY", "sk-your-key-here", "User")
+[Environment]::SetEnvironmentVariable("LAYERLENS_ATLAS_ORG_ID", "org-your-org-here", "User")
+[Environment]::SetEnvironmentVariable("LAYERLENS_ATLAS_PROJECT_ID", "proj-your-project-here", "User")
+```
+
+### Verification
+
+**Check if variables are set correctly**:
+```python
+import os
+
+def verify_atlas_environment():
+    """Verify Atlas environment variables are configured"""
+    required_vars = {
+        'LAYERLENS_ATLAS_API_KEY': 'API Key',
+        'LAYERLENS_ATLAS_ORG_ID': 'Organization ID',
+        'LAYERLENS_ATLAS_PROJECT_ID': 'Project ID'
+    }
+    
+    print("🔍 Atlas Environment Variable Check")
+    print("=" * 40)
+    
+    all_set = True
+    for var_name, description in required_vars.items():
+        value = os.getenv(var_name)
+        
+        if value:
+            # Don't print the full value for security
+            masked_value = f"{value[:8]}..." if len(value) > 8 else "***"
+            print(f"✅ {description}: {masked_value}")
+        else:
+            print(f"❌ {description}: Not set")
+            all_set = False
+    
+    
+    if all_set:
+        print(f"\n🎉 All required variables are set!")
+    else:
+        print(f"\n⚠️ Some required variables are missing")
+    
+    return all_set
+
+# Run verification
+verify_atlas_environment()
+```
+
+## Using .env Files
+
+### Creating .env Files
+
+**.env file for development**:
+```bash
+# .env
+LAYERLENS_ATLAS_API_KEY=sk-development-key-here
+LAYERLENS_ATLAS_ORG_ID=org-dev-12345
+LAYERLENS_ATLAS_PROJECT_ID=proj-dev-67890
+
+# Optional: Set environment name
+ATLAS_ENV=development
+```
+
+**Loading .env files in Python**:
+```python
+from dotenv import load_dotenv
+import os
+
+# Load .env file from current directory
+load_dotenv()
+
+# Or load specific .env file
+load_dotenv('.env.development')
+
+# Or load from specific path
+load_dotenv('/path/to/your/.env')
+
+# Verify variables are loaded
+from atlas import Atlas
+
+try:
+    client = Atlas()  # Will use environment variables
+    print("✅ Atlas client initialized successfully")
+except Exception as e:
+    print(f"❌ Failed to initialize client: {e}")
+```
+
+### Environment-Specific .env Files
+
+**Create separate files for each environment**:
+
+**.env.development**:
+```bash
+LAYERLENS_ATLAS_API_KEY=sk-dev-key-here
+LAYERLENS_ATLAS_ORG_ID=org-dev-12345
+LAYERLENS_ATLAS_PROJECT_ID=proj-dev-67890
+```
+
+**.env.staging**:
+```bash
+LAYERLENS_ATLAS_API_KEY=sk-staging-key-here
+LAYERLENS_ATLAS_ORG_ID=org-staging-12345
+LAYERLENS_ATLAS_PROJECT_ID=proj-staging-67890
+```
+
+**.env.production**:
+```bash
+LAYERLENS_ATLAS_API_KEY=sk-prod-key-here
+LAYERLENS_ATLAS_ORG_ID=org-prod-12345
+LAYERLENS_ATLAS_PROJECT_ID=proj-prod-67890
+```
+
+**Load environment-specific configuration**:
+```python
+import os
+from dotenv import load_dotenv
+from atlas import Atlas
+
+def load_environment_config():
+    """Load environment-specific configuration"""
+    # Determine environment
+    env = os.getenv('ATLAS_ENV', 'development')
+    
+    # Load base .env file first
+    load_dotenv('.env')
+    
+    # Override with environment-specific file
+    env_file = f'.env.{env}'
+    if os.path.exists(env_file):
+        load_dotenv(env_file, override=True)
+        print(f"📄 Loaded configuration from {env_file}")
+    else:
+        print(f"⚠️ Environment file {env_file} not found, using base configuration")
+    
+    return env
+
+def get_atlas_client():
+    """Get Atlas client with environment-specific configuration"""
+    env = load_environment_config()
+    
+    # Create client with loaded environment variables
+    client = Atlas()
+    
+    # Log configuration (without sensitive data)
+    print(f"🌍 Environment: {env}")
+    print(f"🔗 Base URL: {client.base_url}")
+    print(f"⏱️ Timeout: {client.timeout}s")
+    
+    return client
+
+# Usage
+client = get_atlas_client()
+```
+
+## Security Best Practices
+
+### Environment Variable Security Checklist
+
+- [ ] ✅ No sensitive values hardcoded in source code
+- [ ] ✅ .env files added to .gitignore
+- [ ] ✅ Different credentials for each environment (dev/staging/prod)
+- [ ] ✅ Environment variables validated before use
+- [ ] ✅ Production secrets managed through secure systems (not .env files)
+- [ ] ✅ Regular rotation of API keys
+- [ ] ✅ Monitoring for credential exposure in logs
+- [ ] ✅ Team members trained on secure credential handling
diff --git a/docs/security/rate-limiting.md b/docs/security/rate-limiting.md
new file mode 100644
index 0000000..1cfac05
--- /dev/null
+++ b/docs/security/rate-limiting.md
@@ -0,0 +1,570 @@
+# Rate Limiting
+
+This guide covers how to handle rate limiting when using the Atlas Python SDK, including best practices for avoiding rate limits and properly handling rate limit errors.
+
+## Identifying Rate Limit Errors
+
+### Rate Limit HTTP Response
+
+When you exceed rate limits, the API returns a `429 Too Many Requests` status:
+
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    
+    # Making too many requests quickly
+    for i in range(100):
+        evaluation = client.evaluations.create(
+            model="gpt-4", 
+            benchmark="mmlu"
+        )
+        
+except atlas.RateLimitError as e:
+    print(f"Rate limited: {e}")
+    print(f"Status code: {e.status_code}")  # 429
+    print(f"Response headers: {dict(e.response.headers)}")
+```
+
+### Rate Limit Headers
+
+The API response includes helpful headers:
+
+```python
+import atlas
+from atlas import Atlas
+
+def inspect_rate_limit_headers(error):
+    """Inspect rate limit headers from error response"""
+    headers = error.response.headers
+    
+    # Common rate limit headers
+    rate_limit_info = {
+        'retry_after': headers.get('retry-after'),
+        'x_ratelimit_limit': headers.get('x-ratelimit-limit'),
+        'x_ratelimit_remaining': headers.get('x-ratelimit-remaining'),
+        'x_ratelimit_reset': headers.get('x-ratelimit-reset'),
+    }
+    
+    print("Rate limit information:")
+    for key, value in rate_limit_info.items():
+        if value:
+            print(f"  {key}: {value}")
+
+try:
+    client = Atlas()
+    # ... make request that triggers rate limit
+    
+except atlas.RateLimitError as e:
+    inspect_rate_limit_headers(e)
+```
+
+## Handling Rate Limits
+
+### Basic Retry with Backoff
+
+```python
+import time
+import random
+import atlas
+from atlas import Atlas
+
+def create_evaluation_with_retry(model: str, benchmark: str, max_retries: int = 3):
+    """Create evaluation with rate limit retry logic"""
+    client = Atlas()
+    
+    for attempt in range(max_retries):
+        try:
+            evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+            
+            if evaluation:
+                print(f"✅ Success on attempt {attempt + 1}")
+                return evaluation
+                
+        except atlas.RateLimitError as e:
+            print(f"⏳ Rate limited on attempt {attempt + 1}")
+            
+            # Check if server provided retry-after header
+            retry_after = e.response.headers.get('retry-after')
+            
+            if retry_after:
+                wait_time = int(retry_after)
+                print(f"   Server requests waiting {wait_time} seconds")
+            else:
+                # Exponential backoff with jitter
+                base_wait = 2 ** attempt
+                jitter = random.uniform(0, 1)
+                wait_time = base_wait + jitter
+                print(f"   Using exponential backoff: {wait_time:.1f} seconds")
+            
+            if attempt < max_retries - 1:
+                time.sleep(wait_time)
+            else:
+                print(f"❌ Exhausted all {max_retries} retry attempts")
+                raise
+                
+        except atlas.APIError as e:
+            print(f"❌ Non-rate-limit error: {e}")
+            raise
+    
+    return None
+
+# Usage
+evaluation = create_evaluation_with_retry("gpt-4", "mmlu")
+```
+
+### Advanced Retry Strategies
+
+#### Exponential Backoff with Jitter
+
+```python
+import time
+import random
+import atlas
+from atlas import Atlas
+
+class ExponentialBackoffRetry:
+    """Implement exponential backoff with jitter for rate limit handling"""
+    
+    def __init__(self, max_retries=5, base_delay=1.0, max_delay=60.0):
+        self.max_retries = max_retries
+        self.base_delay = base_delay
+        self.max_delay = max_delay
+    
+    def calculate_delay(self, attempt: int, retry_after: str = None) -> float:
+        """Calculate delay before next retry"""
+        
+        # If server provided retry-after, use that
+        if retry_after:
+            try:
+                return float(retry_after)
+            except (ValueError, TypeError):
+                pass
+        
+        # Exponential backoff: 2^attempt * base_delay
+        delay = self.base_delay * (2 ** attempt)
+        
+        # Add jitter to prevent thundering herd
+        jitter = delay * 0.1 * random.uniform(-1, 1)
+        delay += jitter
+        
+        # Cap at maximum delay
+        return min(delay, self.max_delay)
+    
+    def retry_operation(self, operation_func, *args, **kwargs):
+        """Retry operation with exponential backoff"""
+        
+        for attempt in range(self.max_retries):
+            try:
+                return operation_func(*args, **kwargs)
+                
+            except atlas.RateLimitError as e:
+                if attempt == self.max_retries - 1:
+                    # Last attempt - re-raise the error
+                    raise
+                
+                retry_after = e.response.headers.get('retry-after')
+                delay = self.calculate_delay(attempt, retry_after)
+                
+                print(f"⏳ Rate limited (attempt {attempt + 1}/{self.max_retries})")
+                print(f"   Waiting {delay:.1f} seconds before retry...")
+                
+                time.sleep(delay)
+                continue
+                
+            except atlas.APIError as e:
+                # Don't retry other API errors
+                print(f"❌ Non-retryable error: {e}")
+                raise
+
+# Usage
+backoff = ExponentialBackoffRetry(max_retries=5, base_delay=2.0, max_delay=120.0)
+
+def create_evaluation():
+    client = Atlas()
+    return client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+evaluation = backoff.retry_operation(create_evaluation)
+```
+
+
+## Proactive Rate Limit Management
+
+### Request Throttling
+
+```python
+import time
+from threading import Lock
+from datetime import datetime, timedelta
+import atlas
+from atlas import Atlas
+
+class ThrottledAtlasClient:
+    """Atlas client with built-in request throttling"""
+    
+    def __init__(self, requests_per_minute=30, **client_kwargs):
+        self.client = Atlas(**client_kwargs)
+        self.requests_per_minute = requests_per_minute
+        self.min_interval = 60.0 / requests_per_minute  # seconds between requests
+        self.last_request_time = None
+        self.lock = Lock()
+    
+    def _wait_for_next_request(self):
+        """Wait if necessary to maintain rate limit"""
+        with self.lock:
+            if self.last_request_time:
+                elapsed = time.time() - self.last_request_time
+                if elapsed < self.min_interval:
+                    wait_time = self.min_interval - elapsed
+                    print(f"⏳ Throttling: waiting {wait_time:.1f}s")
+                    time.sleep(wait_time)
+            
+            self.last_request_time = time.time()
+    
+    def create_evaluation(self, *args, **kwargs):
+        """Create evaluation with throttling"""
+        self._wait_for_next_request()
+        return self.client.evaluations.create(*args, **kwargs)
+    
+    def get_results(self, *args, **kwargs):
+        """Get results with throttling"""
+        self._wait_for_next_request()
+        return self.client.results.get(*args, **kwargs)
+
+# Usage
+throttled_client = ThrottledAtlasClient(requests_per_minute=20)
+
+# These requests will be automatically throttled
+evaluations = []
+for i in range(10):
+    evaluation = throttled_client.create_evaluation(
+        model="gpt-4",
+        benchmark="mmlu"
+    )
+    evaluations.append(evaluation)
+```
+
+### Batch Request Management
+
+```python
+import time
+from typing import List, Tuple, Callable, Any
+from concurrent.futures import ThreadPoolExecutor, as_completed
+import atlas
+from atlas import Atlas
+
+class BatchRequestManager:
+    """Manage batch requests with rate limiting"""
+    
+    def __init__(self, requests_per_minute=30, max_concurrent=5):
+        self.requests_per_minute = requests_per_minute
+        self.max_concurrent = max_concurrent
+        self.request_interval = 60.0 / requests_per_minute
+        
+    def execute_batch(self, operations: List[Tuple[Callable, tuple, dict]], 
+                     handle_rate_limits=True) -> List[Any]:
+        """Execute a batch of operations with rate limiting"""
+        
+        results = []
+        
+        if self.max_concurrent == 1 or not handle_rate_limits:
+            # Sequential execution
+            for i, (func, args, kwargs) in enumerate(operations):
+                if i > 0 and handle_rate_limits:
+                    time.sleep(self.request_interval)
+                
+                try:
+                    result = func(*args, **kwargs)
+                    results.append({"success": True, "result": result, "index": i})
+                except Exception as e:
+                    results.append({"success": False, "error": e, "index": i})
+        else:
+            # Concurrent execution with rate limiting
+            with ThreadPoolExecutor(max_workers=self.max_concurrent) as executor:
+                future_to_index = {}
+                
+                for i, (func, args, kwargs) in enumerate(operations):
+                    if i > 0 and handle_rate_limits:
+                        # Stagger request submissions
+                        time.sleep(self.request_interval / self.max_concurrent)
+                    
+                    future = executor.submit(self._execute_with_retry, func, args, kwargs)
+                    future_to_index[future] = i
+                
+                # Collect results
+                for future in as_completed(future_to_index):
+                    index = future_to_index[future]
+                    try:
+                        result = future.result()
+                        results.append({"success": True, "result": result, "index": index})
+                    except Exception as e:
+                        results.append({"success": False, "error": e, "index": index})
+        
+        # Sort results by original order
+        results.sort(key=lambda x: x["index"])
+        return results
+    
+    def _execute_with_retry(self, func, args, kwargs, max_retries=3):
+        """Execute operation with retry on rate limit"""
+        for attempt in range(max_retries):
+            try:
+                return func(*args, **kwargs)
+            except atlas.RateLimitError as e:
+                if attempt == max_retries - 1:
+                    raise
+                
+                retry_after = e.response.headers.get('retry-after', 60)
+                wait_time = int(retry_after)
+                time.sleep(wait_time)
+
+# Usage
+client = Atlas()
+batch_manager = BatchRequestManager(requests_per_minute=20, max_concurrent=3)
+
+# Prepare batch operations
+operations = []
+models = ["gpt-4", "claude-3-opus", "gpt-3.5-turbo"] * 5
+
+for model in models:
+    operation = (
+        client.evaluations.create,  # function
+        (),                         # args
+        {"model": model, "benchmark": "mmlu"}  # kwargs
+    )
+    operations.append(operation)
+
+# Execute batch
+print(f"📦 Executing batch of {len(operations)} operations...")
+results = batch_manager.execute_batch(operations)
+
+# Process results
+successful = [r for r in results if r["success"]]
+failed = [r for r in results if not r["success"]]
+
+print(f"✅ Successful: {len(successful)}")
+print(f"❌ Failed: {len(failed)}")
+
+for result in failed:
+    print(f"   Failed operation {result['index']}: {result['error']}")
+```
+
+## Monitoring Rate Limits
+
+### Rate Limit Usage Tracking
+
+```python
+import time
+from collections import defaultdict, deque
+from datetime import datetime, timedelta
+from typing import Dict, List
+import atlas
+from atlas import Atlas
+
+class RateLimitMonitor:
+    """Monitor and track rate limit usage"""
+    
+    def __init__(self, window_minutes=60):
+        self.window_minutes = window_minutes
+        self.request_times = deque()
+        self.rate_limit_events = []
+        self.operation_counts = defaultdict(int)
+        self.error_counts = defaultdict(int)
+        
+    def record_request(self, operation: str):
+        """Record a successful request"""
+        now = datetime.now()
+        self.request_times.append(now)
+        self.operation_counts[operation] += 1
+        self._cleanup_old_data(now)
+    
+    def record_rate_limit(self, operation: str, retry_after: int = None):
+        """Record a rate limit event"""
+        event = {
+            'timestamp': datetime.now(),
+            'operation': operation,
+            'retry_after': retry_after
+        }
+        self.rate_limit_events.append(event)
+        self.error_counts['rate_limit'] += 1
+    
+    def _cleanup_old_data(self, current_time: datetime):
+        """Remove data outside monitoring window"""
+        cutoff = current_time - timedelta(minutes=self.window_minutes)
+        
+        # Clean request times
+        while self.request_times and self.request_times[0] < cutoff:
+            self.request_times.popleft()
+        
+        # Clean rate limit events
+        self.rate_limit_events = [
+            event for event in self.rate_limit_events
+            if event['timestamp'] > cutoff
+        ]
+    
+    def get_current_rate(self) -> float:
+        """Get current requests per minute"""
+        self._cleanup_old_data(datetime.now())
+        
+        if not self.request_times:
+            return 0.0
+        
+        # Calculate rate over actual time window
+        time_span = (datetime.now() - self.request_times[0]).total_seconds() / 60
+        return len(self.request_times) / max(time_span, 1)
+    
+    def get_statistics(self) -> Dict:
+        """Get comprehensive rate limit statistics"""
+        self._cleanup_old_data(datetime.now())
+        
+        recent_rate_limits = len(self.rate_limit_events)
+        total_requests = len(self.request_times)
+        
+        return {
+            'current_rate_per_minute': self.get_current_rate(),
+            'total_requests_in_window': total_requests,
+            'rate_limit_events': recent_rate_limits,
+            'rate_limit_percentage': (recent_rate_limits / max(total_requests, 1)) * 100,
+            'operation_breakdown': dict(self.operation_counts),
+            'last_rate_limit': max([e['timestamp'] for e in self.rate_limit_events], 
+                                 default=None)
+        }
+    
+    def should_slow_down(self, threshold_percentage=5) -> bool:
+        """Check if we should slow down requests based on rate limits"""
+        stats = self.get_statistics()
+        return stats['rate_limit_percentage'] > threshold_percentage
+
+class MonitoredAtlasClient:
+    """Atlas client with rate limit monitoring"""
+    
+    def __init__(self, **client_kwargs):
+        self.client = Atlas(**client_kwargs)
+        self.monitor = RateLimitMonitor()
+    
+    def create_evaluation(self, *args, **kwargs):
+        """Create evaluation with monitoring"""
+        try:
+            result = self.client.evaluations.create(*args, **kwargs)
+            self.monitor.record_request('create_evaluation')
+            
+            # Adaptive slowdown
+            if self.monitor.should_slow_down():
+                print("⚠️ High rate limit percentage detected, slowing down...")
+                time.sleep(2)
+            
+            return result
+            
+        except atlas.RateLimitError as e:
+            retry_after = e.response.headers.get('retry-after')
+            self.monitor.record_rate_limit('create_evaluation', retry_after)
+            raise
+    
+    def get_results(self, *args, **kwargs):
+        """Get results with monitoring"""
+        try:
+            result = self.client.results.get(*args, **kwargs)
+            self.monitor.record_request('get_results')
+            return result
+            
+        except atlas.RateLimitError as e:
+            retry_after = e.response.headers.get('retry-after')
+            self.monitor.record_rate_limit('get_results', retry_after)
+            raise
+    
+    def print_statistics(self):
+        """Print current rate limit statistics"""
+        stats = self.monitor.get_statistics()
+        
+        print("📊 Rate Limit Statistics (last hour):")
+        print(f"   Current rate: {stats['current_rate_per_minute']:.1f} requests/min")
+        print(f"   Total requests: {stats['total_requests_in_window']}")
+        print(f"   Rate limit events: {stats['rate_limit_events']}")
+        print(f"   Rate limit percentage: {stats['rate_limit_percentage']:.1f}%")
+        
+        if stats['operation_breakdown']:
+            print("   Operations:")
+            for op, count in stats['operation_breakdown'].items():
+                print(f"     {op}: {count}")
+        
+        if stats['last_rate_limit']:
+            print(f"   Last rate limit: {stats['last_rate_limit']}")
+
+# Usage
+monitored_client = MonitoredAtlasClient()
+
+# Make requests and monitor
+for i in range(20):
+    try:
+        evaluation = monitored_client.create_evaluation(
+            model="gpt-4",
+            benchmark="mmlu"
+        )
+        print(f"✅ Evaluation {i+1} created")
+        
+        if i % 5 == 0:  # Print stats every 5 requests
+            monitored_client.print_statistics()
+            
+    except atlas.RateLimitError:
+        print(f"⏳ Rate limited on request {i+1}")
+        time.sleep(30)  # Wait before continuing
+
+# Final statistics
+monitored_client.print_statistics()
+```
+
+## Best Practices Summary
+
+### 1. Implement Proper Retry Logic
+```python
+# ✅ Good: Exponential backoff with jitter
+def robust_request(operation_func, max_retries=3):
+    for attempt in range(max_retries):
+        try:
+            return operation_func()
+        except atlas.RateLimitError as e:
+            if attempt == max_retries - 1:
+                raise
+            
+            # Use server-suggested wait time if available
+            retry_after = e.response.headers.get('retry-after', 2 ** attempt)
+            wait_time = int(retry_after) + random.uniform(0, 1)
+            time.sleep(wait_time)
+```
+
+### 2. Respect Server Headers
+```python
+# ✅ Good: Check retry-after header
+except atlas.RateLimitError as e:
+    retry_after = e.response.headers.get('retry-after')
+    if retry_after:
+        time.sleep(int(retry_after))
+```
+
+### 3. Monitor Your Usage
+```python
+# ✅ Good: Track your rate limit usage
+monitor = RateLimitMonitor()
+# ... use monitor to adjust request patterns
+```
+
+### 4. Use Appropriate Request Rates
+```python
+# ✅ Good: Conservative request rate
+throttled_client = ThrottledAtlasClient(requests_per_minute=20)
+
+# ❌ Bad: Aggressive request rate
+# aggressive_client = ThrottledAtlasClient(requests_per_minute=1000)
+```
+
+### 5. Handle Rate Limits Gracefully
+```python
+# ✅ Good: Graceful handling
+try:
+    result = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.RateLimitError:
+    # Log the event, wait, and potentially retry
+    logger.warning("Rate limit hit, backing off")
+    time.sleep(60)
+```
diff --git a/docs/troubleshooting/authentication.md b/docs/troubleshooting/authentication.md
new file mode 100644
index 0000000..fbf3ee6
--- /dev/null
+++ b/docs/troubleshooting/authentication.md
@@ -0,0 +1,186 @@
+# Authentication Problems
+
+This guide covers authentication-related issues and their solutions when using the Atlas Python SDK.
+
+## Understanding Atlas Authentication
+
+The Atlas SDK uses API key-based authentication with three required components:
+
+1. **API Key**: Your secret authentication token
+2. **Organization ID**: Your organization identifier
+3. **Project ID**: The specific project you're working with
+
+## Common Authentication Errors
+
+### Invalid or Missing API Key
+
+**Error**: `AuthenticationError: Invalid API key`
+
+**Symptoms**:
+- 401 Unauthorized responses
+- "Invalid API key" error messages
+- Authentication fails immediately
+
+
+### Missing Required Configuration
+
+**Error**: `AtlasError: The api_key client option must be set either by passing api_key to the client or by setting the LAYERLENS_ATLAS_API_KEY environment variable`
+
+**Solutions**:
+
+1. **Check all required environment variables**:
+   ```bash
+   # Linux/macOS
+   echo $LAYERLENS_ATLAS_API_KEY
+   echo $LAYERLENS_ATLAS_ORG_ID
+   echo $LAYERLENS_ATLAS_PROJECT_ID
+   
+   # Windows
+   echo %LAYERLENS_ATLAS_API_KEY%
+   echo %LAYERLENS_ATLAS_ORG_ID%
+   echo %LAYERLENS_ATLAS_PROJECT_ID%
+   ```
+
+2. **Set environment variables properly**:
+   ```bash
+   # Linux/macOS - in your shell profile (.bashrc, .zshrc, etc.)
+   export LAYERLENS_ATLAS_API_KEY="sk-..."
+   export LAYERLENS_ATLAS_ORG_ID="org-..."
+   export LAYERLENS_ATLAS_PROJECT_ID="proj-..."
+   
+   # Windows - persistently
+   setx LAYERLENS_ATLAS_API_KEY "sk-..."
+   setx LAYERLENS_ATLAS_ORG_ID "org-..."
+   setx LAYERLENS_ATLAS_PROJECT_ID "proj-..."
+   ```
+
+3. **Use .env file**:
+   ```bash
+   # Create .env file in your project root
+   LAYERLENS_ATLAS_API_KEY=sk-your-key-here
+   LAYERLENS_ATLAS_ORG_ID=org-your-org-here
+   LAYERLENS_ATLAS_PROJECT_ID=proj-your-project-here
+   ```
+   
+   ```python
+   # Load .env file in your Python code
+   from dotenv import load_dotenv
+   import os
+   
+   load_dotenv()
+   
+   from atlas import Atlas
+   client = Atlas()
+   ```
+
+### Permission Denied Errors
+
+**Error**: `PermissionDeniedError: 403 Forbidden`
+
+**Symptoms**:
+- Valid API key but still get 403 errors
+- Can authenticate but cannot create evaluations
+- Access denied to specific models or benchmarks
+
+**Diagnosis**:
+```python
+import atlas
+from atlas import Atlas
+
+def diagnose_permissions():
+    client = Atlas()
+    
+    print("🔍 Permission Diagnosis:")
+    
+    # Test basic access
+    try:
+        # This should fail with specific error types
+        evaluation = client.evaluations.create(
+            model="test-model",
+            benchmark="test-benchmark"
+        )
+    except atlas.AuthenticationError:
+        print("   ❌ Authentication failed - invalid API key")
+        return
+    except atlas.PermissionDeniedError:
+        print("   ❌ Permission denied - valid key, insufficient permissions")
+    except atlas.NotFoundError:
+        print("   ✅ Authentication works (model/benchmark not found is normal)")
+    except Exception as e:
+        print(f"   ❓ Unexpected error: {e}")
+    
+    # Test with common models/benchmarks
+    test_combinations = [
+        ("gpt-3.5-turbo", "mmlu"),
+        ("gpt-4", "hellaswag"),
+        ("claude-3-sonnet", "arc-challenge")
+    ]
+    
+    print("\n   Testing access to specific resources:")
+    
+    for model, benchmark in test_combinations:
+        try:
+            evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+            if evaluation:
+                print(f"   ✅ {model} + {benchmark}: Access granted")
+        except atlas.PermissionDeniedError:
+            print(f"   ❌ {model} + {benchmark}: Permission denied")
+        except atlas.NotFoundError:
+            print(f"   ⚠️ {model} + {benchmark}: Resource not found")
+        except Exception as e:
+            print(f"   ❓ {model} + {benchmark}: {e}")
+
+diagnose_permissions()
+```
+
+### Organization/Project Access Issues
+
+**Problem**: Valid API key but wrong organization or project
+
+**Symptoms**:
+- Authentication succeeds
+- Cannot access expected models or benchmarks
+- Permission errors for resources you should have access to
+
+**Diagnosis**:
+```python
+import os
+from atlas import Atlas
+import atlas
+
+def verify_org_project_access():
+    # Test with different org/project combinations
+    api_key = os.getenv('LAYERLENS_ATLAS_API_KEY')
+    
+    if not api_key:
+        print("❌ No API key found")
+        return
+    
+    # Test current configuration
+    current_org = os.getenv('LAYERLENS_ATLAS_ORG_ID')
+    current_project = os.getenv('LAYERLENS_ATLAS_PROJECT_ID')
+    
+    print(f"Testing current configuration:")
+    print(f"  Organization: {current_org}")
+    print(f"  Project: {current_project}")
+    
+    try:
+        client = Atlas(
+            api_key=api_key,
+            organization_id=current_org,
+            project_id=current_project
+        )
+        
+        evaluation = client.evaluations.create(model="test", benchmark="test")
+        
+    except atlas.AuthenticationError:
+        print("  ❌ Authentication failed")
+    except atlas.PermissionDeniedError:
+        print("  ❌ Permission denied - check org/project IDs")
+    except atlas.NotFoundError:
+        print("  ✅ Access granted (test model not found is expected)")
+    except Exception as e:
+        print(f"  ❓ Error: {e}")
+
+verify_org_project_access()
+```
\ No newline at end of file
diff --git a/docs/troubleshooting/common-issues.md b/docs/troubleshooting/common-issues.md
new file mode 100644
index 0000000..1df848e
--- /dev/null
+++ b/docs/troubleshooting/common-issues.md
@@ -0,0 +1,112 @@
+# Common Issues
+
+This guide covers the most frequently encountered issues when using the Atlas Python SDK and provides step-by-step solutions.
+
+## Installation Issues
+
+### Package Not Found
+
+**Problem**: `pip install atlas` fails with "No matching distribution found"
+
+**Solutions**:
+
+1. **Check Python version compatibility**:
+   ```bash
+   python --version
+   # Atlas requires Python 3.8+
+   ```
+
+2. **Update pip and try again**:
+   ```bash
+   python -m pip install --upgrade pip
+   pip install atlas
+   ```
+
+3. **Use Python 3 explicitly**:
+   ```bash
+   python3 -m pip install atlas
+   ```
+
+## Configuration Issues
+
+### Missing Environment Variables
+
+**Problem**: `AtlasError: The api_key client option must be set`
+
+**Diagnosis**:
+```python
+import os
+print(f"API Key: {os.getenv('LAYERLENS_ATLAS_API_KEY', 'NOT SET')}")
+print(f"Org ID: {os.getenv('LAYERLENS_ATLAS_ORG_ID', 'NOT SET')}")
+print(f"Project ID: {os.getenv('LAYERLENS_ATLAS_PROJECT_ID', 'NOT SET')}")
+```
+
+**Solutions**:
+
+1. **Set environment variables**:
+   ```bash
+   # Linux/macOS
+   export LAYERLENS_ATLAS_API_KEY="your_api_key_here"
+   export LAYERLENS_ATLAS_ORG_ID="your_org_id_here"
+   export LAYERLENS_ATLAS_PROJECT_ID="your_project_id_here"
+   
+   # Windows
+   set LAYERLENS_ATLAS_API_KEY=your_api_key_here
+   set LAYERLENS_ATLAS_ORG_ID=your_org_id_here
+   set LAYERLENS_ATLAS_PROJECT_ID=your_project_id_here
+   ```
+
+2. **Use .env file**:
+   ```bash
+   # Create .env file
+   LAYERLENS_ATLAS_API_KEY=your_api_key_here
+   LAYERLENS_ATLAS_ORG_ID=your_org_id_here
+   LAYERLENS_ATLAS_PROJECT_ID=your_project_id_here
+   ```
+   
+   ```python
+   from dotenv import load_dotenv
+   load_dotenv()
+   
+   from atlas import Atlas
+   client = Atlas()
+   ```
+
+3. **Pass explicitly to client**:
+   ```python
+   from atlas import Atlas
+   
+   client = Atlas(
+       api_key="your_api_key_here",
+       organization_id="your_org_id_here",
+       project_id="your_project_id_here"
+   )
+   ```
+
+### Where to Get Help
+
+1. **LayerLens Support**: Contact support through your LayerLens dashboard for technical issues
+2. **Documentation**: Check the [complete documentation](../README.md)
+3. **Community**: Join LayerLens community channels for discussions
+
+### Creating a Good Bug Report
+
+Include this information when reporting issues:
+
+1. **Environment details** (from debug info above)
+2. **Complete error message** with stack trace
+3. **Minimal reproducible example**:
+   ```python
+   from atlas import Atlas
+   
+   client = Atlas()
+   
+   # Minimal code that demonstrates the problem
+   evaluation = client.evaluations.create(
+       model="gpt-4",
+       benchmark="mmlu"
+   )
+   ```
+4. **Expected vs actual behavior**
+5. **Steps to reproduce**
+6. **Workarounds attempted**
diff --git a/docs/troubleshooting/error-codes.md b/docs/troubleshooting/error-codes.md
new file mode 100644
index 0000000..227b16f
--- /dev/null
+++ b/docs/troubleshooting/error-codes.md
@@ -0,0 +1,689 @@
+# Error Codes Reference
+
+This reference guide provides detailed information about all error codes and exceptions in the Atlas Python SDK.
+
+## Exception Hierarchy
+
+```
+AtlasError (Base exception)
+├── APIError (Base for API-related errors)
+│   ├── APIConnectionError (Network/connection issues)
+│   │   └── APITimeoutError (Request timeouts)
+│   ├── APIResponseValidationError (Invalid response format)
+│   └── APIStatusError (HTTP status errors)
+│       ├── BadRequestError (400)
+│       ├── AuthenticationError (401)
+│       ├── PermissionDeniedError (403)
+│       ├── NotFoundError (404)
+│       ├── ConflictError (409)
+│       ├── UnprocessableEntityError (422)
+│       ├── RateLimitError (429)
+│       └── InternalServerError (500+)
+```
+
+## HTTP Status Code Errors
+
+### 400 - Bad Request (`BadRequestError`)
+
+**When it occurs**:
+- Invalid request parameters
+- Missing required fields
+- Malformed request data
+
+**Common causes**:
+```python
+# Empty or invalid parameters
+client.evaluations.create(model="", benchmark="")  # Empty strings
+client.evaluations.create(model=None, benchmark="mmlu")  # None values
+
+# Invalid parameter types
+client.evaluations.create(model=123, benchmark="mmlu")  # Wrong type
+```
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="", benchmark="mmlu")
+except atlas.BadRequestError as e:
+    print(f"Bad request: {e}")
+    print(f"Status code: {e.status_code}")  # 400
+    print(f"Response body: {e.body}")
+```
+
+**Solutions**:
+1. **Validate parameters before making requests**:
+   ```python
+   def validate_evaluation_params(model, benchmark):
+       if not model or not isinstance(model, str):
+           raise ValueError("Model must be a non-empty string")
+       if not benchmark or not isinstance(benchmark, str):
+           raise ValueError("Benchmark must be a non-empty string")
+       return True
+   
+   if validate_evaluation_params(model, benchmark):
+       evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+   ```
+
+2. **Check parameter format requirements**:
+   ```python
+   # Ensure parameters meet expected format
+   model = model.strip() if model else ""
+   benchmark = benchmark.strip() if benchmark else ""
+   
+   if len(model) < 2 or len(benchmark) < 2:
+       raise ValueError("Model and benchmark names must be at least 2 characters")
+   ```
+
+### 401 - Unauthorized (`AuthenticationError`)
+
+**When it occurs**:
+- Missing API key
+- Invalid or expired API key
+- API key format issues
+
+**Common causes**:
+```python
+# Missing API key
+client = Atlas(api_key=None)
+
+# Invalid API key format
+client = Atlas(api_key="invalid-key")
+
+# Expired API key (need to regenerate)
+client = Atlas(api_key="sk-old-expired-key")
+```
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas(api_key="invalid-key")
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.AuthenticationError as e:
+    print(f"Authentication failed: {e}")
+    print(f"Status code: {e.status_code}")  # 401
+    print(f"Request ID: {e.request_id}")
+```
+
+**Solutions**:
+1. **Verify API key configuration**:
+   ```python
+   import os
+   
+   api_key = os.getenv('LAYERLENS_ATLAS_API_KEY')
+   if not api_key:
+       print("❌ API key not found in environment variables")
+   elif len(api_key) < 10:
+       print("⚠️ API key seems too short")
+   else:
+       print("✅ API key found and looks valid")
+   ```
+
+2. **Regenerate API key**:
+   - Log into Atlas dashboard
+   - Go to Settings > API Keys
+   - Generate new API key
+   - Update environment variables
+
+3. **Test authentication separately**:
+   ```python
+   def test_authentication(api_key):
+       try:
+           client = Atlas(api_key=api_key)
+           # Try minimal operation to test auth
+           client.evaluations.create(model="test", benchmark="test")
+       except atlas.AuthenticationError:
+           return False, "Invalid API key"
+       except atlas.NotFoundError:
+           return True, "Authentication successful (test resources not found is expected)"
+       except Exception as e:
+           return False, f"Unexpected error: {e}"
+   
+   is_valid, message = test_authentication(your_api_key)
+   print(f"Authentication test: {message}")
+   ```
+
+### 403 - Forbidden (`PermissionDeniedError`)
+
+**When it occurs**:
+- Valid API key but insufficient permissions
+- No access to specific models or benchmarks
+- Organization/project access issues
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="restricted-model", benchmark="mmlu")
+except atlas.PermissionDeniedError as e:
+    print(f"Permission denied: {e}")
+    print(f"Status code: {e.status_code}")  # 403
+    print(f"Response body: {e.body}")
+```
+
+**Solutions**:
+1. **Check organization and project IDs**:
+   ```python
+   import os
+   
+   print(f"Organization ID: {os.getenv('LAYERLENS_ATLAS_ORG_ID')}")
+   print(f"Project ID: {os.getenv('LAYERLENS_ATLAS_PROJECT_ID')}")
+   
+   # Verify these match your Atlas dashboard settings
+   ```
+
+2. **Test access to different resources**:
+   ```python
+   def test_resource_access(models, benchmarks):
+       client = Atlas()
+       access_matrix = {}
+       
+       for model in models:
+           access_matrix[model] = {}
+           for benchmark in benchmarks:
+               try:
+                   evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+                   access_matrix[model][benchmark] = "✅ Access granted"
+               except atlas.PermissionDeniedError:
+                   access_matrix[model][benchmark] = "❌ Permission denied"
+               except atlas.NotFoundError:
+                   access_matrix[model][benchmark] = "❓ Resource not found"
+               except Exception as e:
+                   access_matrix[model][benchmark] = f"❓ {type(e).__name__}"
+       
+       return access_matrix
+   
+   # Test common resources
+   models = ["gpt-3.5-turbo", "gpt-4", "claude-3-sonnet"]
+   benchmarks = ["mmlu", "hellaswag", "arc-easy"]
+   
+   access = test_resource_access(models, benchmarks)
+   ```
+
+3. **Contact administrator for access**:
+   - Request access to specific models or benchmarks
+   - Verify project membership
+   - Check organization-level permissions
+
+### 404 - Not Found (`NotFoundError`)
+
+**When it occurs**:
+- Model ID doesn't exist
+- Benchmark ID doesn't exist
+- Evaluation ID not found (for results)
+- Resource doesn't exist in your organization
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="nonexistent-model", benchmark="mmlu")
+except atlas.NotFoundError as e:
+    print(f"Resource not found: {e}")
+    print(f"Status code: {e.status_code}")  # 404
+```
+
+**Solutions**:
+1. **Verify resource names**:
+   ```python
+   def find_available_models():
+       """Try common model names to find available ones"""
+       client = Atlas()
+       
+       common_models = [
+           "gpt-4", "gpt-3.5-turbo", "gpt-4-turbo",
+           "claude-3-opus", "claude-3-sonnet", "claude-3-haiku",
+           "llama-2-70b", "llama-2-13b", "mistral-7b"
+       ]
+       
+       available_models = []
+       
+       for model in common_models:
+           try:
+               # Test with common benchmark
+               evaluation = client.evaluations.create(model=model, benchmark="mmlu")
+               if evaluation:
+                   available_models.append(model)
+           except atlas.NotFoundError:
+               # Model or benchmark not found
+               continue
+           except atlas.PermissionDeniedError:
+               # Model exists but no permission
+               available_models.append(f"{model} (no permission)")
+           except Exception:
+               # Other errors - model might exist
+               available_models.append(f"{model} (unknown status)")
+       
+       return available_models
+   
+   available = find_available_models()
+   print(f"Available models: {available}")
+   ```
+
+2. **Check spelling and case sensitivity**:
+   ```python
+   # Common mistakes
+   correct_names = {
+       "GPT-4": "gpt-4",
+       "GPT4": "gpt-4", 
+       "MMLU": "mmlu",
+       "HellaSwag": "hellaswag",
+       "arc_challenge": "arc-challenge"  # Underscore vs hyphen
+   }
+   ```
+
+3. **Use exact names from Atlas dashboard**:
+   - Log into Atlas dashboard
+   - Check available models and benchmarks
+   - Copy exact names (case-sensitive)
+
+### 409 - Conflict (`ConflictError`)
+
+**When it occurs**:
+- Resource already exists
+- Conflicting operation in progress
+- State conflict (e.g., trying to modify completed evaluation)
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    # Some operation that conflicts with current state
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.ConflictError as e:
+    print(f"Conflict error: {e}")
+    print(f"Status code: {e.status_code}")  # 409
+```
+
+**Solutions**:
+1. **Check current resource state**
+2. **Wait for ongoing operations to complete**
+3. **Use different resource identifiers**
+
+### 422 - Unprocessable Entity (`UnprocessableEntityError`)
+
+**When it occurs**:
+- Valid request format but business logic prevents processing
+- Parameter combinations that don't make sense
+- Resource constraints exceeded
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="invalid-benchmark")
+except atlas.UnprocessableEntityError as e:
+    print(f"Unprocessable entity: {e}")
+    print(f"Status code: {e.status_code}")  # 422
+    print(f"Response details: {e.body}")
+```
+
+**Solutions**:
+1. **Check business logic constraints**
+2. **Verify parameter combinations are valid**
+3. **Review API documentation for limitations**
+
+### 429 - Rate Limited (`RateLimitError`)
+
+**When it occurs**:
+- Too many requests in short time period
+- API rate limits exceeded
+- Organization-level quotas reached
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    
+    # Making too many requests quickly
+    for i in range(100):
+        evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+        
+except atlas.RateLimitError as e:
+    print(f"Rate limited: {e}")
+    print(f"Status code: {e.status_code}")  # 429
+    print(f"Retry after: {e.response.headers.get('retry-after', 'not specified')}")
+```
+
+**Solutions**:
+1. **Implement retry with backoff**:
+   ```python
+   import time
+   import atlas
+   from atlas import Atlas
+   
+   def create_evaluation_with_rate_limit_handling(model, benchmark, max_retries=3):
+       client = Atlas()
+       
+       for attempt in range(max_retries):
+           try:
+               return client.evaluations.create(model=model, benchmark=benchmark)
+               
+           except atlas.RateLimitError as e:
+               retry_after = e.response.headers.get('retry-after')
+               
+               if retry_after:
+                   wait_time = int(retry_after)
+                   print(f"Rate limited. Waiting {wait_time}s as requested...")
+               else:
+                   wait_time = (2 ** attempt) * 60  # Exponential backoff
+                   print(f"Rate limited. Waiting {wait_time}s...")
+               
+               if attempt < max_retries - 1:
+                   time.sleep(wait_time)
+               else:
+                   raise  # Re-raise on final attempt
+       
+       return None
+   
+   evaluation = create_evaluation_with_rate_limit_handling("gpt-4", "mmlu")
+   ```
+
+2. **Add delays between requests**:
+   ```python
+   import time
+   
+   evaluations = []
+   models = ["gpt-4", "claude-3-opus", "llama-2-70b"]
+   
+   for model in models:
+       evaluation = client.evaluations.create(model=model, benchmark="mmlu")
+       evaluations.append(evaluation)
+       
+       # Wait between requests to avoid rate limits
+       time.sleep(2)  # 2-second delay
+   ```
+
+3. **Monitor rate limit headers**:
+   ```python
+   def monitor_rate_limits(client):
+       """Monitor rate limit status"""
+       # This would require SDK modification to expose headers
+       # Check with LayerLens documentation for rate limit details
+       pass
+   ```
+
+### 500+ - Server Errors (`InternalServerError`)
+
+**When it occurs**:
+- Atlas API server errors
+- Temporary service unavailability
+- Infrastructure issues
+
+**Example error**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas()
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.InternalServerError as e:
+    print(f"Server error: {e}")
+    print(f"Status code: {e.status_code}")  # 500, 502, 503, etc.
+    print(f"Request ID: {e.request_id}")  # Include in support requests
+```
+
+**Solutions**:
+1. **Implement retry logic**:
+   ```python
+   import time
+   import atlas
+   from atlas import Atlas
+   
+   def create_evaluation_with_server_error_handling(model, benchmark):
+       client = Atlas()
+       max_retries = 3
+       base_delay = 5  # seconds
+       
+       for attempt in range(max_retries):
+           try:
+               return client.evaluations.create(model=model, benchmark=benchmark)
+               
+           except atlas.InternalServerError as e:
+               print(f"Server error on attempt {attempt + 1}: {e}")
+               
+               if attempt < max_retries - 1:
+                   # Exponential backoff with jitter
+                   delay = base_delay * (2 ** attempt) + random.uniform(0, 2)
+                   print(f"Retrying in {delay:.1f}s...")
+                   time.sleep(delay)
+               else:
+                   print(f"All {max_retries} attempts failed. Request ID: {e.request_id}")
+                   raise
+       
+       return None
+   ```
+
+2. **Check service status**:
+   - Visit LayerLens status page
+   - Check for ongoing incidents
+   - Monitor Atlas service announcements
+
+3. **Report persistent issues**:
+   - Include request ID from error
+   - Provide timestamp and error details
+   - Contact LayerLens support
+
+## Connection Errors
+
+### `APIConnectionError`
+
+**When it occurs**:
+- Network connectivity issues
+- DNS resolution failures
+- Firewall blocking requests
+- Proxy configuration problems
+
+**Example**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas(timeout=10.0)
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APIConnectionError as e:
+    print(f"Connection error: {e}")
+    print(f"Request URL: {e.request.url}")
+```
+
+**Solutions**:
+1. **Test basic connectivity**:
+   ```bash
+   ping api.layerlens.com
+   curl -I https://api.layerlens.com
+   ```
+
+2. **Check proxy/firewall settings**
+3. **Verify DNS resolution**
+
+### `APITimeoutError`
+
+**When it occurs**:
+- Request takes longer than configured timeout
+- Network latency issues
+- Server processing delays
+
+**Example**:
+```python
+import atlas
+from atlas import Atlas
+
+try:
+    client = Atlas(timeout=30.0)  # 30-second timeout
+    evaluation = client.evaluations.create(model="gpt-4", benchmark="mmlu")
+except atlas.APITimeoutError as e:
+    print(f"Request timed out: {e}")
+```
+
+**Solutions**:
+1. **Increase timeout**:
+   ```python
+   client = Atlas(timeout=600.0)  # 10 minutes
+   ```
+
+2. **Use appropriate timeouts for operation type**:
+   ```python
+   # Quick operations
+   quick_client = Atlas(timeout=60.0)
+   
+   # Long-running evaluations
+   patient_client = Atlas(timeout=1800.0)  # 30 minutes
+   ```
+
+## Error Handling Best Practices
+
+### Comprehensive Error Handling
+
+```python
+import atlas
+from atlas import Atlas
+import time
+import logging
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+def robust_create_evaluation(model: str, benchmark: str):
+    """Create evaluation with comprehensive error handling"""
+    client = Atlas()
+    
+    try:
+        evaluation = client.evaluations.create(model=model, benchmark=benchmark)
+        
+        if evaluation:
+            logger.info(f"✅ Evaluation created: {evaluation.id}")
+            return evaluation
+        else:
+            logger.warning("⚠️ Evaluation creation returned None")
+            return None
+            
+    except atlas.BadRequestError as e:
+        logger.error(f"❌ Bad request - check parameters: {e}")
+        logger.error(f"   Model: '{model}', Benchmark: '{benchmark}'")
+        return None
+        
+    except atlas.AuthenticationError as e:
+        logger.error(f"❌ Authentication failed: {e}")
+        logger.error("   Check API key configuration")
+        return None
+        
+    except atlas.PermissionDeniedError as e:
+        logger.error(f"❌ Permission denied: {e}")
+        logger.error(f"   No access to model '{model}' or benchmark '{benchmark}'")
+        return None
+        
+    except atlas.NotFoundError as e:
+        logger.error(f"❌ Resource not found: {e}")
+        logger.error(f"   Model '{model}' or benchmark '{benchmark}' doesn't exist")
+        return None
+        
+    except atlas.RateLimitError as e:
+        retry_after = e.response.headers.get('retry-after', 60)
+        logger.warning(f"⏳ Rate limited - retry after {retry_after}s")
+        return None  # Could implement retry logic here
+        
+    except atlas.InternalServerError as e:
+        logger.error(f"❌ Server error: {e}")
+        logger.error(f"   Request ID: {e.request_id} (include in support requests)")
+        return None
+        
+    except atlas.APITimeoutError as e:
+        logger.error(f"⏰ Request timed out: {e}")
+        logger.error("   Consider increasing timeout or checking network")
+        return None
+        
+    except atlas.APIConnectionError as e:
+        logger.error(f"🔌 Connection error: {e}")
+        logger.error("   Check network connectivity and proxy settings")
+        return None
+        
+    except atlas.APIError as e:
+        logger.error(f"❌ Unexpected API error: {e}")
+        logger.error(f"   Type: {type(e).__name__}")
+        return None
+        
+    except Exception as e:
+        logger.error(f"❌ Unexpected error: {e}")
+        logger.error(f"   Type: {type(e).__name__}")
+        return None
+
+# Usage
+evaluation = robust_create_evaluation("gpt-4", "mmlu")
+```
+
+### Error Recovery Patterns
+
+```python
+import atlas
+from atlas import Atlas
+import time
+import random
+
+class AtlasErrorRecovery:
+    """Implement various error recovery patterns"""
+    
+    def __init__(self, client: Atlas):
+        self.client = client
+    
+    def exponential_backoff_retry(self, operation, max_retries=3, base_delay=1):
+        """Retry with exponential backoff"""
+        for attempt in range(max_retries):
+            try:
+                return operation()
+            except (atlas.InternalServerError, atlas.APIConnectionError, atlas.APITimeoutError) as e:
+                if attempt == max_retries - 1:
+                    raise  # Last attempt - re-raise the error
+                
+                delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
+                print(f"Attempt {attempt + 1} failed: {e}")
+                print(f"Retrying in {delay:.1f}s...")
+                time.sleep(delay)
+    
+    def circuit_breaker(self, operation, failure_threshold=5, recovery_time=60):
+        """Implement circuit breaker pattern"""
+        # This would be a more complex implementation
+        # See advanced-usage.md for full implementation
+        pass
+    
+    def fallback_strategy(self, primary_operation, fallback_operation):
+        """Try primary operation, fall back to alternative"""
+        try:
+            return primary_operation()
+        except atlas.APIError as e:
+            print(f"Primary operation failed: {e}")
+            print("Trying fallback...")
+            return fallback_operation()
+
+# Usage
+client = Atlas()
+recovery = AtlasErrorRecovery(client)
+
+def create_evaluation():
+    return client.evaluations.create(model="gpt-4", benchmark="mmlu")
+
+# Retry with exponential backoff
+evaluation = recovery.exponential_backoff_retry(create_evaluation)
+```