Skip to content

Commit 4732dc5

Browse files
prepping for release
1 parent 3910831 commit 4732dc5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+1413
-1620
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,18 +58,22 @@ See [Quick Start Guide](docs/getting-started/getting_started.md) for detailed se
5858
### 🏁 1. Getting Started
5959
- **[Installation & Setup](docs/getting-started/getting_started.md)** - Get running in 5 minutes.
6060
- **[Auth Modes](docs/authentication.md)** - Understanding Auth vs No-Auth and OAuth.
61+
- **[Service Users](docs/features/service_users.md)** - API keys for programmatic access.
62+
- **[Multi-Tenancy](docs/features/multi_tenancy.md)** - Understanding isolation.
6163
- **[User Scopes](docs/getting-started/getting_started.md#user-scopes)** - Roles: Root, Tenant Admin, and Tenant User.
6264
- **[Configuration](docs/getting-started/configuration.md)** - Server configuration options.
6365
- **[Environment Variables](docs/getting-started/env_vars.md)** - Complete metadata and storage reference.
6466

6567
### 🏗️ 2. Core Infrastructure
6668
- **[Warehouses](docs/warehouse/README.md)** - Managing S3, Azure, and GCS storage.
69+
- **[Credential Vending](docs/features/security_vending.md)** - Secure direct-to-storage access.
6770
- **[Catalogs](docs/features/asset_management.md)** - Creating Local and Federated catalogs.
6871
- **[Backend Storage](docs/backend_storage/README.md)** - Metadata persistence with Postgres, Mongo, or SQLite.
6972

7073
### 🧪 3. Data Management (API, CLI, UI)
7174
- **[Branching & Versioning](docs/features/branch_management.md)** - Git-style workflows and auto-add nuances.
7275
- **[Permissions & RBAC](docs/permissions.md)** - Asset-level access and cascading grants.
76+
- **[IAM Roles](docs/features/iam_roles.md)** - Cloud provider integration.
7377
- **[Business Metadata](docs/features/business_catalog.md)** - Tags, search, and data discovery.
7478
- **[Audit Logging](docs/features/audit_logs.md)** - Security tracking across all tools.
7579
- **[Maintenance](docs/features/maintenance.md)** - Snapshots, orphan files, and storage optimization.

admin_token.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

docker-compose.release.yml

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
version: '3.8'
2+
3+
services:
4+
minio:
5+
image: minio/minio
6+
ports:
7+
- "9000:9000"
8+
- "9001:9001"
9+
environment:
10+
MINIO_ROOT_USER: minioadmin
11+
MINIO_ROOT_PASSWORD: minioadmin
12+
command: server /data --console-address ":9001"
13+
healthcheck:
14+
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
15+
interval: 30s
16+
timeout: 20s
17+
retries: 3
18+
volumes:
19+
- minio_data_release:/data
20+
21+
createbuckets:
22+
image: minio/mc
23+
depends_on:
24+
minio:
25+
condition: service_healthy
26+
entrypoint: >
27+
/bin/sh -c "
28+
/usr/bin/mc alias set myminio http://minio:9000 minioadmin minioadmin;
29+
/usr/bin/mc mb myminio/warehouse;
30+
/usr/bin/mc anonymous set public myminio/warehouse;
31+
/usr/bin/mc mb myminio/warehouse-acme;
32+
/usr/bin/mc anonymous set public myminio/warehouse-acme;
33+
exit 0;
34+
"
35+
36+
pangolin-api:
37+
image: alexmerced/pangolin-api:0.2.0
38+
ports:
39+
- "8080:8080"
40+
environment:
41+
- RUST_LOG=info
42+
- PANGOLIN_STORE_TYPE=memory
43+
- AWS_ACCESS_KEY_ID=minioadmin
44+
- AWS_SECRET_ACCESS_KEY=minioadmin
45+
- AWS_REGION=us-east-1
46+
- AWS_ENDPOINT_URL=http://minio:9000
47+
- AWS_ALLOW_HTTP=true
48+
- PANGOLIN_NO_AUTH=${PANGOLIN_NO_AUTH:-false}
49+
depends_on:
50+
minio:
51+
condition: service_healthy
52+
53+
tests:
54+
image: python:3.11-slim
55+
volumes:
56+
- ./scripts:/app/scripts
57+
working_dir: /app
58+
environment:
59+
- PANGOLIN_API_URL=http://pangolin-api:8080
60+
- MINIO_URL=http://minio:9000
61+
- AWS_ENDPOINT_URL=http://minio:9000
62+
- AWS_ACCESS_KEY_ID=minioadmin
63+
- AWS_SECRET_ACCESS_KEY=minioadmin
64+
- AWS_REGION=us-east-1
65+
- TEST_MODE=${TEST_MODE:-no-auth}
66+
depends_on:
67+
- pangolin-api
68+
command: sh -c "pip install requests pyiceberg pyarrow && python scripts/test_release_v0.2.0.py"
69+
70+
volumes:
71+
minio_data_release:

docs/features/security_vending.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Instead of sharing long-term cloud credentials with clients (e.g., Spark jobs, D
1313
- ✅ Centralized audit trail of data access
1414
- ✅ Support for cross-account/cross-cloud access
1515
-**Multi-cloud support:** S3, Azure ADLS Gen2, Google Cloud Storage
16+
- ⚠️ **Local Filesystem:** Supported for dev/test (no credential vending involved)
1617

1718
---
1819

docs/getting-started/auth-mode.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
2+
# Pangolin Authentication Guide
3+
4+
This guide details how authentication works in Pangolin when the system is running in standard production mode (Auth Mode).
5+
6+
## 1. The Root User
7+
8+
The **Root User** is the super-administrator of the entire Pangolin system. This user exists outside of any specific tenant.
9+
10+
- **Scope**: System-wide. No specific `tenant_id`.
11+
- **Capabilities**:
12+
- Create and delete Tenants.
13+
- Create the initial Tenant Admin for a Tenant.
14+
- View global system statistics (e.g., total tenants, total tables).
15+
- **Limitations**:
16+
- Cannot query data within a Tenant's catalogs unless explicitly granted permission (conceptually separate).
17+
- Cannot be assigned to a specific Tenant.
18+
19+
**Configuration**:
20+
You can define the initial credentials for the Root User using environment variables. This is critical for securing your production deployment.
21+
22+
```bash
23+
# Default values if not specified: admin / password
24+
export PANGOLIN_ROOT_USER="super_admin"
25+
export PANGOLIN_ROOT_PASSWORD="complex_secure_password"
26+
```
27+
28+
**Login Behavior**:
29+
- **API**: Login with `username` and `password`, leaving `tenant_id` null.
30+
- **UI**: Use the dedicated Root Login toggle or route (e.g., `/login` with "Root" checked).
31+
32+
## 2. Authentication Flows
33+
34+
### Generating Tokens (API/CLI)
35+
36+
Authentication tokens (JWTs) act as your digital keys.
37+
38+
**For Tenant Admins & Users**:
39+
You must provide the `tenant_id` along with credentials.
40+
41+
```bash
42+
# Example Request
43+
POST /api/v1/users/login
44+
{
45+
"username": "data_analyst",
46+
"password": "secure_password",
47+
"tenant_id": "123e4567-e89b-12d3..."
48+
}
49+
50+
# Response
51+
{
52+
"token": "eyJhbGciOiJIUz...",
53+
"expires_in": 86400
54+
}
55+
```
56+
57+
**For Root**:
58+
Omit the `tenant_id`.
59+
60+
### Logging into the UI
61+
62+
The UI login page adapts based on the user type:
63+
64+
1. **Tenant Login**:
65+
- **Username**: Your username.
66+
- **Password**: Your password.
67+
- **Tenant ID**: The UUID of your organization/tenant.
68+
2. **Root Login**:
69+
- Click "Login as System Root".
70+
- Enter only Username and Password.
71+
72+
## 3. Testing with PyIceberg
73+
74+
When running in Auth Mode, PyIceberg requires a valid token and the tenant context.
75+
76+
```python
77+
from pyiceberg.catalog import load_catalog
78+
79+
# 1. Obtain a token (e.g., via script or manually from UI)
80+
token = "YOUR_GENERATED_JWT_TOKEN"
81+
tenant_id = "YOUR_TENANT_UUID"
82+
83+
# 2. Configure PyIceberg
84+
catalog = load_catalog(
85+
"production",
86+
**{
87+
"type": "rest",
88+
"uri": "http://localhost:8080/api/v1/catalogs/sales/iceberg",
89+
"token": token,
90+
# Required for Multi-tenancy routing
91+
"header.X-Pangolin-Tenant": tenant_id
92+
}
93+
)
94+
```
95+
96+
## 4. Permissions Matrix
97+
98+
Pangolin uses a hierarchical RBAC system.
99+
100+
### Roles
101+
102+
| Role | Scope | Description |
103+
| :--- | :--- | :--- |
104+
| **Root** | System | Manages Tenants. Cannot manage data inside a tenant directly without assuming a tenant context (if allowed). |
105+
| **Tenant Admin** | Tenant | Full control over a specific Tenant's resources (Users, Catalogs, Roles). |
106+
| **Tenant User** | Tenant | Zero access by default. Must be granted specific permissions. |
107+
108+
### Permissions Granting
109+
110+
Permissions are additive. A user can be granted specific Actions on specific Scopes.
111+
112+
| Scope | Description | Implied Children |
113+
| :--- | :--- | :--- |
114+
| **Tenant** | Entire Tenant | All Catalogs, Namespaces, Tables |
115+
| **Catalog** | Specific Catalog | All Namespaces, Tables within |
116+
| **Namespace** | Specific Namespace | All Tables within |
117+
| **Table/Asset** | Specific Table | None |
118+
119+
| Action | Description |
120+
| :--- | :--- |
121+
| **Read** | Read metadata and data. |
122+
| **Write** | Insert, update, delete data. |
123+
| **Create** | Create new resources (Tables, Namespaces). |
124+
| **Delete** | Drop resources. |
125+
| **List** | See that the resource exists (Discovery). |
126+
| **ManageDiscovery** | Mark assets as strictly discoverable (metadata visible, data locked). |
127+
128+
### Common Scenarios
129+
130+
- **Data Analyst**: Grant `Read` on specific `Namespace`.
131+
- **Data Engineer**: Grant `Read`, `Write`, `Create` on specific `Catalog`.
132+
- **Auditor**: Grant `List` (Discovery) on `Tenant` + `Read` on `audit_logs` (conceptually).
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
2+
# No Auth Mode
3+
4+
Pangolin supports a specialized "No Auth" mode designed for local development, testing, and evaluation where setting up a full authentication provider (like OAuth/OIDC) is not feasible or desired.
5+
6+
## Triggering No Auth Mode
7+
8+
To enable No Auth mode, set the following environment variable before starting the `pangolin_api` server:
9+
10+
```bash
11+
export PANGOLIN_NO_AUTH=true
12+
```
13+
14+
## Behavior & Features
15+
16+
### 1. Automatic Tenant Admin Access
17+
When accessing the API without any credentials, the system automatically treats the request as coming from the default **Tenant Admin** of the default tenant.
18+
19+
- **Username**: `tenant_admin`
20+
- **Role**: `TenantAdmin`
21+
- **Tenant ID**: `00000000-0000-0000-0000-000000000000`
22+
23+
This allows you to perform administrative tasks (create users, manage catalogs, etc.) immediately without logging in.
24+
25+
### 2. "Easy Auth" (Selective Authentication)
26+
While No Auth mode defaults to Admin for unauthenticated requests, it **respects the `Authorization` header** if provided. This is crucial for verifying permissions:
27+
28+
- **If you provide a token**: The system processes the request as that specific user.
29+
- **If you provide NO token**: The system processes the request as the default Tenant Admin.
30+
31+
### 3. Password Bypass
32+
To facilitate logging in as specific users (e.g., `TenantUser` for testing access restrictions) without setting up true hashes:
33+
34+
- The `.login()` endpoint bypasses password verification in this mode.
35+
- You can log in as *any* existing user with *any* password.
36+
37+
## Usage Guide
38+
39+
### Using the CLI
40+
Since the CLI handles authentication tokens automatically, you can mix modes:
41+
42+
1. **Admin Mode (Default)**: Just run commands.
43+
```bash
44+
pangolin-admin catalog list # Works immediately
45+
```
46+
47+
2. **User Mode**: Login as a specific user to switch context.
48+
```bash
49+
pangolin-user login --username data_analyst --password any
50+
# Now subsequent commands run as data_analyst
51+
```
52+
53+
### Using the UI
54+
The UI will automatically detect No Auth mode and may auto-login as `tenant_admin`.
55+
56+
To test different roles:
57+
1. **Logout** via the profile menu.
58+
2. **Login** with the username you created (e.g., `data_analyst`) and any password.
59+
3. You are now exploring the UI restricted by that user's permissions.
60+
61+
### Using the API (cURL / Python)
62+
- **Admin Request**:
63+
```bash
64+
curl http://localhost:8080/api/v1/catalogs
65+
```
66+
- **User Request** (after getting token via `/login`):
67+
```bash
68+
curl -H "Authorization: Bearer <token>" http://localhost:8080/api/v1/catalogs
69+
```
70+
71+
## Testing with PyIceberg
72+
73+
The server prints a convenient configuration snippet on startup when in No Auth mode. This snippet works immediately because the token vended is for the default `tenant_admin`.
74+
75+
### Configuration
76+
Use this snippet in your Python scripts to connect PyIceberg to your local Pangolin instance:
77+
78+
```python
79+
from pyiceberg.catalog import load_catalog
80+
81+
catalog = load_catalog(
82+
"local",
83+
**{
84+
"type": "rest",
85+
"uri": "http://127.0.0.1:8080/api/v1/catalogs/sales/iceberg",
86+
# Default Admin Token (valid for 24h/until restart)
87+
"token": "YOUR_TOKEN_FROM_SERVER_LOGS",
88+
# Critical for No Auth mode routing
89+
"header.X-Pangolin-Tenant": "00000000-0000-0000-0000-000000000000"
90+
}
91+
)
92+
93+
# Verify connection
94+
print(catalog.list_namespaces())
95+
```
96+
97+
### Key Notes for PyIceberg
98+
1. **Tenant Header**: You **MUST** include `"header.X-Pangolin-Tenant"` in the config properties. Without it, the server won't know which tenant context to use for the Iceberg REST endpoints, even with a valid token.
99+
2. **Token**: The token printed in the logs is a valid JWT for the `tenant_admin` role. You can use it as-is.
100+
3. **URI**: Point to the specific catalog's Iceberg endpoint (e.g., `.../catalogs/sales/iceberg`). Ensure the catalog exists first (e.g., via the seeding script).

docs/pyiceberg/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ Detailed configuration for different storage backends.
3030
To enable **Credential Vending**, PyIceberg must send the `X-Iceberg-Access-Delegation` header with the value `vended-credentials`. This tells Pangolin that the client expects temporary storage keys for its operations.
3131

3232
### Tenant Context
33-
Pangolin uses either a standard `token` or the custom `X-Pangolin-Tenant` header to determine the tenant context. For PyIceberg, use the `token` property for the best experience.
33+
While Pangolin extracts tenant context from the authentication token, we **strongly recommend** explicitly setting the `header.X-Pangolin-Tenant` property in your PyIceberg configuration. This ensures reliable routing for all operations, especially during initial connection.

docs/warehouse/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,7 @@ The `use_sts` boolean field is deprecated. Use `vending_strategy` instead.
228228
| [AWS S3](s3.md) | ✅ Production | Most common, excellent performance |
229229
| [Azure Blob](azure.md) | ✅ Production | Azure-native deployments |
230230
| [Google Cloud Storage](gcs.md) | ✅ Production | GCP-native deployments |
231+
| [Local Filesystem](local.md) | ⚠️ Dev/Test | Local development & testing |
231232

232233
## Quick Start
233234

0 commit comments

Comments
 (0)