Skip to content

exospherehost/claude-code-proxy

Repository files navigation

Claude Code Proxy

A self-hosted LiteLLM proxy that gives every developer on your team Claude Code access — using your existing cloud credits, without handing out API keys.

Two env vars per developer. That's the entire setup.

What This Does

  • Routes Claude Code traffic through a single proxy with weighted load balancing across Vertex AI, Bedrock, and Anthropic Direct
  • Tracks per-developer cost, token usage, and model selection in PostgreSQL
  • Enforces budget limits via virtual keys
  • Enables prompt caching automatically through session affinity
  • Provides automatic failover across providers

Architecture

Developer machines (Claude Code CLI)
        │
        ▼
  Nginx (TLS + session extraction via Lua)
        │
        ▼
  LiteLLM Proxy (routing, auth, cost tracking)
        │
        ├── Vertex AI    (weight: 10)
        ├── AWS Bedrock   (weight: 1)
        └── Anthropic     (weight: 1)

Everything runs on a single VM.

Prerequisites

  • A VM with Ubuntu (any cloud provider — AWS, GCP, DigitalOcean, etc.)
  • A domain name pointed at your VM (A record)
  • API credentials for at least one Claude provider

Quick Start

  1. Clone and configure:
git clone https://github.com/your-org/claude-code-proxy.git
cd claude-code-proxy
cp env.example .env
# Edit .env with your credentials
  1. Add your GCP credentials (if using Vertex AI):
# Place your Application Default Credentials file
cp /path/to/your/adc.json ./gcp-adc.json
  1. Deploy:
chmod +x deploy.sh
./deploy.sh your-domain.com you@example.com

This installs Docker, Nginx, provisions an SSL certificate via Let's Encrypt, and starts the stack.

  1. Create virtual keys for your developers via the LiteLLM admin dashboard at https://your-domain.com/ui.

  2. Developer setup (2 minutes per person):

echo 'export ANTHROPIC_BASE_URL=https://your-domain.com/v1' >> ~/.bashrc
echo 'export ANTHROPIC_AUTH_TOKEN=sk-...' >> ~/.bashrc
source ~/.bashrc

Done. claude works as normal.

Provider Setup

You need credentials for at least one provider. Configure all three for maximum reliability and credit utilization.

Anthropic Direct

The simplest option. Create an API key at console.anthropic.com:

  1. Sign up or log in at console.anthropic.com
  2. Go to API Keys and create a new key
  3. Add to your .env:
ANTHROPIC_API_KEY=sk-ant-your-key-here

Google Cloud (Vertex AI)

Use this to route traffic through your GCP cloud credits.

  1. Enable the Vertex AI API in your GCP project:
gcloud services enable aiplatform.googleapis.com
  1. Enable the Claude models you need. Go to Vertex AI Model Garden and enable Claude Opus, Sonnet, and/or Haiku.

  2. Create Application Default Credentials:

# Option A: User credentials (development)
gcloud auth application-default login
cp ~/.config/gcloud/application_default_credentials.json ./gcp-adc.json

# Option B: Service account (production, recommended)
gcloud iam service-accounts create litellm-proxy
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member="serviceAccount:litellm-proxy@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"
gcloud iam service-accounts keys create ./gcp-adc.json \
    --iam-account=litellm-proxy@YOUR_PROJECT_ID.iam.gserviceaccount.com
  1. Add to your .env:
VERTEX_PROJECT=your-gcp-project-id
VERTEX_LOCATION=us-east5          # Region where Claude is available

The gcp-adc.json file is mounted into the container automatically by docker-compose.yml.

AWS Bedrock

Use this to route traffic through your AWS cloud credits.

  1. Enable Claude model access in the AWS console:

    • Go to Amazon Bedrock in your preferred region
    • Navigate to Model access in the left sidebar
    • Request access to the Anthropic Claude models you need
    • Wait for access to be granted (usually instant for on-demand)
  2. Create an IAM user with Bedrock permissions:

aws iam create-user --user-name litellm-proxy

# Attach the Bedrock policy
aws iam attach-user-policy \
    --user-name litellm-proxy \
    --policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess

# Create access keys
aws iam create-access-key --user-name litellm-proxy
  1. Add to your .env:
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1              # Region where you enabled Claude

Configuring Routing Weights

The weight parameter in litellm-config.yaml controls what percentage of traffic goes to each provider. Set weights to match your available cloud credit ratio.

How Weights Work

Each model is defined three times — once per provider — under the same model_name. The router picks a provider using weighted random selection:

# Example: 10x more GCP credits than AWS or Anthropic
- model_name: claude-sonnet-4-6
  litellm_params:
    model: vertex_ai/claude-sonnet-4-6
    weight: 10                    # ~83% of traffic

- model_name: claude-sonnet-4-6
  litellm_params:
    model: bedrock/us.anthropic.claude-sonnet-4-6
    weight: 1                     # ~8% of traffic

- model_name: claude-sonnet-4-6
  litellm_params:
    model: anthropic/claude-sonnet-4-6
    weight: 1                     # ~8% of traffic

Common Ratios

Scenario Vertex Bedrock Anthropic Result
Heavy GCP credits 10 1 1 ~83% GCP, ~8% each AWS/Anthropic
Equal credits 1 1 1 ~33% each
GCP only 1 0 0 100% GCP (remove other entries)
GCP + AWS, no direct 5 5 0 50/50 (remove Anthropic entries)
Anthropic only 0 0 1 100% direct (remove other entries)

To change the ratio, edit litellm-config.yaml and restart:

sudo docker compose restart litellm

Removing a Provider

If you only have credentials for one or two providers, simply delete the model entries you don't need from litellm-config.yaml. For example, to use only Anthropic Direct, keep only the anthropic/ entries and remove all vertex_ai/ and bedrock/ entries.

Session Affinity

The Lua script (extract_session.lua) automatically extracts Claude Code's session_id from request bodies and pins sessions to the same provider for 4 hours. This enables prompt caching with zero developer configuration. Prompt caching can reduce costs by up to 90% on cached prefixes.

Files

File Purpose
deploy.sh One-command deployment (Docker, Nginx, SSL, containers)
docker-compose.yml LiteLLM + PostgreSQL service definitions
litellm-config.yaml Model routing, weights, and general settings
nginx.conf Reverse proxy with TLS and Lua session extraction
extract_session.lua Extracts session ID from request body for routing affinity
setup-claude-session.sh Optional shell wrapper for session ID injection
env.example Template for required environment variables

License

MIT

About

Self-hosted proxy that gives every developer Claude Code access using your cloud credits — no API keys required

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors