A self-hosted LiteLLM proxy that gives every developer on your team Claude Code access — using your existing cloud credits, without handing out API keys.
Two env vars per developer. That's the entire setup.
- Routes Claude Code traffic through a single proxy with weighted load balancing across Vertex AI, Bedrock, and Anthropic Direct
- Tracks per-developer cost, token usage, and model selection in PostgreSQL
- Enforces budget limits via virtual keys
- Enables prompt caching automatically through session affinity
- Provides automatic failover across providers
Developer machines (Claude Code CLI)
│
▼
Nginx (TLS + session extraction via Lua)
│
▼
LiteLLM Proxy (routing, auth, cost tracking)
│
├── Vertex AI (weight: 10)
├── AWS Bedrock (weight: 1)
└── Anthropic (weight: 1)
Everything runs on a single VM.
- A VM with Ubuntu (any cloud provider — AWS, GCP, DigitalOcean, etc.)
- A domain name pointed at your VM (A record)
- API credentials for at least one Claude provider
- Clone and configure:
git clone https://github.com/your-org/claude-code-proxy.git
cd claude-code-proxy
cp env.example .env
# Edit .env with your credentials- Add your GCP credentials (if using Vertex AI):
# Place your Application Default Credentials file
cp /path/to/your/adc.json ./gcp-adc.json- Deploy:
chmod +x deploy.sh
./deploy.sh your-domain.com you@example.comThis installs Docker, Nginx, provisions an SSL certificate via Let's Encrypt, and starts the stack.
-
Create virtual keys for your developers via the LiteLLM admin dashboard at
https://your-domain.com/ui. -
Developer setup (2 minutes per person):
echo 'export ANTHROPIC_BASE_URL=https://your-domain.com/v1' >> ~/.bashrc
echo 'export ANTHROPIC_AUTH_TOKEN=sk-...' >> ~/.bashrc
source ~/.bashrcDone. claude works as normal.
You need credentials for at least one provider. Configure all three for maximum reliability and credit utilization.
The simplest option. Create an API key at console.anthropic.com:
- Sign up or log in at console.anthropic.com
- Go to API Keys and create a new key
- Add to your
.env:
ANTHROPIC_API_KEY=sk-ant-your-key-hereUse this to route traffic through your GCP cloud credits.
- Enable the Vertex AI API in your GCP project:
gcloud services enable aiplatform.googleapis.com-
Enable the Claude models you need. Go to Vertex AI Model Garden and enable Claude Opus, Sonnet, and/or Haiku.
-
Create Application Default Credentials:
# Option A: User credentials (development)
gcloud auth application-default login
cp ~/.config/gcloud/application_default_credentials.json ./gcp-adc.json
# Option B: Service account (production, recommended)
gcloud iam service-accounts create litellm-proxy
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:litellm-proxy@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud iam service-accounts keys create ./gcp-adc.json \
--iam-account=litellm-proxy@YOUR_PROJECT_ID.iam.gserviceaccount.com- Add to your
.env:
VERTEX_PROJECT=your-gcp-project-id
VERTEX_LOCATION=us-east5 # Region where Claude is availableThe gcp-adc.json file is mounted into the container automatically by docker-compose.yml.
Use this to route traffic through your AWS cloud credits.
-
Enable Claude model access in the AWS console:
- Go to Amazon Bedrock in your preferred region
- Navigate to Model access in the left sidebar
- Request access to the Anthropic Claude models you need
- Wait for access to be granted (usually instant for on-demand)
-
Create an IAM user with Bedrock permissions:
aws iam create-user --user-name litellm-proxy
# Attach the Bedrock policy
aws iam attach-user-policy \
--user-name litellm-proxy \
--policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess
# Create access keys
aws iam create-access-key --user-name litellm-proxy- Add to your
.env:
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1 # Region where you enabled ClaudeThe weight parameter in litellm-config.yaml controls what percentage of traffic goes to each provider. Set weights to match your available cloud credit ratio.
Each model is defined three times — once per provider — under the same model_name. The router picks a provider using weighted random selection:
# Example: 10x more GCP credits than AWS or Anthropic
- model_name: claude-sonnet-4-6
litellm_params:
model: vertex_ai/claude-sonnet-4-6
weight: 10 # ~83% of traffic
- model_name: claude-sonnet-4-6
litellm_params:
model: bedrock/us.anthropic.claude-sonnet-4-6
weight: 1 # ~8% of traffic
- model_name: claude-sonnet-4-6
litellm_params:
model: anthropic/claude-sonnet-4-6
weight: 1 # ~8% of traffic| Scenario | Vertex | Bedrock | Anthropic | Result |
|---|---|---|---|---|
| Heavy GCP credits | 10 | 1 | 1 | ~83% GCP, ~8% each AWS/Anthropic |
| Equal credits | 1 | 1 | 1 | ~33% each |
| GCP only | 1 | 0 | 0 | 100% GCP (remove other entries) |
| GCP + AWS, no direct | 5 | 5 | 0 | 50/50 (remove Anthropic entries) |
| Anthropic only | 0 | 0 | 1 | 100% direct (remove other entries) |
To change the ratio, edit litellm-config.yaml and restart:
sudo docker compose restart litellmIf you only have credentials for one or two providers, simply delete the model entries you don't need from litellm-config.yaml. For example, to use only Anthropic Direct, keep only the anthropic/ entries and remove all vertex_ai/ and bedrock/ entries.
The Lua script (extract_session.lua) automatically extracts Claude Code's session_id from request bodies and pins sessions to the same provider for 4 hours. This enables prompt caching with zero developer configuration. Prompt caching can reduce costs by up to 90% on cached prefixes.
| File | Purpose |
|---|---|
deploy.sh |
One-command deployment (Docker, Nginx, SSL, containers) |
docker-compose.yml |
LiteLLM + PostgreSQL service definitions |
litellm-config.yaml |
Model routing, weights, and general settings |
nginx.conf |
Reverse proxy with TLS and Lua session extraction |
extract_session.lua |
Extracts session ID from request body for routing affinity |
setup-claude-session.sh |
Optional shell wrapper for session ID injection |
env.example |
Template for required environment variables |
MIT