Skip to content
/ one-api Public
forked from Laisky/one-api

The LLM API aggregation platform unifies various LLM API services into a standardized OpenAI format. It supports multi-user management, allowing each user to create as many independent tokens for accounting purposes as they need.

License

Notifications You must be signed in to change notification settings

liheji/one-api

 
 

Repository files navigation

One API

Synopsis

One‑API is a single‑endpoint gateway that lets you manage and call dozens of AI SaaS models without the headache of custom adapters. 🌐 Simply change the model_name and you can reach OpenAI, Anthropic, Gemini, Groq, DeepSeek, and many others—all through the same request format.

=== One-API Regression Matrix ===

Variant                             gpt-4o-mini  gpt-5-mini                              claude-haiku-4-5  gemini-2.5-flash  openai/gpt-oss-20b  deepseek-chat                           grok-4-fast-non-reasoning  azure-gpt-5-nano
Chat (stream=false)                 PASS 6.03s   PASS 12.39s                             PASS 7.08s        PASS 3.08s        PASS 2.96s          PASS 6.11s                              PASS 2.99s                 PASS 8.99s
Chat (stream=true)                  PASS 5.70s   PASS 10.19s                             PASS 3.49s        PASS 4.98s        PASS 7.22s          PASS 3.73s                              PASS 2.71s                 PASS 15.24s
Chat Tools (stream=false)           PASS 6.02s   PASS 9.29s                              PASS 4.17s        PASS 3.19s        PASS 1.78s          PASS 5.16s                              PASS 3.03s                 PASS 15.86s
Chat Tools (stream=true)            PASS 4.82s   PASS 5.52s                              PASS 4.71s        PASS 6.72s        PASS 0.61s          PASS 6.22s                              PASS 4.16s                 PASS 14.32s
Chat Structured (stream=false)      PASS 6.02s   PASS 10.66s                             PASS 6.47s        PASS 6.71s        PASS 7.63s          PASS 4.65s                              PASS 1.51s                 PASS 15.48s
Chat Structured (stream=true)       PASS 2.21s   PASS 10.34s                             PASS 4.18s        PASS 4.98s        PASS 1.20s          PASS 6.46s                              PASS 2.55s                 PASS 15.21s
Response (stream=false)             PASS 5.78s   PASS 17.29s                             PASS 10.73s       PASS 5.62s        PASS 5.60s          PASS 5.97s                              PASS 5.07s                 PASS 18.06s
Response (stream=true)              PASS 4.26s   PASS 15.69s                             PASS 5.50s        PASS 5.12s        PASS 3.54s          PASS 5.48s                              PASS 3.76s                 PASS 17.50s
Response Vision (stream=false)      PASS 5.17s   PASS 11.08s                             PASS 9.90s        PASS 5.10s        SKIP                SKIP                                    PASS 5.52s                 PASS 30.63s
Response Vision (stream=true)       PASS 7.62s   PASS 8.67s                              PASS 4.27s        PASS 4.98s        SKIP                SKIP                                    PASS 3.07s                 PASS 42.85s
Response Tools (stream=false)       PASS 6.03s   PASS 8.21s                              PASS 4.16s        PASS 7.69s        PASS 2.61s          PASS 4.28s                              PASS 3.52s                 PASS 8.67s
Response Tools (stream=true)        PASS 8.13s   PASS 6.76s                              PASS 4.70s        PASS 2.53s        PASS 3.73s          PASS 5.20s                              PASS 2.70s                 PASS 11.90s
Response Structured (stream=false)  PASS 6.50s   PASS 11.52s                             PASS 4.32s        PASS 8.05s        PASS 6.02s          FAIL status 400 Bad Request: {"error"…  PASS 2.76s                 PASS 23.83s
Response Structured (stream=true)   PASS 4.25s   PASS 6.12s                              PASS 5.14s        PASS 4.97s        PASS 3.60s          FAIL status 400 Bad Request: {"error"…  PASS 1.74s                 PASS 26.90s
Claude (stream=false)               PASS 3.70s   PASS 7.73s                              PASS 4.16s        PASS 8.12s        PASS 4.20s          PASS 3.56s                              PASS 6.54s                 PASS 6.79s
Claude (stream=true)                PASS 3.67s   PASS 9.14s                              PASS 7.58s        PASS 4.26s        PASS 4.18s          PASS 4.68s                              PASS 3.71s                 PASS 9.45s
Claude Tools (stream=false)         PASS 5.20s   PASS 12.11s                             PASS 7.30s        PASS 1.85s        PASS 5.79s          PASS 6.70s                              PASS 1.76s                 PASS 9.87s
Claude Tools (stream=true)          PASS 2.50s   PASS 9.40s                              PASS 9.78s        PASS 7.23s        PASS 2.23s          PASS 4.74s                              PASS 2.65s                 PASS 8.85s
Claude Structured (stream=false)    PASS 5.19s   PASS 17.93s                             PASS 3.50s        PASS 6.09s        PASS 4.75s          PASS 5.64s                              PASS 5.13s                 FAIL structured output fields missing
Claude Structured (stream=true)     PASS 1.53s   FAIL stream missing structured output…  PASS 3.50s        PASS 5.22s        PASS 1.85s          PASS 3.93s                              PASS 1.49s                 FAIL stream missing structured output…

Totals  | Requests: 160 | Passed: 151 | Failed: 5 | Skipped: 4

Failures:
- azure-gpt-5-nano · Claude Structured (stream=false) → structured output fields missing
- azure-gpt-5-nano · Claude Structured (stream=true) → stream missing structured output fields
- deepseek-chat · Response Structured (stream=false) → status 400 Bad Request: {"error":{"message":"This response_format type is unavailable now (request id: 202510240239395607035417250974)","type":"invalid_request_error","param":"","code":"invalid_reques…
- deepseek-chat · Response Structured (stream=true) → status 400 Bad Request: {"error":{"message":"This response_format type is unavailable now (request id: 202510240239394051289902547661)","type":"invalid_request_error","param":"","code":"invalid_reques…
- gpt-5-mini · Claude Structured (stream=true) → stream missing structured output fields

Skipped (unsupported combinations):
- deepseek-chat · Response Vision (stream=false) → vision input unsupported by model deepseek-chat
- deepseek-chat · Response Vision (stream=true) → vision input unsupported by model deepseek-chat
- openai/gpt-oss-20b · Response Vision (stream=false) → vision input unsupported by model openai/gpt-oss-20b
- openai/gpt-oss-20b · Response Vision (stream=true) → vision input unsupported by model openai/gpt-oss-20b

Why this fork exists

The original author stopped maintaining the project, leaving critical PRs and new features unaddressed. As a long‑time contributor, I’ve forked the repository and rebuilt the core to keep the ecosystem alive and evolving.

What’s new

  • 🔧 Complete billing overhaul – per‑channel pricing for the same model, discounted rates for cached inputs, and transparent usage reports.
  • 🖥️ Refreshed UI/UX – a fully rewritten front‑end that makes tenant, quota, and cost management a breeze.
  • 🔀 Transparent API conversion – send a request in ChatCompletion, Response, or Claude Messages format and One‑API will automatically translate it to the target provider’s native schema.
  • 🔄 Drop‑in database compatibility – the original One‑API schema is fully supported, so you can migrate without data loss or schema changes.
  • 🐳 Multi‑architecture support – runs on Linux x86_64, ARM64, and Windows out of the box.

Docker images available on Docker Hub:

  • ppcelery/one-api:latest
  • ppcelery/one-api:arm64-latest

One‑API empowers developers to:

  • Centralize tenant management and quota control.
  • Route AI calls to the right model with a single endpoint.
  • Track usage and costs in real time.

Also welcome to register and use my deployed one-api gateway, which supports various mainstream models. For usage instructions, please refer to https://wiki.laisky.com/projects/gpt/pay/.

Tutorial

Docker Compose Deployment

Run one-api using docker-compose:

oneapi:
  image: ppcelery/one-api:latest
  restart: unless-stopped
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
  environment:
    # --- Session & Security ---
    # (optional) SESSION_SECRET set a fixed session secret so that user sessions won't be invalidated after server restart
    SESSION_SECRET: xxxxxxx
    # (optional) ENABLE_COOKIE_SECURE enable secure cookies, must be used with HTTPS
    ENABLE_COOKIE_SECURE: "true"
    # (optional) COOKIE_MAXAGE_HOURS sets the session cookie's max age in hours. Default is `168` (7 days); adjust to control session lifetime.
    COOKIE_MAXAGE_HOURS: 168

    # --- Core Runtime ---
    # (optional) PORT override the listening port used by the HTTP server, default is `3000`
    PORT: 3000
    # (optional) GIN_MODE set Gin runtime mode; defaults to release when unset
    GIN_MODE: release
    # (optional) SHUTDOWN_TIMEOUT_SEC controls how long to wait for graceful shutdown and drains (seconds)
    SHUTDOWN_TIMEOUT_SEC: 360
    # (optional) DEBUG enable debug mode
    DEBUG: "true"
    # (optional) DEBUG_SQL display SQL logs
    DEBUG_SQL: "true"
    # (optional) ENABLE_PROMETHEUS_METRICS expose /metrics for Prometheus scraping when true
    ENABLE_PROMETHEUS_METRICS: "true"
    # (optional) LOG_RETENTION_DAYS set log retention days; default is not to delete any logs
    LOG_RETENTION_DAYS: 7
    # (optional) TRACE_RENTATION_DAYS retain trace records for the specified number of days; default is 30 and 0 disables cleanup
    TRACE_RENTATION_DAYS: 30

    # --- Storage & Cache ---
    # (optional) SQL_DSN set SQL database connection; leave empty to use SQLite (supports mysql, postgresql, sqlite3)
    SQL_DSN: "postgres://laisky:xxxxxxx@1.2.3.4/oneapi"
    # (optional) SQLITE_PATH override SQLite file path when SQL_DSN is empty
    SQLITE_PATH: "/data/one-api.db"
    # (optional) SQL_MAX_IDLE_CONNS tune database idle connection pool size
    SQL_MAX_IDLE_CONNS: 200
    # (optional) SQL_MAX_OPEN_CONNS tune database max open connections
    SQL_MAX_OPEN_CONNS: 2000
    # (optional) SQL_MAX_LIFETIME tune database connection lifetime in seconds
    SQL_MAX_LIFETIME: 300
    # (optional) REDIS_CONN_STRING set Redis cache connection
    REDIS_CONN_STRING: redis://100.122.41.16:6379/1
    # (optional) REDIS_PASSWORD set Redis password when authentication is required
    REDIS_PASSWORD: ""
    # (optional) SYNC_FREQUENCY refresh in-memory caches every N seconds when enabled
    SYNC_FREQUENCY: 600
    # (optional) MEMORY_CACHE_ENABLED force memory cache usage even without Redis
    MEMORY_CACHE_ENABLED: "true"

    # --- Usage & Billing ---
    # (optional) ENFORCE_INCLUDE_USAGE require upstream API responses to include usage field
    ENFORCE_INCLUDE_USAGE: "true"
    # (optional) PRECONSUME_TOKEN_FOR_BACKGROUND_REQUEST reserve quota for background requests that report usage later
    PRECONSUME_TOKEN_FOR_BACKGROUND_REQUEST: 15000
    # (optional) DEFAULT_MAX_TOKEN set the default maximum number of tokens for requests, default is 2048
    DEFAULT_MAX_TOKEN: 2048
    # (optional) DEFAULT_USE_MIN_MAX_TOKENS_MODEL opt-in to the min/max token contract for supported channels
    DEFAULT_USE_MIN_MAX_TOKENS_MODEL: "false"

    # --- Rate Limiting ---
    # (optional) GLOBAL_API_RATE_LIMIT maximum API requests per IP within three minutes, default is 1000
    GLOBAL_API_RATE_LIMIT: 1000
    # (optional) GLOBAL_WEB_RATE_LIMIT maximum web page requests per IP within three minutes, default is 1000
    GLOBAL_WEB_RATE_LIMIT: 1000
    # (optional) GLOBAL_RELAY_RATE_LIMIT /v1 API ratelimit for each token
    GLOBAL_RELAY_RATE_LIMIT: 1000
    # (optional) GLOBAL_CHANNEL_RATE_LIMIT whether to ratelimit per channel; 0 is unlimited, 1 enables rate limiting
    GLOBAL_CHANNEL_RATE_LIMIT: 1
    # (optional) CRITICAL_RATE_LIMIT tighten rate limits for admin-only APIs (seconds window matches defaults)
    CRITICAL_RATE_LIMIT: 20

    # --- Channel Automation ---
    # (optional) CHANNEL_SUSPEND_SECONDS_FOR_429 set the suspension duration (seconds) after receiving a 429 error, default is 60 seconds
    CHANNEL_SUSPEND_SECONDS_FOR_429: 60
    # (optional) CHANNEL_TEST_FREQUENCY run automatic channel health checks every N seconds (0 disables)
    CHANNEL_TEST_FREQUENCY: 0
    # (optional) BATCH_UPDATE_ENABLED enable background batch quota updater
    BATCH_UPDATE_ENABLED: "false"
    # (optional) BATCH_UPDATE_INTERVAL batch quota flush interval in seconds
    BATCH_UPDATE_INTERVAL: 5

    # --- Frontend & Proxies ---
    # (optional) FRONTEND_BASE_URL redirect page requests to specified address, server-side setting only
    FRONTEND_BASE_URL: https://oneapi.laisky.com
    # (optional) RELAY_PROXY forward upstream model calls through an HTTP proxy
    RELAY_PROXY: ""
    # (optional) USER_CONTENT_REQUEST_PROXY proxy for fetching user-provided assets
    USER_CONTENT_REQUEST_PROXY: ""
    # (optional) USER_CONTENT_REQUEST_TIMEOUT timeout (seconds) for fetching user assets
    USER_CONTENT_REQUEST_TIMEOUT: 30

    # --- Media & Pagination ---
    # (optional) MAX_ITEMS_PER_PAGE maximum items per page, default is 100
    MAX_ITEMS_PER_PAGE: 100
    # (optional) MAX_INLINE_IMAGE_SIZE_MB set the maximum allowed image size (in MB) for inlining images as base64, default is 30
    MAX_INLINE_IMAGE_SIZE_MB: 30

    # --- Integrations ---
    # (optional) OPENROUTER_PROVIDER_SORT set sorting method for OpenRouter Providers, default is throughput
    OPENROUTER_PROVIDER_SORT: throughput
    # (optional) LOG_PUSH_API set the API address for pushing error logs to external services
    LOG_PUSH_API: "https://gq.laisky.com/query/"
    LOG_PUSH_TYPE: "oneapi"
    LOG_PUSH_TOKEN: "xxxxxxx"

  volumes:
    - /var/lib/oneapi:/data
  ports:
    - 3000:3000

The initial default account and password are root / 123456. Listening port can be configured via the PORT environment variable, default is 3000.

Tip

For production environments, consider using proper secret management solutions instead of hardcoding sensitive values in environment variables.

Kubernetes Deployment

The Kubernetes deployment guide has been moved into a dedicated document:

Contributors

New Features

Universal Features

Support update user's remained quota

You can update the used quota using the API key of any token, allowing other consumption to be aggregated into the one-api for centralized management.

Get request's cost

Each chat completion request will include a X-Oneapi-Request-Id in the returned headers. You can use this request id to request GET /api/cost/request/:request_id to get the cost of this request.

The returned structure is:

type UserRequestCost struct {
  Id          int     `json:"id"`
  CreatedTime int64   `json:"created_time" gorm:"bigint"`
  UserID      int     `json:"user_id"`
  RequestID   string  `json:"request_id"`
  Quota       int64   `json:"quota"`
  CostUSD     float64 `json:"cost_usd" gorm:"-"`
}

Support Tracing info in logs

Support Cached Input

Now supports cached input, which can significantly reduce the cost.

Support Anthropic Prompt caching

Automatically Enable Thinking and Customize Reasoning Format via URL Parameters

Supports two URL parameters: thinking and reasoning_format.

  • thinking: Whether to enable thinking mode, disabled by default.
  • reasoning_format: Specifies the format of the returned reasoning.
    • reasoning_content: DeepSeek official API format, returned in the reasoning_content field.
    • reasoning: OpenRouter format, returned in the reasoning field.
    • thinking: Claude format, returned in the thinking field.
Reasoning Format - reasoning-content

Reasoning Format - reasoning

Reasoning Format - thinking

OpenAI Features

(Merged) Support gpt-vision

Support openai images edits

Support OpenAI o1/o1-mini/o1-preview

Support gpt-4o-audio

Support OpenAI web search models

support gpt-4o-search-preview & gpt-4o-mini-search-preview

Support gpt-image-1's image generation & edits

Support o3-mini & o3 & o4-mini & gpt-4.1 & o3-pro & reasoning content

Support OpenAI Response API

Partially supported, still in development.

Support gpt-5 family

gpt-5-chat-latest / gpt-5 / gpt-5-mini / gpt-5-nano / gpt-5-codex / gpt-5-pro

Support o3-deep-research & o4-mini-deep-research

Support Codex Cli

# vi $HOME/.codex/config.toml

model = "gemini-2.5-flash"
model_provider = "laisky"

[model_providers.laisky]
# Name of the provider that will be displayed in the Codex UI.
name = "Laisky"
# The path `/chat/completions` will be amended to this URL to make the POST
# request for the chat completions.
base_url = "https://oneapi.laisky.com/v1"
# If `env_key` is set, identifies an environment variable that must be set when
# using Codex with this provider. The value of the environment variable must be
# non-empty and will be used in the `Bearer TOKEN` HTTP header for the POST request.
env_key = "sk-xxxxxxx"
# Valid values for wire_api are "chat" and "responses". Defaults to "chat" if omitted.
wire_api = "responses"
# If necessary, extra query params that need to be added to the URL.
# See the Azure example below.
query_params = {}

Anthropic (Claude) Features

(Merged) Support aws claude

Support claude-3-7-sonnet & thinking

By default, the thinking mode is not enabled. You need to manually pass the thinking field in the request body to enable it.

Stream

Non-Stream

Support /v1/messages Claude Messages API

Support Claude Code
export ANTHROPIC_MODEL="openai/gpt-oss-120b"
export ANTHROPIC_BASE_URL="https://oneapi.laisky.com/"
export ANTHROPIC_AUTH_TOKEN="sk-xxxxxxx"

You can use any model you like for Claude Code, even if the model doesn’t natively support the Claude Messages API.

Support Claude 4.x Models

claude-opus-4-0 / claude-opus-4-1 / claude-sonnet-4-0 / claude-sonnet-4-5 / claude-haiku-4-5

Google (Gemini & Vertex) Features

Support gemini-2.0-flash-exp

Support gemini-2.0-flash

Support gemini-2.0-flash-thinking-exp-01-21

Support Vertex Imagen3

Support gemini multimodal output #2197

Support gemini-2.5-pro

Support GCP Vertex gloabl region and gemini-2.5-pro-preview-06-05

Support gemini-2.5-flash-image-preview & imagen-4 series

OpenCode Support

OpenCode logo

opencode.ai is an AI coding agent built for the terminal. OpenCode is fully open source, giving you control and freedom to use any provider, any model, and any editor. It's available as both a CLI and TUI.

One‑API integrates seamlessly with OpenCode: you can connect any One‑API endpoint and use all your unified models through OpenCode's interface (both CLI and TUI).

To get started, create or edit ~/.config/opencode/opencode.json like this:

Using OpenAI SDK:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "one-api": {
      "npm": "@ai-sdk/openai",
      "name": "One API",
      "options": {
        "baseURL": "https://oneapi.laisky.com/v1",
        "apiKey": "<ONEAPI_TOKEN_KEY>"
      },
      "models": {
        "gpt-4.1-2025-04-14": {
          "name": "GPT 4.1"
        }
      }
    }
  }
}

Using Anthropic SDK:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "one-api-anthropic": {
      "npm": "@ai-sdk/anthropic",
      "name": "One API (Anthropic)",
      "options": {
        "baseURL": "https://oneapi.laisky.com/v1",
        "apiKey": "<ONEAPI_TOKEN_KEY>"
      },
      "models": {
        "claude-sonnet-4-5": {
          "name": "Claude Sonnet 4.5"
        }
      }
    }
  }
}

AWS Features

Support AWS cross-region inferences

Support AWS BedRock Inference Profile

Replicate Features

Support replicate flux & remix

Support replicate chat models

DeepSeek Features

Support deepseek-reasoner

OpenRouter Features

Support OpenRouter's reasoning content

By default, the thinking mode is automatically enabled for the deepseek-r1 model, and the response is returned in the open-router format.

Coze Features

Support coze oauth authentication

XAI / Grok Features

Support XAI/Grok Text & Image Models

Black Forest Labs Features

Support black-forest-labs/flux-kontext-pro

Bug Fixes & Enterprise-Grade Improvements (Including Security Enhancements)

Note

For additional enterprise-grade improvements, including security enhancements (e.g., vulnerability fixes), you can also view these pull requests here.

About

The LLM API aggregation platform unifies various LLM API services into a standardized OpenAI format. It supports multi-user management, allowing each user to create as many independent tokens for accounting purposes as they need.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 57.2%
  • JavaScript 23.2%
  • TypeScript 15.9%
  • CSS 3.2%
  • Shell 0.2%
  • SCSS 0.1%
  • Other 0.2%