API Documentation - Perplexity OSS

Overview

Perplexity OSS provides OpenAI-compatible REST endpoints at /v1/* that allow external applications to use this as a drop-in replacement for chat completion and search APIs.

Base URL: http://localhost:8003/v1 (adjust port/domain for your deployment)

Available Endpoints:

POST /v1/chat/completions - Chat with AI-powered search and answers
POST /v1/search - Search only (no AI answer generation)
GET /v1/models - List available models

Authentication

All API endpoints require Bearer token authentication.

Setup

Add API key(s) to your .env file:
```
API_KEYS=sk-your-secret-key-here
```
For multiple keys (comma-separated):
```
API_KEYS=sk-key-1,sk-key-2,sk-key-3
```

Usage

Include the API key in the Authorization header:

Authorization: Bearer sk-your-secret-key-here

Endpoints

POST /v1/chat/completions

Main chat completion endpoint. Supports both streaming and non-streaming responses.

Request

POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-your-secret-key-here

Body:

{
  "model": "default",
  "messages": [
    {"role": "user", "content": "What is quantum computing?"}
  ],
  "stream": false,
  "return_images": false,
  "return_related_questions": false,
  "search_domain_filter": ["arxiv.org"],
  "search_recency_filter": "week",
  "pro_search": false,
  "max_results": 6
}

Parameters:

Parameter	Type	Default	Description
`model`	string	`"default"`	Model identifier (currently ignored, uses backend default)
`messages`	array	required	Array of message objects with `role` and `content`
`stream`	boolean	`false`	Enable streaming responses (SSE)
`return_images`	boolean	`false`	Include image URLs in response
`return_related_questions`	boolean	`false`	Include related follow-up questions
`search_domain_filter`	array	`null`	Limit search to specific domains (e.g. `["reddit.com"]`)
`search_recency_filter`	string	`null`	Time range filter: `"day"`, `"week"`, `"month"`, or `"year"`
`pro_search`	boolean	`false`	Enable multi-step reasoning (requires pro mode enabled)
`max_results`	integer	`6`	Number of search results to use (1-20)

Message Roles:

system - System instructions (treated as assistant context)
user - User messages
assistant - Assistant responses

Response (Non-Streaming)

{
  "id": "chatcmpl-1234567890",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "default",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 150,
    "total_tokens": 250
  },
  "search_results": [
    {
      "title": "Quantum Computing Basics",
      "url": "https://example.com/quantum",
      "content": "Summary of the article..."
    }
  ],
  "related_questions": [
    "How does quantum entanglement work?",
    "What are quantum algorithms?",
    "What is quantum supremacy?"
  ],
  "images": [
    "https://example.com/image1.jpg"
  ]
}

Additional Fields (Perplexity OSS Extensions):

search_results - Array of search result objects (only if results found)
related_questions - Array of follow-up questions (only if return_related_questions: true)
images - Array of image URLs (only if return_images: true)

Response (Streaming)

Server-Sent Events (SSE) format with text/event-stream content type.

Event Format:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Delta Object:

First chunk includes role
Subsequent chunks include content with text fragments
Final chunk includes finish_reason: "stop"
Stream ends with data: [DONE]

POST /v1/search

Perplexity-compatible search endpoint. Returns ranked search results without AI-generated answers.

Request

POST /v1/search
Content-Type: application/json
Authorization: Bearer sk-your-secret-key-here

Body:

{
  "query": "latest AI developments 2024",
  "max_results": 10,
  "search_domain_filter": ["science.org", "arxiv.org"],
  "max_tokens_per_page": 1024,
  "country": "US"
}

Parameters:

Parameter	Type	Default	Description
`query`	string or array	required	Search query or list of queries (max 5)
`max_results`	integer	`10`	Maximum number of results to return (1-20)
`search_domain_filter`	array	`null`	Limit search to specific domains (max 20)
`max_tokens_per_page`	integer	`1024`	Accepted but not used (SearXNG limitation)
`country`	string	`null`	Accepted but not used (SearXNG limitation)

Note on Multi-Query Search:

If query is an array, only the first query is executed due to SearXNG limitations
Example: ["query 1", "query 2"] → only "query 1" is searched

Response

{
  "results": [
    {
      "title": "Understanding Artificial Intelligence",
      "url": "https://science.org/article/ai-developments",
      "snippet": "Recent advances in AI technology...",
      "date": null,
      "last_updated": null
    },
    {
      "title": "Machine Learning Breakthroughs",
      "url": "https://arxiv.org/abs/2024.12345",
      "snippet": "A comprehensive survey of ML techniques...",
      "date": null,
      "last_updated": null
    }
  ]
}

Note: date and last_updated fields are always null because SearXNG doesn't provide this information.

GET /v1/models

List available models.

Request

GET /v1/models
Authorization: Bearer sk-your-secret-key-here

Response

{
  "object": "list",
  "data": [
    {
      "id": "default",
      "object": "model",
      "created": 1234567890,
      "owned_by": "perplexity-oss"
    }
  ]
}

Examples

cURL - Basic Request

curl http://localhost:8003/v1/chat/completions \
  -H "Authorization: Bearer sk-your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [
      {"role": "user", "content": "Explain AI in simple terms"}
    ]
  }'

cURL - Streaming

curl http://localhost:8003/v1/chat/completions \
  -H "Authorization: Bearer sk-your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "stream": true
  }'

cURL - With Filters

curl http://localhost:8003/v1/chat/completions \
  -H "Authorization: Bearer sk-your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [
      {"role": "user", "content": "Latest AI research"}
    ],
    "search_recency_filter": "week",
    "search_domain_filter": ["arxiv.org", "paperswithcode.com"],
    "return_related_questions": true
  }'

cURL - Search Only

curl http://localhost:8003/v1/search \
  -H "Authorization: Bearer sk-your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quantum computing breakthrough",
    "max_results": 5,
    "search_domain_filter": ["science.org", "nature.com"]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-secret-key-here",
    base_url="http://localhost:8003/v1"
)

response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "user", "content": "What is quantum computing?"}
    ],
    extra_body={
        "search_recency_filter": "month",
        "return_related_questions": True
    }
)

print(response.choices[0].message.content)

Python (Streaming)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-secret-key-here",
    base_url="http://localhost:8003/v1"
)

stream = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "user", "content": "Explain neural networks"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

JavaScript/TypeScript

const response = await fetch('http://localhost:8003/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-your-secret-key-here',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'default',
    messages: [
      { role: 'user', content: 'What is deep learning?' }
    ],
    return_related_questions: true
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Search Features

Domain Filtering

Limit search results to specific domains using search_domain_filter:

{
  "search_domain_filter": ["reddit.com", "stackoverflow.com"]
}

This adds site:domain.com operators to the search query.

Time Range Filtering

Filter results by recency using search_recency_filter:

"day" - Last 24 hours
"week" - Last 7 days
"month" - Last 30 days
"year" - Last 365 days

{
  "search_recency_filter": "week"
}

Pro Search Mode

Enable multi-step reasoning for complex queries:

{
  "pro_search": true
}

Note: Requires NEXT_PUBLIC_PRO_MODE_ENABLED=true in backend configuration.

Pro search automatically:

Breaks down complex queries into steps
Generates targeted search queries for each step
Synthesizes information from multiple searches
Provides comprehensive answers

Error Handling

Error Response Format

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": 401
  }
}

Common Error Codes

Code	Type	Description
401	`invalid_request_error`	Missing or invalid API key
400	`invalid_request_error`	Invalid request format or parameters
500	`internal_error`	Server error during processing
503	`service_unavailable`	API keys not configured on server

Rate Limiting

Currently no rate limiting is implemented. Configure your own rate limiting if deploying to production.

CORS

CORS is enabled for all origins by default. Modify main.py to restrict origins in production:

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://your-domain.com"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Differences from OpenAI API

Supported

Standard message format with roles
Streaming and non-streaming modes
Bearer token authentication
Compatible with OpenAI Python SDK

Not Supported

Temperature, top_p, max_tokens (uses backend defaults)
Function calling / tools
Image inputs (vision)
Audio/TTS
Embeddings

Additional Features (Extensions)

Search domain filtering (search_domain_filter)
Time range filtering (search_recency_filter)
Pro search mode (pro_search)
Search results in response (search_results)
Related questions (related_questions)
Image URLs (images)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Documentation - Perplexity OSS

Overview

Authentication

Setup

Usage

Endpoints

POST /v1/chat/completions

Request

Response (Non-Streaming)

Response (Streaming)

POST /v1/search

Request

Response

GET /v1/models

Request

Response

Examples

cURL - Basic Request

cURL - Streaming

cURL - With Filters

cURL - Search Only

Python (OpenAI SDK)

Python (Streaming)

JavaScript/TypeScript

Search Features

Domain Filtering

Time Range Filtering

Pro Search Mode

Error Handling

Error Response Format

Common Error Codes

Rate Limiting

CORS

Differences from OpenAI API

Supported

Not Supported

Additional Features (Extensions)

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Documentation - Perplexity OSS

Overview

Authentication

Setup

Usage

Endpoints

POST /v1/chat/completions

Request

Response (Non-Streaming)

Response (Streaming)

POST /v1/search

Request

Response

GET /v1/models

Request

Response

Examples

cURL - Basic Request

cURL - Streaming

cURL - With Filters

cURL - Search Only

Python (OpenAI SDK)

Python (Streaming)

JavaScript/TypeScript

Search Features

Domain Filtering

Time Range Filtering

Pro Search Mode

Error Handling

Error Response Format

Common Error Codes

Rate Limiting

CORS

Differences from OpenAI API

Supported

Not Supported

Additional Features (Extensions)