Perplexity OSS provides OpenAI-compatible REST endpoints at /v1/* that allow external applications to use this as a drop-in replacement for chat completion and search APIs.
Base URL: http://localhost:8003/v1 (adjust port/domain for your deployment)
Available Endpoints:
POST /v1/chat/completions- Chat with AI-powered search and answersPOST /v1/search- Search only (no AI answer generation)GET /v1/models- List available models
All API endpoints require Bearer token authentication.
-
Add API key(s) to your
.envfile:API_KEYS=sk-your-secret-key-here
-
For multiple keys (comma-separated):
API_KEYS=sk-key-1,sk-key-2,sk-key-3
Include the API key in the Authorization header:
Authorization: Bearer sk-your-secret-key-hereMain chat completion endpoint. Supports both streaming and non-streaming responses.
POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-your-secret-key-hereBody:
{
"model": "default",
"messages": [
{"role": "user", "content": "What is quantum computing?"}
],
"stream": false,
"return_images": false,
"return_related_questions": false,
"search_domain_filter": ["arxiv.org"],
"search_recency_filter": "week",
"pro_search": false,
"max_results": 6
}Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | "default" |
Model identifier (currently ignored, uses backend default) |
messages |
array | required | Array of message objects with role and content |
stream |
boolean | false |
Enable streaming responses (SSE) |
return_images |
boolean | false |
Include image URLs in response |
return_related_questions |
boolean | false |
Include related follow-up questions |
search_domain_filter |
array | null |
Limit search to specific domains (e.g. ["reddit.com"]) |
search_recency_filter |
string | null |
Time range filter: "day", "week", "month", or "year" |
pro_search |
boolean | false |
Enable multi-step reasoning (requires pro mode enabled) |
max_results |
integer | 6 |
Number of search results to use (1-20) |
Message Roles:
system- System instructions (treated as assistant context)user- User messagesassistant- Assistant responses
{
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"created": 1234567890,
"model": "default",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 150,
"total_tokens": 250
},
"search_results": [
{
"title": "Quantum Computing Basics",
"url": "https://example.com/quantum",
"content": "Summary of the article..."
}
],
"related_questions": [
"How does quantum entanglement work?",
"What are quantum algorithms?",
"What is quantum supremacy?"
],
"images": [
"https://example.com/image1.jpg"
]
}Additional Fields (Perplexity OSS Extensions):
search_results- Array of search result objects (only if results found)related_questions- Array of follow-up questions (only ifreturn_related_questions: true)images- Array of image URLs (only ifreturn_images: true)
Server-Sent Events (SSE) format with text/event-stream content type.
Event Format:
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":123,"model":"default","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Delta Object:
- First chunk includes
role - Subsequent chunks include
contentwith text fragments - Final chunk includes
finish_reason: "stop" - Stream ends with
data: [DONE]
Perplexity-compatible search endpoint. Returns ranked search results without AI-generated answers.
POST /v1/search
Content-Type: application/json
Authorization: Bearer sk-your-secret-key-hereBody:
{
"query": "latest AI developments 2024",
"max_results": 10,
"search_domain_filter": ["science.org", "arxiv.org"],
"max_tokens_per_page": 1024,
"country": "US"
}Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string or array | required | Search query or list of queries (max 5) |
max_results |
integer | 10 |
Maximum number of results to return (1-20) |
search_domain_filter |
array | null |
Limit search to specific domains (max 20) |
max_tokens_per_page |
integer | 1024 |
Accepted but not used (SearXNG limitation) |
country |
string | null |
Accepted but not used (SearXNG limitation) |
Note on Multi-Query Search:
- If
queryis an array, only the first query is executed due to SearXNG limitations - Example:
["query 1", "query 2"]→ only "query 1" is searched
{
"results": [
{
"title": "Understanding Artificial Intelligence",
"url": "https://science.org/article/ai-developments",
"snippet": "Recent advances in AI technology...",
"date": null,
"last_updated": null
},
{
"title": "Machine Learning Breakthroughs",
"url": "https://arxiv.org/abs/2024.12345",
"snippet": "A comprehensive survey of ML techniques...",
"date": null,
"last_updated": null
}
]
}Note: date and last_updated fields are always null because SearXNG doesn't provide this information.
List available models.
GET /v1/models
Authorization: Bearer sk-your-secret-key-here{
"object": "list",
"data": [
{
"id": "default",
"object": "model",
"created": 1234567890,
"owned_by": "perplexity-oss"
}
]
}curl http://localhost:8003/v1/chat/completions \
-H "Authorization: Bearer sk-your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [
{"role": "user", "content": "Explain AI in simple terms"}
]
}'curl http://localhost:8003/v1/chat/completions \
-H "Authorization: Bearer sk-your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [
{"role": "user", "content": "What is machine learning?"}
],
"stream": true
}'curl http://localhost:8003/v1/chat/completions \
-H "Authorization: Bearer sk-your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [
{"role": "user", "content": "Latest AI research"}
],
"search_recency_filter": "week",
"search_domain_filter": ["arxiv.org", "paperswithcode.com"],
"return_related_questions": true
}'curl http://localhost:8003/v1/search \
-H "Authorization: Bearer sk-your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"query": "quantum computing breakthrough",
"max_results": 5,
"search_domain_filter": ["science.org", "nature.com"]
}'from openai import OpenAI
client = OpenAI(
api_key="sk-your-secret-key-here",
base_url="http://localhost:8003/v1"
)
response = client.chat.completions.create(
model="default",
messages=[
{"role": "user", "content": "What is quantum computing?"}
],
extra_body={
"search_recency_filter": "month",
"return_related_questions": True
}
)
print(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key="sk-your-secret-key-here",
base_url="http://localhost:8003/v1"
)
stream = client.chat.completions.create(
model="default",
messages=[
{"role": "user", "content": "Explain neural networks"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")const response = await fetch('http://localhost:8003/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-your-secret-key-here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'default',
messages: [
{ role: 'user', content: 'What is deep learning?' }
],
return_related_questions: true
})
});
const data = await response.json();
console.log(data.choices[0].message.content);Limit search results to specific domains using search_domain_filter:
{
"search_domain_filter": ["reddit.com", "stackoverflow.com"]
}This adds site:domain.com operators to the search query.
Filter results by recency using search_recency_filter:
"day"- Last 24 hours"week"- Last 7 days"month"- Last 30 days"year"- Last 365 days
{
"search_recency_filter": "week"
}Enable multi-step reasoning for complex queries:
{
"pro_search": true
}Note: Requires NEXT_PUBLIC_PRO_MODE_ENABLED=true in backend configuration.
Pro search automatically:
- Breaks down complex queries into steps
- Generates targeted search queries for each step
- Synthesizes information from multiple searches
- Provides comprehensive answers
{
"error": {
"message": "Invalid API key",
"type": "invalid_request_error",
"code": 401
}
}| Code | Type | Description |
|---|---|---|
| 401 | invalid_request_error |
Missing or invalid API key |
| 400 | invalid_request_error |
Invalid request format or parameters |
| 500 | internal_error |
Server error during processing |
| 503 | service_unavailable |
API keys not configured on server |
Currently no rate limiting is implemented. Configure your own rate limiting if deploying to production.
CORS is enabled for all origins by default. Modify main.py to restrict origins in production:
app.add_middleware(
CORSMiddleware,
allow_origins=["https://your-domain.com"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)- Standard message format with roles
- Streaming and non-streaming modes
- Bearer token authentication
- Compatible with OpenAI Python SDK
- Temperature, top_p, max_tokens (uses backend defaults)
- Function calling / tools
- Image inputs (vision)
- Audio/TTS
- Embeddings
- Search domain filtering (
search_domain_filter) - Time range filtering (
search_recency_filter) - Pro search mode (
pro_search) - Search results in response (
search_results) - Related questions (
related_questions) - Image URLs (
images)