The official Python SDK for Kafeido - An OpenAI-compatible AI inference API providing access to LLM, ASR, OCR, TTS, and Vision models.
- OpenAI Compatible: Drop-in replacement for OpenAI Python SDK
- Multiple AI Models:
- LLM:
gpt-oss-20b,gpt-oss-120b - ASR:
whisper-large-v3,whisper-turbo - OCR:
deepseek-ocr,paddle-ocr - TTS:
qwen3-tts,xtts-v2 - Vision:
llama-3.2-vision-11b,llama-3.2-vision-90b
- LLM:
- Streaming Support: Real-time streaming for chat completions and vision chat
- Async Job Support: Submit long-running jobs and poll for results
- Async Support: Full async/await support for all endpoints
- Type Safety: Comprehensive type hints and Pydantic models
- Model Management: Status checking, warmup/prefetch, and health monitoring
- Robust Error Handling: Detailed exception hierarchy
pip install kafeido# For async support with HTTP/2
pip install kafeido[async]
# For development
pip install kafeido[dev]from kafeido import OpenAI
# Initialize client (API key from environment or parameter)
client = OpenAI(api_key="sk-...")
# Chat completion
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
]
)
print(response.choices[0].message.content)
# Audio transcription
with open("audio.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
file=f,
model="whisper-large-v3"
)
print(transcript.text)
# List available models
models = client.models.list()
for model in models.data:
print(f"- {model.id}")The SDK supports multiple ways to provide your API key:
export KAFEIDO_API_KEY="sk-..."
# or
export OPENAI_API_KEY="sk-..." # For OpenAI compatibilityfrom kafeido import OpenAI
client = OpenAI() # Automatically uses environment variablefrom kafeido import OpenAI
client = OpenAI(api_key="sk-...")response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}],
temperature=0.7,
max_tokens=100
)
print(response.choices[0].message.content)stream = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Write a poem about AI"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "How do I read a file in Python?"}
]
)with open("meeting.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
file=audio_file,
model="whisper-large-v3",
language="en", # Optional: specify language
response_format="verbose_json" # Get detailed output
)
print(f"Transcript: {transcript.text}")
print(f"Language: {transcript.language}")
print(f"Duration: {transcript.duration}s")
# Access segments with timestamps
if transcript.segments:
for segment in transcript.segments:
print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")# Translate any language to English
with open("audio_spanish.mp3", "rb") as audio_file:
translation = client.audio.translations.create(
file=audio_file,
model="whisper-large-v3"
)
print(translation.text) # English translationimport time
# Create a TTS job
job = client.audio.speech.create(
model="qwen3-tts",
input="Hello! Welcome to Kafeido.",
voice="alloy",
response_format="mp3",
)
# Poll for result
while True:
result = client.audio.speech.get_result(job_id=job.job_id)
if result.status == "completed":
print(f"Download: {result.result.download_url}")
break
time.sleep(2)# Sync OCR extraction
result = client.ocr.extractions.create(
model_id="deepseek-ocr",
file_id="file-123",
mode="markdown",
)
print(result.text)
# With bounding boxes (grounding mode)
result = client.ocr.extractions.create(
model_id="deepseek-ocr",
file_id="file-123",
mode="grounding",
)
for region in result.regions:
print(f"'{region.text}' at ({region.x1},{region.y1})-({region.x2},{region.y2})")# Analyze an image
result = client.vision.analyze.create(
model_id="llama-3.2-vision-11b",
image_url="https://example.com/photo.jpg",
prompt="Describe this image",
)
print(result.text)
# Vision chat with streaming
for chunk in client.vision.chat.create(
model_id="llama-3.2-vision-11b",
messages=[{
"role": "user",
"content": "What is shown in this chart?",
"images": [{"url": "https://example.com/chart.png"}],
}],
stream=True,
):
if chunk.text:
print(chunk.text, end="", flush=True)# Check model status
status = client.models.status("whisper-large-v3")
print(f"Status: {status.status.status}")
# Warmup a cold model
warmup = client.models.warmup(model="whisper-large-v3")
print(f"Already warm: {warmup.already_warm}, ETA: {warmup.estimated_seconds}s")
# Health check
health = client.health()
print(f"API status: {health.status}, version: {health.version}")models = client.models.list()
for model in models.data:
print(f"{model.id} (owned by: {model.owned_by})")model = client.models.retrieve("gpt-oss-20b")
print(f"Model: {model.id}")
print(f"Created: {model.created}")with open("large_audio.mp3", "rb") as f:
file_obj = client.files.create(
file=f,
purpose="assistants"
)
print(f"Uploaded: {file_obj.id}")
print(f"Size: {file_obj.bytes} bytes")files = client.files.list()
for file in files.data:
print(f"{file.filename} - {file.created_at}")result = client.files.delete("file-123")
print(f"Deleted: {result.deleted}")All methods have async equivalents:
import asyncio
from kafeido import AsyncOpenAI
async def main():
async with AsyncOpenAI(api_key="sk-...") as client:
# Chat completion
response = await client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Streaming
stream = await client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True
)
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
asyncio.run(main())The SDK provides a comprehensive exception hierarchy:
from kafeido import OpenAI, AuthenticationError, RateLimitError, APIError
client = OpenAI(api_key="sk-...")
try:
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello"}]
)
except AuthenticationError as e:
print(f"Authentication failed: {e}")
except RateLimitError as e:
print(f"Rate limit exceeded: {e}")
except APIError as e:
print(f"API error: {e}")OpenAIError- Base exception for all errorsAPIError- Base for API-related errorsAPIConnectionError- Network connectivity issuesAPITimeoutError- Request timeoutAPIStatusError- HTTP 4xx/5xx responsesAuthenticationError- Invalid API key (401)PermissionDeniedError- Insufficient permissions (403)NotFoundError- Resource not found (404)RateLimitError- Rate limit exceeded (429)InternalServerError- Server errors (5xx)
The Kafeido SDK is designed as a drop-in replacement for the OpenAI Python SDK:
# Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key="...")
# After (Kafeido)
from kafeido import OpenAI
client = OpenAI(api_key="sk-...", base_url="https://api.kafeido.app")All method signatures and response types are compatible.
# Production
client = OpenAI(
api_key="sk-...",
base_url="https://api.kafeido.app"
)
# Development/Self-hosted
client = OpenAI(
api_key="sk-...",
base_url="http://localhost:8080"
)client = OpenAI(
api_key="sk-...",
timeout=60.0, # 60 seconds (default: 120)
max_retries=3 # Max retry attempts (default: 2)
)gpt-oss-20b- 20B parameter model (recommended)gpt-oss-120b- 120B parameter model (high performance)
whisper-large-v3- Latest Whisper model (recommended)whisper-turbo- Faster inference
deepseek-ocr- DeepSeek OCR model (recommended)paddle-ocr- PaddleOCR model
qwen3-tts- Qwen3 TTS model (recommended)xtts-v2- XTTS-v2 model with voice cloning
llama-3.2-vision-11b- Llama 3.2 Vision 11B (recommended)llama-3.2-vision-90b- Llama 3.2 Vision 90B (high performance)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Homepage: https://kafeido.app
- Documentation: https://docs.kafeido.app
- API Reference: https://docs.kafeido.app/api
- GitHub: https://github.com/footprintai/kafeido-sdk
- PyPI: https://pypi.org/project/kafeido
For issues and questions:
- GitHub Issues: https://github.com/footprintai/kafeido-sdk/issues
- Email: kafeido@footprint-ai.com