Kafeido Python SDK

The official Python SDK for Kafeido - An OpenAI-compatible AI inference API providing access to LLM, ASR, OCR, TTS, and Vision models.

Features

OpenAI Compatible: Drop-in replacement for OpenAI Python SDK
Multiple AI Models:
- LLM: gpt-oss-20b, gpt-oss-120b
- ASR: whisper-large-v3, whisper-turbo
- OCR: deepseek-ocr, paddle-ocr
- TTS: qwen3-tts, xtts-v2
- Vision: llama-3.2-vision-11b, llama-3.2-vision-90b
Streaming Support: Real-time streaming for chat completions and vision chat
Async Job Support: Submit long-running jobs and poll for results
Async Support: Full async/await support for all endpoints
Type Safety: Comprehensive type hints and Pydantic models
Model Management: Status checking, warmup/prefetch, and health monitoring
Robust Error Handling: Detailed exception hierarchy

Installation

pip install kafeido

Optional Dependencies

# For async support with HTTP/2
pip install kafeido[async]

# For development
pip install kafeido[dev]

Quick Start

from kafeido import OpenAI

# Initialize client (API key from environment or parameter)
client = OpenAI(api_key="sk-...")

# Chat completion
response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is machine learning?"}
    ]
)
print(response.choices[0].message.content)

# Audio transcription
with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3"
    )
    print(transcript.text)

# List available models
models = client.models.list()
for model in models.data:
    print(f"- {model.id}")

Authentication

The SDK supports multiple ways to provide your API key:

Environment Variables

export KAFEIDO_API_KEY="sk-..."
# or
export OPENAI_API_KEY="sk-..."  # For OpenAI compatibility

from kafeido import OpenAI

client = OpenAI()  # Automatically uses environment variable

Direct Parameter

from kafeido import OpenAI

client = OpenAI(api_key="sk-...")

Usage Examples

Chat Completions

Basic Completion

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,
    max_tokens=100
)
print(response.choices[0].message.content)

Streaming Completion

stream = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

With System Message

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a Python expert."},
        {"role": "user", "content": "How do I read a file in Python?"}
    ]
)

Audio Transcription

Transcribe Audio File

with open("meeting.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        file=audio_file,
        model="whisper-large-v3",
        language="en",  # Optional: specify language
        response_format="verbose_json"  # Get detailed output
    )

print(f"Transcript: {transcript.text}")
print(f"Language: {transcript.language}")
print(f"Duration: {transcript.duration}s")

# Access segments with timestamps
if transcript.segments:
    for segment in transcript.segments:
        print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

Audio Translation

# Translate any language to English
with open("audio_spanish.mp3", "rb") as audio_file:
    translation = client.audio.translations.create(
        file=audio_file,
        model="whisper-large-v3"
    )

print(translation.text)  # English translation

Text-to-Speech (TTS)

import time

# Create a TTS job
job = client.audio.speech.create(
    model="qwen3-tts",
    input="Hello! Welcome to Kafeido.",
    voice="alloy",
    response_format="mp3",
)

# Poll for result
while True:
    result = client.audio.speech.get_result(job_id=job.job_id)
    if result.status == "completed":
        print(f"Download: {result.result.download_url}")
        break
    time.sleep(2)

OCR

# Sync OCR extraction
result = client.ocr.extractions.create(
    model_id="deepseek-ocr",
    file_id="file-123",
    mode="markdown",
)
print(result.text)

# With bounding boxes (grounding mode)
result = client.ocr.extractions.create(
    model_id="deepseek-ocr",
    file_id="file-123",
    mode="grounding",
)
for region in result.regions:
    print(f"'{region.text}' at ({region.x1},{region.y1})-({region.x2},{region.y2})")

Vision

# Analyze an image
result = client.vision.analyze.create(
    model_id="llama-3.2-vision-11b",
    image_url="https://example.com/photo.jpg",
    prompt="Describe this image",
)
print(result.text)

# Vision chat with streaming
for chunk in client.vision.chat.create(
    model_id="llama-3.2-vision-11b",
    messages=[{
        "role": "user",
        "content": "What is shown in this chart?",
        "images": [{"url": "https://example.com/chart.png"}],
    }],
    stream=True,
):
    if chunk.text:
        print(chunk.text, end="", flush=True)

Model Management

Model Status and Warmup

# Check model status
status = client.models.status("whisper-large-v3")
print(f"Status: {status.status.status}")

# Warmup a cold model
warmup = client.models.warmup(model="whisper-large-v3")
print(f"Already warm: {warmup.already_warm}, ETA: {warmup.estimated_seconds}s")

# Health check
health = client.health()
print(f"API status: {health.status}, version: {health.version}")

List All Models

models = client.models.list()
for model in models.data:
    print(f"{model.id} (owned by: {model.owned_by})")

Get Model Details

model = client.models.retrieve("gpt-oss-20b")
print(f"Model: {model.id}")
print(f"Created: {model.created}")

File Management

Upload Audio File

with open("large_audio.mp3", "rb") as f:
    file_obj = client.files.create(
        file=f,
        purpose="assistants"
    )

print(f"Uploaded: {file_obj.id}")
print(f"Size: {file_obj.bytes} bytes")

List Uploaded Files

files = client.files.list()
for file in files.data:
    print(f"{file.filename} - {file.created_at}")

Delete File

result = client.files.delete("file-123")
print(f"Deleted: {result.deleted}")

Async Usage

All methods have async equivalents:

import asyncio
from kafeido import AsyncOpenAI

async def main():
    async with AsyncOpenAI(api_key="sk-...") as client:
        # Chat completion
        response = await client.chat.completions.create(
            model="gpt-oss-20b",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)

        # Streaming
        stream = await client.chat.completions.create(
            model="gpt-oss-20b",
            messages=[{"role": "user", "content": "Count to 5"}],
            stream=True
        )

        async for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

Error Handling

The SDK provides a comprehensive exception hierarchy:

from kafeido import OpenAI, AuthenticationError, RateLimitError, APIError

client = OpenAI(api_key="sk-...")

try:
    response = client.chat.completions.create(
        model="gpt-oss-20b",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except APIError as e:
    print(f"API error: {e}")

Exception Types

OpenAIError - Base exception for all errors
APIError - Base for API-related errors
APIConnectionError - Network connectivity issues
APITimeoutError - Request timeout
APIStatusError - HTTP 4xx/5xx responses
AuthenticationError - Invalid API key (401)
PermissionDeniedError - Insufficient permissions (403)
NotFoundError - Resource not found (404)
RateLimitError - Rate limit exceeded (429)
InternalServerError - Server errors (5xx)

Migration from OpenAI SDK

The Kafeido SDK is designed as a drop-in replacement for the OpenAI Python SDK:

# Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key="...")

# After (Kafeido)
from kafeido import OpenAI
client = OpenAI(api_key="sk-...", base_url="https://api.kafeido.app")

All method signatures and response types are compatible.

Configuration

Base URL

# Production
client = OpenAI(
    api_key="sk-...",
    base_url="https://api.kafeido.app"
)

# Development/Self-hosted
client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:8080"
)

Timeouts and Retries

client = OpenAI(
    api_key="sk-...",
    timeout=60.0,  # 60 seconds (default: 120)
    max_retries=3   # Max retry attempts (default: 2)
)

Supported Models

Large Language Models (LLM)

gpt-oss-20b - 20B parameter model (recommended)
gpt-oss-120b - 120B parameter model (high performance)

Automatic Speech Recognition (ASR)

whisper-large-v3 - Latest Whisper model (recommended)
whisper-turbo - Faster inference

Optical Character Recognition (OCR)

deepseek-ocr - DeepSeek OCR model (recommended)
paddle-ocr - PaddleOCR model

Text-to-Speech (TTS)

qwen3-tts - Qwen3 TTS model (recommended)
xtts-v2 - XTTS-v2 model with voice cloning

Vision

llama-3.2-vision-11b - Llama 3.2 Vision 11B (recommended)
llama-3.2-vision-90b - Llama 3.2 Vision 90B (high performance)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Links

Homepage: https://kafeido.app
Documentation: https://docs.kafeido.app
API Reference: https://docs.kafeido.app/api
GitHub: https://github.com/footprintai/kafeido-sdk
PyPI: https://pypi.org/project/kafeido

Support

For issues and questions:

GitHub Issues: https://github.com/footprintai/kafeido-sdk/issues
Email: kafeido@footprint-ai.com

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
examples		examples
kafeido		kafeido
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

FootprintAI/kafeido-sdk

Folders and files

Latest commit

History

Repository files navigation

Kafeido Python SDK

Features

Installation

Optional Dependencies

Quick Start

Authentication

Environment Variables

Direct Parameter

Usage Examples

Chat Completions

Basic Completion

Streaming Completion

With System Message

Audio Transcription

Transcribe Audio File

Audio Translation

Text-to-Speech (TTS)

OCR

Vision

Model Management

Model Status and Warmup

List All Models

Get Model Details

File Management

Upload Audio File

List Uploaded Files

Delete File

Async Usage

Error Handling

Exception Types

Migration from OpenAI SDK

Configuration

Base URL

Timeouts and Retries

Supported Models

Large Language Models (LLM)

Automatic Speech Recognition (ASR)

Optical Character Recognition (OCR)

Text-to-Speech (TTS)

Vision

Contributing

License

Links

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages