Skip to content

FootprintAI/kafeido-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafeido Python SDK

PyPI version Python Support License

The official Python SDK for Kafeido - An OpenAI-compatible AI inference API providing access to LLM, ASR, OCR, TTS, and Vision models.

Features

  • OpenAI Compatible: Drop-in replacement for OpenAI Python SDK
  • Multiple AI Models:
    • LLM: gpt-oss-20b, gpt-oss-120b
    • ASR: whisper-large-v3, whisper-turbo
    • OCR: deepseek-ocr, paddle-ocr
    • TTS: qwen3-tts, xtts-v2
    • Vision: llama-3.2-vision-11b, llama-3.2-vision-90b
  • Streaming Support: Real-time streaming for chat completions and vision chat
  • Async Job Support: Submit long-running jobs and poll for results
  • Async Support: Full async/await support for all endpoints
  • Type Safety: Comprehensive type hints and Pydantic models
  • Model Management: Status checking, warmup/prefetch, and health monitoring
  • Robust Error Handling: Detailed exception hierarchy

Installation

pip install kafeido

Optional Dependencies

# For async support with HTTP/2
pip install kafeido[async]

# For development
pip install kafeido[dev]

Quick Start

from kafeido import OpenAI

# Initialize client (API key from environment or parameter)
client = OpenAI(api_key="sk-...")

# Chat completion
response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is machine learning?"}
    ]
)
print(response.choices[0].message.content)

# Audio transcription
with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3"
    )
    print(transcript.text)

# List available models
models = client.models.list()
for model in models.data:
    print(f"- {model.id}")

Authentication

The SDK supports multiple ways to provide your API key:

Environment Variables

export KAFEIDO_API_KEY="sk-..."
# or
export OPENAI_API_KEY="sk-..."  # For OpenAI compatibility
from kafeido import OpenAI

client = OpenAI()  # Automatically uses environment variable

Direct Parameter

from kafeido import OpenAI

client = OpenAI(api_key="sk-...")

Usage Examples

Chat Completions

Basic Completion

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,
    max_tokens=100
)
print(response.choices[0].message.content)

Streaming Completion

stream = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

With System Message

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a Python expert."},
        {"role": "user", "content": "How do I read a file in Python?"}
    ]
)

Audio Transcription

Transcribe Audio File

with open("meeting.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        file=audio_file,
        model="whisper-large-v3",
        language="en",  # Optional: specify language
        response_format="verbose_json"  # Get detailed output
    )

print(f"Transcript: {transcript.text}")
print(f"Language: {transcript.language}")
print(f"Duration: {transcript.duration}s")

# Access segments with timestamps
if transcript.segments:
    for segment in transcript.segments:
        print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

Audio Translation

# Translate any language to English
with open("audio_spanish.mp3", "rb") as audio_file:
    translation = client.audio.translations.create(
        file=audio_file,
        model="whisper-large-v3"
    )

print(translation.text)  # English translation

Text-to-Speech (TTS)

import time

# Create a TTS job
job = client.audio.speech.create(
    model="qwen3-tts",
    input="Hello! Welcome to Kafeido.",
    voice="alloy",
    response_format="mp3",
)

# Poll for result
while True:
    result = client.audio.speech.get_result(job_id=job.job_id)
    if result.status == "completed":
        print(f"Download: {result.result.download_url}")
        break
    time.sleep(2)

OCR

# Sync OCR extraction
result = client.ocr.extractions.create(
    model_id="deepseek-ocr",
    file_id="file-123",
    mode="markdown",
)
print(result.text)

# With bounding boxes (grounding mode)
result = client.ocr.extractions.create(
    model_id="deepseek-ocr",
    file_id="file-123",
    mode="grounding",
)
for region in result.regions:
    print(f"'{region.text}' at ({region.x1},{region.y1})-({region.x2},{region.y2})")

Vision

# Analyze an image
result = client.vision.analyze.create(
    model_id="llama-3.2-vision-11b",
    image_url="https://example.com/photo.jpg",
    prompt="Describe this image",
)
print(result.text)

# Vision chat with streaming
for chunk in client.vision.chat.create(
    model_id="llama-3.2-vision-11b",
    messages=[{
        "role": "user",
        "content": "What is shown in this chart?",
        "images": [{"url": "https://example.com/chart.png"}],
    }],
    stream=True,
):
    if chunk.text:
        print(chunk.text, end="", flush=True)

Model Management

Model Status and Warmup

# Check model status
status = client.models.status("whisper-large-v3")
print(f"Status: {status.status.status}")

# Warmup a cold model
warmup = client.models.warmup(model="whisper-large-v3")
print(f"Already warm: {warmup.already_warm}, ETA: {warmup.estimated_seconds}s")

# Health check
health = client.health()
print(f"API status: {health.status}, version: {health.version}")

List All Models

models = client.models.list()
for model in models.data:
    print(f"{model.id} (owned by: {model.owned_by})")

Get Model Details

model = client.models.retrieve("gpt-oss-20b")
print(f"Model: {model.id}")
print(f"Created: {model.created}")

File Management

Upload Audio File

with open("large_audio.mp3", "rb") as f:
    file_obj = client.files.create(
        file=f,
        purpose="assistants"
    )

print(f"Uploaded: {file_obj.id}")
print(f"Size: {file_obj.bytes} bytes")

List Uploaded Files

files = client.files.list()
for file in files.data:
    print(f"{file.filename} - {file.created_at}")

Delete File

result = client.files.delete("file-123")
print(f"Deleted: {result.deleted}")

Async Usage

All methods have async equivalents:

import asyncio
from kafeido import AsyncOpenAI

async def main():
    async with AsyncOpenAI(api_key="sk-...") as client:
        # Chat completion
        response = await client.chat.completions.create(
            model="gpt-oss-20b",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)

        # Streaming
        stream = await client.chat.completions.create(
            model="gpt-oss-20b",
            messages=[{"role": "user", "content": "Count to 5"}],
            stream=True
        )

        async for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

Error Handling

The SDK provides a comprehensive exception hierarchy:

from kafeido import OpenAI, AuthenticationError, RateLimitError, APIError

client = OpenAI(api_key="sk-...")

try:
    response = client.chat.completions.create(
        model="gpt-oss-20b",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except APIError as e:
    print(f"API error: {e}")

Exception Types

  • OpenAIError - Base exception for all errors
  • APIError - Base for API-related errors
  • APIConnectionError - Network connectivity issues
  • APITimeoutError - Request timeout
  • APIStatusError - HTTP 4xx/5xx responses
  • AuthenticationError - Invalid API key (401)
  • PermissionDeniedError - Insufficient permissions (403)
  • NotFoundError - Resource not found (404)
  • RateLimitError - Rate limit exceeded (429)
  • InternalServerError - Server errors (5xx)

Migration from OpenAI SDK

The Kafeido SDK is designed as a drop-in replacement for the OpenAI Python SDK:

# Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key="...")

# After (Kafeido)
from kafeido import OpenAI
client = OpenAI(api_key="sk-...", base_url="https://api.kafeido.app")

All method signatures and response types are compatible.

Configuration

Base URL

# Production
client = OpenAI(
    api_key="sk-...",
    base_url="https://api.kafeido.app"
)

# Development/Self-hosted
client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:8080"
)

Timeouts and Retries

client = OpenAI(
    api_key="sk-...",
    timeout=60.0,  # 60 seconds (default: 120)
    max_retries=3   # Max retry attempts (default: 2)
)

Supported Models

Large Language Models (LLM)

  • gpt-oss-20b - 20B parameter model (recommended)
  • gpt-oss-120b - 120B parameter model (high performance)

Automatic Speech Recognition (ASR)

  • whisper-large-v3 - Latest Whisper model (recommended)
  • whisper-turbo - Faster inference

Optical Character Recognition (OCR)

  • deepseek-ocr - DeepSeek OCR model (recommended)
  • paddle-ocr - PaddleOCR model

Text-to-Speech (TTS)

  • qwen3-tts - Qwen3 TTS model (recommended)
  • xtts-v2 - XTTS-v2 model with voice cloning

Vision

  • llama-3.2-vision-11b - Llama 3.2 Vision 11B (recommended)
  • llama-3.2-vision-90b - Llama 3.2 Vision 90B (high performance)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Links

Support

For issues and questions:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors