diff --git a/skills/pixeltable/README.md b/skills/pixeltable/README.md
new file mode 100644
index 0000000..e92c3eb
--- /dev/null
+++ b/skills/pixeltable/README.md
@@ -0,0 +1,33 @@
+# Pixeltable
+
+Build multimodal AI applications with Pixeltable -- declarative tables replace LangChain + pandas + vector DB with one system. Automates chunking, embedding, retrieval, tool-calling agents, and 25+ AI provider integrations via computed columns that run on insert.
+
+## Triggers
+
+This skill is activated by the following keywords:
+
+- `pixeltable`
+- `multimodal`
+- `computed columns`
+- `embedding index`
+- `pxt.udf`
+- `similarity search`
+- `RAG pipeline`
+- `video frames`
+- `document chunks`
+
+## What it covers
+
+- Creating tables with multimodal column types (Image, Video, Audio, Document)
+- Computed columns that auto-execute on insert
+- Embedding indexes and similarity search
+- UDFs and query functions
+- Views with iterators (frame extraction, document chunking)
+- 25+ AI provider integrations (OpenAI, Anthropic, Gemini, etc.)
+- FastAPI serving and production patterns
+
+## Links
+
+- [Documentation](https://docs.pixeltable.com/)
+- [GitHub](https://github.com/pixeltable/pixeltable)
+- [Discussions](https://github.com/pixeltable/pixeltable/discussions)
diff --git a/skills/pixeltable/SKILL.md b/skills/pixeltable/SKILL.md
new file mode 100644
index 0000000..a7d3923
--- /dev/null
+++ b/skills/pixeltable/SKILL.md
@@ -0,0 +1,520 @@
+---
+name: pixeltable
+description: >
+ Build multimodal AI applications with Pixeltable -- declarative tables replace
+ LangChain + pandas + vector DB with one system. Automates chunking, embedding,
+ retrieval, tool-calling agents, and 25+ AI provider integrations (OpenAI,
+ Anthropic, Gemini, etc.) via computed columns that run on insert.
+triggers:
+ - pixeltable
+ - multimodal
+ - computed columns
+ - embedding index
+ - pxt.udf
+ - similarity search
+ - RAG pipeline
+ - video frames
+ - document chunks
+license: Apache-2.0
+metadata:
+ author: Pixeltable
+ version: 2.3.0
+ category: data-infrastructure
+ tags: [multimodal, ai, data, tables, embeddings, rag, udf, video, audio, images, documents, agents, tools, fastapi, declarative, computed-columns, vector-search]
+ documentation: https://docs.pixeltable.com/
+ support: https://github.com/pixeltable/pixeltable/discussions
+---
+
+## What is Pixeltable?
+
+Open-source Python library for **declarative multimodal AI data infrastructure** — tables, computed columns, embedding indexes, and 25+ AI providers in one interface.
+
+`pip install pixeltable` (Python >= 3.10) | [Docs](https://docs.pixeltable.com/) | [GitHub](https://github.com/pixeltable/pixeltable)
+
+## STOP — Do NOT Use These With Pixeltable
+
+**Pixeltable replaces the entire imperative stack you're about to reach for.** If you find yourself importing any of these, you are off-path:
+
+1. **Do NOT use LangChain / LlamaIndex / Haystack / LangGraph** — Pixeltable has built-in document chunking (`document_splitter`), embedding indexes (`add_embedding_index`), retrieval (`.similarity()`), and tool-calling (`pxt.tools()` + `invoke_tools()`). There is zero reason to add a framework on top.
+2. **Do NOT use pandas as a working data store** — Pixeltable tables ARE the store. Use `.collect().to_pandas()` only for final export/visualization, never as an intermediate.
+3. **Do NOT write `for row in ...:` loops calling AI models** — wrap the call in a computed column. Pixeltable handles batching, rate limiting, caching, and retry automatically.
+4. **Do NOT install a separate vector database** (Pinecone, Chroma, FAISS, Qdrant, Weaviate, pgvector) — call `t.add_embedding_index(col, embedding=fn)` and search with `.similarity()`. The index lives inside the table.
+5. **Do NOT write `while not done:` agent loops** — model the agent as a table where inserting a row triggers the entire computed-column chain (LLM → tool selection → tool execution → final answer) declaratively.
+
+See [anti-patterns.md](references/anti-patterns.md) for the full 15-bias reference with wrong/right code examples.
+
+---
+
+## Task Router
+
+Jump to the right section based on what you're building:
+
+| If the user wants to... | Read |
+|--------------------------|------|
+| Create tables, insert data, query | **Core Concepts** (below) and [core-api.md](references/core-api.md) |
+| Add AI-powered columns (summarize, classify, embed) | **Computed Columns** (below) and [providers.md](references/providers.md) |
+| Chunk documents, extract video frames, split audio | **Views and Iterators** (below) and [core-api.md → Views](references/core-api.md#views) |
+| Build semantic search / embedding indexes | **Embedding Indexes** (below) and [core-api.md → Embedding Indexes](references/core-api.md#embedding-indexes) |
+| Build a RAG pipeline | [workflows.md → RAG Pipeline](references/workflows.md#rag-pipeline) |
+| Build a tool-calling agent | **Tool-Calling Agent Pipeline** (below) and [workflows.md → Tool-Calling Agent](references/workflows.md#tool-calling-agent-full-production-example) |
+| Build an agent with persistent memory | [agents-memory-mcp.md](references/agents-memory-mcp.md) — chat history, knowledge bank, user scoping |
+| Use MCP tools with an agent | [agents-memory-mcp.md → Adding MCP Tools](references/agents-memory-mcp.md#adding-mcp-tools) |
+| Use `invoke_tools()` with OpenAI, Groq, Gemini, Bedrock | [agents-memory-mcp.md → Multi-Provider](references/agents-memory-mcp.md#multi-provider-invoke_tools) |
+| Build a video RAG agent (video + search + agent) | [video-rag-agents.md](references/video-rag-agents.md) — dedicated combined recipe |
+| Process video (frames, transcription, visual search) | [workflows.md → Video Analysis Pipeline](references/workflows.md#video-analysis-pipeline) |
+| Process images (classify, tag, search) | [workflows.md → Image Classification and Search](references/workflows.md#image-classification-and-search) |
+| Process audio (transcribe, summarize) | [workflows.md → Audio Transcription](references/workflows.md#audio-transcription-and-analysis) |
+| Wrangle data for ML training (label, version, export) | [ml-data-pipeline.md](references/ml-data-pipeline.md) — ingest, enrich, snapshot, PyTorch export |
+| Export to PyTorch, Parquet, or pandas | [ml-data-pipeline.md → Export for Training](references/ml-data-pipeline.md#export-for-training) |
+| Look up structured data with `retrieval_udf` | [ml-data-pipeline.md → Retrieval UDFs](references/ml-data-pipeline.md#retrieval-udfs-for-structured-data-lookup) |
+| Retry failed computed columns | **Error Handling** (below) — `recompute_columns()` |
+| Use agentic patterns (chaining, routing, parallelization, eval-optimize) | [agentic-patterns.md](references/agentic-patterns.md) — 6 patterns + 2 reasoning strategies |
+| Run batch processing (ingest, compute, export, exit) | [workflows.md → Batch Processing](references/workflows.md#batch-processing-pattern) |
+| Configure rate limits, media storage, API keys | [core-api.md → Configuration](references/core-api.md#configuration) |
+| Export to CSV, JSON, Parquet, LanceDB | [core-api.md → Export](references/core-api.md#export-csv-json-parquet-lancedb) |
+| Export to SQL databases (Postgres, Snowflake, SQLite) | [core-api.md → Export to SQL](references/core-api.md#export-to-sql-databases) |
+| Share tables across teams (`publish`, `replicate`) | [core-api.md → Data Sharing](references/core-api.md#data-sharing-and-replication) |
+| Compare multiple AI providers | [workflows.md → Multi-Provider Comparison](references/workflows.md#multi-provider-comparison) |
+| Build a FastAPI web app (hand-written endpoints) | [workflows.md → FastAPI App Pattern](references/workflows.md#fastapi-app-pattern) |
+| Serve tables/queries via FastAPIRouter (v0.6+) | [workflows.md → FastAPIRouter](references/workflows.md#fastapirouter-declarative-serving-v06) and [core-api.md → Serving](references/core-api.md#serving-fastapirouter) |
+| Serve via CLI (`pxt serve` + TOML config) | [core-api.md → pxt serve](references/core-api.md#pxt-serve-cli) |
+| Store media in Pixeltable Cloud (`pxtfs://`) | [core-api.md → Media Destinations](references/core-api.md#media-destinations-cloud-storage) |
+| Write UDFs or query functions | **UDFs** / **Query Functions** (below) and [core-api.md → UDFs](references/core-api.md#udfs) |
+| Use `pxt.tools()` and `invoke_tools()` for agents | **Tool-Calling Agent Pipeline** (below) and [core-api.md → Tools and Agents](references/core-api.md#tools-and-agents) |
+| Avoid common mistakes (wrong imports, broken schemas, serialization) | **Common Pitfalls** (below) and [core-api.md → Common Pitfalls](references/core-api.md#common-pitfalls) |
+| Understand what NOT to use with Pixeltable (LangChain, pandas, vector DBs) | [anti-patterns.md](references/anti-patterns.md) — 15 training-distribution biases with wrong/right code |
+| Look up a specific provider's import and output shape | [providers.md → Quick Reference](references/providers.md#quick-reference) |
+
+## Critical Warnings — Read Before Writing Code
+
+1. **`openai.vision` does not exist** — use `openai.chat_completions` with `image_url` content blocks
+2. **Cast to `pxt.String` before embedding** — use `.text.astype(pxt.String)` on AI function outputs before `add_embedding_index`
+3. **`if_exists='ignore'` won't fix bugs** — if a computed column has wrong logic, you must `drop_column()` then recreate; re-running is a silent no-op
+4. **Import `frame_iterator` as a function** — `from pixeltable.functions.video import frame_iterator`, NOT `from pixeltable.iterators import FrameIterator`
+5. **Use `string=` keyword in similarity** — always `t.col.similarity(string=query)`, not positional
+
+See [Common Pitfalls](#common-pitfalls) below for full details and code examples.
+
+## Starting a New Project
+
+Scaffold a complete Pixeltable project from the [Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit) in one command:
+
+```bash
+# Application templates (each builds on a structural pattern)
+uvx pixeltable-new --template knowledge-base my-kb # web UI + API
+uvx pixeltable-new --template chat-agent my-agent # web UI + API
+uvx pixeltable-new --template audio-transcription my-podcast # web UI + API
+uvx pixeltable-new --template full-stack-showcase my-sitewatch # web UI + API (complete reference app)
+uvx pixeltable-new --template video-search my-video-app # API only
+uvx pixeltable-new --template media-indexing my-pipe # API + batch
+uvx pixeltable-new --template image-dataset my-dataset # API + batch
+
+# Structural patterns (API/pipeline scaffolds)
+uvx pixeltable-new myapp # default: declarative serving pattern
+uvx pixeltable-new myapp --backend # FastAPI API scaffold (headless)
+uvx pixeltable-new myapp --batch # batch processing script with export_sql
+
+# Discovery
+uvx pixeltable-new --list # show all patterns + templates
+```
+
+Each template builds on one of the three structural patterns (serving, backend, batch), so you already know how to run and deploy it.
+
+## Core Concepts
+
+### Tables and Column Types
+
+```python
+import pixeltable as pxt
+
+pxt.create_dir('my_project', if_exists='ignore')
+
+t = pxt.create_table('my_project.documents', {
+ 'title': pxt.String,
+ 'content': pxt.String,
+ 'image': pxt.Image,
+ 'video': pxt.Video,
+ 'audio': pxt.Audio,
+ 'doc': pxt.Document,
+ 'metadata': pxt.Json,
+ 'score': pxt.Float,
+ 'count': pxt.Int,
+ 'is_active': pxt.Bool,
+ 'created_at': pxt.Timestamp,
+}, if_exists='ignore')
+```
+
+Available types: `String`, `Int`, `Float`, `Bool`, `Image`, `Video`, `Audio`, `Document`, `Json`, `Array`, `Timestamp`, `Date`, `UUID`, `Binary`. Use `pxt.Required[pxt.String]` for non-nullable.
+
+### Tables with Auto-Generated Keys
+
+Use `uuid7()` for auto-generated primary keys (recommended for production):
+
+```python
+from pixeltable.functions.uuid import uuid7
+
+t = pxt.create_table('my_project.items', {
+ 'content': pxt.String,
+ 'uuid': uuid7(), # auto-generated on insert
+ 'timestamp': pxt.Timestamp,
+}, primary_key=['uuid'], if_exists='ignore')
+```
+
+### Inserting Data
+
+```python
+t.insert([{'title': 'Doc 1', 'content': 'Hello world', 'score': 0.95}]) # list of dicts
+t.insert(title='Doc 2', content='Single row', score=0.75) # keyword syntax
+t.insert(source='path/to/data.csv') # from file
+```
+
+### Computed Columns
+
+Auto-run on insert. Chain AI providers, UDFs, or expressions:
+
+```python
+from pixeltable.functions.openai import chat_completions
+
+t.add_computed_column(
+ summary=chat_completions(
+ messages=[{'role': 'user', 'content': t.content}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore'
+)
+
+t.add_computed_column(upper_title=t.title.upper(), if_exists='ignore')
+```
+
+### Querying
+
+```python
+results = t.select(t.title, t.score).collect()
+results = t.where(t.score > 0.8).select(t.title, t.content).collect()
+results = t.order_by(t.score, asc=False).limit(10).select(t.title).collect()
+count = t.count()
+df = t.select(t.title, t.score).collect().to_pandas()
+items = list(t.select(title=t.title, score=t.score).collect().to_pydantic(MyModel))
+```
+
+### Views and Iterators
+
+Split rows into sub-rows (chunking, frame extraction, audio splitting):
+
+```python
+from pixeltable.functions.document import document_splitter
+from pixeltable.functions.video import frame_iterator
+from pixeltable.functions.string import string_splitter
+from pixeltable.functions.audio import audio_splitter
+
+# Chunk documents into 300-token pieces (requires: pip install tiktoken)
+chunks = pxt.create_view(
+ 'my_project.doc_chunks', t,
+ iterator=document_splitter(t.doc, separators='token_limit', limit=300),
+ if_exists='ignore'
+)
+
+# Extract video frames at 1 fps
+frames = pxt.create_view(
+ 'my_project.video_frames', t,
+ iterator=frame_iterator(t.video, fps=1.0),
+ if_exists='ignore'
+)
+
+# Split text into sentences
+sentences = pxt.create_view(
+ 'my_project.sentences', t,
+ iterator=string_splitter(t.content, separators='sentence'),
+ if_exists='ignore'
+)
+
+# Split audio into 30-second chunks
+audio_chunks = pxt.create_view(
+ 'my_project.audio_chunks', t,
+ iterator=audio_splitter(audio=t.audio, duration=30.0),
+ if_exists='ignore'
+)
+
+# Filtered view (no iterator needed)
+active = pxt.create_view(
+ 'my_project.active', t.where(t.is_active == True),
+ if_exists='ignore'
+)
+```
+
+### Embedding Indexes and Similarity Search
+
+```python
+from pixeltable.functions.huggingface import clip, sentence_transformer
+
+embed_fn = clip.using(model_id='openai/clip-vit-base-patch32')
+t.add_embedding_index('content', embedding=embed_fn, if_exists='ignore')
+
+# Search
+sim = t.content.similarity(string='search query')
+results = t.order_by(sim, asc=False).limit(5).select(t.title, t.content, sim).collect()
+
+# Image search with text (multimodal CLIP)
+sim = t.image.similarity(string='a photo of a cat')
+results = t.order_by(sim, asc=False).limit(5).select(t.image, sim).collect()
+```
+
+### Built-in Image and Video Functions
+
+```python
+from pixeltable.functions import image as pxt_image
+from pixeltable.functions.video import extract_audio
+
+# Image thumbnails and encoding
+t.add_computed_column(
+ thumbnail=pxt_image.b64_encode(
+ pxt_image.thumbnail(t.image, size=(320, 320))
+ ),
+ if_exists='ignore'
+)
+
+# Extract audio from video
+t.add_computed_column(
+ audio=extract_audio(t.video, format='mp3'),
+ if_exists='ignore'
+)
+```
+
+### User-Defined Functions (UDFs)
+
+```python
+@pxt.udf
+def clean_text(text: str) -> str:
+ return text.strip().lower()
+
+@pxt.udf
+def safe_length(text: str | None) -> str:
+ return 0 if text is None else len(text)
+
+t.add_computed_column(cleaned=clean_text(t.content), if_exists='ignore')
+```
+
+### Query Functions (also usable as agent tools)
+
+```python
+@pxt.query
+def search_documents(query_text: str, limit: int = 10):
+ sim = t.content.similarity(string=query_text)
+ return t.order_by(sim, asc=False).limit(limit).select(t.title, t.content, sim)
+
+results = search_documents('machine learning').collect()
+```
+
+## Tool-Calling Agent Pipeline
+
+Inserting a row triggers the entire computed column chain automatically.
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.anthropic import messages, invoke_tools
+from datetime import datetime
+
+tools = pxt.tools(web_search, search_documents) # @pxt.udf + @pxt.query
+
+@pxt.udf
+def assemble_context(question: str, tool_outputs: list | None, doc_context: list | None) -> str:
+ tool_str = str(tool_outputs) if tool_outputs else 'N/A'
+ doc_str = '\n'.join(
+ f"- {item.get('text', '')}" for item in (doc_context or []) if isinstance(item, dict)
+ ) or 'N/A'
+ return (f"QUESTION: {question}\n\n"
+ f"\n{tool_str}\n\n\n"
+ f"\n{doc_str}\n")
+
+agent = pxt.create_table('my_project.agent', {
+ 'prompt': pxt.String, 'timestamp': pxt.Timestamp,
+ 'system_prompt': pxt.String, 'max_tokens': pxt.Int, 'temperature': pxt.Float,
+}, if_exists='ignore')
+
+# LLM selects tools → execute tools → RAG retrieval → assemble → final answer
+agent.add_computed_column(initial_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': agent.prompt}]}],
+ tools=tools, tool_choice=tools.choice(required=True),
+ max_tokens=agent.max_tokens,
+ model_kwargs={'system': agent.system_prompt, 'temperature': agent.temperature},
+), if_exists='ignore')
+
+agent.add_computed_column(tool_output=invoke_tools(tools, agent.initial_response), if_exists='ignore')
+agent.add_computed_column(doc_context=search_documents(agent.prompt), if_exists='ignore')
+agent.add_computed_column(context=assemble_context(agent.prompt, agent.tool_output, agent.doc_context), if_exists='ignore')
+
+agent.add_computed_column(final_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': agent.context}]}],
+ max_tokens=agent.max_tokens,
+ model_kwargs={'system': agent.system_prompt, 'temperature': agent.temperature},
+), if_exists='ignore')
+
+agent.add_computed_column(answer=agent.final_response.content[0].text, if_exists='ignore')
+
+# Usage
+agent.insert([{'prompt': 'What is quantum computing?', 'timestamp': datetime.now(),
+ 'system_prompt': 'You are a helpful assistant.', 'max_tokens': 1024}])
+result = agent.where(agent.prompt == 'What is quantum computing?').select(agent.answer).collect()
+```
+
+## AI Provider Integrations
+
+Built-in functions for 25+ providers in `pixeltable.functions.*`:
+
+| Provider | Module | Key Functions |
+|----------|--------|---------------|
+| OpenAI | `openai` | `chat_completions` (supports multimodal/vision via messages), `embeddings`, `image_generations`, `speech`, `transcriptions` |
+| Anthropic | `anthropic` | `messages`, `invoke_tools` |
+| Gemini | `gemini` | `generate_content`, `invoke_tools` |
+| Hugging Face | `huggingface` | `clip`, `sentence_transformer`, `detr_for_object_detection` |
+| Together | `together` | `chat_completions`, `embeddings`, `image_generations` |
+| Fireworks | `fireworks` | `chat_completions`, `embeddings` |
+| Ollama | `ollama` | `chat_completions`, `embeddings` |
+| Mistral | `mistralai` | `chat_completions`, `embeddings` |
+| Groq | `groq` | `chat_completions`, `invoke_tools` |
+| DeepSeek | `deepseek` | `chat_completions` |
+| Replicate | `replicate` | `run` |
+| Voyage AI | `voyageai` | `embed` |
+| Bedrock | `bedrock` | `converse`, `invoke_tools` |
+| OpenRouter | `openrouter` | `chat_completions` |
+| Whisper | `whisper` | `transcribe` (local transcription) |
+| WhisperX | `whisperx` | `transcribe` (local, with speaker diarization) |
+| Twelve Labs | `twelvelabs` | `embed` (video understanding) |
+| Jina AI | `jina` | `embeddings`, `rerank` |
+| BFL FLUX | `bfl` | `generate`, `edit`, `expand`, `fill` (image generation/editing) |
+| RunwayML | `runwayml` | `text_to_video`, `image_to_video`, `text_to_image`, `video_to_video` |
+| fal.ai | `fal` | `run` (execute any fal.ai model) |
+| Reve | `reve` | `create`, `edit`, `remix` (image generation) |
+| Microsoft Fabric | `fabric` | `chat_completions`, `embeddings` (Azure OpenAI via Fabric) |
+| llama.cpp | `llama_cpp` | `create_chat_completion` (local GGUF models) |
+| YOLOX | `yolox` | `yolox` (object detection) |
+
+## Import/Export
+
+```python
+# From CSV / Parquet
+t = pxt.create_table('dir.from_csv', source='data.csv')
+t = pxt.create_table('dir.from_parquet', source='data.parquet')
+
+# With schema overrides (remap columns to media types)
+t = pxt.create_table('dir.data', source='data.csv',
+ schema_overrides={'image_col': pxt.Image, 'doc_col': pxt.Document})
+
+# From Hugging Face
+from pixeltable.io import import_huggingface_dataset
+import datasets
+ds = datasets.load_dataset('squad', split='train[:1000]')
+t = import_huggingface_dataset('dir.squad', ds)
+
+# From pandas
+from pixeltable.io import import_pandas
+t = import_pandas('dir.from_df', df)
+
+# Export
+from pixeltable.io import export_parquet
+export_parquet(t, 'output/')
+```
+
+## Idempotent Operations and Error Handling
+
+CRITICAL: Always use `if_exists='ignore'` on every `create_*` and `add_*` call.
+
+```python
+# Fault-tolerant inserts
+status = t.insert(rows, on_error='ignore')
+# Inspect errors
+t.where(t.summary.errortype != None).select(t.title, t.summary.errormsg).collect()
+# Retry failed columns
+t.recompute_columns(columns=['summary'], where=t.summary.errortype != None)
+```
+
+## Common Pitfalls
+
+| # | Wrong | Correct |
+|---|-------|---------|
+| 1 | `openai.vision(prompt=..., image=t.image)` | `openai.chat_completions(messages=[{'role':'user','content':[{'type':'text','text':'...'}, {'type':'image_url','image_url':{'url':t.image}}]}], model='gpt-4o-mini').choices[0].message.content` |
+| 2 | `from pixeltable.iterators import FrameIterator` | `from pixeltable.functions.video import frame_iterator` |
+| 3 | `t.add_embedding_index('transcript', ...)` on Json col | Extract `.text.astype(pxt.String)` first, then index |
+| 4 | Fix code + re-run with `if_exists='ignore'` | Must `t.drop_column('col')` then recreate — re-run is a no-op |
+| 5 | `{'type':'image', 'data': t.image}` in messages | Use `{'type':'image_url', 'image_url':{'url': t.image}}` |
+| 6 | `t.content.similarity(query)` (positional) | `t.content.similarity(string=query)` (keyword) |
+| 7 | Schema corruption (`IntegrityError`) | `pip install -U pixeltable && rm -rf ~/.pixeltable` |
+| 8 | `.collect()` or `pxt.get_table()` inside `@pxt.query` | `@pxt.query` compiles the body at decoration time with expression placeholders — don't call `.collect()`, `insert()`, or reference tables that may not exist. Use a plain `def` for imperative logic |
+| 9 | `'id': pxt.String` as primary key | PK columns must be non-nullable. Use `pxt.Required[pxt.String]` or `uuid7()` as a computed default |
+| 10 | Module-level `Table` object used in FastAPI endpoint | `Table` objects are thread-bound. Call `pxt.get_table()` inside each endpoint function, not at module level |
+
+Full examples in [core-api.md → Common Pitfalls](references/core-api.md#common-pitfalls).
+
+## Table Management
+
+```python
+pxt.list_tables()
+t = pxt.get_table('my_project.my_table')
+pxt.drop_table('my_project.my_table')
+pxt.drop_dir('my_project', force=True)
+t.describe()
+t.columns()
+
+# Snapshots (point-in-time copy)
+snap = pxt.create_snapshot('my_project.snapshot_v1', t, if_exists='ignore')
+
+# Update and delete
+t.update({'score': 1.0}, where=t.category == 'important')
+t.delete(where=t.is_active == False)
+```
+
+## Building Apps with Pixeltable
+
+- Pixeltable IS the data layer — no ORM, no SQLAlchemy
+- **Prefer `FastAPIRouter`** (v0.6+) over hand-written endpoints — `add_insert_route`, `add_query_route`, `add_delete_route` generate endpoints from tables and `@pxt.query` functions
+- Use `background=True` on `add_insert_route` for long-running inserts (returns a job handle, client polls for completion)
+- FastAPI endpoints: use `def` not `async def` (Pixeltable is synchronous)
+- Business logic in `@pxt.udf` / `@pxt.query`, not in endpoint handlers
+- Schema in one file, queries co-located with routes in each router file
+- Insert a row → entire computed column chain runs automatically
+
+```python
+from pixeltable.serving import FastAPIRouter
+import pixeltable as pxt
+
+router = FastAPIRouter(prefix="/api/data", tags=["data"])
+docs = pxt.get_table("app.documents")
+
+router.add_insert_route(docs, path="/upload", uploadfile_inputs=["document"],
+ inputs=["timestamp"], outputs=["uuid"], background=True)
+router.add_delete_route(docs, path="/delete")
+
+@pxt.query
+def list_docs():
+ return docs.select(uuid=docs.uuid, name=docs.document).order_by(docs.timestamp, asc=False)
+
+router.add_query_route(path="/list", query=list_docs, method="get")
+```
+
+Reference: [Pixeltable Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit) | [workflows.md → FastAPIRouter](references/workflows.md#fastapirouter-declarative-serving-v06) | [core-api.md → Serving](references/core-api.md#serving-fastapirouter)
+
+## Resources
+
+- [Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit) — 3 structural patterns + 7 application templates:
+ - **Patterns**: `backend/` (FastAPI + React), `batch/` (no HTTP server), `serving/` (`pxt serve` + TOML)
+ - **app.py templates** (have UI, run `python app.py`): `knowledge-base`, `chat-agent`, `audio-transcription`, `full-stack-showcase`
+ - **pxt-serve templates** (API only, run `python schema.py` then `pxt serve `): `video-search`, `media-indexing`, `image-dataset`
+ - All `app.py` templates include port auto-detection (probes upward from 8000; override with `PORT` env var)
+ - Scaffold with [`pixeltable-new`](https://github.com/pixeltable/pixeltable-new): `uvx pixeltable-new --template my-app`
+- [MCP Server](https://github.com/pixeltable/mcp-server-pixeltable-developer) — Explore Pixeltable tables via MCP
+- [LLM Docs](https://docs.pixeltable.com/llms-full.txt) — Complete documentation as plain text | [llms.txt](https://www.pixeltable.com/llms.txt)
+
+## Reference Files
+
+| File | Coverage |
+|------|----------|
+| [core-api.md](references/core-api.md) | Tables, querying, views, embeddings, UDFs, tools, **serving (FastAPIRouter)**, B-tree indexes, recompute, config, data sharing, SQL export |
+| [providers.md](references/providers.md) | Quick-reference table + full examples for all 25+ AI providers |
+| [workflows.md](references/workflows.md) | RAG, video analysis, image classification, audio, multi-provider, agent, **batch processing**, FastAPI, **FastAPIRouter**, export |
+| [video-rag-agents.md](references/video-rag-agents.md) | Video + transcript/frame retrieval + tool-calling agent |
+| [agents-memory-mcp.md](references/agents-memory-mcp.md) | Agent with persistent memory, MCP integration, multi-provider invoke_tools |
+| [ml-data-pipeline.md](references/ml-data-pipeline.md) | Ingest, enrich, version, export to PyTorch/Parquet/pandas |
+| [agentic-patterns.md](references/agentic-patterns.md) | 6 architectural patterns + 2 reasoning strategies |
+| [anti-patterns.md](references/anti-patterns.md) | 15 training-distribution biases LLMs bring; wrong/right code for each |
diff --git a/skills/pixeltable/references/agentic-patterns.md b/skills/pixeltable/references/agentic-patterns.md
new file mode 100644
index 0000000..6ba84d6
--- /dev/null
+++ b/skills/pixeltable/references/agentic-patterns.md
@@ -0,0 +1,368 @@
+# Agentic Patterns
+
+Six architectural patterns and two reasoning strategies for building AI agents with Pixeltable. Every pattern uses declarative computed columns — no async code, no orchestration framework, no loop management.
+
+**Core principle**: Your agent _is_ a table. Each step is a computed column. The engine resolves dependencies, parallelizes independent columns, caches results, and persists every intermediate step automatically.
+
+## Contents
+
+- [Prompt Chaining](#prompt-chaining) — sequential multi-step generation
+- [Routing](#routing) — classify intent, dispatch to specialized handlers
+- [Parallelization](#parallelization) — independent analyses on same input
+- [Tool Use](#tool-use) — LLM selects and calls external functions
+- [Evaluator-Optimizer](#evaluator-optimizer) — generate, judge, refine
+- [Orchestrator-Worker](#orchestrator-worker) — decompose, delegate, synthesize
+- [ReAct](#react-reasoning--acting) — reason-act-observe loop
+- [Planning](#planning) — plan upfront, then execute
+
+---
+
+## Prompt Chaining
+
+Sequential steps where each output feeds into the next.
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.openai import chat_completions
+
+chain = pxt.create_table('demo.chain', {'topic': pxt.String}, if_exists='ignore')
+
+# Step 1: Generate outline
+chain.add_computed_column(
+ outline=chat_completions(
+ messages=[{'role': 'user', 'content': 'Create a 3-point outline about: ' + chain.topic}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Step 2: Write draft from outline (depends on step 1)
+chain.add_computed_column(
+ draft=chat_completions(
+ messages=[{'role': 'user', 'content': 'Write article based on outline:\n\n' + chain.outline}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Step 3: Polish draft (depends on step 2)
+chain.add_computed_column(
+ final=chat_completions(
+ messages=[{'role': 'user', 'content': 'Edit for clarity and conciseness:\n\n' + chain.draft}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+chain.insert([{'topic': 'benefits of declarative AI pipelines'}])
+```
+
+**When to use**: Content generation, data transformation pipelines, multi-step extraction.
+
+## Routing
+
+Classify input and dispatch to specialized handlers.
+
+```python
+router = pxt.create_table('demo.router', {'query': pxt.String}, if_exists='ignore')
+
+# Classify intent
+router.add_computed_column(
+ intent=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': 'Classify into exactly one word — technical, billing, or general:\n\n' + router.query
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Route to specialized prompt
+@pxt.udf
+def route_prompt(intent: str, query: str) -> list[dict]:
+ prompts = {
+ 'technical': 'You are a senior technical support engineer.',
+ 'billing': 'You are a billing specialist. Be empathetic.',
+ 'general': 'You are a friendly customer service representative.',
+ }
+ system = prompts.get(intent.strip().lower(), prompts['general'])
+ return [{'role': 'system', 'content': system}, {'role': 'user', 'content': query}]
+
+router.add_computed_column(
+ routed_messages=route_prompt(router.intent, router.query),
+ if_exists='ignore')
+
+router.add_computed_column(
+ response=chat_completions(
+ messages=router.routed_messages, model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+```
+
+**When to use**: Customer support, multi-domain Q&A, content moderation.
+
+## Parallelization
+
+Multiple independent analyses on the same input — auto-parallelized by the engine.
+
+```python
+parallel = pxt.create_table('demo.parallel', {'text': pxt.String}, if_exists='ignore')
+
+# Three independent columns (no dependencies → run concurrently)
+parallel.add_computed_column(
+ sentiment=chat_completions(
+ messages=[{'role': 'user', 'content': 'Sentiment (positive/negative/neutral):\n\n' + parallel.text}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+parallel.add_computed_column(
+ entities=chat_completions(
+ messages=[{'role': 'user', 'content': 'Extract named entities as JSON:\n\n' + parallel.text}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+parallel.add_computed_column(
+ summary=chat_completions(
+ messages=[{'role': 'user', 'content': 'Summarize in one sentence:\n\n' + parallel.text}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+# Merge results (depends on all three → runs after they complete)
+@pxt.udf
+def merge(sentiment: str, entities: str, summary: str) -> dict:
+ return {'sentiment': sentiment.strip(), 'entities': entities.strip(), 'summary': summary.strip()}
+
+parallel.add_computed_column(
+ report=merge(parallel.sentiment, parallel.entities, parallel.summary),
+ if_exists='ignore')
+```
+
+**When to use**: Document analysis, multi-aspect evaluation, feature extraction.
+
+## Tool Use
+
+LLM chooses which tools to call; Pixeltable executes them automatically.
+
+```python
+from pixeltable.functions.openai import chat_completions, invoke_tools
+
+@pxt.udf
+def get_weather(city: str) -> str:
+ """Get current weather for a city."""
+ data = {'tokyo': 'Rainy, 65F', 'london': 'Cloudy, 58F', 'paris': 'Sunny, 72F'}
+ return data.get(city.lower(), f'No data for {city}')
+
+@pxt.udf
+def get_stock_price(symbol: str) -> str:
+ """Get current stock price."""
+ prices = {'AAPL': '$178.50', 'GOOGL': '$141.25', 'MSFT': '$378.90'}
+ return prices.get(symbol.upper(), f'No data for {symbol}')
+
+tools = pxt.tools(get_weather, get_stock_price)
+
+agent = pxt.create_table('demo.tool_agent', {'query': pxt.String}, if_exists='ignore')
+
+agent.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': agent.query}],
+ model='gpt-4o-mini', tools=tools,
+ ), if_exists='ignore')
+
+agent.add_computed_column(
+ tool_output=invoke_tools(tools, agent.response),
+ if_exists='ignore')
+
+agent.insert([
+ {'query': "What's the weather in Tokyo?"},
+ {'query': "What's Apple's stock price?"},
+])
+```
+
+**When to use**: Any agent that needs external data or actions. See also [agents-memory-mcp.md](agents-memory-mcp.md) for memory and MCP integration.
+
+## Evaluator-Optimizer
+
+Generate → judge → refine loop as three chained columns.
+
+```python
+evaluator = pxt.create_table('demo.evaluator', {'brief': pxt.String}, if_exists='ignore')
+
+# Generate first draft
+evaluator.add_computed_column(
+ draft=chat_completions(
+ messages=[{'role': 'user', 'content': 'Write a marketing tagline for:\n\n' + evaluator.brief}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+# LLM-as-judge evaluates the draft
+evaluator.add_computed_column(
+ evaluation=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': 'Rate clarity and creativity (1-10) with feedback:\n\nTagline: ' + evaluator.draft
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+# Refine based on feedback
+evaluator.add_computed_column(
+ refined=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': 'Improve based on feedback:\n\nOriginal: ' + evaluator.draft + '\n\nFeedback: ' + evaluator.evaluation
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+```
+
+**When to use**: Content quality control, code review pipelines, iterative refinement.
+
+## Orchestrator-Worker
+
+Central agent decomposes tasks, specialized worker tables handle sub-tasks.
+
+```python
+# Worker A: Summarizer (reusable table-as-UDF)
+summarizer = pxt.create_table('demo.summarizer', {'text': pxt.String}, if_exists='ignore')
+summarizer.add_computed_column(
+ summary=chat_completions(
+ messages=[{'role': 'user', 'content': 'Summarize:\n\n' + summarizer.text}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+# Worker B: Fact-checker
+checker = pxt.create_table('demo.checker', {'claim': pxt.String}, if_exists='ignore')
+checker.add_computed_column(
+ assessment=chat_completions(
+ messages=[{'role': 'user', 'content': 'Is this plausible? Reply PLAUSIBLE or DUBIOUS:\n\n' + checker.claim}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+# Wrap worker tables as callable UDFs
+summarize_fn = pxt.udf(summarizer, return_value=summarizer.summary)
+fact_check_fn = pxt.udf(checker, return_value=checker.assessment)
+
+# Orchestrator: calls workers in parallel, then synthesizes
+orchestrator = pxt.create_table('demo.orchestrator', {'article': pxt.String}, if_exists='ignore')
+orchestrator.add_computed_column(summary=summarize_fn(text=orchestrator.article), if_exists='ignore')
+orchestrator.add_computed_column(fact_check=fact_check_fn(claim=orchestrator.article), if_exists='ignore')
+
+orchestrator.add_computed_column(
+ briefing=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': 'Write editorial note:\n\nSummary: ' + orchestrator.summary + '\n\nFact-check: ' + orchestrator.fact_check
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+```
+
+**Key technique**: `pxt.udf(table, return_value=table.col)` wraps an entire table pipeline as a callable function. Workers are reusable across multiple orchestrators.
+
+**When to use**: Research assistants, report generation, multi-agent systems.
+
+## ReAct (Reasoning + Acting)
+
+Agent alternates between reasoning and acting in a loop. Each step is a row.
+
+```python
+@pxt.udf
+def lookup_population(country: str) -> str:
+ """Look up country population."""
+ populations = {'united states': '331 million', 'brazil': '214 million', 'germany': '84 million'}
+ return populations.get(country.lower(), 'Not available')
+
+react_tools = pxt.tools(lookup_population)
+
+react = pxt.create_table('demo.react', {
+ 'step': pxt.Int, 'prompt': pxt.String, 'system_prompt': pxt.String,
+}, if_exists='ignore')
+
+react.add_computed_column(
+ response=chat_completions(
+ messages=[
+ {'role': 'system', 'content': react.system_prompt},
+ {'role': 'user', 'content': react.prompt}
+ ],
+ model='gpt-4o-mini', tools=react_tools,
+ ), if_exists='ignore')
+
+react.add_computed_column(
+ answer=react.response.choices[0].message.content,
+ if_exists='ignore')
+
+react.add_computed_column(
+ tool_output=invoke_tools(react_tools, react.response),
+ if_exists='ignore')
+
+# Reasoning loop — each iteration is a new row
+SYSTEM = "Answer step by step. Use tools when needed. Say FINAL ANSWER when done."
+question = "Which has a larger population, Brazil or Germany?"
+history = []
+
+for step in range(1, 5):
+ prompt = question + ('\n\nObservations so far:\n' + '\n'.join(history) if history else '')
+ react.insert([{'step': step, 'prompt': prompt, 'system_prompt': SYSTEM}])
+
+ row = react.where(react.step == step).select(react.answer, react.tool_output).collect()[0]
+ if row['tool_output']:
+ history.append(f'Step {step}: {row["tool_output"]}')
+ if row['answer'] and 'FINAL' in row['answer'].upper():
+ break
+```
+
+**When to use**: Multi-step research, complex reasoning requiring external data.
+
+## Planning
+
+Generate a complete plan upfront, then execute all steps.
+
+```python
+import json
+
+planner = pxt.create_table('demo.planner', {'question': pxt.String}, if_exists='ignore')
+
+# Generate plan as JSON
+planner.add_computed_column(
+ plan_text=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': 'Break into 2-3 research steps. Return JSON: {"steps": ["step1", "step2"]}\n\n' + planner.question
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+# Format plan into execution prompt
+@pxt.udf
+def format_plan(plan_json: str, question: str) -> str:
+ try:
+ data = json.loads(plan_json)
+ steps = data if isinstance(data, list) else data.get('steps', [])
+ step_list = '\n'.join(f'{i+1}. {s}' for i, s in enumerate(steps))
+ except Exception:
+ step_list = '1. ' + question
+ return f'Answer each sub-question, then synthesize:\n\nOriginal: {question}\n\n{step_list}'
+
+planner.add_computed_column(
+ exec_prompt=format_plan(planner.plan_text, planner.question),
+ if_exists='ignore')
+
+planner.add_computed_column(
+ answer=chat_completions(
+ messages=[{'role': 'user', 'content': planner.exec_prompt}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+```
+
+**When to use**: Complex questions, multi-step research, structured problem solving.
+
+## Comparison with Traditional Frameworks
+
+| Concept | Pixeltable | LangChain / CrewAI / LangGraph |
+|---------|-----------|-------------------------------|
+| Pipeline step | Computed column | Function in a chain/loop |
+| Parallel execution | Independent columns (automatic) | `asyncio.gather` / explicit |
+| Persistence | Built-in — every intermediate stored | Separate logging/DB layer |
+| Caching | Automatic — same input never recomputed | Manual memoization |
+| Reusable sub-agent | `pxt.udf(table, return_value=...)` | Agent class with `.run()` |
+| Error recovery | `recompute_columns(where=errortype != None)` | Re-run entire pipeline |
+| Observability | Query any column on any row | Attach tracing callbacks |
+
+Patterns compose naturally — an orchestrator can use routing in its dispatch, tool use within workers, and ReAct reasoning inside tool loops, all without special glue code.
diff --git a/skills/pixeltable/references/agents-memory-mcp.md b/skills/pixeltable/references/agents-memory-mcp.md
new file mode 100644
index 0000000..5a6bfff
--- /dev/null
+++ b/skills/pixeltable/references/agents-memory-mcp.md
@@ -0,0 +1,289 @@
+# Agent with Memory and MCP Tools
+
+A production recipe combining a tool-calling agent with persistent memory (chat history + knowledge bank) and external MCP server integration. The agent remembers past conversations, retrieves stored facts, and can call both local tools and remote MCP tools.
+
+## Workflow
+
+1. Create a chat history table with embedding index for semantic recall
+2. Create a memory bank table for long-lived facts and preferences
+3. Write `@pxt.query` retrieval functions for both (filtered by `user_id`)
+4. Write local `@pxt.udf` tools (including a `save_memory` tool for the LLM to store facts)
+5. (Optional) Load MCP tools with `pxt.mcp_udfs()` and combine with local tools
+6. Bundle all tools with `pxt.tools()`
+7. Create agent table with computed column chain: LLM -> invoke_tools -> context assembly -> final answer
+8. After each agent response, save the conversation to chat history for future recall
+
+## Full Pipeline
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.openai import chat_completions, embeddings
+from pixeltable.functions.openai import invoke_tools as openai_invoke_tools
+from pixeltable.functions.huggingface import sentence_transformer
+from datetime import datetime
+
+pxt.create_dir('agent_app', if_exists='ignore')
+
+# ── 1. Memory: Chat History ─────────────────────────────────────────
+# Stores every user and assistant message with embeddings for recall.
+
+chat_history = pxt.create_table('agent_app.chat_history', {
+ 'role': pxt.String, # 'user' or 'assistant'
+ 'content': pxt.String,
+ 'timestamp': pxt.Timestamp,
+ 'user_id': pxt.String,
+}, if_exists='ignore')
+
+embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
+chat_history.add_embedding_index('content', string_embed=embed_fn, if_exists='ignore')
+
+@pxt.query
+def recall_chat_history(query_text: str, user_id: str, top_k: int = 5):
+ """Retrieve past conversation turns relevant to the current query."""
+ sim = chat_history.content.similarity(string=query_text)
+ return (
+ chat_history
+ .where((chat_history.user_id == user_id) & (sim > 0.5))
+ .order_by(sim, asc=False)
+ .limit(top_k)
+ .select(chat_history.role, chat_history.content, sim=sim)
+ )
+
+# ── 2. Memory: Knowledge Bank ───────────────────────────────────────
+# Stores user preferences, facts, and persistent notes.
+
+memory_bank = pxt.create_table('agent_app.memory_bank', {
+ 'content': pxt.String,
+ 'category': pxt.String, # 'preference', 'fact', 'note'
+ 'user_id': pxt.String,
+ 'timestamp': pxt.Timestamp,
+}, if_exists='ignore')
+
+memory_bank.add_embedding_index('content', string_embed=embed_fn, if_exists='ignore')
+
+@pxt.query
+def recall_memories(query_text: str, user_id: str, top_k: int = 3):
+ """Retrieve relevant stored memories for a user."""
+ sim = memory_bank.content.similarity(string=query_text)
+ return (
+ memory_bank
+ .where((memory_bank.user_id == user_id) & (sim > 0.5))
+ .order_by(sim, asc=False)
+ .limit(top_k)
+ .select(memory_bank.content, memory_bank.category, sim=sim)
+ )
+
+# Seed memories
+memory_bank.insert([
+ {'content': 'User prefers concise answers with code examples.',
+ 'category': 'preference', 'user_id': 'user_1', 'timestamp': datetime.now()},
+ {'content': 'Project uses FastAPI with Python 3.12.',
+ 'category': 'fact', 'user_id': 'user_1', 'timestamp': datetime.now()},
+])
+
+# ── 3. Local tools ──────────────────────────────────────────────────
+
+@pxt.udf
+def get_weather(city: str) -> str:
+ """Get current weather for a city."""
+ weather_data = {
+ 'new york': 'Sunny, 72F', 'london': 'Cloudy, 58F',
+ 'tokyo': 'Rainy, 65F', 'paris': 'Partly cloudy, 68F',
+ }
+ return weather_data.get(city.lower(), f'Weather data not available for {city}')
+
+@pxt.udf
+def save_memory(content: str, category: str, user_id: str) -> str:
+ """Save a new fact or preference to the user's memory bank."""
+ memory_bank.insert([{
+ 'content': content, 'category': category,
+ 'user_id': user_id, 'timestamp': datetime.now(),
+ }])
+ return f'Saved to memory: {content}'
+
+# ── 4. MCP tools (optional) ─────────────────────────────────────────
+# Load tools from any MCP-compliant server and combine with local tools.
+
+# mcp_tools = pxt.mcp_udfs('http://localhost:8000/mcp')
+# tools = pxt.tools(get_weather, save_memory, recall_memories, *mcp_tools)
+
+# Without MCP:
+tools = pxt.tools(get_weather, save_memory, recall_memories)
+
+# ── 5. Context assembly ─────────────────────────────────────────────
+
+@pxt.udf
+def build_prompt(
+ question: str,
+ tool_outputs: list | None,
+ chat_context: list | None,
+ memory_context: list | None,
+) -> str:
+ parts = [f"USER QUESTION: {question}"]
+
+ if memory_context:
+ mem_str = '\n'.join(
+ f"- [{item.get('category', '?')}] {item.get('content', '')}"
+ for item in memory_context if isinstance(item, dict)
+ )
+ parts.append(f"\n[USER MEMORIES]\n{mem_str}")
+
+ if chat_context:
+ chat_str = '\n'.join(
+ f"- {item.get('role', '?')}: {item.get('content', '')}"
+ for item in chat_context if isinstance(item, dict)
+ )
+ parts.append(f"\n[RECENT CONVERSATION]\n{chat_str}")
+
+ if tool_outputs:
+ parts.append(f"\n[TOOL RESULTS]\n{tool_outputs}")
+
+ return '\n'.join(parts)
+
+# ── 6. Agent pipeline ───────────────────────────────────────────────
+
+agent = pxt.create_table('agent_app.agent', {
+ 'prompt': pxt.String,
+ 'user_id': pxt.String,
+ 'timestamp': pxt.Timestamp,
+ 'system_prompt': pxt.String,
+ 'max_tokens': pxt.Int,
+ 'temperature': pxt.Float,
+}, if_exists='ignore')
+
+# Step 1: Tool selection
+agent.add_computed_column(
+ initial_response=chat_completions(
+ messages=[{'role': 'user', 'content': agent.prompt}],
+ model='gpt-4o-mini',
+ tools=tools,
+ ), if_exists='ignore')
+
+# Step 2: Execute tools
+agent.add_computed_column(
+ tool_output=openai_invoke_tools(tools, agent.initial_response),
+ if_exists='ignore')
+
+# Step 3: Retrieve memory context (runs in parallel as separate computed columns)
+agent.add_computed_column(
+ chat_context=recall_chat_history(agent.prompt, agent.user_id),
+ if_exists='ignore')
+
+agent.add_computed_column(
+ memory_context=recall_memories(agent.prompt, agent.user_id),
+ if_exists='ignore')
+
+# Step 4: Assemble prompt
+agent.add_computed_column(
+ context=build_prompt(
+ agent.prompt, agent.tool_output,
+ agent.chat_context, agent.memory_context),
+ if_exists='ignore')
+
+# Step 5: Final response
+agent.add_computed_column(
+ final_response=chat_completions(
+ messages=[
+ {'role': 'system', 'content': agent.system_prompt},
+ {'role': 'user', 'content': agent.context},
+ ],
+ model='gpt-4o-mini',
+ max_tokens=agent.max_tokens,
+ temperature=agent.temperature,
+ ), if_exists='ignore')
+
+agent.add_computed_column(
+ answer=agent.final_response.choices[0].message.content,
+ if_exists='ignore')
+```
+
+## Usage
+
+```python
+# Ask a question — memory and tools are used automatically
+agent.insert([{
+ 'prompt': 'What is the weather in Tokyo? Remember that I like brief answers.',
+ 'user_id': 'user_1',
+ 'timestamp': datetime.now(),
+ 'system_prompt': 'You are a helpful assistant. Use tools and memories to personalize your response.',
+ 'max_tokens': 512,
+ 'temperature': 0.7,
+}])
+
+result = agent.order_by(agent.timestamp, asc=False).limit(1).select(agent.answer).collect()
+
+# Save the conversation to chat history for future recall
+agent_row = agent.order_by(agent.timestamp, asc=False).limit(1).select(
+ agent.prompt, agent.answer, agent.user_id, agent.timestamp).collect()
+row = agent_row[0]
+
+chat_history.insert([
+ {'role': 'user', 'content': row['prompt'],
+ 'user_id': row['user_id'], 'timestamp': row['timestamp']},
+ {'role': 'assistant', 'content': row['answer'],
+ 'user_id': row['user_id'], 'timestamp': datetime.now()},
+])
+```
+
+## Adding MCP Tools
+
+Connect to any MCP-compliant server to extend the agent with external tools:
+
+```python
+# Load tools from an MCP server
+mcp_tools = pxt.mcp_udfs('http://localhost:8000/mcp')
+
+# Inspect available tools
+for tool in mcp_tools:
+ print(f'- {tool.name}: {tool.comment()}')
+
+# Combine with local tools
+tools = pxt.tools(get_weather, save_memory, recall_memories, *mcp_tools)
+```
+
+MCP tools are called via `invoke_tools()` exactly like local UDFs — no special handling needed.
+
+## Multi-Provider invoke_tools
+
+The agent pipeline works with any provider that supports tool calling:
+
+| Provider | Import | invoke_tools |
+|----------|--------|-------------|
+| OpenAI | `from pixeltable.functions.openai import invoke_tools` | `openai.invoke_tools(tools, response)` |
+| Anthropic | `from pixeltable.functions.anthropic import invoke_tools` | `anthropic.invoke_tools(tools, response)` |
+| Groq | `from pixeltable.functions.groq import invoke_tools` | `groq.invoke_tools(tools, response)` |
+| Gemini | `from pixeltable.functions.gemini import invoke_tools` | `gemini.invoke_tools(tools, response)` |
+| Bedrock | `from pixeltable.functions.bedrock import invoke_tools` | `bedrock.invoke_tools(tools, response)` |
+
+To switch providers, change the import and the LLM call function. The `tools` object and `invoke_tools()` pattern stay the same.
+
+## How It Works
+
+1. **Chat history** — Every conversation is stored in a table with an embedding index. The `recall_chat_history` query retrieves semantically relevant past turns for the current user.
+
+2. **Memory bank** — Long-lived facts and preferences are stored separately. The `recall_memories` query retrieves relevant memories. The `save_memory` tool lets the LLM itself save new facts during conversation.
+
+3. **User scoping** — All queries filter by `user_id`, so multiple users can share the same tables without seeing each other's data.
+
+4. **MCP integration** — `pxt.mcp_udfs()` loads tools from any MCP server as regular Pixeltable UDFs. They're bundled with `pxt.tools()` and executed with `invoke_tools()` just like local functions.
+
+## Adapting This Recipe
+
+- **Add document RAG**: Create a document chunking view and add a `search_documents` query to the tools list
+- **Add image memory**: Use CLIP embeddings on an image column for visual memory recall
+- **Serve via API**: Wrap in a FastAPI endpoint — see [workflows.md → FastAPI App Pattern](workflows.md#fastapi-app-pattern)
+- **Use Anthropic instead**: Swap `chat_completions` → `messages` and `openai.invoke_tools` → `anthropic.invoke_tools` — see [providers.md → Quick Reference](providers.md#quick-reference)
+
+## Agent with Memory Checklist
+
+- [ ] Chat history table created with `user_id`, `role`, `content`, `timestamp` columns
+- [ ] Embedding index added on chat history `content` column
+- [ ] Memory bank table created with `user_id`, `content`, `category` columns
+- [ ] Embedding index added on memory bank `content` column
+- [ ] Recall queries filter by `user_id` (multi-tenant safety)
+- [ ] Recall queries use `.similarity(string=...)` with keyword argument and a minimum threshold
+- [ ] `save_memory` tool has a clear docstring so the LLM knows when to store facts
+- [ ] Tools bundled with `pxt.tools()` — includes both local UDFs and MCP tools if any
+- [ ] `invoke_tools()` import matches the LLM provider used
+- [ ] Agent response saved to chat history after each interaction (both user and assistant turns)
+- [ ] Tested with multiple user IDs to verify scoping works
diff --git a/skills/pixeltable/references/anti-patterns.md b/skills/pixeltable/references/anti-patterns.md
new file mode 100644
index 0000000..9288fa3
--- /dev/null
+++ b/skills/pixeltable/references/anti-patterns.md
@@ -0,0 +1,388 @@
+# Anti-Patterns: Training-Distribution Biases LLMs Bring to Pixeltable
+
+LLMs are trained on millions of imperative Python examples using pandas, LangChain, standalone vector DBs, and raw loops. These priors are **wrong for Pixeltable**. This page lists every common bias and the correct idiomatic shape.
+
+## The 5 Macro Biases (High Priority)
+
+These are structural — getting any one wrong means the entire solution is non-idiomatic.
+
+### 1. Framework addiction (LangChain / LlamaIndex / Haystack / LangGraph)
+
+**Wrong:**
+```python
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.vectorstores import Chroma
+from langchain_openai import OpenAIEmbeddings, ChatOpenAI
+from langchain.chains import RetrievalQA
+
+splitter = RecursiveCharacterTextSplitter(chunk_size=512)
+chunks = splitter.split_documents(docs)
+vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
+chain = RetrievalQA.from_chain_type(ChatOpenAI(), retriever=vectorstore.as_retriever())
+```
+
+**Right:**
+```python
+import pixeltable as pxt
+from pixeltable.functions.document import document_splitter
+from pixeltable.functions.openai import chat_completions, embeddings
+
+docs = pxt.create_table('app.docs', {'doc': pxt.Document}, if_exists='ignore')
+chunks = pxt.create_view('app.chunks', docs,
+ iterator=document_splitter(docs.doc, separators='token_limit', limit=512),
+ if_exists='ignore')
+chunks.add_embedding_index('text', embedding=embeddings(model='text-embedding-3-small'), if_exists='ignore')
+```
+
+**Why:** Pixeltable handles chunking, embedding, indexing, and retrieval natively. Adding a framework on top creates redundant abstraction, breaks incremental updates, and loses version control.
+
+---
+
+### 2. pandas as working store
+
+**Wrong:**
+```python
+import pandas as pd
+
+df = pd.read_csv('data.csv')
+df['summary'] = df['text'].apply(lambda x: call_openai(x))
+df['embedding'] = df['text'].apply(lambda x: get_embedding(x))
+df.to_parquet('output.parquet')
+```
+
+**Right:**
+```python
+import pixeltable as pxt
+from pixeltable.functions.openai import chat_completions, embeddings
+
+t = pxt.create_table('app.data', source='data.csv', if_exists='ignore')
+t.add_computed_column(summary=chat_completions(
+ messages=[{'role': 'user', 'content': 'Summarize: ' + t.text}],
+ model='gpt-4o-mini'
+).choices[0].message.content, if_exists='ignore')
+t.add_embedding_index('text', embedding=embeddings(model='text-embedding-3-small'), if_exists='ignore')
+
+# Export ONLY at the end if needed
+df = t.select(t.text, t.summary).collect().to_pandas()
+```
+
+**Why:** pandas has no persistence, no incremental computation, no automatic retry on API failures, and no version control. Pixeltable tables persist, recompute only new/failed rows, and maintain full history.
+
+---
+
+### 3. For-loops calling AI models
+
+**Wrong:**
+```python
+results = []
+for _, row in df.iterrows():
+ response = openai.chat.completions.create(
+ model='gpt-4o-mini',
+ messages=[{'role': 'user', 'content': row['text']}]
+ )
+ results.append(response.choices[0].message.content)
+df['summary'] = results
+```
+
+**Right:**
+```python
+from pixeltable.functions.openai import chat_completions
+
+t.add_computed_column(
+ summary=chat_completions(
+ messages=[{'role': 'user', 'content': t.text}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore'
+)
+```
+
+**Why:** Computed columns handle batching, rate limiting (configured in `~/.pixeltable/config.toml`), automatic caching (never re-calls for unchanged rows), error isolation per row, and retry via `recompute_columns()`. A for-loop has none of this.
+
+---
+
+### 4. Separate vector database
+
+**Wrong:**
+```python
+import chromadb
+from chromadb.utils import embedding_functions
+
+client = chromadb.Client()
+ef = embedding_functions.OpenAIEmbeddingFunction(api_key=os.environ['OPENAI_API_KEY'])
+collection = client.create_collection("docs", embedding_function=ef)
+collection.add(documents=texts, ids=ids)
+results = collection.query(query_texts=["search query"], n_results=5)
+```
+
+**Right:**
+```python
+from pixeltable.functions.openai import embeddings
+
+t.add_embedding_index('text',
+ embedding=embeddings(model='text-embedding-3-small'),
+ if_exists='ignore')
+
+sim = t.text.similarity(string='search query')
+results = t.order_by(sim, asc=False).limit(5).select(t.text, sim).collect()
+```
+
+**Why:** The embedding index lives inside the table — it updates automatically when rows are inserted, shares the same version history, and requires no separate service. Querying uses the same expression language as everything else.
+
+---
+
+### 5. While-loop agent patterns
+
+**Wrong:**
+```python
+messages = [{"role": "user", "content": user_query}]
+while True:
+ response = openai.chat.completions.create(model="gpt-4o", messages=messages, tools=tools)
+ if response.choices[0].finish_reason == "stop":
+ break
+ tool_calls = response.choices[0].message.tool_calls
+ for tc in tool_calls:
+ result = execute_tool(tc)
+ messages.append({"role": "tool", "content": result, "tool_call_id": tc.id})
+```
+
+**Right:**
+```python
+from pixeltable.functions.openai import chat_completions, invoke_tools
+
+tools = pxt.tools(search_docs, get_weather)
+
+agent = pxt.create_table('app.agent', {'prompt': pxt.String}, if_exists='ignore')
+agent.add_computed_column(response=chat_completions(
+ messages=[{'role': 'user', 'content': agent.prompt}],
+ model='gpt-4o', tools=tools, tool_choice=tools.choice(required=True)
+), if_exists='ignore')
+agent.add_computed_column(tool_output=invoke_tools(tools, agent.response), if_exists='ignore')
+agent.add_computed_column(final=chat_completions(
+ messages=[{'role': 'user', 'content': agent.prompt + '\n\nContext: ' + agent.tool_output.astype(pxt.String)}],
+ model='gpt-4o'
+).choices[0].message.content, if_exists='ignore')
+
+agent.insert([{'prompt': 'What is the weather in NYC?'}])
+```
+
+**Why:** The declarative chain persists every intermediate result, enables debugging by inspecting any column, retries individual steps without re-running the whole chain, and maintains a complete audit trail. The while-loop loses all intermediate state on failure.
+
+---
+
+## The Full 15-Bias Reference
+
+| # | LLM's prior reaches for | Correct Pixeltable shape | Why the prior is wrong |
+|---|--------------------------|--------------------------|------------------------|
+| 1 | LangChain / LlamaIndex / Haystack / LangGraph | `create_view` + iterator + `add_embedding_index` + `pxt.tools()` | Redundant abstraction; breaks incremental updates |
+| 2 | `pandas.DataFrame` as working store | Pixeltable table is the store; `.to_pandas()` for export only | No persistence, no incremental, no versioning |
+| 3 | `for row in ...:` calling AI per row | Computed column | No batching, no rate limits, no caching, no retry |
+| 4 | Pinecone / Chroma / FAISS / Qdrant / pgvector | `t.add_embedding_index(col, embedding=fn)` | Separate service; no auto-update; no version control |
+| 5 | Embeddings as `list[list[float]]` in memory | Stored as computed column with type `pxt.Array` | Volatile; lost on restart; can't query |
+| 6 | `while not done:` agent loop | Table where insert triggers computed-column chain | Loses intermediate state; no audit trail |
+| 7 | `cv2.VideoCapture` / Pillow loops for media | `frame_iterator` + `pixeltable.functions.image.*` | No persistence; manual frame management |
+| 8 | `psycopg2` / `sqlalchemy` against `~/.pixeltable/pgdata` | SDK only (never touch embedded Postgres) | Corrupts internal schema; breaks versioning |
+| 9 | `async def` FastAPI endpoints calling Pixeltable | `def` endpoints (Pixeltable is synchronous) | Deadlocks or silent failures under async |
+| 10 | Drop + recreate tables as "initialization" | `if_exists='ignore'` on `create_table` / `create_view` | Data loss; breaks incremental computation |
+| 11 | `if_exists='ignore'` to "update" column logic | `t.drop_column('col')` then recreate | `if_exists='ignore'` is a no-op if column exists |
+| 12 | Threading `api_key=` into every provider call | Environment variables or `~/.pixeltable/config.toml` | Leaks keys; breaks portability |
+| 13 | `openai-whisper` / `faster-whisper` imperative | `whisper.transcribe` or `openai.transcriptions` as computed column | No caching; manual error handling |
+| 14 | Pydantic / dataclass schemas for table definition | `{'col': pxt.Type}` dict | Pixeltable has its own type system; Pydantic adds nothing |
+| 15 | Chat history in Python `list` or Redis | Table with embedding index for semantic memory retrieval | Volatile or disconnected from the data layer |
+
+## Per-Bias Code Examples (6–15)
+
+### 5. Embeddings as raw lists
+
+**Wrong:**
+```python
+embeddings_cache = []
+for text in texts:
+ emb = openai.embeddings.create(input=text, model="text-embedding-3-small")
+ embeddings_cache.append(emb.data[0].embedding)
+# Now what? Save to pickle? Rebuild on every restart?
+```
+
+**Right:**
+```python
+from pixeltable.functions.openai import embeddings
+t.add_embedding_index('text', embedding=embeddings(model='text-embedding-3-small'), if_exists='ignore')
+```
+
+### 7. cv2 / Pillow loops for video/image processing
+
+**Wrong:**
+```python
+import cv2
+cap = cv2.VideoCapture('video.mp4')
+frames = []
+while cap.isOpened():
+ ret, frame = cap.read()
+ if not ret:
+ break
+ if frame_count % 30 == 0:
+ frames.append(frame)
+```
+
+**Right:**
+```python
+from pixeltable.functions.video import frame_iterator
+
+frames = pxt.create_view('app.frames', videos,
+ iterator=frame_iterator(videos.video, fps=1.0),
+ if_exists='ignore')
+```
+
+### 8. Direct Postgres access
+
+**Wrong:**
+```python
+import psycopg2
+conn = psycopg2.connect(dbname='pixeltable', host='/tmp/.s.PGSQL.5432')
+cur = conn.cursor()
+cur.execute("SELECT * FROM ...") # NEVER DO THIS
+```
+
+**Right:** Always use the Pixeltable SDK. The embedded Postgres is an implementation detail.
+
+### 9. async def with Pixeltable
+
+**Wrong:**
+```python
+@app.post("/query")
+async def query_endpoint(q: str):
+ results = t.where(t.text.contains(q)).collect() # May deadlock
+ return results
+```
+
+**Right:**
+```python
+@app.post("/query")
+def query_endpoint(q: str):
+ results = t.where(t.text.contains(q)).select(t.text, t.score).collect()
+ return results.to_pandas().to_dict(orient='records')
+```
+
+### 10. Drop + recreate as init
+
+**Wrong:**
+```python
+pxt.drop_table('app.data', force=True)
+t = pxt.create_table('app.data', {'text': pxt.String})
+```
+
+**Right:**
+```python
+t = pxt.create_table('app.data', {'text': pxt.String}, if_exists='ignore')
+```
+
+### 11. if_exists='ignore' to update logic
+
+**Wrong:**
+```python
+# Bug in summary prompt — "fix" by re-running:
+t.add_computed_column(summary=fixed_expression, if_exists='ignore')
+# ↑ SILENT NO-OP — column already exists with old logic
+```
+
+**Right:**
+```python
+t.drop_column('summary')
+t.add_computed_column(summary=fixed_expression)
+```
+
+### 12. Hardcoding API keys
+
+**Wrong:**
+```python
+from pixeltable.functions.openai import chat_completions
+t.add_computed_column(resp=chat_completions(..., api_key='sk-abc123'))
+```
+
+**Right:** Set `OPENAI_API_KEY` env var or add to `~/.pixeltable/config.toml`:
+```toml
+[openai]
+api_key = 'sk-...'
+```
+
+### 13. Imperative whisper
+
+**Wrong:**
+```python
+import whisper
+model = whisper.load_model("base")
+for audio_file in audio_files:
+ result = model.transcribe(audio_file)
+ transcripts.append(result["text"])
+```
+
+**Right:**
+```python
+from pixeltable.functions.whisper import transcribe
+
+t.add_computed_column(
+ transcript=transcribe(t.audio, model='base').text,
+ if_exists='ignore'
+)
+```
+
+### 14. Pydantic schemas
+
+**Wrong:**
+```python
+from pydantic import BaseModel
+
+class Document(BaseModel):
+ title: str
+ content: str
+ embedding: list[float]
+
+# Then trying to map this to Pixeltable somehow...
+```
+
+**Right:**
+```python
+t = pxt.create_table('app.docs', {
+ 'title': pxt.String,
+ 'content': pxt.String,
+}, if_exists='ignore')
+# Embeddings are computed, not schema-declared
+t.add_embedding_index('content', embedding=embed_fn, if_exists='ignore')
+```
+
+### 15. Chat history in lists or Redis
+
+**Wrong:**
+```python
+chat_history = [] # Lost on restart
+# or
+import redis
+r = redis.Redis()
+r.lpush(f"chat:{user_id}", json.dumps(message))
+```
+
+**Right:**
+```python
+memory = pxt.create_table('app.memory', {
+ 'role': pxt.String,
+ 'content': pxt.String,
+ 'session_id': pxt.String,
+ 'timestamp': pxt.Timestamp,
+}, if_exists='ignore')
+memory.add_embedding_index('content',
+ embedding=embeddings(model='text-embedding-3-small'),
+ if_exists='ignore')
+
+# Retrieve relevant past context
+sim = memory.content.similarity(string=current_query)
+context = memory.where(memory.session_id == sid).order_by(sim, asc=False).limit(5).collect()
+```
+
+---
+
+## Cross-References
+
+- [SKILL.md → Critical Warnings](../SKILL.md#critical-warnings--read-before-writing-code) — hallucinated API fixes
+- [SKILL.md → Common Pitfalls](../SKILL.md#common-pitfalls) — wrong/right table for specific APIs
+- [core-api.md → Common Pitfalls](core-api.md#common-pitfalls) — extended examples
+- [Migration guides](https://docs.pixeltable.com/migrate/from-agent-frameworks) — porting from LangChain/LlamaIndex
diff --git a/skills/pixeltable/references/core-api.md b/skills/pixeltable/references/core-api.md
new file mode 100644
index 0000000..2bb38f1
--- /dev/null
+++ b/skills/pixeltable/references/core-api.md
@@ -0,0 +1,1146 @@
+# Pixeltable Core API Reference
+
+Complete reference for table operations, querying, computed columns, views, embedding indexes, UDFs, tools, and configuration.
+
+## Contents
+
+- [Table Creation](#table-creation) (basic, primary key, UUID, from source)
+- [Querying](#querying) (select, where, order by, pandas, Pydantic)
+- [Computed Columns](#computed-columns)
+- [Views](#views) (filtered, document chunking, video frames, string splitting, audio splitting)
+- [Built-in Functions](#built-in-image-functions) (image, video, string)
+- [Embedding Indexes](#embedding-indexes) (add index, similarity search, distance metrics)
+- [UDFs](#udfs) (basic, optional args, batch, aggregate, retrieval)
+- [Update and Delete](#update-and-delete)
+- [Table Operations](#table-operations)
+- [Snapshots](#snapshots)
+- [Tools and Agents](#tools-and-agents) (create tools, agent pipeline, MCP)
+- [Serving (FastAPIRouter)](#serving-fastapirouter) (add_insert_route, add_query_route, add_delete_route, background jobs, pxt serve)
+- [Export](#export-csv-json-parquet-lancedb) (CSV, JSON, Parquet, LanceDB, SQL)
+- [Configuration](#configuration) (API keys, config.toml, rate limiting, media destinations, pxtfs://)
+- [Performance Tips](#performance-tips)
+
+---
+
+## Table Creation
+
+### Basic Table
+
+```python
+import pixeltable as pxt
+
+t = pxt.create_table('dir.table_name', {
+ 'col1': pxt.String,
+ 'col2': pxt.Int,
+ 'col3': pxt.Float,
+ 'col4': pxt.Bool,
+ 'col5': pxt.Image,
+ 'col6': pxt.Video,
+ 'col7': pxt.Audio,
+ 'col8': pxt.Document,
+ 'col9': pxt.Json,
+ 'col10': pxt.Array[(3, 4), pxt.Float], # 3x4 float array
+ 'col11': pxt.Timestamp,
+ 'col12': pxt.Date,
+ 'col13': pxt.UUID,
+ 'col14': pxt.Binary,
+}, if_exists='ignore')
+```
+
+### Table with Primary Key
+
+```python
+t = pxt.create_table('dir.table', {
+ 'id': pxt.Required[pxt.String],
+ 'data': pxt.String,
+}, primary_key=['id'], if_exists='ignore')
+```
+
+### Table with Auto-Generated UUID Primary Key
+
+Production-ready pattern using uuid7() for automatic unique IDs:
+
+```python
+from pixeltable.functions.uuid import uuid7
+
+t = pxt.create_table('dir.items', {
+ 'content': pxt.String,
+ 'uuid': uuid7(), # auto-generated on insert
+ 'timestamp': pxt.Timestamp,
+}, primary_key=['uuid'], if_exists='ignore')
+
+# No need to provide uuid when inserting
+from datetime import datetime
+t.insert([{'content': 'Hello', 'timestamp': datetime.now()}])
+```
+
+### Table from Data Source
+
+```python
+t = pxt.create_table('dir.from_csv', source='data.csv')
+t = pxt.create_table('dir.from_parquet', source='data.parquet')
+t = pxt.create_table('dir.data', source='data.csv',
+ schema_overrides={'image_col': pxt.Image, 'doc_col': pxt.Document})
+```
+
+## Querying
+
+### Select
+
+```python
+results = t.collect() # all columns
+results = t.select(t.col1, t.col2).collect() # specific columns
+results = t.select(t.col1, doubled=t.col2 * 2).collect() # with expressions
+```
+
+### Where (Filter)
+
+```python
+results = t.where(t.col2 > 10).select(t.col1).collect()
+results = t.where((t.col2 > 10) & (t.col1 != 'exclude')).collect()
+results = t.where(t.col1.like('%pattern%')).collect()
+```
+
+### Order By / Limit / Count / Sample
+
+```python
+results = t.order_by(t.col2, asc=False).limit(10).collect()
+total = t.count()
+filtered = t.where(t.score > 0.5).count()
+
+# Pagination with offset
+page2 = t.order_by(t.col2).limit(10, offset=10).collect()
+
+# Random sample (reproducible with seed)
+sample = t.sample(n=100, seed=42).select(t.col1, t.col2).collect()
+```
+
+### Conversions
+
+```python
+df = t.select(t.col1, t.col2).collect().to_pandas() # to pandas
+items = list(t.select(title=t.title, score=t.score).collect().to_pydantic(M)) # to Pydantic (names must match)
+t.insert([pydantic_model_instance]) # insert Pydantic models
+first_5 = t.head(5)
+
+# return_rows=True: get computed columns back from insert without a follow-up query
+status = t.insert([row], return_rows=True)
+data = status.rows[0] # dict with ALL columns including computed
+```
+
+## Computed Columns
+
+```python
+# Simple expression
+t.add_computed_column(upper_name=t.name.upper(), if_exists='ignore')
+
+# Using a UDF
+t.add_computed_column(result=my_udf(t.input_col), if_exists='ignore')
+
+# Using an AI provider
+from pixeltable.functions.openai import chat_completions
+t.add_computed_column(
+ summary=chat_completions(
+ messages=[{'role': 'user', 'content': t.text}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore'
+)
+
+# Drop column
+t.drop_column('column_name')
+
+# Recompute failed or outdated columns (critical for error recovery)
+t.recompute_columns(columns=['summary'])
+t.recompute_columns(columns=['summary'], where=t.summary.errortype != None)
+```
+
+## Views
+
+### Filtered View
+
+```python
+v = pxt.create_view('dir.active', t.where(t.is_active == True), if_exists='ignore')
+```
+
+### Document Chunking
+
+```python
+from pixeltable.functions.document import document_splitter
+
+# Separators: 'token_limit', 'sentence', 'heading', 'page', or combine: 'page, sentence'
+chunks = pxt.create_view('dir.chunks', t,
+ iterator=document_splitter(t.doc, separators='token_limit', limit=300),
+ if_exists='ignore')
+
+# With metadata extraction and image extraction (PDF)
+chunks = pxt.create_view('dir.chunks', t,
+ iterator=document_splitter(t.doc, separators='page, sentence',
+ metadata='title,heading,page', elements=['text', 'image']),
+ if_exists='ignore')
+```
+
+### Video Frame Extraction
+
+```python
+from pixeltable.functions.video import frame_iterator
+
+frames = pxt.create_view('dir.frames', t, iterator=frame_iterator(t.video, fps=1.0), if_exists='ignore')
+# Options: fps=N, num_frames=N, keyframes_only=True
+# Output columns: frame (Image), frame_idx, pos_msec, pos_frame
+```
+
+### String / Audio Splitting
+
+```python
+from pixeltable.functions.string import string_splitter
+from pixeltable.functions.audio import audio_splitter
+
+sentences = pxt.create_view('dir.sentences', t,
+ iterator=string_splitter(text=t.content, separators='sentence'), if_exists='ignore')
+audio_chunks = pxt.create_view('dir.audio_chunks', t,
+ iterator=audio_splitter(audio=t.audio, duration=30.0), if_exists='ignore')
+```
+
+## Built-in Image Functions
+
+```python
+from pixeltable.functions import image as pxt_image
+
+# Thumbnail generation
+t.add_computed_column(
+ thumb=pxt_image.thumbnail(t.image, size=(320, 320)),
+ if_exists='ignore')
+
+# Base64 encoding (useful for API responses and Anthropic vision)
+t.add_computed_column(
+ b64=pxt_image.b64_encode(t.image),
+ if_exists='ignore')
+
+# Combined: thumbnail + base64 (common pattern for APIs)
+t.add_computed_column(
+ thumbnail=pxt_image.b64_encode(
+ pxt_image.thumbnail(t.image, size=(320, 320))
+ ),
+ if_exists='ignore')
+
+# Base64 with explicit format
+t.add_computed_column(
+ png_b64=pxt_image.b64_encode(t.image, 'png'),
+ if_exists='ignore')
+```
+
+## Built-in Image Functions (Additional)
+
+```python
+from pixeltable.functions.image import draw_bounding_boxes
+
+# Draw detection results on images (pairs with DETR/YOLOX output)
+t.add_computed_column(
+ annotated=draw_bounding_boxes(t.image, t.detections),
+ if_exists='ignore')
+```
+
+## Built-in Video Functions
+
+```python
+from pixeltable.functions.video import (
+ extract_audio, resize, crop, concat_videos,
+ with_audio, pan, mix_audio, overlay_image,
+)
+
+# Extract audio track from video
+t.add_computed_column(
+ audio=extract_audio(t.video, format='mp3'),
+ if_exists='ignore')
+
+# Resize video
+t.add_computed_column(
+ resized=resize(t.video, width=640, height=480),
+ if_exists='ignore')
+
+# Crop video region
+t.add_computed_column(
+ cropped=crop(t.video, x=100, y=100, w=400, h=300),
+ if_exists='ignore')
+
+# Concatenate two videos
+t.add_computed_column(
+ combined=concat_videos(t.intro_video, t.main_video),
+ if_exists='ignore')
+
+# Replace audio track on a video
+t.add_computed_column(
+ with_new_audio=with_audio(t.video, t.narration),
+ if_exists='ignore')
+
+# Ken Burns pan effect on an image (creates video from still image)
+t.add_computed_column(
+ clip=pan(t.image, duration=5.0, zoom_start=1.0, zoom_end=1.3),
+ if_exists='ignore')
+
+# Mix (overlay) two audio tracks
+t.add_computed_column(
+ mixed=mix_audio(t.narration, t.background_music),
+ if_exists='ignore')
+
+# Overlay image (watermark) on video
+t.add_computed_column(
+ watermarked=overlay_image(t.video, t.logo, x=10, y=10),
+ if_exists='ignore')
+```
+
+## Built-in String Functions
+
+```python
+from pixeltable.functions import string as pxt_str
+
+# String length
+t.add_computed_column(text_len=pxt_str.len(t.content), if_exists='ignore')
+```
+
+## Embedding Indexes
+
+### Add Index
+
+```python
+from pixeltable.functions.huggingface import clip, sentence_transformer
+
+# CLIP (multimodal: text + image)
+embed_fn = clip.using(model_id='openai/clip-vit-base-patch32')
+t.add_embedding_index('image_col', embedding=embed_fn, if_exists='ignore')
+
+# Sentence Transformers (text)
+embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
+t.add_embedding_index('text_col', embedding=embed_fn, if_exists='ignore')
+
+# Sentence Transformers (multilingual, high quality, recommended for production)
+embed_fn = sentence_transformer.using(model_id='intfloat/multilingual-e5-large-instruct')
+t.add_embedding_index('text_col', string_embed=embed_fn, if_exists='ignore')
+
+# OpenAI embeddings
+from pixeltable.functions.openai import embeddings
+t.add_embedding_index('text_col', embedding=embeddings.using(model='text-embedding-3-small'), if_exists='ignore')
+```
+
+### Similarity Search
+
+```python
+# Text
+sim = t.text_col.similarity(string='search query')
+results = t.order_by(sim, asc=False).limit(10).select(t.text_col, sim).collect()
+
+# Text with threshold filter
+sim = t.text_col.similarity(string='search query')
+results = t.where(sim > 0.5).order_by(sim, asc=False).limit(10).select(t.text_col, sim).collect()
+
+# Image with text (multimodal)
+sim = t.image_col.similarity(string='a red car')
+results = t.order_by(sim, asc=False).limit(5).select(t.image_col, sim).collect()
+
+# Image with image
+sim = t.image_col.similarity(image='path/to/query.jpg')
+results = t.order_by(sim, asc=False).limit(5).select(t.image_col, sim).collect()
+```
+
+### Distance Metrics
+
+```python
+t.add_embedding_index('col', embedding=fn, metric='cosine') # default
+t.add_embedding_index('col', embedding=fn, metric='ip') # inner product
+t.add_embedding_index('col', embedding=fn, metric='l2') # euclidean
+```
+
+## B-Tree Indexes
+
+For efficient range queries and equality lookups on non-embedding columns:
+
+```python
+# Add B-tree index for fast filtering
+t.add_btree_index('category', if_exists='ignore')
+t.add_btree_index('timestamp', if_exists='ignore')
+
+# Drop an index
+t.drop_index('index_name')
+```
+
+## UDFs
+
+### Basic
+
+```python
+@pxt.udf
+def my_function(x: str) -> str:
+ return x.upper()
+```
+
+### With Optional Args
+
+```python
+from typing import Optional
+
+@pxt.udf
+def safe_process(value: Optional[str], default: str = '') -> str:
+ return value if value is not None else default
+```
+
+### Batch UDF
+
+```python
+from pixeltable.func import Batch
+
+@pxt.udf(batch_size=32)
+def batch_process(texts: Batch[str]) -> Batch[list[float]]:
+ return model.encode(texts).tolist()
+```
+
+### Aggregate UDF
+
+```python
+@pxt.uda
+class MyAggregator(pxt.Aggregator):
+ def __init__(self):
+ self.sum = 0
+ self.count = 0
+
+ def update(self, val: int) -> None:
+ self.sum += val
+ self.count += 1
+
+ def value(self) -> float:
+ return self.sum / self.count if self.count > 0 else 0.0
+```
+
+### Retrieval UDF (for AI Tool Use)
+
+```python
+lookup_fn = pxt.retrieval_udf(t, name='lookup_items', description='Look up items by name',
+ parameters=['name'], limit=5)
+```
+
+### Custom Iterator
+
+Define custom iterators that produce multiple output rows from a single input:
+
+```python
+@pxt.iterator
+class SlidingWindowIterator:
+ """Produce overlapping windows from a text."""
+ def __init__(self, text: str, window_size: int = 100, stride: int = 50):
+ self.text = text
+ self.window_size = window_size
+ self.stride = stride
+
+ def __next__(self) -> dict: # yields {'window': str}
+ ...
+```
+
+### List Iterator
+
+Split a list/array column into one row per element:
+
+```python
+from pixeltable.functions import list_iterator
+
+# Explode a JSON array column into individual rows
+items = pxt.create_view('dir.items', t,
+ iterator=list_iterator(t.tags),
+ if_exists='ignore')
+```
+
+## Update and Delete
+
+```python
+t.update({'score': 1.0}, where=t.category == 'important')
+t.delete(where=t.is_active == False)
+```
+
+### return_rows=True (insert-then-read)
+
+Get all column values (including computed columns) back from `insert()`, `update()`, or `batch_update()` without a follow-up query:
+
+```python
+# Anti-pattern: insert then query
+t.insert([row])
+result = t.where(t.id == value).select(...).collect()
+data = result[0]
+
+# Correct: return_rows=True
+status = t.insert([row], return_rows=True)
+data = status.rows[0] # dict with ALL columns including computed
+```
+
+For typed access, use Pydantic `model_validate()` with `extra="ignore"` (row dicts contain every column):
+
+```python
+from pydantic import BaseModel
+
+class AgentResult(BaseModel):
+ model_config = {"extra": "ignore"}
+ answer: str | None = None
+ tool_output: Any = None
+
+status = agent.insert([{"prompt": user_input}], return_rows=True)
+result = AgentResult.model_validate(status.rows[0])
+```
+
+**When to use which:**
+- `return_rows=True` -- insert/update and read computed columns back in one call
+- `to_pydantic()` -- reading from a `ResultSet` (after `.collect()`)
+- `model_validate()` -- reading from `status.rows` (plain dicts from `return_rows=True`)
+
+## Table Operations
+
+```python
+t.rename_column('old_name', 'new_name')
+t.add_column(new_col=pxt.String)
+t.drop_column('col_name')
+t.describe()
+t.columns()
+
+# Directory management
+pxt.list_dirs()
+pxt.list_tables()
+contents = pxt.get_dir_contents('my_dir')
+```
+
+## Recompute Columns
+
+Re-run computed columns on existing rows. Critical for retrying after API errors or rate limits:
+
+```python
+# Recompute all rows for a column
+t.recompute_columns(columns=['summary'])
+
+# Recompute only failed rows (most common pattern)
+t.recompute_columns(columns=['summary'], where=t.summary.errortype != None)
+
+# Recompute specific rows matching a condition
+t.recompute_columns(columns=['label'], where=t.category == 'pending')
+```
+
+## Snapshots and Version History
+
+Point-in-time copies of tables:
+
+```python
+snap = pxt.create_snapshot('dir.snap_v1', t, if_exists='ignore')
+# Query the snapshot like any table
+snap.select(snap.col1).collect()
+
+# View table version history
+versions = t.get_versions()
+```
+
+## Tools and Agents
+
+### Create Tools from UDFs and Query Functions
+
+```python
+@pxt.udf
+def web_search(keywords: str) -> str:
+ """Search the web for information."""
+ from duckduckgo_search import DDGS
+ with DDGS() as ddgs:
+ results = list(ddgs.news(keywords=keywords, max_results=5))
+ return '\n'.join(f"{r['title']}: {r['body']}" for r in results) if results else 'No results.'
+
+@pxt.query
+def search_docs(query_text: str):
+ """Search documents by semantic similarity."""
+ sim = chunks.text.similarity(string=query_text)
+ return chunks.order_by(sim, asc=False).limit(10).select(chunks.text, sim)
+
+tools = pxt.tools(web_search, search_docs)
+```
+
+### Full Tool-Calling Agent Pipeline
+
+The agent pipeline uses chained computed columns. Inserting a row triggers the entire pipeline:
+
+```python
+from pixeltable.functions.anthropic import messages, invoke_tools
+
+agent = pxt.create_table('project.agent', {
+ 'prompt': pxt.String,
+ 'timestamp': pxt.Timestamp,
+ 'initial_system_prompt': pxt.String,
+ 'final_system_prompt': pxt.String,
+ 'max_tokens': pxt.Int,
+ 'temperature': pxt.Float,
+}, if_exists='ignore')
+
+# Step 1: Initial LLM call with tool selection
+agent.add_computed_column(
+ initial_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': agent.prompt}]}],
+ tools=tools,
+ tool_choice=tools.choice(required=True),
+ max_tokens=agent.max_tokens,
+ model_kwargs={
+ 'system': agent.initial_system_prompt,
+ 'temperature': agent.temperature,
+ },
+ ),
+ if_exists='ignore',
+)
+
+# Step 2: Execute the tools the LLM selected
+agent.add_computed_column(
+ tool_output=invoke_tools(tools, agent.initial_response),
+ if_exists='ignore',
+)
+
+# Step 3: RAG context retrieval
+agent.add_computed_column(
+ doc_context=search_docs(agent.prompt),
+ if_exists='ignore',
+)
+
+# Step 4: Assemble context with a UDF
+agent.add_computed_column(
+ context=assemble_context(agent.prompt, agent.tool_output, agent.doc_context),
+ if_exists='ignore',
+)
+
+# Step 5: Final LLM call with full context
+agent.add_computed_column(
+ final_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': agent.context}]}],
+ max_tokens=agent.max_tokens,
+ model_kwargs={
+ 'system': agent.final_system_prompt,
+ 'temperature': agent.temperature,
+ },
+ ),
+ if_exists='ignore',
+)
+
+# Step 6: Extract answer text
+agent.add_computed_column(
+ answer=agent.final_response.content[0].text,
+ if_exists='ignore',
+)
+```
+
+### Using the Agent Pipeline
+
+```python
+from datetime import datetime
+
+agent.insert([{
+ 'prompt': 'What are the latest developments in quantum computing?',
+ 'timestamp': datetime.now(),
+ 'initial_system_prompt': 'Identify the best tool(s) to answer the query.',
+ 'final_system_prompt': 'Provide a clear answer. Cite sources when possible.',
+ 'max_tokens': 1024,
+ 'temperature': 0.7,
+}])
+
+result = agent.order_by(agent.timestamp, asc=False).limit(1).select(agent.answer).collect()
+```
+
+### MCP Integration
+
+```python
+udfs = pxt.mcp_udfs('http://localhost:8080/sse')
+```
+
+---
+
+## Serving (FastAPIRouter)
+
+`pixeltable.serving.FastAPIRouter` (v0.6+) is a subclass of FastAPI's `APIRouter` that generates endpoints from tables and `@pxt.query` functions. No Pydantic models or hand-written handlers needed.
+
+### add_insert_route
+
+```python
+from pixeltable.serving import FastAPIRouter
+import pixeltable as pxt
+
+router = FastAPIRouter(prefix="/api/data", tags=["data"])
+docs = pxt.get_table("app.documents")
+
+# Synchronous insert — returns inserted row fields
+router.add_insert_route(docs, path="/upload/image",
+ uploadfile_inputs=["image"], inputs=["timestamp"], outputs=["uuid", "thumbnail"])
+
+# Background insert — returns job handle for polling
+router.add_insert_route(docs, path="/upload/document",
+ uploadfile_inputs=["document"], inputs=["timestamp"], outputs=["uuid"],
+ background=True)
+# Client receives { "job_url": "http://host/jobs/{id}" }
+# Poll GET /jobs/{id} → { "status": "pending" | "done" | "error", "result": {...} }
+```
+
+Parameters:
+- `uploadfile_inputs` — column names sent as `UploadFile` (multipart form)
+- `inputs` — column names sent as form fields
+- `outputs` — column names to return after insert
+- `background=True` — return immediately with a job URL; client polls for result
+
+### add_query_route
+
+```python
+@pxt.query
+def search_docs(query_text: str):
+ sim = chunks.text.similarity(string=query_text)
+ return chunks.where(sim > 0.3).order_by(sim, asc=False).select(
+ text=chunks.text, sim=sim).limit(20)
+
+router.add_query_route(path="/search", query=search_docs, method="post")
+# POST /api/data/search {"query_text": "..."} → { "rows": [...] }
+
+@pxt.query
+def list_docs():
+ return docs.select(uuid=docs.uuid, name=docs.document).order_by(docs.timestamp, asc=False)
+
+router.add_query_route(path="/list", query=list_docs, method="get")
+# GET /api/data/list → { "rows": [...] }
+```
+
+### add_delete_route
+
+```python
+# Delete by primary key
+router.add_delete_route(docs, path="/delete")
+# POST /api/data/delete {"uuid": "..."} → { "num_rows": 1 }
+
+# Delete by non-PK column
+router.add_delete_route(chat, path="/delete-conversation", match_columns=["conversation_id"])
+```
+
+### Architecture pattern
+
+```
+setup_pixeltable.py — flat module: creates tables, views, indexes on import
+routers/data.py — pxt.get_table() + @pxt.query + add_*_route
+routers/search.py — pxt.get_table() + @pxt.query + add_*_route
+main.py — import setup_pixeltable; from routers import data, search
+```
+
+See [workflows.md → FastAPIRouter](workflows.md#fastapirouter-declarative-serving-v06) for a complete example.
+
+### pxt serve (CLI)
+
+Define routes in `pyproject.toml` (standard Python convention) or a standalone `pixeltable.toml`, then run `pxt serve`:
+
+```toml
+# In pyproject.toml (alongside [project] and dependencies)
+# Requires [build-system] + [tool.setuptools] py-modules = ["schema"]
+# so pxt serve can import schema.py without PYTHONPATH hacks.
+
+[[tool.pixeltable.service]]
+name = "pipeline"
+prefix = "/api"
+port = 8000
+
+[[tool.pixeltable.service.routes]]
+type = "query"
+path = "/search"
+query = "schema:search_documents" # colon-separated: module:attribute
+method = "post"
+
+[[tool.pixeltable.service.routes]]
+type = "insert"
+path = "/ingest/document"
+table = "pipeline.documents"
+inputs = ["title", "body", "source_id"]
+outputs = ["uuid"]
+
+[[tool.pixeltable.service.routes]]
+type = "delete"
+path = "/delete/document"
+table = "pipeline.documents"
+```
+
+```bash
+pxt serve pipeline # serves at http://localhost:8000
+pxt serve pipeline --port 9000
+```
+
+Insert routes can auto-export to a serving DB on every request:
+
+```toml
+[[tool.pixeltable.service.routes]]
+type = "insert"
+path = "/ingest/document"
+table = "pipeline.documents"
+inputs = ["title", "body", "source_id"]
+outputs = ["uuid"]
+
+[tool.pixeltable.service.routes.export_sql]
+db_connect = "postgresql+psycopg://user:pass@host/db"
+table = "processed_documents"
+method = "insert"
+```
+
+`pxt serve` generates a complete FastAPI app with OpenAPI docs at `/docs`. Same capabilities as `FastAPIRouter` (insert, query, delete, background jobs). See the [Starter Kit `serving/` directory](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/serving) for a working example.
+
+---
+
+## Data Sharing and Replication
+
+Share tables across teams or environments:
+
+```python
+# Publish a table version (makes it shareable)
+t.publish()
+
+# Replicate a published table (creates a local synchronized copy)
+replica = pxt.replicate('dir.local_copy', source_table_uri)
+
+# Sync changes
+replica.pull() # fetch latest from source
+replica.push() # push local changes to source
+```
+
+## Export (CSV, JSON, Parquet, LanceDB)
+
+```python
+import pixeltable as pxt
+
+t = pxt.get_table('myapp/documents')
+
+# Export to CSV
+pxt.io.export_csv(t, '/data/documents.csv')
+
+# Export to JSON
+pxt.io.export_json(t, '/data/documents.json')
+
+# Export to Parquet
+pxt.io.export_parquet(t, '/data/documents.parquet')
+
+# Export to LanceDB (vector DB)
+pxt.io.export_lancedb(t, db_uri='/data/lance', table_name='docs')
+
+# Export filtered query results
+results = t.where(t.score > 0.8).select(t.title, t.score)
+pxt.io.export_csv(results, '/data/filtered.csv')
+
+# Other formats
+df = t.collect().to_pandas() # Pandas DataFrame
+ds = t.to_pytorch_dataset(['image']) # PyTorch DataLoader
+coco = t.to_coco_dataset() # COCO format
+```
+
+---
+
+## Export to SQL Databases
+
+```python
+from pixeltable.io.sql import export_sql
+
+# Export full table to SQLite
+export_sql(t, 'my_table', db_connect_str='sqlite:///data.db')
+
+# Export filtered query with column rename
+export_sql(
+ t.where(t.score > 0.8).select(product_name=t.name, price=t.price),
+ 'filtered_products',
+ db_connect_str='sqlite:///data.db',
+)
+
+# Append to existing SQL table
+export_sql(t, 'products', db_connect_str=connection_string, if_exists='insert')
+
+# Replace existing SQL table
+export_sql(t, 'products', db_connect_str=connection_string, if_exists='replace')
+
+# Cloud databases (PostgreSQL, Snowflake, etc.)
+export_sql(t, 'products', db_connect_str='postgresql+psycopg://user:pass@host:5432/db')
+```
+
+---
+
+## Configuration
+
+### API Keys
+
+```python
+# Via init
+pxt.init({'openai.api_key': 'sk-...', 'anthropic.api_key': 'sk-ant-...'})
+
+# Via environment variables (recommended)
+# OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY / GEMINI_API_KEY,
+# TOGETHER_API_KEY, FIREWORKS_API_KEY, MISTRAL_API_KEY, GROQ_API_KEY,
+# DEEPSEEK_API_KEY, VOYAGE_API_KEY, REPLICATE_API_TOKEN, HF_AUTH_TOKEN,
+# OPENROUTER_API_KEY, FAL_API_KEY, REVE_API_KEY, TWELVELABS_API_KEY,
+# BEDROCK_API_KEY
+```
+
+### config.toml
+
+Located at `~/.pixeltable/config.toml`:
+
+```toml
+[pixeltable]
+file_cache_size_g = 250
+time_zone = "America/Los_Angeles"
+hide_warnings = true
+verbosity = 2
+
+[openai]
+api_key = 'sk-...'
+# For Azure OpenAI, add these to the same [openai] section:
+# base_url = 'https://my-deployment.openai.azure.com/'
+# api_version = '2024-02-01'
+
+# Per-model rate limits (requests per minute)
+[openai.rate_limits]
+gpt-4o = 500
+gpt-4o-mini = 1000
+tts-1 = 50
+dall-e-3 = 10
+
+[anthropic]
+api_key = 'sk-ant-...'
+
+[mistral]
+api_key = 'my-mistral-key'
+rate_limit = 600
+```
+
+### Rate Limiting
+
+Default: 600 requests per minute per provider. Configure in `config.toml`:
+
+```toml
+# Single rate limit for all models of a provider
+[fireworks]
+rate_limit = 300
+
+# Per-model rate limits
+[openai.rate_limits]
+gpt-4o = 500
+gpt-4o-mini = 1000
+```
+
+Custom resource pools for non-built-in APIs:
+
+```python
+@pxt.udf(resource_pool='request-rate:my_service')
+def call_custom_api(prompt: str) -> str:
+ return requests.post('https://my-api.com/generate', json={'prompt': prompt}).json()['text']
+```
+
+### Media Destinations (Cloud Storage)
+
+Store media files in S3, GCS, Azure, or other cloud storage instead of locally:
+
+```toml
+# config.toml — global default
+[pixeltable]
+input_media_dest = "s3://my-bucket/input/"
+output_media_dest = "s3://my-bucket/output/"
+```
+
+```bash
+# Or via environment variables
+export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/"
+export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/"
+```
+
+```python
+# Per-column destination (overrides global default)
+t.add_computed_column(
+ thumbnail=pxt_image.thumbnail(t.image, size=(256, 256)),
+ destination='s3://my-bucket/thumbnails/',
+ if_exists='ignore',
+)
+```
+
+Supported providers: Amazon S3, Google Cloud Storage (`gs://`), Azure Blob Storage (`wasbs://`), Cloudflare R2, Backblaze B2, Tigris.
+
+**Pixeltable Cloud (home bucket):** Free R2-backed storage. No AWS credentials needed:
+
+```python
+# Use pxtfs:// URI as a destination
+t.add_computed_column(
+ thumbnail=pxt_image.thumbnail(t.image, size=(256, 256)),
+ destination='pxtfs://org:db/home/thumbnails/',
+)
+```
+
+```bash
+# Or set globally
+export PIXELTABLE_API_KEY="pxt_..."
+export PIXELTABLE_OUTPUT_MEDIA_DEST="pxtfs://org:db/home/"
+```
+
+See [Cloud Storage docs](https://docs.pixeltable.com/integrations/cloud-storage).
+
+## Common Pitfalls
+
+### Deprecated/Wrong Imports
+
+```python
+# WRONG — openai.vision does not exist
+from pixeltable.functions.openai import vision
+description = vision(prompt='Describe', image=t.image)
+
+# CORRECT — use chat_completions with multimodal messages
+from pixeltable.functions.openai import chat_completions
+description = chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'Describe this image.'},
+ {'type': 'image_url', 'image_url': {'url': t.image}}
+ ]
+ }],
+ model='gpt-4o-mini'
+).choices[0].message.content
+
+# WRONG — FrameIterator class import
+from pixeltable.iterators import FrameIterator
+pxt.create_view('v', t, iterator=FrameIterator.create(video=t.video, fps=1))
+
+# CORRECT — function import
+from pixeltable.functions.video import frame_iterator
+pxt.create_view('v', t, iterator=frame_iterator(t.video, fps=1), if_exists='ignore')
+```
+
+### Cast to String Before Embedding
+
+AI functions often return `Json` or complex objects. Embedding indexes require `String` columns:
+
+```python
+# WRONG — transcriptions returns a Json object, not a String
+t.add_computed_column(transcript=openai.transcriptions(audio=t.audio, model='whisper-1'), if_exists='ignore')
+t.add_embedding_index('transcript', embedding=embed_fn) # silently fails
+
+# CORRECT — extract .text and cast
+t.add_computed_column(
+ transcript=openai.transcriptions(audio=t.audio, model='whisper-1').text.astype(pxt.String),
+ if_exists='ignore')
+t.add_embedding_index('transcript', embedding=embed_fn, if_exists='ignore')
+```
+
+This applies to any computed column used as an embedding source — always ensure it evaluates to `pxt.String`.
+
+### The `if_exists='ignore'` Trap
+
+If you create a column with buggy logic, fixing the code and re-running does **NOT** update the column. `if_exists='ignore'` silently skips the already-existing (broken) column:
+
+```python
+# Bug: wrong model name
+t.add_computed_column(summary=openai.chat_completions(..., model='nonexistent'), if_exists='ignore')
+
+# Fixing the code and re-running does NOTHING — old column persists
+t.add_computed_column(summary=openai.chat_completions(..., model='gpt-4o-mini'), if_exists='ignore')
+
+# FIX: drop the column first, then recreate
+t.drop_column('summary')
+t.add_computed_column(summary=openai.chat_completions(..., model='gpt-4o-mini'), if_exists='ignore')
+
+# OR: wipe the entire namespace during development
+pxt.drop_dir('my_project', force=True)
+```
+
+### Other Pitfalls
+
+```python
+# Image in messages: use image_url, never raw pxt.Image
+messages=[{'role': 'user', 'content': [
+ {'type': 'text', 'text': 'Describe.'},
+ {'type': 'image_url', 'image_url': {'url': t.image}} # NOT {'type': 'image', 'data': t.image}
+]}]
+
+# Similarity: always use string= keyword
+sim = t.content.similarity(string=query_text) # NOT .similarity(query_text)
+```
+
+Schema corruption (`IntegrityError`): `pip install -U pixeltable && rm -rf ~/.pixeltable`
+
+### `@pxt.query` Eager Compilation
+
+`@pxt.query` compiles the function body at **decoration time** by calling it with expression placeholders. This means:
+
+```python
+# WRONG — .collect() executes during decoration, not at call time
+@pxt.query
+def find_similar(ref_id: str):
+ ref = t.where(t.uuid == ref_id).select(t.embedding).collect() # FAILS at decoration
+ return t.order_by(t.embedding.similarity(ref[0]['embedding'])).limit(5)
+
+# CORRECT — use a plain def for imperative logic that needs .collect()
+def find_similar(ref_id: str) -> list[dict]:
+ ref = t.where(t.uuid == ref_id).select(t.embedding).collect()
+ return list(t.order_by(t.embedding.similarity(ref[0]['embedding'])).limit(5).collect())
+
+# WRONG — references a table that may not exist yet
+@pxt.query
+def search():
+ t = pxt.get_table('maybe.missing') # FAILS if table doesn't exist at decoration time
+ return t.select(t.col)
+```
+
+### Nullable Primary Keys
+
+Primary key columns must be non-nullable. Bare `pxt.String` is nullable by default:
+
+```python
+# WRONG — nullable PK rejected at table creation
+t = pxt.create_table('dir.items', {
+ 'id': pxt.String, # nullable!
+}, primary_key=['id'])
+
+# CORRECT — explicit non-nullable
+t = pxt.create_table('dir.items', {
+ 'id': pxt.Required[pxt.String],
+}, primary_key=['id'])
+
+# CORRECT — uuid7() computed default (recommended)
+from pixeltable.functions.uuid import uuid7
+t = pxt.create_table('dir.items', {
+ 'content': pxt.String,
+ 'uuid': uuid7(),
+}, primary_key=['uuid'])
+```
+
+### Thread-Safety in FastAPI
+
+`Table` objects are bound to the thread that created them. In FastAPI (which dispatches sync endpoints to a thread pool), call `pxt.get_table()` inside each endpoint:
+
+```python
+# WRONG — module-level Table used across threads
+docs = pxt.get_table('app.documents')
+
+@app.get('/count')
+def count():
+ return {'count': docs.count()} # fails: wrong thread
+
+# CORRECT — get a fresh handle per request
+@app.get('/count')
+def count():
+ docs = pxt.get_table('app.documents')
+ return {'count': docs.count()}
+```
+
+### `document_splitter` with `token_limit`
+
+The `token_limit` separator requires the `tiktoken` package:
+
+```bash
+pip install tiktoken
+```
+
+Without it, `document_splitter(t.doc, separators='token_limit', ...)` raises `RequestError: This feature requires the tiktoken package`.
+
+## Performance Tips
+
+- Batch inserts for efficiency
+- Use `on_error='ignore'` to continue past row failures
+- Use `batch_size` in `@pxt.udf(batch_size=32)` for GPU models
+- Embedding indexes use HNSW for fast approximate nearest neighbor search
+- Use `t.insert(source='file.csv')` instead of loading into memory for large datasets
+- Use `keyframes_only=True` in `frame_iterator` for efficient video processing
+- Use `thumbnail()` + `b64_encode()` for API-friendly image responses
+- Configure rate limits in `config.toml` to avoid 429 errors on provider APIs
+- Use `recompute_columns(where=t.col.errortype != None)` to retry only failed rows
+- Use `add_btree_index()` on columns used frequently in `where()` filters
+- Cast AI function outputs to `pxt.String` with `.astype(pxt.String)` before embedding indexing
+- During development, use `pxt.drop_dir('dir', force=True)` to reset schema cleanly
diff --git a/skills/pixeltable/references/ml-data-pipeline.md b/skills/pixeltable/references/ml-data-pipeline.md
new file mode 100644
index 0000000..56ede38
--- /dev/null
+++ b/skills/pixeltable/references/ml-data-pipeline.md
@@ -0,0 +1,282 @@
+# ML Data Wrangling Pipeline
+
+A complete recipe for processing multimodal data (video, audio, images, documents) into training-ready datasets. Covers ingestion, enrichment with AI models, dataset versioning, and export to PyTorch, Parquet, and pandas.
+
+## Ingest Raw Data
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.video import frame_iterator
+from pixeltable.functions.openai import chat_completions
+from pixeltable.functions.huggingface import clip, detr_for_object_detection
+from pixeltable.functions import image as pxt_image
+
+pxt.create_dir('ml_data', if_exists='ignore')
+
+# From local files, URLs, or cloud storage (S3, GCS, Azure)
+images = pxt.create_table('ml_data.images', {
+ 'image': pxt.Image,
+ 'filename': pxt.String,
+ 'split': pxt.String, # 'train', 'val', 'test'
+}, if_exists='ignore')
+
+images.insert([
+ {'image': 'path/to/cat_01.jpg', 'filename': 'cat_01.jpg', 'split': 'train'},
+ {'image': 'path/to/dog_01.jpg', 'filename': 'dog_01.jpg', 'split': 'train'},
+ {'image': 's3://bucket/images/bird.jpg', 'filename': 'bird.jpg', 'split': 'val'},
+])
+
+# From CSV with schema overrides for media columns
+labeled_data = pxt.create_table('ml_data.labeled',
+ source='annotations.csv',
+ schema_overrides={'image_path': pxt.Image},
+ if_exists='ignore')
+
+# From Hugging Face datasets
+from pixeltable.io import import_huggingface_dataset
+import datasets
+ds = datasets.load_dataset('cifar10', split='train[:500]')
+cifar = import_huggingface_dataset('ml_data.cifar', ds, if_exists='ignore')
+```
+
+## Explore and Sample
+
+```python
+# Quick look at the data
+first_5 = images.head(5)
+total = images.count()
+train_count = images.where(images.split == 'train').count()
+
+# Random sample for exploration
+sample = images.sample(n=10, seed=42).select(images.image, images.filename).collect()
+```
+
+## Enrich with AI Models
+
+```python
+# Resize images for consistent training input (thumbnail preserves aspect ratio)
+images.add_computed_column(
+ resized=pxt_image.thumbnail(images.image, size=(224, 224)),
+ if_exists='ignore')
+
+# Auto-classify with a vision LLM
+images.add_computed_column(
+ label=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'Classify this image into exactly one word: cat, dog, bird, or other.'},
+ {'type': 'image_url', 'image_url': {'url': images.image}}
+ ]
+ }],
+ model='gpt-4o-mini',
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Object detection for bounding boxes
+detect = detr_for_object_detection.using(model_id='facebook/detr-resnet-50')
+images.add_computed_column(
+ detections=detect(images.image, threshold=0.8),
+ if_exists='ignore')
+
+# Visualize detections (draw bounding boxes on images)
+from pixeltable.functions.image import draw_bounding_boxes
+images.add_computed_column(
+ annotated=draw_bounding_boxes(images.image, images.detections),
+ if_exists='ignore')
+
+# Generate captions
+images.add_computed_column(
+ caption=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'Describe this image in one sentence.'},
+ {'type': 'image_url', 'image_url': {'url': images.image}}
+ ]
+ }],
+ model='gpt-4o-mini',
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Add CLIP embeddings for similarity search and deduplication
+embed_fn = clip.using(model_id='openai/clip-vit-base-patch32')
+images.add_embedding_index('image', embedding=embed_fn, if_exists='ignore')
+```
+
+## Curate: Filter, Deduplicate, Quality Check
+
+```python
+# Test on a small sample first (recommended workflow)
+sample = images.limit(5).select(images.image, images.label, images.caption).collect()
+
+# Filter by label
+cats = images.where(images.label == 'cat').select(images.image, images.caption).collect()
+
+# Find near-duplicates via similarity
+sim = images.image.similarity(image='path/to/reference.jpg')
+near_dupes = images.where(sim > 0.95).select(images.filename, sim).collect()
+
+# Review errors from computed columns
+errors = images.where(images.label.errortype != None).select(
+ images.filename, images.label.errormsg).collect()
+
+# Recompute failed columns (critical for retrying after API errors)
+images.recompute_columns(columns=['label'], where=images.label.errortype != None)
+```
+
+## Video Frame Extraction
+
+```python
+videos = pxt.create_table('ml_data.videos', {
+ 'video': pxt.Video,
+ 'category': pxt.String,
+}, if_exists='ignore')
+
+frames = pxt.create_view('ml_data.frames', videos,
+ iterator=frame_iterator(videos.video, fps=1.0),
+ if_exists='ignore')
+
+frames.add_computed_column(
+ resized=pxt_image.thumbnail(frames.frame, size=(224, 224)),
+ if_exists='ignore')
+```
+
+## Retrieval UDFs for Structured Data Lookup
+
+```python
+# Create a lookup function for enrichment across tables
+products = pxt.create_table('ml_data.products', {
+ 'sku': pxt.String,
+ 'name': pxt.String,
+ 'category': pxt.String,
+}, if_exists='ignore')
+
+get_product = pxt.retrieval_udf(
+ products,
+ name='get_product',
+ description='Look up a product by SKU',
+ parameters=['sku'],
+ limit=1,
+)
+
+# Use as a computed column for cross-table enrichment
+# orders.add_computed_column(product_info=get_product(sku=orders.product_sku), if_exists='ignore')
+```
+
+## Version with Snapshots
+
+```python
+# Take a point-in-time snapshot before exporting
+snap_v1 = pxt.create_snapshot('ml_data.images_v1', images, if_exists='ignore')
+
+# Later, take another snapshot after adding more data
+# snap_v2 = pxt.create_snapshot('ml_data.images_v2', images, if_exists='ignore')
+
+# Query any snapshot like a regular table
+snap_v1.select(snap_v1.filename, snap_v1.label).limit(5).collect()
+```
+
+## Export for Training
+
+```python
+# To PyTorch Dataset (recommended for training loops)
+train_query = images.where(images.split == 'train').select(
+ images.resized, images.label)
+
+pytorch_ds = train_query.to_pytorch_dataset(image_format='pt')
+
+from torch.utils.data import DataLoader
+dataloader = DataLoader(pytorch_ds, batch_size=32, num_workers=4)
+
+# Iterate in a training loop
+for batch in dataloader:
+ imgs, labels = batch # imgs: (32, 3, 224, 224) tensor
+ # ... training step ...
+ break
+
+# To Parquet (for Spark, DuckDB, or cross-platform sharing)
+from pixeltable.io import export_parquet
+
+export_parquet(
+ images.where(images.split == 'train').select(
+ images.filename, images.label, images.caption),
+ 'output/train/')
+
+export_parquet(
+ images.where(images.split == 'val').select(
+ images.filename, images.label, images.caption),
+ 'output/val/')
+
+# To pandas (for quick analysis or CSV export)
+df = images.select(
+ images.filename, images.label, images.caption
+).collect().to_pandas()
+df.to_csv('output/annotations.csv', index=False)
+```
+
+## Key Patterns
+
+### Test Before Deploying
+
+Always test transformations on a small sample before committing:
+
+```python
+# 1. Test the expression inline
+result = images.limit(5).select(
+ images.image, label=chat_completions(...).choices[0].message.content
+).collect()
+
+# 2. Review results, then deploy as a computed column
+images.add_computed_column(label=chat_completions(...).choices[0].message.content, if_exists='ignore')
+```
+
+### Error Handling and Recomputation
+
+```python
+# Insert with error tolerance
+status = images.insert(rows, on_error='ignore')
+print(f'Inserted: {status.num_rows}, Errors: {status.num_excs}')
+
+# Find and inspect failed rows
+errors = images.where(images.label.errortype != None).select(
+ images.filename, images.label.errormsg).collect()
+
+# Retry failed computations (e.g., after fixing rate limits)
+images.recompute_columns(columns=['label'], where=images.label.errortype != None)
+```
+
+### PyTorch Dataset Options
+
+| Parameter | Values | Description |
+|-----------|--------|-------------|
+| `image_format` | `'pt'` | CxHxW float tensors in [0, 1] |
+| `image_format` | `'np'` | HxWxC uint8 arrays in [0, 255] |
+
+Data is cached to disk for efficient repeated loading. Use `num_workers > 0` in DataLoader for parallel loading.
+
+## Building Blocks
+
+| Step | Function | Purpose |
+|------|----------|---------|
+| Ingest | `create_table(source='file.csv')` | Load from CSV, Parquet, URLs, S3 |
+| Ingest | `import_huggingface_dataset()` | Load from Hugging Face Hub |
+| Explore | `t.head(5)`, `t.count()`, `t.sample(n)` | Quick data inspection |
+| Enrich | `add_computed_column(label=...)` | Auto-label with AI models |
+| Enrich | `detr_for_object_detection()` | Bounding box detection |
+| Visualize | `draw_bounding_boxes(image, detections)` | Overlay detections on images |
+| Search | `add_embedding_index()` + `.similarity()` | Find similar / deduplicate |
+| Curate | `.where(col.errortype != None)` | Review failed transformations |
+| Retry | `recompute_columns(columns=[...], where=...)` | Re-run failed computations |
+| Version | `create_snapshot('name', table)` | Point-in-time dataset copy |
+| Export | `to_pytorch_dataset(image_format='pt')` | PyTorch DataLoader-ready |
+| Export | `export_parquet(query, 'path/')` | Parquet files for sharing |
+| Export | `.collect().to_pandas()` | pandas DataFrame |
+| Lookup | `pxt.retrieval_udf(table, ...)` | Structured data enrichment |
+
+## Adapting This Recipe
+
+- **Audio data**: Use `audio_splitter` and `transcriptions` to create labeled audio datasets — see [workflows.md → Audio Transcription](workflows.md#audio-transcription-and-analysis)
+- **Document data**: Use `document_splitter` to chunk PDFs into training examples — see [workflows.md → RAG Pipeline](workflows.md#rag-pipeline)
+- **Add human labels**: Export to Label Studio, annotate, then re-import
+- **Multi-GPU training**: The PyTorch dataset supports `DistributedSampler` with standard PyTorch patterns
diff --git a/skills/pixeltable/references/providers.md b/skills/pixeltable/references/providers.md
new file mode 100644
index 0000000..f889b5c
--- /dev/null
+++ b/skills/pixeltable/references/providers.md
@@ -0,0 +1,591 @@
+# Pixeltable AI Provider Reference
+
+Complete examples for all 25+ built-in AI provider integrations. All functions live in `pixeltable.functions.*`.
+
+## Quick Reference
+
+Use this table to find the correct import, function, and output accessor for each provider:
+
+| Provider | Import | Function | Extract answer |
+|----------|--------|----------|----------------|
+| OpenAI | `from pixeltable.functions.openai import chat_completions` | `chat_completions(messages=..., model='gpt-4o-mini')` | `.choices[0].message.content` |
+| OpenAI Embeddings | `from pixeltable.functions.openai import embeddings` | `embeddings(input=..., model='text-embedding-3-small')` | `.data[0].embedding` |
+| OpenAI TTS | `from pixeltable.functions.openai import speech` | `speech(input=..., model='tts-1', voice='alloy')` | *(returns Audio directly)* |
+| OpenAI Transcription | `from pixeltable.functions.openai import transcriptions` | `transcriptions(audio=..., model='whisper-1')` | `.text` |
+| OpenAI DALL-E | `from pixeltable.functions.openai import image_generations` | `image_generations(prompt=..., model='dall-e-3')` | `.data[0].url` |
+| Anthropic | `from pixeltable.functions.anthropic import messages` | `messages(messages=..., model='claude-sonnet-4-20250514', max_tokens=1024)` | `.content[0].text` |
+| Gemini | `from pixeltable.functions.gemini import generate_content, embed_content` | `generate_content(contents=..., model='gemini-2.0-flash')` | *(returns text directly)* |
+| Together | `from pixeltable.functions.together import chat_completions` | `chat_completions(messages=..., model='meta-llama/...')` | `.choices[0].message.content` |
+| Fireworks | `from pixeltable.functions.fireworks import chat_completions` | `chat_completions(messages=..., model='accounts/fireworks/...')` | `.choices[0].message.content` |
+| Ollama | `from pixeltable.functions.ollama import chat_completions` | `chat_completions(messages=..., model='llama3.1')` | `.choices[0].message.content` |
+| Mistral | `from pixeltable.functions.mistralai import chat_completions` | `chat_completions(messages=..., model='mistral-large-latest')` | `.choices[0].message.content` |
+| Groq | `from pixeltable.functions.groq import chat_completions` | `chat_completions(messages=..., model='llama-3.1-70b-versatile')` | `.choices[0].message.content` |
+| DeepSeek | `from pixeltable.functions.deepseek import chat_completions` | `chat_completions(messages=..., model='deepseek-chat')` | `.choices[0].message.content` |
+| OpenRouter | `from pixeltable.functions.openrouter import chat_completions` | `chat_completions(messages=..., model='anthropic/claude-sonnet-4-20250514')` | `.choices[0].message.content` |
+| Hugging Face CLIP | `from pixeltable.functions.huggingface import clip` | `clip.using(model_id='openai/clip-vit-base-patch32')` | *(use as embedding index)* |
+| Hugging Face ST | `from pixeltable.functions.huggingface import sentence_transformer` | `sentence_transformer.using(model_id='all-MiniLM-L6-v2')` | *(use as embedding index)* |
+| Whisper (Local) | `from pixeltable.functions.whisper import transcribe` | `transcribe(audio=..., model='base')` | *(returns text directly)* |
+| WhisperX (Local) | `from pixeltable.functions.whisperx import transcribe` | `transcribe(audio=..., model='large-v2', diarize=True)` | *(returns JSON with segments)* |
+| Voyage AI | `from pixeltable.functions.voyageai import embed` | `embed(input=..., model='voyage-2')` | *(returns embedding directly)* |
+| Jina AI | `from pixeltable.functions.jina import embeddings` | `embeddings(text=..., model='jina-embeddings-v3')` | *(use as embedding index)* |
+| Twelve Labs | `from pixeltable.functions.twelvelabs import embed` | `embed(video_segment=..., model_name='marengo3.0')` | *(use as video embedding index)* |
+| BFL FLUX | `from pixeltable.functions.bfl import generate` | `generate(prompt=..., width=1024, height=1024)` | *(returns Image directly)* |
+| RunwayML | `from pixeltable.functions.runwayml import text_to_video` | `text_to_video(prompt=..., model='gen4.5')` | `['output'][0]` cast to `pxt.Video` |
+| fal.ai | `from pixeltable.functions.fal import run` | `run(input=json, app='fal-ai/flux/schnell')` | *(returns JSON)* |
+| Reve | `from pixeltable.functions.reve import create` | `create(prompt=...)` | *(returns Image directly)* |
+| Fabric | `from pixeltable.functions.fabric import chat_completions` | `chat_completions(messages=..., model='gpt-4.1')` | `.choices[0].message.content` |
+| llama.cpp | `from pixeltable.functions.llama_cpp import create_chat_completion` | `create_chat_completion(messages=..., repo_id='...', repo_filename='*q5_k_m.gguf')` | `.choices[0].message.content` |
+| YOLOX | `from pixeltable.functions.yolox import yolox` | `yolox(image=...)` | *(returns detection JSON)* |
+| Replicate | `from pixeltable.functions.replicate import run` | `run(input=json, model='owner/model')` | *(returns JSON)* |
+| Bedrock | `from pixeltable.functions.bedrock import converse` | `converse(messages=..., model='...')` | `.output.message.content[0].text` |
+
+**Key patterns**: OpenAI-compatible providers (Together, Fireworks, Ollama, Mistral, Groq, DeepSeek, OpenRouter, Fabric) all return `.choices[0].message.content`. Anthropic returns `.content[0].text`. Embedding functions are used with `add_embedding_index()`, not accessed directly. Image generation functions (BFL, Reve) return `pxt.Image` directly.
+
+---
+
+## Full Examples
+
+### OpenAI
+
+### Chat Completions
+
+```python
+from pixeltable.functions.openai import chat_completions
+
+# Basic
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+
+# With system message
+t.add_computed_column(
+ response=chat_completions(
+ messages=[
+ {'role': 'system', 'content': 'You are a helpful assistant.'},
+ {'role': 'user', 'content': t.prompt}
+ ],
+ model='gpt-4o',
+ max_tokens=1000,
+ temperature=0.7
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+
+# Vision (image analysis)
+t.add_computed_column(
+ description=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'Describe this image.'},
+ {'type': 'image_url', 'image_url': {'url': t.image}}
+ ]
+ }],
+ model='gpt-4o'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+
+# JSON mode
+t.add_computed_column(
+ structured=chat_completions(
+ messages=[{'role': 'user', 'content': t.text}],
+ model='gpt-4o-mini',
+ response_format={'type': 'json_object'}
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+### Embeddings
+
+```python
+from pixeltable.functions.openai import embeddings
+
+t.add_computed_column(
+ embed=embeddings(input=t.text, model='text-embedding-3-small').data[0].embedding,
+ if_exists='ignore',
+)
+
+# As index
+t.add_embedding_index('text', embedding=embeddings.using(model='text-embedding-3-small'), if_exists='ignore')
+```
+
+### Image Generation (DALL-E)
+
+```python
+from pixeltable.functions.openai import image_generations
+
+t.add_computed_column(
+ generated=image_generations(prompt=t.description, model='dall-e-3', size='1024x1024').data[0].url,
+ if_exists='ignore',
+)
+```
+
+### Speech (TTS)
+
+```python
+from pixeltable.functions.openai import speech
+
+t.add_computed_column(audio=speech(input=t.text, model='tts-1', voice='alloy'), if_exists='ignore')
+```
+
+### Transcription
+
+```python
+from pixeltable.functions.openai import transcriptions
+
+t.add_computed_column(transcript=transcriptions(audio=t.audio_file, model='whisper-1').text, if_exists='ignore')
+```
+
+## Anthropic
+
+```python
+from pixeltable.functions.anthropic import messages
+
+# Basic
+t.add_computed_column(
+ response=messages(
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': t.prompt}]}],
+ model='claude-sonnet-4-20250514',
+ max_tokens=1024
+ ).content[0].text,
+ if_exists='ignore',
+)
+
+# With system prompt
+t.add_computed_column(
+ response=messages(
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': t.prompt}]}],
+ model='claude-sonnet-4-20250514',
+ system='You are an expert analyst.',
+ max_tokens=2048
+ ).content[0].text,
+ if_exists='ignore',
+)
+
+# With tool calling
+from pixeltable.functions.anthropic import messages, invoke_tools
+
+tools = pxt.tools(search_fn, lookup_fn)
+t.add_computed_column(
+ response=messages(
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': t.prompt}]}],
+ model='claude-sonnet-4-20250514',
+ tools=tools,
+ tool_choice=tools.choice(required=True),
+ max_tokens=1024,
+ ),
+ if_exists='ignore',
+)
+t.add_computed_column(
+ tool_results=invoke_tools(tools, t.response),
+ if_exists='ignore',
+)
+```
+
+## Google Gemini
+
+```python
+from pixeltable.functions.gemini import generate_content, embed_content
+
+# Text generation
+t.add_computed_column(response=generate_content(contents=t.prompt, model='gemini-2.0-flash'), if_exists='ignore')
+
+# Embeddings (for add_embedding_index)
+t.add_embedding_index(
+ 'text',
+ string_embed=embed_content.using(model='gemini-embedding-2-preview'),
+)
+
+# Multimodal: pass images alongside text
+t.add_computed_column(
+ vision=generate_content(contents=[t.image, t.prompt], model='gemini-2.0-flash'),
+ if_exists='ignore',
+)
+```
+
+## Together AI
+
+```python
+from pixeltable.functions.together import chat_completions
+
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+## Fireworks
+
+```python
+from pixeltable.functions.fireworks import chat_completions
+
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='accounts/fireworks/models/llama-v3p1-70b-instruct'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+## Ollama (Local)
+
+```python
+from pixeltable.functions.ollama import chat_completions, embeddings
+
+# Chat
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='llama3.1'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+
+# Embeddings
+t.add_computed_column(embed=embeddings(input=t.text, model='nomic-embed-text'), if_exists='ignore')
+```
+
+## Mistral AI
+
+```python
+from pixeltable.functions.mistralai import chat_completions
+
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='mistral-large-latest'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+## Groq
+
+```python
+from pixeltable.functions.groq import chat_completions
+
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='llama-3.1-70b-versatile'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+## DeepSeek
+
+```python
+from pixeltable.functions.deepseek import chat_completions
+
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='deepseek-chat'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+## OpenRouter
+
+```python
+from pixeltable.functions.openrouter import chat_completions
+
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='anthropic/claude-sonnet-4-20250514'
+ ).choices[0].message.content,
+ if_exists='ignore',
+)
+```
+
+## Hugging Face
+
+### CLIP (Multimodal Embeddings)
+
+```python
+from pixeltable.functions.huggingface import clip
+
+embed_fn = clip.using(model_id='openai/clip-vit-base-patch32')
+t.add_embedding_index('image', embedding=embed_fn, if_exists='ignore')
+
+sim = t.image.similarity(string='a photo of a dog')
+results = t.order_by(sim, asc=False).limit(5).select(t.image, sim).collect()
+```
+
+### Sentence Transformers
+
+```python
+from pixeltable.functions.huggingface import sentence_transformer
+
+embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
+t.add_embedding_index('text', embedding=embed_fn, if_exists='ignore')
+
+# For multilingual / high-quality (recommended for production)
+embed_fn = sentence_transformer.using(model_id='intfloat/multilingual-e5-large-instruct')
+t.add_embedding_index('text', string_embed=embed_fn, if_exists='ignore')
+```
+
+### Object Detection (DETR)
+
+```python
+from pixeltable.functions.huggingface import detr_for_object_detection
+
+detect = detr_for_object_detection.using(model_id='facebook/detr-resnet-50')
+t.add_computed_column(detections=detect(t.image, threshold=0.8), if_exists='ignore')
+```
+
+## Whisper (Local)
+
+```python
+from pixeltable.functions.whisper import transcribe
+
+t.add_computed_column(transcript=transcribe(audio=t.audio, model='base'), if_exists='ignore')
+```
+
+## Voyage AI
+
+```python
+from pixeltable.functions.voyageai import embed
+
+t.add_computed_column(embed=embed(input=t.text, model='voyage-2'), if_exists='ignore')
+```
+
+## WhisperX (Local)
+
+Enhanced local transcription with word-level timestamps and speaker diarization.
+
+```python
+from pixeltable.functions.whisperx import transcribe
+
+# Basic transcription
+t.add_computed_column(
+ transcript=transcribe(audio=t.audio, model='large-v2'),
+ if_exists='ignore')
+
+# With speaker diarization (requires HF_TOKEN for pyannote)
+t.add_computed_column(
+ transcript=transcribe(audio=t.audio, model='large-v2', diarize=True),
+ if_exists='ignore')
+```
+
+## Jina AI
+
+Embeddings and reranking for search pipelines.
+
+```python
+from pixeltable.functions.jina import embeddings, rerank
+
+# Embeddings (multilingual, 89+ languages)
+t.add_embedding_index('text',
+ embedding=embeddings.using(model='jina-embeddings-v3', task='retrieval.passage'),
+ if_exists='ignore')
+
+# Reranking search results
+t.add_computed_column(
+ ranked=rerank(
+ query=t.query,
+ documents=t.candidates,
+ model='jina-reranker-v2-base-multilingual',
+ top_n=3,
+ return_documents=True,
+ ), if_exists='ignore')
+```
+
+## Twelve Labs
+
+Video understanding via multimodal embeddings.
+
+```python
+from pixeltable.functions.twelvelabs import embed
+
+# Add video embedding index for semantic video search
+t.add_embedding_index('video',
+ embedding=embed.using(model_name='marengo3.0'),
+ if_exists='ignore')
+
+# Search videos by text query
+sim = t.video.similarity(string='person giving a presentation')
+results = t.order_by(sim, asc=False).limit(5).select(t.video, sim).collect()
+```
+
+## BFL FLUX
+
+Image generation and editing with Black Forest Labs FLUX models.
+
+```python
+from pixeltable.functions.bfl import generate, edit, expand, fill
+
+# Text-to-image generation
+t.add_computed_column(
+ image=generate(prompt=t.description, width=1024, height=1024),
+ if_exists='ignore')
+
+# Edit an existing image
+t.add_computed_column(
+ edited=edit(image=t.image, prompt='Make the sky more dramatic'),
+ if_exists='ignore')
+
+# Expand image canvas (outpainting)
+t.add_computed_column(
+ expanded=expand(image=t.image, prompt='Extend the landscape', top=200, right=200),
+ if_exists='ignore')
+
+# Inpaint masked region
+t.add_computed_column(
+ filled=fill(image=t.image, mask=t.mask, prompt='A wooden bench'),
+ if_exists='ignore')
+```
+
+## RunwayML
+
+AI video generation and transformation.
+
+```python
+from pixeltable.functions.runwayml import text_to_video, image_to_video
+
+# Generate video from text
+t.add_computed_column(
+ video=text_to_video(
+ prompt=t.description, model='gen4.5', ratio='1280:720', duration=5,
+ ).astype(pxt.Video),
+ if_exists='ignore')
+
+# Animate an image into a video
+t.add_computed_column(
+ video=image_to_video(
+ prompt=t.description, image=t.image, model='gen4.5', ratio='1280:720',
+ ).astype(pxt.Video),
+ if_exists='ignore')
+```
+
+## fal.ai
+
+Run any model on fal.ai's inference platform.
+
+```python
+from pixeltable.functions.fal import run
+
+# Image generation with FLUX Schnell
+t.add_computed_column(
+ result=run(
+ input={'prompt': t.description, 'image_size': 'landscape_16_9'},
+ app='fal-ai/flux/schnell',
+ ), if_exists='ignore')
+```
+
+## Reve
+
+Image generation, editing, and remixing.
+
+```python
+from pixeltable.functions.reve import create, edit, remix
+
+# Text-to-image
+t.add_computed_column(
+ image=create(prompt=t.description),
+ if_exists='ignore')
+
+# Edit an existing image
+t.add_computed_column(
+ edited=edit(image=t.image, edit_instruction='Make it look like a watercolor painting'),
+ if_exists='ignore')
+```
+
+## Microsoft Fabric
+
+Azure OpenAI models via Microsoft Fabric notebooks (no API key needed in Fabric environment).
+
+```python
+from pixeltable.functions.fabric import chat_completions, embeddings
+
+# Chat
+t.add_computed_column(
+ response=chat_completions(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ model='gpt-4.1',
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Embeddings
+t.add_embedding_index('text',
+ embedding=embeddings.using(model='text-embedding-3-small'),
+ if_exists='ignore')
+```
+
+## llama.cpp
+
+Run local GGUF models via llama.cpp (auto-downloaded from Hugging Face).
+
+```python
+from pixeltable.functions.llama_cpp import create_chat_completion
+
+t.add_computed_column(
+ response=create_chat_completion(
+ messages=[{'role': 'user', 'content': t.prompt}],
+ repo_id='Qwen/Qwen2.5-0.5B-Instruct-GGUF',
+ repo_filename='*q5_k_m.gguf',
+ ), if_exists='ignore')
+```
+
+## Replicate
+
+Run any model on Replicate's cloud platform.
+
+```python
+from pixeltable.functions.replicate import run
+
+t.add_computed_column(
+ result=run(input={'prompt': t.description}, model='stability-ai/sdxl'),
+ if_exists='ignore')
+```
+
+## Bedrock
+
+AWS Bedrock models.
+
+```python
+from pixeltable.functions.bedrock import converse, invoke_tools
+
+# Chat
+t.add_computed_column(
+ response=converse(
+ messages=[{'role': 'user', 'content': [{'text': t.prompt}]}],
+ model='anthropic.claude-sonnet-4-20250514-v1:0',
+ ).output.message.content[0].text,
+ if_exists='ignore')
+
+# Tool calling
+tools = pxt.tools(search_fn, lookup_fn)
+t.add_computed_column(
+ response=converse(
+ messages=[{'role': 'user', 'content': [{'text': t.prompt}]}],
+ model='anthropic.claude-sonnet-4-20250514-v1:0',
+ tools=tools,
+ ), if_exists='ignore')
+t.add_computed_column(
+ tool_results=invoke_tools(tools, t.response),
+ if_exists='ignore')
+```
+
+## YOLOX
+
+Local object detection.
+
+```python
+from pixeltable.functions.yolox import yolox
+
+t.add_computed_column(detections=yolox(t.image), if_exists='ignore')
+```
diff --git a/skills/pixeltable/references/video-rag-agents.md b/skills/pixeltable/references/video-rag-agents.md
new file mode 100644
index 0000000..d751730
--- /dev/null
+++ b/skills/pixeltable/references/video-rag-agents.md
@@ -0,0 +1,251 @@
+# Video RAG Agent
+
+A complete recipe that combines video processing, document/transcript retrieval, and a tool-calling agent into one pipeline. Insert a video and a question — the agent automatically searches frames, transcripts, and documents to answer it.
+
+## Full Pipeline
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.video import frame_iterator, extract_audio
+from pixeltable.functions.audio import audio_splitter
+from pixeltable.functions.string import string_splitter
+from pixeltable.functions.openai import chat_completions, transcriptions
+from pixeltable.functions.huggingface import clip, sentence_transformer
+from pixeltable.functions.anthropic import messages, invoke_tools
+from pixeltable.functions import image as pxt_image
+from datetime import datetime
+
+pxt.create_dir('vrag', if_exists='ignore')
+
+# ── 1. Video ingestion ──────────────────────────────────────────────
+
+videos = pxt.create_table('vrag.videos', {
+ 'video': pxt.Video,
+ 'title': pxt.String,
+}, if_exists='ignore')
+
+# ── 2. Keyframe extraction + CLIP visual search ─────────────────────
+
+frames = pxt.create_view('vrag.frames', videos,
+ iterator=frame_iterator(videos.video, keyframes_only=True),
+ if_exists='ignore')
+
+frames.add_computed_column(
+ thumbnail=pxt_image.b64_encode(
+ pxt_image.thumbnail(frames.frame, size=(320, 320))),
+ if_exists='ignore')
+
+frames.add_embedding_index('frame',
+ embedding=clip.using(model_id='openai/clip-vit-base-patch32'),
+ if_exists='ignore')
+
+# Describe each frame with a vision LLM
+frames.add_computed_column(
+ description=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'Describe this video frame in one sentence.'},
+ {'type': 'image_url', 'image_url': {'url': frames.frame}}
+ ]
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# ── 3. Audio extraction → transcription → sentence embedding ────────
+
+videos.add_computed_column(
+ audio=extract_audio(videos.video, format='mp3'),
+ if_exists='ignore')
+
+audio_chunks = pxt.create_view('vrag.audio_chunks', videos,
+ iterator=audio_splitter(audio=videos.audio, duration=30.0),
+ if_exists='ignore')
+
+audio_chunks.add_computed_column(
+ transcription=transcriptions(
+ audio=audio_chunks.audio_chunk, model='whisper-1'),
+ if_exists='ignore')
+
+sentences = pxt.create_view('vrag.sentences',
+ audio_chunks.where(audio_chunks.transcription != None),
+ iterator=string_splitter(
+ text=audio_chunks.transcription.text, separators='sentence'),
+ if_exists='ignore')
+
+embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
+sentences.add_embedding_index('text', string_embed=embed_fn, if_exists='ignore')
+
+# ── 4. Query functions (become agent tools) ──────────────────────────
+
+@pxt.query
+def search_video_frames(query_text: str):
+ """Search video frames by visual similarity using CLIP."""
+ sim = frames.frame.similarity(string=query_text)
+ return frames.order_by(sim, asc=False).limit(10).select(
+ frames.description, frames.thumbnail, sim=sim)
+
+@pxt.query
+def search_transcripts(query_text: str):
+ """Search video transcripts by semantic similarity."""
+ sim = sentences.text.similarity(string=query_text)
+ return sentences.where(sim > 0.5).order_by(sim, asc=False).select(
+ sentences.text, sim=sim).limit(20)
+
+@pxt.udf
+def web_search(keywords: str) -> str:
+ """Search the web for additional context."""
+ from duckduckgo_search import DDGS
+ with DDGS() as ddgs:
+ results = list(ddgs.news(keywords=keywords, max_results=5))
+ return '\n'.join(
+ f"{r['title']}: {r['body']}" for r in results
+ ) if results else 'No results.'
+
+# ── 5. Context assembly ─────────────────────────────────────────────
+
+@pxt.udf
+def assemble_context(
+ question: str,
+ tool_outputs: list | None,
+ transcript_context: list | None,
+ frame_context: list | None,
+) -> str:
+ parts = [f"QUESTION: {question}"]
+
+ tool_str = str(tool_outputs) if tool_outputs else 'N/A'
+ parts.append(f"\n\n{tool_str}\n")
+
+ if transcript_context:
+ transcript_str = '\n'.join(
+ f"- {item.get('text', '')}"
+ for item in transcript_context if isinstance(item, dict)
+ ) or 'N/A'
+ else:
+ transcript_str = 'N/A'
+ parts.append(f"\n\n{transcript_str}\n")
+
+ if frame_context:
+ frame_str = '\n'.join(
+ f"- {item.get('description', '')}"
+ for item in frame_context if isinstance(item, dict)
+ ) or 'N/A'
+ else:
+ frame_str = 'N/A'
+ parts.append(f"\n\n{frame_str}\n")
+
+ return '\n'.join(parts)
+
+# ── 6. Agent pipeline ───────────────────────────────────────────────
+
+tools = pxt.tools(web_search, search_transcripts, search_video_frames)
+
+agent = pxt.create_table('vrag.agent', {
+ 'prompt': pxt.String,
+ 'timestamp': pxt.Timestamp,
+ 'system_prompt': pxt.String,
+ 'max_tokens': pxt.Int,
+ 'temperature': pxt.Float,
+}, if_exists='ignore')
+
+# Step 1: Initial LLM call — tool selection
+agent.add_computed_column(
+ initial_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': agent.prompt}]}],
+ tools=tools,
+ tool_choice=tools.choice(required=True),
+ max_tokens=agent.max_tokens,
+ model_kwargs={'system': agent.system_prompt, 'temperature': agent.temperature},
+ ), if_exists='ignore')
+
+# Step 2: Execute the tools the LLM selected
+agent.add_computed_column(
+ tool_output=invoke_tools(tools, agent.initial_response),
+ if_exists='ignore')
+
+# Step 3: RAG context from transcripts and frames
+agent.add_computed_column(
+ transcript_context=search_transcripts(agent.prompt),
+ if_exists='ignore')
+
+agent.add_computed_column(
+ frame_context=search_video_frames(agent.prompt),
+ if_exists='ignore')
+
+# Step 4: Assemble all context
+agent.add_computed_column(
+ context=assemble_context(
+ agent.prompt, agent.tool_output,
+ agent.transcript_context, agent.frame_context),
+ if_exists='ignore')
+
+# Step 5: Final LLM call with full context
+agent.add_computed_column(
+ final_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': agent.context}]}],
+ max_tokens=agent.max_tokens,
+ model_kwargs={
+ 'system': 'Answer based on the video transcripts, visual descriptions, and tool results. Cite timestamps when possible.',
+ 'temperature': agent.temperature,
+ },
+ ), if_exists='ignore')
+
+# Step 6: Extract answer
+agent.add_computed_column(
+ answer=agent.final_response.content[0].text,
+ if_exists='ignore')
+```
+
+## Usage
+
+```python
+# Insert videos
+videos.insert([
+ {'video': 'lecture.mp4', 'title': 'ML Lecture'},
+ {'video': 'https://example.com/demo.mp4', 'title': 'Product Demo'},
+])
+
+# Ask a question — the full pipeline runs automatically
+agent.insert([{
+ 'prompt': 'What visual examples does the lecturer use to explain gradient descent?',
+ 'timestamp': datetime.now(),
+ 'system_prompt': 'Use search_video_frames for visual content and search_transcripts for spoken content.',
+ 'max_tokens': 1024,
+ 'temperature': 0.7,
+}])
+
+result = agent.order_by(agent.timestamp, asc=False).limit(1).select(agent.answer).collect()
+```
+
+## How It Works
+
+The pipeline is a chain of computed columns. Inserting a row into `agent` triggers these steps automatically:
+
+1. **Initial LLM call** — Claude selects which tools to call (transcript search, frame search, web search)
+2. **Tool execution** — `invoke_tools()` runs the selected `@pxt.query` / `@pxt.udf` functions
+3. **RAG retrieval** — Transcript and frame similarity searches run in parallel as computed columns
+4. **Context assembly** — A UDF merges tool outputs, transcript excerpts, and visual descriptions
+5. **Final LLM call** — Claude synthesizes everything into a grounded answer
+
+### Key building blocks
+
+| Concept | Function | Purpose |
+|---------|----------|---------|
+| `frame_iterator` | `pxt.create_view(..., iterator=frame_iterator(...))` | Extract video keyframes |
+| `audio_splitter` | `pxt.create_view(..., iterator=audio_splitter(...))` | Split audio into chunks |
+| `transcriptions` | `t.add_computed_column(transcription=transcriptions(...))` | Transcribe audio chunks |
+| `string_splitter` | `pxt.create_view(..., iterator=string_splitter(...))` | Split transcript into sentences |
+| `add_embedding_index` | `t.add_embedding_index('col', embedding=fn)` | Enable similarity search |
+| `@pxt.query` | `def search_transcripts(query_text: str): ...` | Reusable retrieval + agent tool |
+| `pxt.tools()` | `tools = pxt.tools(fn1, fn2)` | Bundle functions as LLM tools |
+| `invoke_tools()` | `invoke_tools(tools, response)` | Execute the tools the LLM chose |
+
+## Adapting This Recipe
+
+- **Swap providers**: Replace `messages` (Anthropic) with `chat_completions` (OpenAI/Together/etc.) — see [providers.md](providers.md#quick-reference) for import and output shapes
+- **Add document RAG**: Add a `document_splitter` view and a `search_documents` query function to the tools list
+- **Use local models**: Replace OpenAI transcription with `whisper.transcribe()` and use `ollama.chat_completions` for the LLM — see [workflows.md → Local LLM Pipeline](workflows.md#local-llm-pipeline-ollama)
+- **Serve via API**: Wrap the pipeline in a FastAPI endpoint — see [workflows.md → FastAPI App Pattern](workflows.md#fastapi-app-pattern)
diff --git a/skills/pixeltable/references/workflows.md b/skills/pixeltable/references/workflows.md
new file mode 100644
index 0000000..56ca79f
--- /dev/null
+++ b/skills/pixeltable/references/workflows.md
@@ -0,0 +1,642 @@
+# Pixeltable End-to-End Workflow Templates
+
+Complete, production-ready workflow templates combining multiple Pixeltable features.
+
+## Contents
+
+- [RAG Pipeline](#rag-pipeline)
+- [Video Analysis Pipeline](#video-analysis-pipeline)
+- [Image Classification and Search](#image-classification-and-search)
+- [Audio Transcription and Analysis](#audio-transcription-and-analysis)
+- [Multi-Provider Comparison](#multi-provider-comparison)
+- [Tool-Calling Agent (Full Production Example)](#tool-calling-agent-full-production-example)
+- [Local LLM Pipeline (Ollama)](#local-llm-pipeline-ollama)
+- [FastAPI App Pattern](#fastapi-app-pattern) (hand-written endpoints)
+- [FastAPIRouter — Declarative Serving (v0.6+)](#fastapirouter-declarative-serving-v06) (preferred)
+- [Export Workflow](#export-workflow)
+
+---
+
+### RAG Pipeline
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.document import document_splitter
+from pixeltable.functions.openai import chat_completions, embeddings
+
+pxt.create_dir('rag', if_exists='ignore')
+
+docs = pxt.create_table('rag.documents', {
+ 'doc': pxt.Document,
+ 'title': pxt.String,
+}, if_exists='ignore')
+
+chunks = pxt.create_view('rag.chunks', docs,
+ iterator=document_splitter(docs.doc, separators='token_limit', limit=300, metadata='title,heading'),
+ if_exists='ignore')
+
+chunks.add_embedding_index('text',
+ embedding=embeddings.using(model='text-embedding-3-small'),
+ if_exists='ignore')
+
+docs.insert([
+ {'doc': 'path/to/document.pdf', 'title': 'My Document'},
+ {'doc': 'https://example.com/page.html', 'title': 'Web Page'},
+])
+
+@pxt.query
+def retrieve(question: str, top_k: int = 5):
+ sim = chunks.text.similarity(string=question)
+ return chunks.order_by(sim, asc=False).limit(top_k).select(chunks.text, chunks.title, sim)
+
+context = retrieve('What is machine learning?').collect()
+```
+
+### Video Analysis Pipeline
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.video import frame_iterator, extract_audio
+from pixeltable.functions.audio import audio_splitter
+from pixeltable.functions.string import string_splitter
+from pixeltable.functions.openai import chat_completions, transcriptions
+from pixeltable.functions.huggingface import clip, sentence_transformer
+from pixeltable.functions import image as pxt_image
+
+pxt.create_dir('video', if_exists='ignore')
+
+videos = pxt.create_table('video.library', {
+ 'video': pxt.Video, 'title': pxt.String
+}, if_exists='ignore')
+
+# 1. Keyframe extraction + CLIP visual search
+frames = pxt.create_view('video.frames', videos,
+ iterator=frame_iterator(videos.video, keyframes_only=True),
+ if_exists='ignore')
+
+frames.add_computed_column(
+ thumbnail=pxt_image.b64_encode(
+ pxt_image.thumbnail(frames.frame, size=(320, 320))),
+ if_exists='ignore')
+
+frames.add_embedding_index('frame',
+ embedding=clip.using(model_id='openai/clip-vit-base-patch32'),
+ if_exists='ignore')
+
+# 2. Audio extraction -> transcription -> sentence embedding
+videos.add_computed_column(
+ audio=extract_audio(videos.video, format='mp3'),
+ if_exists='ignore')
+
+audio_chunks = pxt.create_view('video.audio_chunks', videos,
+ iterator=audio_splitter(audio=videos.audio, duration=30.0),
+ if_exists='ignore')
+
+audio_chunks.add_computed_column(
+ transcription=transcriptions(
+ audio=audio_chunks.audio_chunk, model='whisper-1'),
+ if_exists='ignore')
+
+sentences = pxt.create_view('video.sentences',
+ audio_chunks.where(audio_chunks.transcription != None),
+ iterator=string_splitter(
+ text=audio_chunks.transcription.text, separators='sentence'),
+ if_exists='ignore')
+
+embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
+sentences.add_embedding_index('text', string_embed=embed_fn, if_exists='ignore')
+
+# 3. Describe frames with vision LLM
+frames.add_computed_column(
+ description=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'Describe this video frame in one sentence.'},
+ {'type': 'image_url', 'image_url': {'url': frames.frame}}
+ ]
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+# Visual search
+sim = frames.frame.similarity(string='person riding a bicycle')
+results = frames.order_by(sim, asc=False).limit(10).select(
+ frames.frame, frames.description, sim).collect()
+
+# Transcript search
+@pxt.query
+def search_transcripts(query_text: str):
+ sim = sentences.text.similarity(string=query_text)
+ return sentences.where(sim > 0.7).order_by(sim, asc=False).select(
+ sentences.text, sim=sim
+ ).limit(20)
+```
+
+### Image Classification and Search
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.openai import chat_completions
+from pixeltable.functions.huggingface import clip
+from pixeltable.functions import image as pxt_image
+
+pxt.create_dir('images', if_exists='ignore')
+
+catalog = pxt.create_table('images.catalog', {
+ 'image': pxt.Image, 'filename': pxt.String,
+}, if_exists='ignore')
+
+catalog.add_computed_column(
+ thumbnail=pxt_image.b64_encode(
+ pxt_image.thumbnail(catalog.image, size=(320, 320))),
+ if_exists='ignore')
+
+catalog.add_computed_column(
+ tags=chat_completions(
+ messages=[{
+ 'role': 'user',
+ 'content': [
+ {'type': 'text', 'text': 'List 5 descriptive tags as a comma-separated list.'},
+ {'type': 'image_url', 'image_url': {'url': catalog.image}}
+ ]
+ }],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+
+embed_fn = clip.using(model_id='openai/clip-vit-base-patch32')
+catalog.add_embedding_index('image', embedding=embed_fn, if_exists='ignore')
+
+sim = catalog.image.similarity(string='sunset over the ocean')
+results = catalog.order_by(sim, asc=False).limit(5).select(
+ catalog.image, catalog.tags, sim).collect()
+```
+
+### Audio Transcription and Analysis
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.openai import transcriptions, chat_completions
+
+pxt.create_dir('audio', if_exists='ignore')
+
+recordings = pxt.create_table('audio.recordings', {
+ 'audio': pxt.Audio, 'speaker': pxt.String,
+}, if_exists='ignore')
+
+recordings.add_computed_column(
+ transcript=transcriptions(audio=recordings.audio, model='whisper-1').text,
+ if_exists='ignore')
+
+recordings.add_computed_column(
+ summary=chat_completions(
+ messages=[
+ {'role': 'system', 'content': 'Summarize in 2-3 sentences.'},
+ {'role': 'user', 'content': recordings.transcript}
+ ],
+ model='gpt-4o-mini'
+ ).choices[0].message.content,
+ if_exists='ignore')
+```
+
+### Multi-Provider Comparison
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.openai import chat_completions as openai_chat
+from pixeltable.functions.anthropic import messages as anthropic_msg
+from pixeltable.functions.together import chat_completions as together_chat
+
+pxt.create_dir('compare', if_exists='ignore')
+prompts = pxt.create_table('compare.prompts', {'prompt': pxt.String}, if_exists='ignore')
+
+prompts.add_computed_column(
+ openai=openai_chat(
+ messages=[{'role': 'user', 'content': prompts.prompt}], model='gpt-4o-mini'
+ ).choices[0].message.content, if_exists='ignore')
+
+prompts.add_computed_column(
+ anthropic=anthropic_msg(
+ messages=[{'role': 'user', 'content': [{'type': 'text', 'text': prompts.prompt}]}],
+ model='claude-sonnet-4-20250514', max_tokens=1024
+ ).content[0].text, if_exists='ignore')
+
+prompts.add_computed_column(
+ llama=together_chat(
+ messages=[{'role': 'user', 'content': prompts.prompt}],
+ model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo'
+ ).choices[0].message.content, if_exists='ignore')
+
+prompts.insert([{'prompt': 'Explain quantum computing simply.'}])
+results = prompts.select(
+ prompts.prompt, prompts.openai, prompts.anthropic, prompts.llama).collect()
+```
+
+### Tool-Calling Agent (Full Production Example)
+
+Complete agent pipeline as used in the [Pixeltable Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit):
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.anthropic import messages, invoke_tools
+from pixeltable.functions.huggingface import sentence_transformer, clip
+from pixeltable.functions.document import document_splitter
+from pixeltable.functions import image as pxt_image
+from datetime import datetime
+
+pxt.create_dir('app', if_exists='ignore')
+
+# --- Data pipelines ---
+documents = pxt.create_table('app.documents', {'document': pxt.Document}, if_exists='ignore')
+chunks = pxt.create_view('app.chunks', documents,
+ iterator=document_splitter(documents.document,
+ separators='page, sentence', metadata='title,heading,page'),
+ if_exists='ignore')
+
+embed_fn = sentence_transformer.using(model_id='intfloat/multilingual-e5-large-instruct')
+chunks.add_embedding_index('text', string_embed=embed_fn, if_exists='ignore')
+
+images = pxt.create_table('app.images', {'image': pxt.Image}, if_exists='ignore')
+images.add_computed_column(
+ thumbnail=pxt_image.b64_encode(pxt_image.thumbnail(images.image, size=(320, 320))),
+ if_exists='ignore')
+images.add_embedding_index('image',
+ embedding=clip.using(model_id='openai/clip-vit-base-patch32'), if_exists='ignore')
+
+# --- Query functions (become tools + RAG context) ---
+@pxt.query
+def search_documents(query_text: str):
+ sim = chunks.text.similarity(string=query_text)
+ return chunks.where(sim > 0.5).order_by(sim, asc=False).select(
+ chunks.text, sim=sim).limit(20)
+
+@pxt.query
+def search_images(query_text: str):
+ sim = images.image.similarity(string=query_text)
+ return images.where(sim > 0.25).order_by(sim, asc=False).select(
+ encoded_image=pxt_image.b64_encode(
+ pxt_image.thumbnail(images.image, size=(224, 224)), 'png'),
+ sim=sim).limit(5)
+
+@pxt.udf
+def web_search(keywords: str) -> str:
+ """Search the web using DuckDuckGo."""
+ from duckduckgo_search import DDGS
+ with DDGS() as ddgs:
+ results = list(ddgs.news(keywords=keywords, max_results=5))
+ return '\n'.join(
+ f"{r['title']}: {r['body']}" for r in results
+ ) if results else 'No results.'
+
+@pxt.udf
+def assemble_context(question: str, tool_outputs: list | None, doc_context: list | None) -> str:
+ tool_str = str(tool_outputs) if tool_outputs else 'N/A'
+ doc_str = '\n'.join(
+ f"- {item.get('text', '')}" for item in (doc_context or []) if isinstance(item, dict)
+ ) or 'N/A'
+ return (f"QUESTION: {question}\n\n"
+ f"\n{tool_str}\n\n\n"
+ f"\n{doc_str}\n")
+
+# --- Agent pipeline ---
+tools = pxt.tools(web_search, search_documents)
+
+agent = pxt.create_table('app.agent', {
+ 'prompt': pxt.String,
+ 'timestamp': pxt.Timestamp,
+ 'system_prompt': pxt.String,
+ 'max_tokens': pxt.Int,
+ 'temperature': pxt.Float,
+}, if_exists='ignore')
+
+agent.add_computed_column(
+ initial_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': agent.prompt}],
+ tools=tools,
+ tool_choice=tools.choice(required=True),
+ max_tokens=agent.max_tokens,
+ model_kwargs={'system': agent.system_prompt, 'temperature': agent.temperature},
+ ), if_exists='ignore')
+
+agent.add_computed_column(tool_output=invoke_tools(tools, agent.initial_response), if_exists='ignore')
+agent.add_computed_column(doc_context=search_documents(agent.prompt), if_exists='ignore')
+agent.add_computed_column(
+ context=assemble_context(agent.prompt, agent.tool_output, agent.doc_context),
+ if_exists='ignore')
+
+agent.add_computed_column(
+ final_response=messages(
+ model='claude-sonnet-4-20250514',
+ messages=[{'role': 'user', 'content': agent.context}],
+ max_tokens=agent.max_tokens,
+ model_kwargs={'system': 'Answer based on context. Cite sources.', 'temperature': agent.temperature},
+ ), if_exists='ignore')
+
+agent.add_computed_column(answer=agent.final_response.content[0].text, if_exists='ignore')
+
+# --- Usage ---
+agent.insert([{
+ 'prompt': 'What are the latest AI breakthroughs?',
+ 'timestamp': datetime.now(),
+ 'system_prompt': 'Use tools to gather information, then answer.',
+ 'max_tokens': 1024,
+ 'temperature': 0.7,
+}])
+result = agent.order_by(agent.timestamp, asc=False).limit(1).select(agent.answer).collect()
+```
+
+### Local LLM Pipeline (Ollama)
+
+```python
+import pixeltable as pxt
+from pixeltable.functions.ollama import chat_completions, embeddings
+
+pxt.create_dir('local', if_exists='ignore')
+t = pxt.create_table('local.data', {'text': pxt.String}, if_exists='ignore')
+
+t.add_computed_column(
+ analysis=chat_completions(
+ messages=[{'role': 'user', 'content': 'Analyze: ' + t.text}],
+ model='llama3.1'
+ ).choices[0].message.content, if_exists='ignore')
+
+t.add_embedding_index('text',
+ embedding=embeddings.using(model='nomic-embed-text'),
+ if_exists='ignore')
+
+t.insert([{'text': 'Machine learning fundamentals'}])
+sim = t.text.similarity(string='neural networks')
+results = t.order_by(sim, asc=False).limit(5).select(t.text, sim).collect()
+```
+
+### FastAPI App Pattern
+
+Production-ready pattern for web apps with Pixeltable:
+
+```python
+# setup_pixeltable.py -- Run once to initialize schema
+import pixeltable as pxt
+from pixeltable.functions.uuid import uuid7
+from pixeltable.functions.document import document_splitter
+from pixeltable.functions.huggingface import sentence_transformer
+
+pxt.drop_dir('app', force=True)
+pxt.create_dir('app', if_exists='ignore')
+
+documents = pxt.create_table('app.documents', {
+ 'document': pxt.Document,
+ 'uuid': uuid7(),
+ 'timestamp': pxt.Timestamp,
+}, primary_key=['uuid'], if_exists='ignore')
+
+chunks = pxt.create_view('app.chunks', documents,
+ iterator=document_splitter(
+ documents.document, separators='page, sentence',
+ metadata='title,heading,page'),
+ if_exists='ignore')
+
+embed_fn = sentence_transformer.using(
+ model_id='intfloat/multilingual-e5-large-instruct')
+chunks.add_embedding_index('text', string_embed=embed_fn, if_exists='ignore')
+
+@pxt.query
+def search_documents(query_text: str):
+ sim = chunks.text.similarity(string=query_text)
+ return chunks.where(sim > 0.5).order_by(sim, asc=False).select(
+ chunks.text, sim=sim, title=chunks.title
+ ).limit(20)
+```
+
+```python
+# main.py -- FastAPI app (use def, not async def)
+from fastapi import FastAPI
+from pydantic import BaseModel
+import pixeltable as pxt
+
+app = FastAPI()
+
+class SearchRequest(BaseModel):
+ query: str
+
+class SearchResult(BaseModel):
+ text: str
+ sim: float
+ title: str | None = None
+
+class SearchResponse(BaseModel):
+ query: str
+ results: list[SearchResult]
+
+@app.post("/api/search", response_model=SearchResponse)
+def search(body: SearchRequest): # sync, not async
+ table = pxt.get_table('app.chunks')
+ sim = table.text.similarity(body.query)
+ result = (
+ table.where(sim > 0.3)
+ .order_by(sim, asc=False)
+ .select(text=table.text, sim=sim, title=table.title)
+ .limit(20)
+ .collect()
+ )
+ items = list(result.to_pydantic(SearchResult)) # direct conversion
+ return SearchResponse(query=body.query, results=items)
+```
+
+### FastAPIRouter — Declarative Serving (v0.6+)
+
+`pixeltable.serving.FastAPIRouter` generates endpoints from tables and `@pxt.query` functions — no Pydantic models, no hand-written handlers. It's a subclass of FastAPI's `APIRouter`.
+
+```python
+# setup_pixeltable.py — flat module, runs on import
+import pixeltable as pxt
+from pixeltable.functions.uuid import uuid7
+from pixeltable.functions.document import document_splitter
+from pixeltable.functions.huggingface import sentence_transformer
+
+pxt.create_dir('app', if_exists='ignore')
+
+docs = pxt.create_table('app.documents', {
+ 'document': pxt.Document, 'uuid': uuid7(), 'timestamp': pxt.Timestamp,
+}, primary_key=['uuid'], if_exists='ignore')
+
+chunks = pxt.create_view('app.chunks', docs,
+ iterator=document_splitter(docs.document, separators='page, sentence', metadata='title,heading,page'),
+ if_exists='ignore')
+
+embed_fn = sentence_transformer.using(model_id='intfloat/multilingual-e5-large-instruct')
+chunks.add_embedding_index('text', idx_name='chunks_embed', string_embed=embed_fn, if_exists='ignore')
+```
+
+```python
+# routers/data.py — queries co-located with routes
+import pixeltable as pxt
+from pixeltable.serving import FastAPIRouter
+
+router = FastAPIRouter(prefix="/api/data", tags=["data"])
+docs = pxt.get_table("app.documents")
+chunks = pxt.get_table("app.chunks")
+
+# Upload with background processing (returns job handle, client polls /jobs/{id})
+router.add_insert_route(docs, path="/upload",
+ uploadfile_inputs=["document"], inputs=["timestamp"], outputs=["uuid"],
+ background=True)
+
+router.add_delete_route(docs, path="/delete")
+
+@pxt.query
+def list_docs():
+ return docs.select(uuid=docs.uuid, name=docs.document, timestamp=docs.timestamp).order_by(docs.timestamp, asc=False)
+
+@pxt.query
+def search_docs(query_text: str):
+ sim = chunks.text.similarity(string=query_text)
+ return chunks.where(sim > 0.3).order_by(sim, asc=False).select(
+ text=chunks.text, sim=sim, title=chunks.title).limit(20)
+
+router.add_query_route(path="/list", query=list_docs, method="get")
+router.add_query_route(path="/search", query=search_docs, method="post")
+```
+
+```python
+# main.py
+from fastapi import FastAPI
+import setup_pixeltable # noqa: F401 — triggers schema init
+from routers import data
+
+app = FastAPI()
+app.include_router(data.router)
+```
+
+Key points:
+- **`add_insert_route`** — generates POST endpoint from table columns. Use `uploadfile_inputs` for file uploads, `background=True` for long-running inserts
+- **`add_query_route`** — wraps a `@pxt.query` function as GET or POST. Returns `{ "rows": [...] }` automatically
+- **`add_delete_route`** — generates POST endpoint for row deletion by primary key or `match_columns`
+- **Schema in one file, queries in routers** — `setup_pixeltable.py` creates tables/views/indexes on import. Routers get table refs via `pxt.get_table()` and define `@pxt.query` locally
+- **Only write custom endpoints** for multi-table side effects (e.g., agent insert + chat history saves)
+
+#### return_rows=True for hand-written endpoints
+
+When you do need a hand-written endpoint (multi-table side effects, conditional logic), use `return_rows=True` to read computed columns back without a follow-up query:
+
+```python
+from pydantic import BaseModel
+
+class AgentResult(BaseModel):
+ model_config = {"extra": "ignore"}
+ answer: str | None = None
+ tool_output: Any = None
+
+@router.post("/query")
+def agent_query(request: QueryRequest):
+ status = agent_table.insert(
+ [{"prompt": request.prompt}], return_rows=True
+ )
+ result = AgentResult.model_validate(status.rows[0])
+ # Conditional: save to chat history based on computed result
+ if result.answer:
+ chat_table.insert([{"role": "assistant", "content": result.answer}])
+ return result
+```
+
+`extra="ignore"` is required because `status.rows` dicts contain every column; Pydantic would reject the extras without it.
+
+Reference: [Pixeltable Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit) | [core-api.md → Serving](core-api.md#serving-fastapirouter)
+
+### Batch Processing Pattern
+
+Use Pixeltable as a batch processing engine: no HTTP server, no FastAPI. A Python script that creates the schema, inserts data, lets computed columns process it, exports results to a serving database, and exits. Run it as a Cloud Run Job, ECS Task, K8s Job, Lambda, or a cron container.
+
+```python
+# schema.py: declarative schema (idempotent)
+import pixeltable as pxt
+from pixeltable.functions.huggingface import sentence_transformer
+from pixeltable.functions.string import string_splitter
+from pixeltable.functions.uuid import uuid7
+
+pxt.create_dir('pipeline', if_exists='ignore')
+embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
+
+documents = pxt.create_table('pipeline.documents', {
+ 'title': pxt.String,
+ 'body': pxt.String,
+ 'source_id': pxt.String,
+ 'uuid': uuid7(),
+ 'timestamp': pxt.Timestamp,
+}, primary_key=['uuid'], if_exists='ignore')
+
+sentences = pxt.create_view(
+ 'pipeline.sentences', documents,
+ iterator=string_splitter(text=documents.body, separators='sentence'),
+ if_exists='ignore',
+)
+sentences.add_embedding_index(
+ 'text', idx_name='sentences_embed', string_embed=embed_fn, if_exists='ignore'
+)
+```
+
+```python
+# pipeline.py: ingest, compute, export, exit
+import json
+from datetime import datetime
+from pixeltable.io.sql import export_sql
+import schema
+
+SERVING_DB_URL = 'postgresql+psycopg://user:pass@host/db'
+
+with open('batch.json') as f:
+ batch = json.load(f)
+
+now = datetime.now()
+for row in batch['documents']:
+ row.setdefault('timestamp', now)
+
+# Insert triggers computed columns: chunking, embeddings, etc.
+schema.documents.insert(batch['documents'])
+
+# Export structured results to serving DB
+export_sql(
+ schema.documents.select(
+ schema.documents.source_id,
+ schema.documents.title,
+ schema.documents.body,
+ ),
+ 'processed_documents',
+ db_connect_str=SERVING_DB_URL,
+ if_exists='replace',
+)
+
+# Verify semantic search works
+sim = schema.sentences.text.similarity(string='test query')
+results = (schema.sentences.order_by(sim, asc=False)
+ .limit(3).select(schema.sentences.text, sim=sim).collect())
+```
+
+Key points:
+- `schema.py` is a flat module that creates everything on import (idempotent)
+- `pipeline.py` is the driver: load data, insert, export, exit
+- Computed columns fire automatically on insert (chunking, embeddings, LLM calls)
+- `export_sql` pushes processed data to any SQL database (Postgres, MySQL, Snowflake, SQLite)
+- Set `PIXELTABLE_HOME=/tmp/pixeltable` for ephemeral containers
+- Use the `destination` parameter on `add_computed_column` to route generated media to cloud buckets (S3, GCS, Azure Blob)
+
+Reference: [Starter Kit `batch/` directory](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/batch)
+
+### Export Workflow
+
+```python
+from pixeltable.io import export_parquet
+
+# To Parquet
+export_parquet(t, 'output/my_data/')
+
+# Query result to Parquet
+query = t.where(t.score > 0.8).select(t.title, t.content, t.score)
+export_parquet(query, 'output/filtered/')
+
+# To pandas
+df = t.select(t.title, t.content).collect().to_pandas()
+df.to_csv('output/data.csv', index=False)
+```