microsoft · yungshinlintw · Mar 21, 2026 · Mar 21, 2026 · Mar 22, 2026 · Mar 23, 2026
diff --git a/python/AGENTS.md b/python/AGENTS.md
@@ -68,6 +68,7 @@ python/
 
 ### Azure Integrations
 - [azure-ai](packages/azure-ai/AGENTS.md) - Azure AI Foundry agents
+- [azure-ai-contentunderstanding](packages/azure-ai-contentunderstanding/AGENTS.md) - Azure Content Understanding context provider
 - [azure-ai-search](packages/azure-ai-search/AGENTS.md) - Azure AI Search RAG
 - [azurefunctions](packages/azurefunctions/AGENTS.md) - Azure Functions hosting
 

diff --git a/python/packages/azure-ai-contentunderstanding/.gitignore b/python/packages/azure-ai-contentunderstanding/.gitignore
@@ -0,0 +1,3 @@
+# Local-only files (not committed)
+_local_only/
+*_local_only*
diff --git a/python/packages/azure-ai-contentunderstanding/AGENTS.md b/python/packages/azure-ai-contentunderstanding/AGENTS.md
@@ -0,0 +1,72 @@
+# AGENTS.md — azure-ai-contentunderstanding
+
+## Package Overview
+
+`agent-framework-azure-ai-contentunderstanding` integrates Azure Content Understanding (CU)
+into the Agent Framework as a context provider. It automatically analyzes file attachments
+(documents, images, audio, video) and injects structured results into the LLM context.
+
+## Public API
+
+| Symbol | Type | Description |
+|--------|------|-------------|
+| `ContentUnderstandingContextProvider` | class | Main context provider — extends `BaseContextProvider` |
+| `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) |
+| `DocumentStatus` | enum | Document lifecycle state (ANALYZING, UPLOADING, READY, FAILED) |
+| `FileSearchBackend` | ABC | Abstract vector store file operations interface |
+| `FileSearchConfig` | dataclass | Configuration for CU + vector store RAG mode |
+
+## Architecture
+
+- **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect
+  file attachments, call the CU API, manage session state with multi-document tracking,
+  and auto-register retrieval tools for follow-up turns.
+  - **Analyzer auto-detection** — When `analyzer_id=None` (default), `_resolve_analyzer_id()`
+    selects the CU analyzer based on media type prefix: `audio/` → `prebuilt-audioSearch`,
+    `video/` → `prebuilt-videoSearch`, everything else → `prebuilt-documentSearch`.
+  - **Multi-segment output** — CU splits long video/audio into multiple scene segments
+    (each a separate `contents[]` entry with its own `startTimeMs`, `endTimeMs`, `markdown`,
+    and `fields`). `_extract_sections()` produces:
+    - `segments`: list of per-segment dicts, each with `markdown`, `fields`, `start_time_s`, `end_time_s`
+    - `markdown`: concatenated at top level with `---` separators (for file_search uploads)
+    - `duration_seconds`: computed from global `min(startTimeMs)` → `max(endTimeMs)`
+    - Metadata (`kind`, `resolution`): taken from the first segment
+  - **Speaker diarization (not identification)** — CU transcripts label speakers as
+    `<Speaker 1>`, `<Speaker 2>`, etc. CU does **not** identify speakers by name.
+  - **file_search RAG** — When `FileSearchConfig` is provided, CU-extracted markdown is
+    uploaded to an OpenAI vector store and a `file_search` tool is registered on the context
+    instead of injecting the full document content. This enables token-efficient retrieval
+    for large documents.
+- **`_models.py`** — `AnalysisSection` enum, `DocumentStatus` enum, `DocumentEntry` TypedDict,
+  `FileSearchConfig` dataclass.
+- **`_file_search.py`** — `FileSearchBackend` ABC, `OpenAIFileSearchBackend`,
+  `FoundryFileSearchBackend`.
+
+## Key Patterns
+
+- Follows the Azure AI Search context provider pattern (same lifecycle, config style).
+- Uses provider-scoped `state` dict for multi-document tracking across turns.
+- Auto-registers `list_documents()` tool via `context.extend_tools()`.
+- Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback.
+- Strips supported binary attachments from `input_messages` to prevent LLM API errors.
+- Explicit `analyzer_id` always overrides auto-detection (user preference wins).
+- Vector store resources are cleaned up in `close()` / `__aexit__`.
+
+## Samples
+
+| Sample | Description |
+|--------|-------------|
+| `01_document_qa.py` | Upload a PDF via URL, ask questions about it |
+| `02_multi_turn_session.py` | AgentSession persistence across turns |
+| `03_multimodal_chat.py` | PDF + audio + video parallel analysis |
+| `04_invoice_processing.py` | Structured field extraction with `prebuilt-invoice` analyzer |
+| `05_background_analysis.py` | Non-blocking analysis with `max_wait` + status tracking |
+| `06_large_doc_file_search.py` | CU extraction + OpenAI vector store RAG |
+| `02-devui/01-multimodal_agent/` | DevUI web UI for CU-powered chat |
+| `02-devui/02-file_search_agent/` | DevUI web UI combining CU + file_search RAG |
+
+## Running Tests
+
+```bash
+uv run poe test -P azure-ai-contentunderstanding
+```
diff --git a/python/packages/azure-ai-contentunderstanding/LICENSE b/python/packages/azure-ai-contentunderstanding/LICENSE
@@ -0,0 +1,21 @@
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
diff --git a/python/packages/azure-ai-contentunderstanding/README.md b/python/packages/azure-ai-contentunderstanding/README.md
@@ -0,0 +1,128 @@
+# Get Started with Azure Content Understanding in Microsoft Agent Framework
+
+Please install this package via pip:
+
+```bash
+pip install agent-framework-azure-ai-contentunderstanding --pre
+```
+
+## Azure Content Understanding Integration
+
+### Prerequisites
+
+Before using this package, you need an Azure Content Understanding resource:
+
+1. An active **Azure subscription** ([create one for free](https://azure.microsoft.com/pricing/purchase-options/azure-account))
+2. A **Microsoft Foundry resource** created in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support)
+3. **Default model deployments** configured for your resource (GPT-4.1, GPT-4.1-mini, text-embedding-3-large)
+
+Follow the [prerequisites section](https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/use-rest-api?tabs=portal%2Cdocument&pivots=programming-language-rest#prerequisites) in the Azure Content Understanding quickstart for setup instructions.
+
+### Introduction
+
+The Azure Content Understanding integration provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/) and injects structured results into the LLM context.
+
+- **Document & image analysis**: State-of-the-art OCR with markdown extraction, table preservation, and structured field extraction — handles scanned PDFs, handwritten content, and complex layouts
+- **Audio & video analysis**: Transcription, speaker diarization, and per-segment summaries
+- **Background processing**: Configurable timeout with async background fallback for large files
+- **file_search integration**: Optional vector store upload for token-efficient RAG on large documents
+
+> Learn more about Azure Content Understanding capabilities at [https://learn.microsoft.com/azure/ai-services/content-understanding/](https://learn.microsoft.com/azure/ai-services/content-understanding/)
+
+### Basic Usage Example
+
+See the [samples directory](samples/) which demonstrates:
+
+- Single PDF upload and Q&A ([01_document_qa](samples/01-get-started/01_document_qa.py))
+- Multi-turn sessions with cached results ([02_multi_turn_session](samples/01-get-started/02_multi_turn_session.py))
+- PDF + audio + video parallel analysis ([03_multimodal_chat](samples/01-get-started/03_multimodal_chat.py))
+- Structured field extraction with prebuilt-invoice ([04_invoice_processing](samples/01-get-started/04_invoice_processing.py))
+- Non-blocking background analysis with status tracking ([05_background_analysis](samples/01-get-started/05_background_analysis.py))
+- CU extraction + OpenAI vector store RAG ([06_large_doc_file_search](samples/01-get-started/06_large_doc_file_search.py))
+- Interactive web UI with DevUI ([02-devui](samples/02-devui/))
+
+```python
+import asyncio
+from agent_framework import Agent, AgentSession, Message, Content
+from agent_framework.foundry import FoundryChatClient
+from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider
+from azure.identity import AzureCliCredential
+
+credential = AzureCliCredential()
+
+cu = ContentUnderstandingContextProvider(
+    endpoint="https://my-resource.cognitiveservices.azure.com/",
+    credential=credential,
+    max_wait=None,  # block until CU extraction completes before sending to LLM
+)
+
+client = FoundryChatClient(
+    project_endpoint="https://your-project.services.ai.azure.com",
+    model="gpt-4.1",
+    credential=credential,
+)
+
+async def main():
+    async with cu:
+        agent = Agent(
+            client=client,
+            name="DocumentQA",
+            instructions="You are a helpful document analyst.",
+            context_providers=[cu],
+        )
+        session = AgentSession()
+
+        response = await agent.run(
+            Message(role="user", contents=[
+                Content.from_text("What's on this invoice?"),
+                Content.from_uri(
+                    "https://raw.githubusercontent.com/Azure-Samples/"
+                    "azure-ai-content-understanding-assets/main/document/invoice.pdf",
+                    media_type="application/pdf",
+                    additional_properties={"filename": "invoice.pdf"},
+                ),
+            ]),
+            session=session,
+        )
+        print(response.text)
+
+asyncio.run(main())
+```
+
+### Supported File Types
+
+| Category | Types |
+|----------|-------|
+| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown |
+| Images | JPEG, PNG, TIFF, BMP |
+| Audio | WAV, MP3, M4A, FLAC, OGG |
+| Video | MP4, MOV, AVI, WebM |
+
+For the complete list of supported file types and size limits, see [Azure Content Understanding service limits](https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits).
+
+### Environment Variables
+
+The provider supports automatic endpoint resolution from environment variables.
+When ``endpoint`` is not passed to the constructor, it is loaded from
+``AZURE_CONTENTUNDERSTANDING_ENDPOINT``:
+
+```python
+# Endpoint auto-loaded from AZURE_CONTENTUNDERSTANDING_ENDPOINT env var
+cu = ContentUnderstandingContextProvider(credential=credential)
+```
+
+Set these in your shell or in a `.env` file:
+
+```bash
+AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/
+AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com
+AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1
+```
+
+You also need to be logged in with `az login` (for `AzureCliCredential`).
+
+### Next steps
+
+- Explore the [samples directory](samples/) for complete code examples
+- Read the [Azure Content Understanding documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) for detailed service information
+- Learn more about the [Microsoft Agent Framework](https://aka.ms/agent-framework)
diff --git a/...s/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/__init__.py b/...s/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/__init__.py
@@ -0,0 +1,28 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Azure Content Understanding integration for Microsoft Agent Framework.
+
+Provides a context provider that analyzes file attachments (documents, images,
+audio, video) using Azure Content Understanding and injects structured results
+into the LLM context.
+"""
+
+import importlib.metadata
+
+from ._context_provider import ContentUnderstandingContextProvider
+from ._file_search import FileSearchBackend
+from ._models import AnalysisSection, DocumentStatus, FileSearchConfig
+
+try:
+    __version__ = importlib.metadata.version(__name__)
+except importlib.metadata.PackageNotFoundError:
+    __version__ = "0.0.0"
+
+__all__ = [
+    "AnalysisSection",
+    "ContentUnderstandingContextProvider",
+    "DocumentStatus",
+    "FileSearchBackend",
+    "FileSearchConfig",
+    "__version__",
+]
diff --git a/...azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_constants.py b/...azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_constants.py
@@ -0,0 +1,78 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Constants for Azure Content Understanding context provider.
+
+Supported media types, MIME aliases, and analyzer mappings used by
+the file detection and analysis pipeline.
+"""
+
+from __future__ import annotations
+
+# MIME types used to match against the resolved media type for routing files to CU analysis.
+# The media type may be provided via Content.media_type or inferred (e.g., via sniffing or filename)
+# when missing or generic (such as application/octet-stream). Only files whose resolved media type is
+# in this set will be processed; others are skipped.
+#
+# Supported input file types:
+# https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits
+SUPPORTED_MEDIA_TYPES: frozenset[str] = frozenset({
+    # Documents and images
+    "application/pdf",
+    "image/jpeg",
+    "image/png",
+    "image/tiff",
+    "image/bmp",
+    "image/heif",
+    "image/heic",
+    "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+    "application/vnd.openxmlformats-officedocument.presentationml.presentation",
+    # Text
+    "text/plain",
+    "text/html",
+    "text/markdown",
+    "text/rtf",
+    "text/xml",
+    "application/xml",
+    "message/rfc822",
+    "application/vnd.ms-outlook",
+    # Audio
+    "audio/wav",
+    "audio/mpeg",
+    "audio/mp3",
+    "audio/mp4",
+    "audio/m4a",
+    "audio/flac",
+    "audio/ogg",
+    "audio/opus",
+    "audio/webm",
+    "audio/x-ms-wma",
+    "audio/aac",
+    "audio/amr",
+    "audio/3gpp",
+    # Video
+    "video/mp4",
+    "video/quicktime",
+    "video/x-msvideo",
+    "video/webm",
+    "video/x-flv",
+    "video/x-ms-wmv",
+    "video/x-ms-asf",
+    "video/x-matroska",
+})
+
+# Mapping from filetype's MIME output to our canonical SUPPORTED_MEDIA_TYPES values.
+# filetype uses some x-prefixed variants that differ from our set.
+MIME_ALIASES: dict[str, str] = {
+    "audio/x-wav": "audio/wav",
+    "audio/x-flac": "audio/flac",
+    "video/x-m4v": "video/mp4",
+}
+
+# Mapping from media type prefix to the appropriate prebuilt CU analyzer.
+# Used when analyzer_id is None (auto-detect mode).
+MEDIA_TYPE_ANALYZER_MAP: dict[str, str] = {
+    "audio/": "prebuilt-audioSearch",
+    "video/": "prebuilt-videoSearch",
+}
+DEFAULT_ANALYZER: str = "prebuilt-documentSearch"