Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
d173f9a
feat: add agent-framework-azure-contentunderstanding package
yungshinlintw Mar 21, 2026
0c2ea7b
fix: update CU fixtures with real API data, fix test assertions
yungshinlintw Mar 21, 2026
8e6e73b
chore: add connector .gitignore, update uv.lock
yungshinlintw Mar 22, 2026
3fbb8f7
refactor: rename to azure-ai-contentunderstanding, fix CI issues
yungshinlintw Mar 23, 2026
36ee6a4
feat: add samples (document_qa, invoice_processing, multimodal_chat)
yungshinlintw Mar 23, 2026
dec37e8
feat: add remaining samples (devui_multimodal_agent, large_doc_file_s…
yungshinlintw Mar 23, 2026
dd918bb
feat: add file_search integration for large document RAG
yungshinlintw Mar 23, 2026
c4fe308
fix: add key-based auth support to all samples
yungshinlintw Mar 23, 2026
f995d7a
FEATURE(python): add analyzer auto-detection, file_search RAG, and la…
yungshinlin Mar 24, 2026
85c8999
feat(cu): MIME sniffing, media-aware formatting, unified timeout, vec…
yungshinlin Mar 24, 2026
fcd04f1
fix: merge all CU content segments for video/audio analysis
yungshinlin Mar 25, 2026
03073a5
refactor: improve CU context provider docs and remove ContentLimits
yungshinlintw Mar 25, 2026
4e8a8cc
feat: support user-provided vector store in FileSearchConfig
yungshinlintw Mar 25, 2026
14234d2
Merge upstream/main into yslin/contentunderstanding-context-provider
yungshinlintw Mar 25, 2026
04e8dce
fix: remove ContentLimits from README code block
yungshinlintw Mar 25, 2026
637a3a4
refactor: create CU client in __init__ instead of __aenter__
yungshinlintw Mar 25, 2026
1f451b6
docs: add file_search param to class docstring
yungshinlintw Mar 25, 2026
d914fbc
feat: introduce FileSearchBackend abstraction for cross-client support
yungshinlintw Mar 25, 2026
cb9b5b6
refactor: FileSearchBackend abstraction + caller-owned vector store
yungshinlintw Mar 26, 2026
478731e
fix: file_search reliability and sample improvements
yungshinlintw Mar 26, 2026
90284e6
perf: set max_num_results=10 for file_search to reduce token usage
yungshinlintw Mar 26, 2026
67975c6
fix: move import to top of file (E402 lint)
yungshinlintw Mar 26, 2026
4345cbc
chore: remove unused imports
yungshinlintw Mar 26, 2026
0403365
fix: align azure-ai-contentunderstanding with MAF coding conventions
yungshinlin Mar 26, 2026
a3c50a2
refactor: improve CU context provider API surface and fix CI
yungshinlin Mar 26, 2026
c6b1cc7
Merge remote-tracking branch 'origin/main' into yslin/contentundersta…
yungshinlin Mar 26, 2026
123bfdf
fix: improve file_search samples and move tool guidelines to context …
yungshinlin Mar 26, 2026
b1ce674
feat: improve source_id, integration tests, and content assertions
yungshinlin Mar 26, 2026
29975c4
feat: reject duplicate filenames, add integration tests and sample co…
yungshinlin Mar 26, 2026
cd72233
chore: improve doc key derivation, comments, and README
yungshinlin Mar 26, 2026
6285d36
Merge branch 'main' into yslin/contentunderstanding-context-provider
yungshinlintw Mar 26, 2026
c3fb1c7
test: strengthen _format_result assertions with exact expected strings
yungshinlin Mar 26, 2026
df382a9
refactor: move invoice.pdf to shared sample_assets directory
yungshinlin Mar 26, 2026
b06a34e
refactor: reorganize samples into numbered dirs and simplify auth
yungshinlin Mar 26, 2026
b78bf9c
fix: resolve CI lint errors (D205, RUF001, E501)
yungshinlin Mar 26, 2026
4eef541
refactor: overhaul samples — FoundryChatClient, sessions, remove get_…
yungshinlin Mar 27, 2026
f8fe7c8
feat: add 05_background_analysis sample and fix 04 session/max_wait
yungshinlin Mar 27, 2026
3d10a7c
docs: update README and fix sample 06
yungshinlin Mar 27, 2026
b635de9
docs: rewrite README — concise format, prerequisites, CU link
yungshinlin Mar 27, 2026
443b4c4
fix: resolve pyright errors in _format_result segment cast
yungshinlin Mar 27, 2026
91a7410
docs: add numbered section comments and fresh sample output to all sa…
yungshinlin Mar 27, 2026
ef7e378
feat(devui): add video file upload support
yungshinlin Mar 27, 2026
6856a27
feat: add load_settings support for env var configuration
yungshinlin Mar 27, 2026
c620a93
docs: polish README — fix duplicate env var, add Next steps, service …
yungshinlin Mar 27, 2026
b9edeaf
chore: trim invoice fixture from 199K to 33 lines
yungshinlin Mar 27, 2026
aa3f71c
revert: remove devui video upload changes (will be in separate PR)
yungshinlin Mar 27, 2026
ee341e2
feat: per-file analyzer_id override via additional_properties
yungshinlin Mar 27, 2026
d3c4047
Trim PDF test fixture and clarify unique filename requirement
yungshinlin Mar 27, 2026
6ee5d98
Merge branch 'main' into yslin/contentunderstanding-context-provider
yungshinlintw Mar 27, 2026
5ee0514
Update python/packages/azure-ai-contentunderstanding/agent_framework_…
yungshinlintw Mar 27, 2026
d0e98b3
Update python/packages/azure-ai-contentunderstanding/agent_framework_…
yungshinlintw Mar 27, 2026
dd1fffb
Update python/packages/azure-ai-contentunderstanding/samples/02-devui…
yungshinlintw Mar 27, 2026
0714d17
Update python/packages/azure-ai-contentunderstanding/samples/02-devui…
yungshinlintw Mar 27, 2026
c456327
Update python/packages/azure-ai-contentunderstanding/samples/01-get-s…
yungshinlintw Mar 27, 2026
ebca922
Fix AGENTS.md to match implementation; remove unused variable in test…
yungshinlin Mar 27, 2026
48c31d9
Fix premature file_search instruction for background-completed docs
yungshinlin Mar 27, 2026
d288fc6
fix: wrap long line in devui agent instructions (E501)
yungshinlin Mar 27, 2026
053bca5
Fix Copilot review: unused logger, stray code in README, await cancel…
yungshinlin Mar 27, 2026
e52d28d
Sanitize doc keys and fix duplicate filename re-injection
yungshinlin Mar 27, 2026
0afc812
fix: add type annotation to tasks_to_cancel for pyright
yungshinlin Mar 27, 2026
860ba4e
Move per-session mutable state to state dict for session isolation
yungshinlin Mar 27, 2026
898478f
Remove unused AnalysisSection enum values
yungshinlin Mar 27, 2026
b376ad8
Merge branch 'main' into yslin/contentunderstanding-context-provider
yungshinlintw Mar 27, 2026
7f5ff2e
Recursively flatten object/array field values for cleaner LLM output
yungshinlin Mar 27, 2026
a5cb199
Preserve sub-field confidence; compare full expected JSON in tests
yungshinlin Mar 27, 2026
dd707a0
Remove incorrect MIME aliases (audio/mp4, video/x-matroska)
yungshinlin Mar 27, 2026
9f31124
feat: add AnalysisInput, content_range, warnings, and category support
yungshinlin Mar 27, 2026
42b5ed1
fix: falsy-0 bug in duration calc; improve test coverage
yungshinlin Mar 27, 2026
b930827
refactor: split _context_provider.py into focused modules
yungshinlin Mar 27, 2026
b73e2b8
Merge branch 'main' into yslin/contentunderstanding-context-provider
yungshinlintw Mar 27, 2026
2e9f952
docs: update AGENTS.md with DocumentStatus, FileSearchBackend, and _f…
yungshinlin Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions python/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ python/

### Azure Integrations
- [azure-ai](packages/azure-ai/AGENTS.md) - Azure AI Foundry agents
- [azure-ai-contentunderstanding](packages/azure-ai-contentunderstanding/AGENTS.md) - Azure Content Understanding context provider
- [azure-ai-search](packages/azure-ai-search/AGENTS.md) - Azure AI Search RAG
- [azurefunctions](packages/azurefunctions/AGENTS.md) - Azure Functions hosting

Expand Down
3 changes: 3 additions & 0 deletions python/packages/azure-ai-contentunderstanding/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Local-only files (not committed)
_local_only/
*_local_only*
72 changes: 72 additions & 0 deletions python/packages/azure-ai-contentunderstanding/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# AGENTS.md — azure-ai-contentunderstanding

## Package Overview

`agent-framework-azure-ai-contentunderstanding` integrates Azure Content Understanding (CU)
into the Agent Framework as a context provider. It automatically analyzes file attachments
(documents, images, audio, video) and injects structured results into the LLM context.

## Public API

| Symbol | Type | Description |
|--------|------|-------------|
| `ContentUnderstandingContextProvider` | class | Main context provider — extends `BaseContextProvider` |
| `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) |
| `DocumentStatus` | enum | Document lifecycle state (ANALYZING, UPLOADING, READY, FAILED) |
| `FileSearchBackend` | ABC | Abstract vector store file operations interface |
| `FileSearchConfig` | dataclass | Configuration for CU + vector store RAG mode |

## Architecture

- **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect
file attachments, call the CU API, manage session state with multi-document tracking,
and auto-register retrieval tools for follow-up turns.
- **Analyzer auto-detection** — When `analyzer_id=None` (default), `_resolve_analyzer_id()`
selects the CU analyzer based on media type prefix: `audio/` → `prebuilt-audioSearch`,
`video/` → `prebuilt-videoSearch`, everything else → `prebuilt-documentSearch`.
- **Multi-segment output** — CU splits long video/audio into multiple scene segments
(each a separate `contents[]` entry with its own `startTimeMs`, `endTimeMs`, `markdown`,
and `fields`). `_extract_sections()` produces:
- `segments`: list of per-segment dicts, each with `markdown`, `fields`, `start_time_s`, `end_time_s`
- `markdown`: concatenated at top level with `---` separators (for file_search uploads)
- `duration_seconds`: computed from global `min(startTimeMs)` → `max(endTimeMs)`
- Metadata (`kind`, `resolution`): taken from the first segment
- **Speaker diarization (not identification)** — CU transcripts label speakers as
`<Speaker 1>`, `<Speaker 2>`, etc. CU does **not** identify speakers by name.
- **file_search RAG** — When `FileSearchConfig` is provided, CU-extracted markdown is
uploaded to an OpenAI vector store and a `file_search` tool is registered on the context
instead of injecting the full document content. This enables token-efficient retrieval
for large documents.
- **`_models.py`** — `AnalysisSection` enum, `DocumentStatus` enum, `DocumentEntry` TypedDict,
`FileSearchConfig` dataclass.
- **`_file_search.py`** — `FileSearchBackend` ABC, `OpenAIFileSearchBackend`,
`FoundryFileSearchBackend`.

## Key Patterns

- Follows the Azure AI Search context provider pattern (same lifecycle, config style).
- Uses provider-scoped `state` dict for multi-document tracking across turns.
- Auto-registers `list_documents()` tool via `context.extend_tools()`.
- Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback.
- Strips supported binary attachments from `input_messages` to prevent LLM API errors.
- Explicit `analyzer_id` always overrides auto-detection (user preference wins).
- Vector store resources are cleaned up in `close()` / `__aexit__`.

## Samples

| Sample | Description |
|--------|-------------|
| `01_document_qa.py` | Upload a PDF via URL, ask questions about it |
| `02_multi_turn_session.py` | AgentSession persistence across turns |
| `03_multimodal_chat.py` | PDF + audio + video parallel analysis |
| `04_invoice_processing.py` | Structured field extraction with `prebuilt-invoice` analyzer |
| `05_background_analysis.py` | Non-blocking analysis with `max_wait` + status tracking |
| `06_large_doc_file_search.py` | CU extraction + OpenAI vector store RAG |
| `02-devui/01-multimodal_agent/` | DevUI web UI for CU-powered chat |
| `02-devui/02-file_search_agent/` | DevUI web UI combining CU + file_search RAG |

## Running Tests

```bash
uv run poe test -P azure-ai-contentunderstanding
```
21 changes: 21 additions & 0 deletions python/packages/azure-ai-contentunderstanding/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
128 changes: 128 additions & 0 deletions python/packages/azure-ai-contentunderstanding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Get Started with Azure Content Understanding in Microsoft Agent Framework

Please install this package via pip:

```bash
pip install agent-framework-azure-ai-contentunderstanding --pre
```

## Azure Content Understanding Integration

### Prerequisites

Before using this package, you need an Azure Content Understanding resource:

1. An active **Azure subscription** ([create one for free](https://azure.microsoft.com/pricing/purchase-options/azure-account))
2. A **Microsoft Foundry resource** created in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support)
3. **Default model deployments** configured for your resource (GPT-4.1, GPT-4.1-mini, text-embedding-3-large)

Follow the [prerequisites section](https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/use-rest-api?tabs=portal%2Cdocument&pivots=programming-language-rest#prerequisites) in the Azure Content Understanding quickstart for setup instructions.

### Introduction

The Azure Content Understanding integration provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/) and injects structured results into the LLM context.

- **Document & image analysis**: State-of-the-art OCR with markdown extraction, table preservation, and structured field extraction — handles scanned PDFs, handwritten content, and complex layouts
- **Audio & video analysis**: Transcription, speaker diarization, and per-segment summaries
- **Background processing**: Configurable timeout with async background fallback for large files
- **file_search integration**: Optional vector store upload for token-efficient RAG on large documents

> Learn more about Azure Content Understanding capabilities at [https://learn.microsoft.com/azure/ai-services/content-understanding/](https://learn.microsoft.com/azure/ai-services/content-understanding/)

### Basic Usage Example

See the [samples directory](samples/) which demonstrates:

- Single PDF upload and Q&A ([01_document_qa](samples/01-get-started/01_document_qa.py))
- Multi-turn sessions with cached results ([02_multi_turn_session](samples/01-get-started/02_multi_turn_session.py))
- PDF + audio + video parallel analysis ([03_multimodal_chat](samples/01-get-started/03_multimodal_chat.py))
- Structured field extraction with prebuilt-invoice ([04_invoice_processing](samples/01-get-started/04_invoice_processing.py))
- Non-blocking background analysis with status tracking ([05_background_analysis](samples/01-get-started/05_background_analysis.py))
- CU extraction + OpenAI vector store RAG ([06_large_doc_file_search](samples/01-get-started/06_large_doc_file_search.py))
- Interactive web UI with DevUI ([02-devui](samples/02-devui/))

```python
import asyncio
from agent_framework import Agent, AgentSession, Message, Content
from agent_framework.foundry import FoundryChatClient
from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider
from azure.identity import AzureCliCredential

credential = AzureCliCredential()

cu = ContentUnderstandingContextProvider(
endpoint="https://my-resource.cognitiveservices.azure.com/",
credential=credential,
max_wait=None, # block until CU extraction completes before sending to LLM
)

client = FoundryChatClient(
project_endpoint="https://your-project.services.ai.azure.com",
model="gpt-4.1",
credential=credential,
)

async def main():
async with cu:
agent = Agent(
client=client,
name="DocumentQA",
instructions="You are a helpful document analyst.",
context_providers=[cu],
)
session = AgentSession()

response = await agent.run(
Message(role="user", contents=[
Content.from_text("What's on this invoice?"),
Content.from_uri(
"https://raw.githubusercontent.com/Azure-Samples/"
"azure-ai-content-understanding-assets/main/document/invoice.pdf",
media_type="application/pdf",
additional_properties={"filename": "invoice.pdf"},
),
]),
session=session,
)
print(response.text)

asyncio.run(main())
```

### Supported File Types

| Category | Types |
|----------|-------|
| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown |
| Images | JPEG, PNG, TIFF, BMP |
| Audio | WAV, MP3, M4A, FLAC, OGG |
| Video | MP4, MOV, AVI, WebM |

For the complete list of supported file types and size limits, see [Azure Content Understanding service limits](https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits).

### Environment Variables

The provider supports automatic endpoint resolution from environment variables.
When ``endpoint`` is not passed to the constructor, it is loaded from
``AZURE_CONTENTUNDERSTANDING_ENDPOINT``:

```python
# Endpoint auto-loaded from AZURE_CONTENTUNDERSTANDING_ENDPOINT env var
cu = ContentUnderstandingContextProvider(credential=credential)
```

Set these in your shell or in a `.env` file:

```bash
AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/
AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1
```

You also need to be logged in with `az login` (for `AzureCliCredential`).

### Next steps

- Explore the [samples directory](samples/) for complete code examples
- Read the [Azure Content Understanding documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) for detailed service information
- Learn more about the [Microsoft Agent Framework](https://aka.ms/agent-framework)
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright (c) Microsoft. All rights reserved.

"""Azure Content Understanding integration for Microsoft Agent Framework.

Provides a context provider that analyzes file attachments (documents, images,
audio, video) using Azure Content Understanding and injects structured results
into the LLM context.
"""

import importlib.metadata

from ._context_provider import ContentUnderstandingContextProvider
from ._file_search import FileSearchBackend
from ._models import AnalysisSection, DocumentStatus, FileSearchConfig

try:
__version__ = importlib.metadata.version(__name__)
except importlib.metadata.PackageNotFoundError:
__version__ = "0.0.0"

__all__ = [
"AnalysisSection",
"ContentUnderstandingContextProvider",
"DocumentStatus",
"FileSearchBackend",
"FileSearchConfig",
"__version__",
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Copyright (c) Microsoft. All rights reserved.

"""Constants for Azure Content Understanding context provider.

Supported media types, MIME aliases, and analyzer mappings used by
the file detection and analysis pipeline.
"""

from __future__ import annotations

# MIME types used to match against the resolved media type for routing files to CU analysis.
# The media type may be provided via Content.media_type or inferred (e.g., via sniffing or filename)
# when missing or generic (such as application/octet-stream). Only files whose resolved media type is
# in this set will be processed; others are skipped.
#
# Supported input file types:
# https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits
SUPPORTED_MEDIA_TYPES: frozenset[str] = frozenset({
# Documents and images
"application/pdf",
"image/jpeg",
"image/png",
"image/tiff",
"image/bmp",
"image/heif",
"image/heic",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.openxmlformats-officedocument.presentationml.presentation",
# Text
"text/plain",
"text/html",
"text/markdown",
"text/rtf",
"text/xml",
"application/xml",
"message/rfc822",
"application/vnd.ms-outlook",
# Audio
"audio/wav",
"audio/mpeg",
"audio/mp3",
"audio/mp4",
"audio/m4a",
"audio/flac",
"audio/ogg",
"audio/opus",
"audio/webm",
"audio/x-ms-wma",
"audio/aac",
"audio/amr",
"audio/3gpp",
# Video
"video/mp4",
"video/quicktime",
"video/x-msvideo",
"video/webm",
"video/x-flv",
"video/x-ms-wmv",
"video/x-ms-asf",
"video/x-matroska",
})

# Mapping from filetype's MIME output to our canonical SUPPORTED_MEDIA_TYPES values.
# filetype uses some x-prefixed variants that differ from our set.
MIME_ALIASES: dict[str, str] = {
"audio/x-wav": "audio/wav",
"audio/x-flac": "audio/flac",
"video/x-m4v": "video/mp4",
}

# Mapping from media type prefix to the appropriate prebuilt CU analyzer.
# Used when analyzer_id is None (auto-detect mode).
MEDIA_TYPE_ANALYZER_MAP: dict[str, str] = {
"audio/": "prebuilt-audioSearch",
"video/": "prebuilt-videoSearch",
}
DEFAULT_ANALYZER: str = "prebuilt-documentSearch"
Loading
Loading