diff --git a/python/AGENTS.md b/python/AGENTS.md index 7ec268dcd1..1095d318e9 100644 --- a/python/AGENTS.md +++ b/python/AGENTS.md @@ -68,6 +68,7 @@ python/ ### Azure Integrations - [azure-ai](packages/azure-ai/AGENTS.md) - Azure AI Foundry agents +- [azure-ai-contentunderstanding](packages/azure-ai-contentunderstanding/AGENTS.md) - Azure Content Understanding context provider - [azure-ai-search](packages/azure-ai-search/AGENTS.md) - Azure AI Search RAG - [azurefunctions](packages/azurefunctions/AGENTS.md) - Azure Functions hosting diff --git a/python/packages/azure-ai-contentunderstanding/.gitignore b/python/packages/azure-ai-contentunderstanding/.gitignore new file mode 100644 index 0000000000..051cb93f3d --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/.gitignore @@ -0,0 +1,3 @@ +# Local-only files (not committed) +_local_only/ +*_local_only* diff --git a/python/packages/azure-ai-contentunderstanding/AGENTS.md b/python/packages/azure-ai-contentunderstanding/AGENTS.md new file mode 100644 index 0000000000..d8e259015c --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/AGENTS.md @@ -0,0 +1,72 @@ +# AGENTS.md — azure-ai-contentunderstanding + +## Package Overview + +`agent-framework-azure-ai-contentunderstanding` integrates Azure Content Understanding (CU) +into the Agent Framework as a context provider. It automatically analyzes file attachments +(documents, images, audio, video) and injects structured results into the LLM context. + +## Public API + +| Symbol | Type | Description | +|--------|------|-------------| +| `ContentUnderstandingContextProvider` | class | Main context provider — extends `BaseContextProvider` | +| `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) | +| `DocumentStatus` | enum | Document lifecycle state (ANALYZING, UPLOADING, READY, FAILED) | +| `FileSearchBackend` | ABC | Abstract vector store file operations interface | +| `FileSearchConfig` | dataclass | Configuration for CU + vector store RAG mode | + +## Architecture + +- **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect + file attachments, call the CU API, manage session state with multi-document tracking, + and auto-register retrieval tools for follow-up turns. + - **Analyzer auto-detection** — When `analyzer_id=None` (default), `_resolve_analyzer_id()` + selects the CU analyzer based on media type prefix: `audio/` → `prebuilt-audioSearch`, + `video/` → `prebuilt-videoSearch`, everything else → `prebuilt-documentSearch`. + - **Multi-segment output** — CU splits long video/audio into multiple scene segments + (each a separate `contents[]` entry with its own `startTimeMs`, `endTimeMs`, `markdown`, + and `fields`). `_extract_sections()` produces: + - `segments`: list of per-segment dicts, each with `markdown`, `fields`, `start_time_s`, `end_time_s` + - `markdown`: concatenated at top level with `---` separators (for file_search uploads) + - `duration_seconds`: computed from global `min(startTimeMs)` → `max(endTimeMs)` + - Metadata (`kind`, `resolution`): taken from the first segment + - **Speaker diarization (not identification)** — CU transcripts label speakers as + ``, ``, etc. CU does **not** identify speakers by name. + - **file_search RAG** — When `FileSearchConfig` is provided, CU-extracted markdown is + uploaded to an OpenAI vector store and a `file_search` tool is registered on the context + instead of injecting the full document content. This enables token-efficient retrieval + for large documents. +- **`_models.py`** — `AnalysisSection` enum, `DocumentStatus` enum, `DocumentEntry` TypedDict, + `FileSearchConfig` dataclass. +- **`_file_search.py`** — `FileSearchBackend` ABC, `OpenAIFileSearchBackend`, + `FoundryFileSearchBackend`. + +## Key Patterns + +- Follows the Azure AI Search context provider pattern (same lifecycle, config style). +- Uses provider-scoped `state` dict for multi-document tracking across turns. +- Auto-registers `list_documents()` tool via `context.extend_tools()`. +- Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback. +- Strips supported binary attachments from `input_messages` to prevent LLM API errors. +- Explicit `analyzer_id` always overrides auto-detection (user preference wins). +- Vector store resources are cleaned up in `close()` / `__aexit__`. + +## Samples + +| Sample | Description | +|--------|-------------| +| `01_document_qa.py` | Upload a PDF via URL, ask questions about it | +| `02_multi_turn_session.py` | AgentSession persistence across turns | +| `03_multimodal_chat.py` | PDF + audio + video parallel analysis | +| `04_invoice_processing.py` | Structured field extraction with `prebuilt-invoice` analyzer | +| `05_background_analysis.py` | Non-blocking analysis with `max_wait` + status tracking | +| `06_large_doc_file_search.py` | CU extraction + OpenAI vector store RAG | +| `02-devui/01-multimodal_agent/` | DevUI web UI for CU-powered chat | +| `02-devui/02-file_search_agent/` | DevUI web UI combining CU + file_search RAG | + +## Running Tests + +```bash +uv run poe test -P azure-ai-contentunderstanding +``` diff --git a/python/packages/azure-ai-contentunderstanding/LICENSE b/python/packages/azure-ai-contentunderstanding/LICENSE new file mode 100644 index 0000000000..9e841e7a26 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/LICENSE @@ -0,0 +1,21 @@ + MIT License + + Copyright (c) Microsoft Corporation. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in all + copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE diff --git a/python/packages/azure-ai-contentunderstanding/README.md b/python/packages/azure-ai-contentunderstanding/README.md new file mode 100644 index 0000000000..22ef23d46c --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/README.md @@ -0,0 +1,128 @@ +# Get Started with Azure Content Understanding in Microsoft Agent Framework + +Please install this package via pip: + +```bash +pip install agent-framework-azure-ai-contentunderstanding --pre +``` + +## Azure Content Understanding Integration + +### Prerequisites + +Before using this package, you need an Azure Content Understanding resource: + +1. An active **Azure subscription** ([create one for free](https://azure.microsoft.com/pricing/purchase-options/azure-account)) +2. A **Microsoft Foundry resource** created in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support) +3. **Default model deployments** configured for your resource (GPT-4.1, GPT-4.1-mini, text-embedding-3-large) + +Follow the [prerequisites section](https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/use-rest-api?tabs=portal%2Cdocument&pivots=programming-language-rest#prerequisites) in the Azure Content Understanding quickstart for setup instructions. + +### Introduction + +The Azure Content Understanding integration provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/) and injects structured results into the LLM context. + +- **Document & image analysis**: State-of-the-art OCR with markdown extraction, table preservation, and structured field extraction — handles scanned PDFs, handwritten content, and complex layouts +- **Audio & video analysis**: Transcription, speaker diarization, and per-segment summaries +- **Background processing**: Configurable timeout with async background fallback for large files +- **file_search integration**: Optional vector store upload for token-efficient RAG on large documents + +> Learn more about Azure Content Understanding capabilities at [https://learn.microsoft.com/azure/ai-services/content-understanding/](https://learn.microsoft.com/azure/ai-services/content-understanding/) + +### Basic Usage Example + +See the [samples directory](samples/) which demonstrates: + +- Single PDF upload and Q&A ([01_document_qa](samples/01-get-started/01_document_qa.py)) +- Multi-turn sessions with cached results ([02_multi_turn_session](samples/01-get-started/02_multi_turn_session.py)) +- PDF + audio + video parallel analysis ([03_multimodal_chat](samples/01-get-started/03_multimodal_chat.py)) +- Structured field extraction with prebuilt-invoice ([04_invoice_processing](samples/01-get-started/04_invoice_processing.py)) +- Non-blocking background analysis with status tracking ([05_background_analysis](samples/01-get-started/05_background_analysis.py)) +- CU extraction + OpenAI vector store RAG ([06_large_doc_file_search](samples/01-get-started/06_large_doc_file_search.py)) +- Interactive web UI with DevUI ([02-devui](samples/02-devui/)) + +```python +import asyncio +from agent_framework import Agent, AgentSession, Message, Content +from agent_framework.foundry import FoundryChatClient +from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider +from azure.identity import AzureCliCredential + +credential = AzureCliCredential() + +cu = ContentUnderstandingContextProvider( + endpoint="https://my-resource.cognitiveservices.azure.com/", + credential=credential, + max_wait=None, # block until CU extraction completes before sending to LLM +) + +client = FoundryChatClient( + project_endpoint="https://your-project.services.ai.azure.com", + model="gpt-4.1", + credential=credential, +) + +async def main(): + async with cu: + agent = Agent( + client=client, + name="DocumentQA", + instructions="You are a helpful document analyst.", + context_providers=[cu], + ) + session = AgentSession() + + response = await agent.run( + Message(role="user", contents=[ + Content.from_text("What's on this invoice?"), + Content.from_uri( + "https://raw.githubusercontent.com/Azure-Samples/" + "azure-ai-content-understanding-assets/main/document/invoice.pdf", + media_type="application/pdf", + additional_properties={"filename": "invoice.pdf"}, + ), + ]), + session=session, + ) + print(response.text) + +asyncio.run(main()) +``` + +### Supported File Types + +| Category | Types | +|----------|-------| +| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown | +| Images | JPEG, PNG, TIFF, BMP | +| Audio | WAV, MP3, M4A, FLAC, OGG | +| Video | MP4, MOV, AVI, WebM | + +For the complete list of supported file types and size limits, see [Azure Content Understanding service limits](https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits). + +### Environment Variables + +The provider supports automatic endpoint resolution from environment variables. +When ``endpoint`` is not passed to the constructor, it is loaded from +``AZURE_CONTENTUNDERSTANDING_ENDPOINT``: + +```python +# Endpoint auto-loaded from AZURE_CONTENTUNDERSTANDING_ENDPOINT env var +cu = ContentUnderstandingContextProvider(credential=credential) +``` + +Set these in your shell or in a `.env` file: + +```bash +AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/ +AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com +AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1 +``` + +You also need to be logged in with `az login` (for `AzureCliCredential`). + +### Next steps + +- Explore the [samples directory](samples/) for complete code examples +- Read the [Azure Content Understanding documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) for detailed service information +- Learn more about the [Microsoft Agent Framework](https://aka.ms/agent-framework) diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/__init__.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/__init__.py new file mode 100644 index 0000000000..9b05519560 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/__init__.py @@ -0,0 +1,28 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Azure Content Understanding integration for Microsoft Agent Framework. + +Provides a context provider that analyzes file attachments (documents, images, +audio, video) using Azure Content Understanding and injects structured results +into the LLM context. +""" + +import importlib.metadata + +from ._context_provider import ContentUnderstandingContextProvider +from ._file_search import FileSearchBackend +from ._models import AnalysisSection, DocumentStatus, FileSearchConfig + +try: + __version__ = importlib.metadata.version(__name__) +except importlib.metadata.PackageNotFoundError: + __version__ = "0.0.0" + +__all__ = [ + "AnalysisSection", + "ContentUnderstandingContextProvider", + "DocumentStatus", + "FileSearchBackend", + "FileSearchConfig", + "__version__", +] diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_constants.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_constants.py new file mode 100644 index 0000000000..b432f0895b --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_constants.py @@ -0,0 +1,78 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Constants for Azure Content Understanding context provider. + +Supported media types, MIME aliases, and analyzer mappings used by +the file detection and analysis pipeline. +""" + +from __future__ import annotations + +# MIME types used to match against the resolved media type for routing files to CU analysis. +# The media type may be provided via Content.media_type or inferred (e.g., via sniffing or filename) +# when missing or generic (such as application/octet-stream). Only files whose resolved media type is +# in this set will be processed; others are skipped. +# +# Supported input file types: +# https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits +SUPPORTED_MEDIA_TYPES: frozenset[str] = frozenset({ + # Documents and images + "application/pdf", + "image/jpeg", + "image/png", + "image/tiff", + "image/bmp", + "image/heif", + "image/heic", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", + "application/vnd.openxmlformats-officedocument.presentationml.presentation", + # Text + "text/plain", + "text/html", + "text/markdown", + "text/rtf", + "text/xml", + "application/xml", + "message/rfc822", + "application/vnd.ms-outlook", + # Audio + "audio/wav", + "audio/mpeg", + "audio/mp3", + "audio/mp4", + "audio/m4a", + "audio/flac", + "audio/ogg", + "audio/opus", + "audio/webm", + "audio/x-ms-wma", + "audio/aac", + "audio/amr", + "audio/3gpp", + # Video + "video/mp4", + "video/quicktime", + "video/x-msvideo", + "video/webm", + "video/x-flv", + "video/x-ms-wmv", + "video/x-ms-asf", + "video/x-matroska", +}) + +# Mapping from filetype's MIME output to our canonical SUPPORTED_MEDIA_TYPES values. +# filetype uses some x-prefixed variants that differ from our set. +MIME_ALIASES: dict[str, str] = { + "audio/x-wav": "audio/wav", + "audio/x-flac": "audio/flac", + "video/x-m4v": "video/mp4", +} + +# Mapping from media type prefix to the appropriate prebuilt CU analyzer. +# Used when analyzer_id is None (auto-detect mode). +MEDIA_TYPE_ANALYZER_MAP: dict[str, str] = { + "audio/": "prebuilt-audioSearch", + "video/": "prebuilt-videoSearch", +} +DEFAULT_ANALYZER: str = "prebuilt-documentSearch" diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_context_provider.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_context_provider.py new file mode 100644 index 0000000000..35162e6f7e --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_context_provider.py @@ -0,0 +1,793 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Azure Content Understanding context provider using BaseContextProvider. + +This module provides ``ContentUnderstandingContextProvider``, built on the +:class:`BaseContextProvider` hooks pattern. It automatically detects file +attachments, analyzes them via the Azure Content Understanding API, and +injects structured results into the LLM context. +""" + +from __future__ import annotations + +import asyncio +import json +import logging +import sys +import time +from datetime import datetime, timezone +from typing import TYPE_CHECKING, Any, ClassVar, TypedDict + +from agent_framework import ( + AGENT_FRAMEWORK_USER_AGENT, + BaseContextProvider, + Content, + FunctionTool, + Message, + SessionContext, +) +from agent_framework._sessions import AgentSession +from agent_framework._settings import load_settings +from azure.ai.contentunderstanding.aio import ContentUnderstandingClient +from azure.ai.contentunderstanding.models import AnalysisInput, AnalysisResult +from azure.core.credentials import AzureKeyCredential +from azure.core.credentials_async import AsyncTokenCredential + +if sys.version_info >= (3, 11): + from typing import Self # pragma: no cover +else: + from typing_extensions import Self # pragma: no cover + +if TYPE_CHECKING: + from agent_framework._agents import SupportsAgentRun + +from ._constants import DEFAULT_ANALYZER, MEDIA_TYPE_ANALYZER_MAP +from ._detection import ( + derive_doc_key, + detect_and_strip_files, + extract_binary, + is_supported_content, + sanitize_doc_key, + sniff_media_type, +) +from ._extraction import extract_field_value, extract_sections, flatten_field, format_result +from ._models import AnalysisSection, DocumentEntry, DocumentStatus, FileSearchConfig + +logger = logging.getLogger("agent_framework.azure_ai_contentunderstanding") + +AzureCredentialTypes = AzureKeyCredential | AsyncTokenCredential + + +class ContentUnderstandingSettings(TypedDict, total=False): + """Settings for ContentUnderstandingContextProvider with auto-loading from environment. + + Settings are resolved in this order: explicit keyword arguments, values from an + explicitly provided .env file, then environment variables with the prefix + ``AZURE_CONTENTUNDERSTANDING_``. + + Keys: + endpoint: Azure AI Foundry endpoint URL. + Can be set via environment variable ``AZURE_CONTENTUNDERSTANDING_ENDPOINT``. + """ + + endpoint: str | None + + +class ContentUnderstandingContextProvider(BaseContextProvider): + """Context provider that analyzes file attachments using Azure Content Understanding. + + Automatically detects supported file attachments in the agent's input, + analyzes them via CU, and injects the structured results (markdown, fields) + into the LLM context. Supports multiple documents per session with background + processing for long-running analyses. Optionally integrates with a vector + store backend for ``file_search``-based RAG retrieval on LLM clients that + support it. + + Args: + endpoint: Azure AI Foundry endpoint URL + (e.g., ``"https://.services.ai.azure.com/"``). + Can also be set via environment variable + ``AZURE_CONTENTUNDERSTANDING_ENDPOINT``. + credential: An ``AzureKeyCredential`` for API key auth or an + ``AsyncTokenCredential`` (e.g., ``DefaultAzureCredential``) for + Microsoft Entra ID auth. + analyzer_id: A prebuilt or custom CU analyzer ID. When ``None`` + (default), a prebuilt analyzer is chosen automatically based on + the file's media type: ``prebuilt-documentSearch`` for documents + and images, ``prebuilt-audioSearch`` for audio, and + ``prebuilt-videoSearch`` for video. + Analyzer reference: https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/analyzer-reference + Prebuilt analyzers: https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/prebuilt-analyzers + max_wait: Max seconds to wait for analysis before deferring to background. + ``None`` waits until complete. + output_sections: Which CU output sections to pass to LLM. + Defaults to ``[AnalysisSection.MARKDOWN, AnalysisSection.FIELDS]``. + file_search: Optional configuration for uploading CU-extracted markdown to + a vector store for token-efficient RAG retrieval. When provided, full + content injection is replaced by ``file_search`` tool registration. + The ``FileSearchConfig`` abstraction is backend-agnostic — use + ``FileSearchConfig.from_openai()`` or ``FileSearchConfig.from_foundry()`` + for supported providers, or supply a custom ``FileSearchBackend`` + implementation for other vector store services. + source_id: Unique identifier for this provider instance, used for message + attribution and tool registration. Defaults to ``"azure_ai_contentunderstanding"``. + env_file_path: Path to a ``.env`` file for loading settings. + env_file_encoding: Encoding of the ``.env`` file. + + Per-file ``additional_properties`` on ``Content`` objects: + The provider reads the following keys from + ``Content.additional_properties`` (passed via ``Content.from_data()`` + or ``Content.from_uri()``): + + ``filename`` (str): + The document key used for tracking, status, and LLM references. + Without a filename, a UUID-based key is generated. + Must be unique within a session — uploading a file with a + duplicate filename will be rejected and the file will not be + analyzed. + + ``analyzer_id`` (str): + Per-file analyzer override. Takes priority over the provider-level + ``analyzer_id``. Useful for mixing analyzers in the same turn + (e.g., ``prebuilt-invoice`` for invoices alongside + ``prebuilt-documentSearch`` for general documents). + + ``content_range`` (str): + Subset of the input to analyze. For documents, use 1-based page + numbers (e.g., ``"1-3"`` for pages 1-3, ``"1,3,5-"`` for pages + 1, 3, and 5 onward). For audio/video, use milliseconds + (e.g., ``"0-60000"`` for the first 60 seconds). + + Example:: + + Content.from_data( + pdf_bytes, "application/pdf", + additional_properties={ + "filename": "invoice.pdf", + "analyzer_id": "prebuilt-invoice", + "content_range": "1-3", + }, + ) + """ + + DEFAULT_SOURCE_ID: ClassVar[str] = "azure_ai_contentunderstanding" + DEFAULT_MAX_WAIT_SECONDS: ClassVar[float] = 5.0 + + def __init__( + self, + endpoint: str | None = None, + credential: AzureCredentialTypes | None = None, + *, + analyzer_id: str | None = None, + max_wait: float | None = DEFAULT_MAX_WAIT_SECONDS, + output_sections: list[AnalysisSection] | None = None, + file_search: FileSearchConfig | None = None, + source_id: str = DEFAULT_SOURCE_ID, + env_file_path: str | None = None, + env_file_encoding: str | None = None, + ) -> None: + super().__init__(source_id) + + # Load settings — explicit args take priority over env vars. + # Env vars use the prefix AZURE_CONTENTUNDERSTANDING_ (e.g., + # AZURE_CONTENTUNDERSTANDING_ENDPOINT). + settings = load_settings( + ContentUnderstandingSettings, + env_prefix="AZURE_CONTENTUNDERSTANDING_", + required_fields=["endpoint"], + endpoint=endpoint, + env_file_path=env_file_path, + env_file_encoding=env_file_encoding, + ) + + resolved_endpoint: str = settings["endpoint"] # type: ignore[assignment] # validated by load_settings + + if credential is None: + raise ValueError( + "Azure credential is required. Provide a 'credential' parameter " + "(e.g., AzureKeyCredential or AzureCliCredential)." + ) + + self._endpoint = resolved_endpoint + self._credential = credential + self.analyzer_id = analyzer_id + self.max_wait = max_wait + self.output_sections = output_sections or [AnalysisSection.MARKDOWN, AnalysisSection.FIELDS] + self.file_search = file_search + self._client = ContentUnderstandingClient( + self._endpoint, self._credential, user_agent=AGENT_FRAMEWORK_USER_AGENT + ) + # Global copies of background tasks and uploaded file IDs — used only + # by close() for best-effort cleanup. The authoritative per-session + # copies live in state["_pending_tasks"] / state["_uploaded_file_ids"] + # (populated in before_run). These global lists may contain entries + # from multiple sessions; that is intentional for cleanup. + self._all_pending_tasks: list[asyncio.Task[AnalysisResult]] = [] + self._all_uploaded_file_ids: list[str] = [] + + async def __aenter__(self) -> Self: + """Async context manager entry.""" + return self + + async def __aexit__( + self, + exc_type: type[BaseException] | None, + exc_val: BaseException | None, + exc_tb: Any, + ) -> None: + """Async context manager exit — cleanup clients.""" + await self.close() + + async def close(self) -> None: + """Close the underlying CU client and cancel pending tasks. + + Uses global tracking lists for best-effort cleanup across all + sessions that used this provider instance. + """ + tasks_to_cancel: list[asyncio.Task[AnalysisResult]] = [] + for task in self._all_pending_tasks: + if not task.done(): + task.cancel() + tasks_to_cancel.append(task) + self._all_pending_tasks.clear() + # Await cancelled tasks so they don't outlive the client + if tasks_to_cancel: + await asyncio.gather(*tasks_to_cancel, return_exceptions=True) + # Clean up uploaded files; the vector store itself is caller-managed. + if self.file_search and self._all_uploaded_file_ids: + await self._cleanup_uploaded_files() + await self._client.close() + + async def before_run( + self, + *, + agent: SupportsAgentRun, + session: AgentSession, + context: SessionContext, + state: dict[str, Any], + ) -> None: + """Analyze file attachments and inject results into the LLM context. + + This method is called automatically by the framework before each LLM invocation. + """ + documents: dict[str, DocumentEntry] = state.setdefault("documents", {}) + + # Per-session mutable state — isolated per session to prevent cross-session leakage. + pending_tasks: dict[str, asyncio.Task[AnalysisResult]] = state.setdefault("_pending_tasks", {}) + pending_uploads: list[tuple[str, DocumentEntry]] = state.setdefault("_pending_uploads", []) + + # 1. Resolve pending background tasks + self._resolve_pending_tasks(pending_tasks, pending_uploads, documents, context) + + # 1b. Upload any documents that completed in the background (file_search mode) + if pending_uploads: + # Use a bounded timeout so before_run() stays responsive and does not block + # indefinitely on slow vector store indexing. + upload_timeout = getattr(self, "max_wait", None) + remaining_uploads: list[tuple[str, DocumentEntry]] = [] + for upload_key, upload_entry in pending_uploads: + try: + if upload_timeout is not None: + await asyncio.wait_for( + self._upload_to_vector_store(upload_key, upload_entry, state=state), + timeout=upload_timeout, + ) + else: + await self._upload_to_vector_store(upload_key, upload_entry, state=state) + except asyncio.TimeoutError: + # Leave timed-out uploads pending so they can be retried on a later turn. + logger.warning( + "Timed out while uploading document '%s' to vector store; will retry later.", + upload_key, + ) + remaining_uploads.append((upload_key, upload_entry)) + except Exception: + # Log unexpected failures and drop the upload entry; this matches prior + # behavior where all pending uploads were cleared regardless of outcome. + logger.exception( + "Error while uploading document '%s' to vector store; dropping from pending list.", + upload_key, + ) + context.extend_instructions( + self.source_id, + f"Document '{upload_key}' was analyzed but failed to upload " + "to the vector store. The document content is not available for search.", + ) + state["_pending_uploads"] = remaining_uploads + pending_uploads = remaining_uploads + + # 2. Detect CU-supported file attachments, strip them from input, and return for analysis + new_files = self._detect_and_strip_files(context) + + # 3. Analyze new files using CU (track elapsed time for combined timeout) + file_start_times: dict[str, float] = {} + accepted_keys: set[str] = set() # doc_keys successfully accepted for analysis this turn + for doc_key, content_item, binary_data in new_files: + # Reject duplicate filenames — re-analyzing would orphan vector store entries + if doc_key in documents: + logger.warning("Duplicate document key '%s' — skipping (already exists in session).", doc_key) + context.extend_instructions( + self.source_id, + f"The user tried to upload '{doc_key}', but a file with that name " + "was already uploaded earlier in this session. The new upload was rejected " + "and was not analyzed. Tell the user that a file with the same name " + "already exists and they need to rename the file before uploading again.", + ) + continue + file_start_times[doc_key] = time.monotonic() + doc_entry = await self._analyze_file(doc_key, content_item, binary_data, context, pending_tasks) + if doc_entry: + documents[doc_key] = doc_entry + accepted_keys.add(doc_key) + + # 4. Inject content for ready documents and register tools + if documents: + self._register_tools(documents, context) + + # 5. On upload turns, inject content for docs accepted this turn + for doc_key in accepted_keys: + entry = documents.get(doc_key) + if entry and entry["status"] == DocumentStatus.READY and entry["result"]: + # Upload to vector store if file_search is configured + if self.file_search: + # Combined timeout: subtract CU analysis time from max_wait + remaining: float | None = None + if self.max_wait is not None: + elapsed = time.monotonic() - file_start_times.get(doc_key, time.monotonic()) + remaining = max(0.0, self.max_wait - elapsed) + uploaded = await self._upload_to_vector_store(doc_key, entry, timeout=remaining, state=state) + if uploaded: + context.extend_instructions( + self.source_id, + f"The user just uploaded '{entry['filename']}'. It has been analyzed " + "using Azure Content Understanding and indexed in a vector store. " + f"When using file_search, include '{entry['filename']}' in your query " + "to retrieve content from this specific document.", + ) + elif entry.get("error"): + # Upload failed (not timeout — actual error) + context.extend_instructions( + self.source_id, + f"Document '{entry['filename']}' was analyzed but failed to upload " + "to the vector store. The document content is not available for search.", + ) + else: + # Upload deferred to background (timeout) + context.extend_instructions( + self.source_id, + f"Document '{entry['filename']}' has been analyzed and is being indexed. " + "Ask about it again in a moment.", + ) + else: + # Without file_search, inject full content into context + context.extend_messages( + self, + [ + Message(role="user", text=self._format_result(entry["filename"], entry["result"])), + ], + ) + context.extend_instructions( + self.source_id, + f"The user just uploaded '{entry['filename']}'. It has been analyzed " + "using Azure Content Understanding. " + "The document content (markdown) and extracted fields (JSON) are provided above. " + "If the user's question is ambiguous, prioritize this most recently uploaded document. " + "Use specific field values and cite page numbers when answering.", + ) + + # 6. Register file_search tool (for LLM clients that support it) + if self.file_search: + context.extend_tools( + self.source_id, + [self.file_search.file_search_tool], + ) + context.extend_instructions( + self.source_id, + "Tool usage guidelines:\n" + "- Use file_search ONLY when answering questions about document content.\n" + "- Use list_documents() for status queries (e.g. 'list docs', 'what's uploaded?').\n" + "- Do NOT call file_search for status queries — it wastes tokens.", + ) + + # ------------------------------------------------------------------ + # File Detection (delegates to _detection module) + # ------------------------------------------------------------------ + + @staticmethod + def _detect_and_strip_files(context: SessionContext) -> list[tuple[str, Any, bytes | None]]: + return detect_and_strip_files(context) + + @staticmethod + def _sniff_media_type(binary_data: bytes | None, content: Any) -> str | None: + return sniff_media_type(binary_data, content) + + @staticmethod + def _is_supported_content(content: Any) -> bool: + return is_supported_content(content) + + @staticmethod + def _sanitize_doc_key(raw: str) -> str: + return sanitize_doc_key(raw) + + @staticmethod + def _derive_doc_key(content: Any) -> str: + return derive_doc_key(content) + + @staticmethod + def _extract_binary(content: Any) -> bytes | None: + return extract_binary(content) + + # ------------------------------------------------------------------ + # Analyzer Resolution + # ------------------------------------------------------------------ + + def _resolve_analyzer_id(self, media_type: str) -> str: + """Return the analyzer ID to use for the given media type. + + When ``self.analyzer_id`` is set, it is always returned (explicit + override). Otherwise the media type prefix is matched against the + known mapping, falling back to ``prebuilt-documentSearch``. + """ + if self.analyzer_id is not None: + return self.analyzer_id + for prefix, analyzer in MEDIA_TYPE_ANALYZER_MAP.items(): + if media_type.startswith(prefix): + return analyzer + return DEFAULT_ANALYZER + + # ------------------------------------------------------------------ + # Analysis + # ------------------------------------------------------------------ + + async def _analyze_file( + self, + doc_key: str, + content: Content, + binary_data: bytes | None, + context: SessionContext, + pending_tasks: dict[str, asyncio.Task[AnalysisResult]] | None = None, + ) -> DocumentEntry | None: + """Analyze a single file via CU with timeout handling. + + The analyzer is resolved in priority order: + 1. Per-file override via ``content.additional_properties["analyzer_id"]`` + 2. Provider-level default via ``self.analyzer_id`` + 3. Auto-detect by media type (document/audio/video) + + Returns: + A ``DocumentEntry`` (ready, analyzing, or failed), or ``None`` if + file data could not be extracted. + """ + media_type = content.media_type or "application/octet-stream" + filename = doc_key + + # Per-file analyzer override from additional_properties + props = content.additional_properties or {} + per_file_analyzer = props.get("analyzer_id") + content_range = props.get("content_range") + resolved_analyzer = per_file_analyzer or self._resolve_analyzer_id(media_type) + t0 = time.monotonic() + + try: + # Start CU analysis + if content.type == "uri" and content.uri and not content.uri.startswith("data:"): + poller = await self._client.begin_analyze( + resolved_analyzer, + inputs=[AnalysisInput(url=content.uri, content_range=content_range)], + ) + elif binary_data: + poller = await self._client.begin_analyze_binary( + resolved_analyzer, + binary_input=binary_data, + content_type=media_type, + ) + else: + context.extend_instructions( + self.source_id, + f"Could not extract file data from '{filename}'.", + ) + return None + + # Wait with timeout; defer to background polling on timeout. + try: + result = await asyncio.wait_for(poller.result(), timeout=self.max_wait) + except asyncio.TimeoutError: + task = asyncio.create_task(self._background_poll(poller)) + if pending_tasks is not None: + pending_tasks[doc_key] = task + self._all_pending_tasks.append(task) + context.extend_instructions( + self.source_id, + f"Document '{filename}' is being analyzed. Ask about it again in a moment.", + ) + return DocumentEntry( + status=DocumentStatus.ANALYZING, + filename=filename, + media_type=media_type, + analyzer_id=resolved_analyzer, + analyzed_at=None, + analysis_duration_s=None, + upload_duration_s=None, + result=None, + error=None, + ) + + # Analysis completed within timeout + analysis_duration = round(time.monotonic() - t0, 2) + extracted = self._extract_sections(result) + logger.info("Analyzed '%s' with analyzer '%s' in %.1fs.", filename, resolved_analyzer, analysis_duration) + return DocumentEntry( + status=DocumentStatus.READY, + filename=filename, + media_type=media_type, + analyzer_id=resolved_analyzer, + analyzed_at=datetime.now(tz=timezone.utc).isoformat(), + analysis_duration_s=analysis_duration, + upload_duration_s=None, + result=extracted, + error=None, + ) + + except asyncio.TimeoutError: + raise + except Exception as e: + logger.warning("CU analysis error for '%s': %s", filename, e) + context.extend_instructions( + self.source_id, + f"Could not analyze '{filename}': {e}", + ) + return DocumentEntry( + status=DocumentStatus.FAILED, + filename=filename, + media_type=media_type, + analyzer_id=resolved_analyzer, + analyzed_at=datetime.now(tz=timezone.utc).isoformat(), + analysis_duration_s=round(time.monotonic() - t0, 2), + upload_duration_s=None, + result=None, + error=str(e), + ) + + async def _background_poll(self, poller: Any) -> AnalysisResult: + """Poll a CU operation in the background until completion.""" + return await poller.result() # type: ignore[no-any-return] + + # ------------------------------------------------------------------ + # Pending Task Resolution + # ------------------------------------------------------------------ + + def _resolve_pending_tasks( + self, + pending_tasks: dict[str, asyncio.Task[AnalysisResult]], + pending_uploads: list[tuple[str, DocumentEntry]], + documents: dict[str, DocumentEntry], + context: SessionContext, + ) -> None: + """Check for completed background CU analysis tasks and update document state. + + When a file's CU analysis exceeds ``max_wait``, it is deferred to a background + ``asyncio.Task``. This method checks all pending tasks on the next ``before_run()`` + call: completed tasks have their results extracted and status set to ``READY``; + failed tasks are marked ``FAILED`` with an error message. + + In file_search mode, completed documents are queued in ``_pending_uploads`` + for vector store upload (handled in step 1b of ``before_run``). + """ + completed_keys: list[str] = [] + + for doc_key, task in pending_tasks.items(): + if not task.done(): + continue + + completed_keys.append(doc_key) + entry = documents.get(doc_key) + if not entry: + continue + + try: + result = task.result() + extracted = self._extract_sections(result) + entry["status"] = DocumentStatus.READY + entry["analyzed_at"] = datetime.now(tz=timezone.utc).isoformat() + entry["result"] = extracted + entry["error"] = None + # analysis_duration_s stays None for background tasks (indeterminate) + logger.info("Background analysis of '%s' completed.", entry["filename"]) + + # Inject newly ready content + if self.file_search: + # Upload to vector store — do NOT inject markdown into messages + # (this is a sync context; schedule the upload as a task) + pending_uploads.append((doc_key, entry)) + else: + context.extend_messages( + self, + [ + Message(role="user", text=self._format_result(entry["filename"], extracted)), + ], + ) + context.extend_instructions( + self.source_id, + f"Document '{entry['filename']}' analysis is now complete." + + ( + " The document is being indexed in the vector store and will become" + " searchable via file_search shortly." + if self.file_search + else " The content is provided above." + ), + ) + + except Exception as e: + logger.warning("Background analysis of '%s' failed: %s", entry.get("filename", doc_key), e) + entry["status"] = DocumentStatus.FAILED + entry["analyzed_at"] = datetime.now(tz=timezone.utc).isoformat() + entry["error"] = str(e) + context.extend_instructions( + self.source_id, + f"Document '{entry['filename']}' analysis failed: {e}", + ) + + for key in completed_keys: + del pending_tasks[key] + + # ------------------------------------------------------------------ + # Output Extraction & Formatting (delegates to _extraction module) + # ------------------------------------------------------------------ + + def _extract_sections(self, result: AnalysisResult) -> dict[str, object]: + return extract_sections(result, self.output_sections) + + @staticmethod + def _extract_field_value(field: Any) -> object: + return extract_field_value(field) + + @staticmethod + def _flatten_field(field: Any) -> object: + return flatten_field(field) + + @staticmethod + def _format_result(filename: str, result: dict[str, object]) -> str: + return format_result(filename, result) + + # ------------------------------------------------------------------ + # Tool Registration + # ------------------------------------------------------------------ + + def _register_tools( + self, + documents: dict[str, DocumentEntry], + context: SessionContext, + ) -> None: + """Register document tools on the context. + + Only ``list_documents`` is registered — the full document content is + already injected into conversation history on the upload turn, so a + separate retrieval tool is not needed. + """ + context.extend_tools( + self.source_id, + [self._make_list_documents_tool(documents)], + ) + + @staticmethod + def _make_list_documents_tool(documents: dict[str, DocumentEntry]) -> FunctionTool: + """Create a tool that lists all tracked documents with their status.""" + docs_ref = documents + + def list_documents() -> str: + """List all documents that have been uploaded and their analysis status.""" + entries: list[dict[str, object]] = [] + for name, entry in docs_ref.items(): + entries.append({ + "name": name, + "status": entry["status"], + "media_type": entry["media_type"], + "analyzed_at": entry["analyzed_at"], + "analysis_duration_s": entry["analysis_duration_s"], + "upload_duration_s": entry["upload_duration_s"], + }) + return json.dumps(entries, indent=2, default=str) + + return FunctionTool( + name="list_documents", + description=( + "List all documents that have been uploaded in this session " + "with their analysis status (analyzing, uploading, ready, or failed)." + ), + func=list_documents, + ) + + # ------------------------------------------------------------------ + # file_search Vector Store Integration + # ------------------------------------------------------------------ + + async def _upload_to_vector_store( + self, + doc_key: str, + entry: DocumentEntry, + *, + timeout: float | None = None, + state: dict[str, Any] | None = None, + ) -> bool: + """Upload CU-extracted markdown to the caller's vector store. + + Delegates to the configured ``FileSearchBackend`` (OpenAI, Foundry, + or a custom implementation). The upload includes file upload **and** + vector store indexing (embedding + ingestion) — ``create_and_poll`` + waits for the index to be fully ready before returning. + + Args: + doc_key: Document identifier. + entry: The document entry with extracted results. + timeout: Max seconds to wait for upload + indexing. ``None`` waits + indefinitely. On timeout the upload is deferred to the + per-session ``_pending_uploads`` queue for the next + ``before_run()`` call. + state: Per-session state dict for tracking uploaded file IDs and + pending uploads. + + Returns: + True if the upload succeeded, False otherwise. + """ + if not self.file_search: + return False + + result = entry.get("result") + if not result: + return False + + # Upload the full formatted content (markdown + fields + segments), + # not just raw markdown — consistent with what non-file_search mode injects. + formatted = self._format_result(entry["filename"], result) + if not formatted: + return False + + entry["status"] = DocumentStatus.UPLOADING + t0 = time.monotonic() + + try: + upload_coro = self.file_search.backend.upload_file( + self.file_search.vector_store_id, f"{doc_key}.md", formatted.encode("utf-8") + ) + file_id = await asyncio.wait_for(upload_coro, timeout=timeout) + upload_duration = round(time.monotonic() - t0, 2) + # Track in per-session state and global list (for close() cleanup) + if state is not None: + state.setdefault("_uploaded_file_ids", []).append(file_id) + self._all_uploaded_file_ids.append(file_id) + entry["status"] = DocumentStatus.READY + entry["upload_duration_s"] = upload_duration + logger.info("Uploaded '%s' to vector store in %.1fs (%s bytes).", doc_key, upload_duration, len(formatted)) + return True + + except asyncio.TimeoutError: + logger.info("Vector store upload for '%s' timed out; deferring to background.", doc_key) + entry["status"] = DocumentStatus.UPLOADING + if state is not None: + state.setdefault("_pending_uploads", []).append((doc_key, entry)) + return False + + except Exception as e: + logger.warning("Failed to upload '%s' to vector store: %s", doc_key, e) + entry["status"] = DocumentStatus.FAILED + entry["upload_duration_s"] = round(time.monotonic() - t0, 2) + entry["error"] = f"Vector store upload failed: {e}" + return False + + async def _cleanup_uploaded_files(self) -> None: + """Delete files uploaded by this provider via the configured backend. + + The vector store itself is caller-managed and is not deleted here. + """ + if not self.file_search: + return + + backend = self.file_search.backend + + try: + for file_id in self._all_uploaded_file_ids: + await backend.delete_file(file_id) + self._all_uploaded_file_ids.clear() + + except Exception as e: + logger.warning("Failed to clean up uploaded files: %s", e) diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_detection.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_detection.py new file mode 100644 index 0000000000..a456ab1208 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_detection.py @@ -0,0 +1,175 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""File detection utilities for Azure Content Understanding context provider. + +Functions for scanning input messages, sniffing MIME types, deriving +document keys, and extracting binary data from content items. +""" + +from __future__ import annotations + +import base64 +import logging +import mimetypes +import re +import uuid + +import filetype +from agent_framework import Content, SessionContext + +from ._constants import MIME_ALIASES, SUPPORTED_MEDIA_TYPES + +logger = logging.getLogger("agent_framework.azure_ai_contentunderstanding") + + +def detect_and_strip_files( + context: SessionContext, +) -> list[tuple[str, Content, bytes | None]]: + """Scan input messages for supported file content and prepare for CU analysis. + + Scans for type ``data`` or ``uri`` content supported by Azure Content + Understanding, strips them from messages to prevent raw binary being sent + to the LLM, and returns metadata for CU analysis. + + Detected files are tracked via ``doc_key`` (derived from filename, URL, + or UUID) and their analysis status is managed in session state. + + When the upstream MIME type is unreliable (``application/octet-stream`` + or missing), binary content sniffing via ``filetype`` is used to + determine the real media type, with ``mimetypes.guess_type`` as a + filename-based fallback. + + Returns: + List of (doc_key, content_item, binary_data) tuples for files to analyze. + """ + results: list[tuple[str, Content, bytes | None]] = [] + strip_ids: set[int] = set() + + for msg in context.input_messages: + for c in msg.contents: + if c.type not in ("data", "uri"): + continue + + media_type = c.media_type + # Fast path: already a known supported type + if media_type and media_type in SUPPORTED_MEDIA_TYPES: + binary_data = extract_binary(c) + results.append((derive_doc_key(c), c, binary_data)) + strip_ids.add(id(c)) + continue + + # Slow path: unreliable MIME — sniff binary content + if (not media_type) or (media_type == "application/octet-stream"): + binary_data = extract_binary(c) + resolved = sniff_media_type(binary_data, c) + if resolved and (resolved in SUPPORTED_MEDIA_TYPES): + c.media_type = resolved + results.append((derive_doc_key(c), c, binary_data)) + strip_ids.add(id(c)) + + # Strip detected files from input so raw binary isn't sent to LLM + msg.contents = [c for c in msg.contents if id(c) not in strip_ids] + + return results + + +def sniff_media_type(binary_data: bytes | None, content: Content) -> str | None: + """Sniff the actual MIME type from binary data, with filename fallback. + + Uses ``filetype`` (magic-bytes) first, then ``mimetypes.guess_type`` + on the filename. Normalizes filetype's variant MIME values (e.g. + ``audio/x-wav`` -> ``audio/wav``) via ``MIME_ALIASES``. + """ + # 1. Binary sniffing via filetype (needs only first 261 bytes) + if binary_data: + kind = filetype.guess(binary_data[:262]) # type: ignore[reportUnknownMemberType] + if kind: + mime: str = kind.mime # type: ignore[reportUnknownMemberType] + return MIME_ALIASES.get(mime, mime) + + # 2. Filename extension fallback — try additional_properties first, + # then extract basename from external URL path + filename: str | None = None + if content.additional_properties: + filename = content.additional_properties.get("filename") + if not filename and content.uri and not content.uri.startswith("data:"): + # Extract basename from URL path (e.g. "https://example.com/report.pdf?v=1" -> "report.pdf") + filename = content.uri.split("?")[0].split("#")[0].rsplit("/", 1)[-1] + if filename: + guessed, _ = mimetypes.guess_type(filename) # uses file extension to guess MIME type + if guessed: + return MIME_ALIASES.get(guessed, guessed) + + return None + + +def is_supported_content(content: Content) -> bool: + """Check if a content item is a supported file type for CU analysis.""" + if content.type not in ("data", "uri"): + return False + media_type = content.media_type + if not media_type: + return False + return media_type in SUPPORTED_MEDIA_TYPES + + +def sanitize_doc_key(raw: str) -> str: + """Sanitize a document key to prevent prompt injection. + + Removes control characters (newlines, tabs, etc.), collapses + whitespace, strips surrounding whitespace, and caps length at + 255 characters. + """ + # Remove control characters (C0/C1 controls, including \n, \r, \t) + cleaned = re.sub(r"[\x00-\x1f\x7f-\x9f]", "", raw) + # Collapse whitespace + cleaned = " ".join(cleaned.split()) + # Cap length + return cleaned[:255] if cleaned else f"doc_{uuid.uuid4().hex[:8]}" + + +def derive_doc_key(content: Content) -> str: + """Derive a unique document key from content metadata. + + The key is used to track documents in session state. Duplicate keys + within a session are rejected (not re-analyzed) to prevent orphaned + vector store entries. + + The returned key is sanitized to prevent prompt injection via + crafted filenames (control characters removed, length capped). + + Priority: filename > URL basename > generated UUID. + """ + # 1. Filename from additional_properties + if content.additional_properties: + filename = content.additional_properties.get("filename") + if filename and isinstance(filename, str): + return sanitize_doc_key(filename) + + # 2. URL path basename for external URIs (e.g. "https://example.com/report.pdf" -> "report.pdf") + if content.type == "uri" and content.uri and not content.uri.startswith("data:"): + path = content.uri.split("?")[0].split("#")[0] # strip query params and fragments + # rstrip("/") handles trailing slashes (e.g. ".../files/" -> ".../files") + # rsplit("/", 1)[-1] splits from the right once to get the last path segment + basename = path.rstrip("/").rsplit("/", 1)[-1] + if basename: + return sanitize_doc_key(basename) + + # 3. Fallback: generate a unique ID for anonymous uploads (no filename, no URL) + return f"doc_{uuid.uuid4().hex[:8]}" + + +def extract_binary(content: Content) -> bytes | None: + """Extract binary data from a data URI content item. + + Only handles ``data:`` URIs (base64-encoded). Returns ``None`` for + external URLs -- those are passed directly to CU via ``begin_analyze``. + """ + if content.uri and content.uri.startswith("data:"): + try: + _, data_part = content.uri.split(",", 1) + return base64.b64decode(data_part) + except Exception: + logger.warning("Failed to decode base64 data URI") + return None + return None diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_extraction.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_extraction.py new file mode 100644 index 0000000000..0679e53db1 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_extraction.py @@ -0,0 +1,303 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Output extraction and formatting for Azure Content Understanding results. + +Converts CU ``AnalysisResult`` objects into plain Python dicts suitable +for LLM consumption, and formats them as human-readable text. +""" + +from __future__ import annotations + +import json +from typing import Any, cast + +from azure.ai.contentunderstanding.models import AnalysisResult + +from ._models import AnalysisSection + + +def extract_sections( + result: AnalysisResult, + output_sections: list[AnalysisSection], +) -> dict[str, object]: + """Extract configured sections from a CU analysis result. + + For single-segment results (documents, images, short audio), returns a flat + dict with ``markdown`` and ``fields`` at the top level. + + For multi-segment results (e.g. video split into scenes), fields are kept + with their respective segments in a ``segments`` list so the LLM can see + which fields belong to which part of the content: + - ``segments``: list of per-segment dicts with ``markdown``, ``fields``, + ``start_time_s``, and ``end_time_s`` + - ``markdown``: still concatenated at top level for file_search uploads + - ``duration_seconds``: computed from the global time span + - ``kind`` / ``resolution``: taken from the first segment + """ + extracted: dict[str, object] = {} + contents = result.contents + if not contents: + return extracted + + # --- Warnings from the CU service (ODataV4Format with code/message/target) --- + if result.warnings: + warnings_out: list[dict[str, str]] = [] + for w in result.warnings: + entry: dict[str, str] = {} + code = getattr(w, "code", None) + if code: + entry["code"] = code + msg = getattr(w, "message", None) + entry["message"] = msg if msg else str(w) + target = getattr(w, "target", None) + if target: + entry["target"] = target + warnings_out.append(entry) + extracted["warnings"] = warnings_out + + # --- Media metadata (from first segment) --- + first = contents[0] + kind = getattr(first, "kind", None) + if kind: + extracted["kind"] = kind + width = getattr(first, "width", None) + height = getattr(first, "height", None) + if width and height: + extracted["resolution"] = f"{width}x{height}" + + # Compute total duration from the global time span of all segments. + global_start: int | None = None + global_end: int | None = None + for content in contents: + s = getattr(content, "start_time_ms", None) + if s is None: + s = getattr(content, "startTimeMs", None) + e = getattr(content, "end_time_ms", None) + if e is None: + e = getattr(content, "endTimeMs", None) + if s is not None: + global_start = s if global_start is None else min(global_start, s) + if e is not None: + global_end = e if global_end is None else max(global_end, e) + if global_start is not None and global_end is not None: + extracted["duration_seconds"] = round((global_end - global_start) / 1000, 1) + + is_multi_segment = len(contents) > 1 + + # --- Single-segment: flat output (documents, images, short audio) --- + if not is_multi_segment: + if AnalysisSection.MARKDOWN in output_sections and contents[0].markdown: + extracted["markdown"] = contents[0].markdown + if AnalysisSection.FIELDS in output_sections and contents[0].fields: + fields: dict[str, object] = {} + for name, field in contents[0].fields.items(): + entry_dict: dict[str, object] = { + "type": getattr(field, "type", None), + "value": extract_field_value(field), + } + confidence = getattr(field, "confidence", None) + if confidence is not None: + entry_dict["confidence"] = confidence + fields[name] = entry_dict + if fields: + extracted["fields"] = fields + # Content-level category (e.g. from classifier analyzers) + category = getattr(contents[0], "category", None) + if category: + extracted["category"] = category + return extracted + + # --- Multi-segment: per-segment output (video scenes, long audio) --- + # Each segment keeps its own markdown + fields together so the LLM can + # see which fields (e.g. Summary) belong to which part of the content. + segments_out: list[dict[str, object]] = [] + md_parts: list[str] = [] # also collect for top-level concatenated markdown + + for content in contents: + seg: dict[str, object] = {} + + # Time range for this segment + s = getattr(content, "start_time_ms", None) + if s is None: + s = getattr(content, "startTimeMs", None) + e = getattr(content, "end_time_ms", None) + if e is None: + e = getattr(content, "endTimeMs", None) + if s is not None: + seg["start_time_s"] = round(s / 1000, 1) + if e is not None: + seg["end_time_s"] = round(e / 1000, 1) + + # Per-segment markdown + if AnalysisSection.MARKDOWN in output_sections and content.markdown: + seg["markdown"] = content.markdown + md_parts.append(content.markdown) + + # Per-segment fields + if AnalysisSection.FIELDS in output_sections and content.fields: + seg_fields: dict[str, object] = {} + for name, field in content.fields.items(): + seg_entry: dict[str, object] = { + "type": getattr(field, "type", None), + "value": extract_field_value(field), + } + confidence = getattr(field, "confidence", None) + if confidence is not None: + seg_entry["confidence"] = confidence + seg_fields[name] = seg_entry + if seg_fields: + seg["fields"] = seg_fields + + # Per-segment category (e.g. from classifier analyzers) + category = getattr(content, "category", None) + if category: + seg["category"] = category + + segments_out.append(seg) + + extracted["segments"] = segments_out + + # Top-level concatenated markdown (used by file_search for vector store upload) + if md_parts: + extracted["markdown"] = "\n\n---\n\n".join(md_parts) + + return extracted + + +def extract_field_value(field: Any) -> object: + """Extract the plain Python value from a CU ``ContentField``. + + Uses the SDK's ``.value`` convenience property, which dynamically + reads the correct ``value_*`` attribute for each field type. + Object and array types are recursively flattened so that the + output contains only plain Python primitives (str, int, float, + date, dict, list) -- no SDK model objects or raw wire format + (``valueNumber``, ``spans``, ``source``, etc.). + """ + field_type = getattr(field, "type", None) + raw = getattr(field, "value", None) + + # Object fields -> recursively resolve nested sub-fields + if field_type == "object" and raw is not None and isinstance(raw, dict): + return { + str(k): flatten_field(v) + for k, v in cast(dict[str, Any], raw).items() + } + + # Array fields -> list of flattened items (each with value + optional confidence) + if field_type == "array" and raw is not None and isinstance(raw, list): + return [ + flatten_field(item) + for item in cast(list[Any], raw) + ] + + # Scalar fields (string, number, date, etc.) -- .value returns native Python type + return raw + + +def flatten_field(field: Any) -> object: + """Flatten a CU ``ContentField`` into a ``{type, value, confidence}`` dict. + + Used for sub-fields inside object and array types to preserve + per-field confidence scores. Confidence is omitted when ``None`` + to reduce token usage. + """ + field_type = getattr(field, "type", None) + value = extract_field_value(field) + confidence = getattr(field, "confidence", None) + + result: dict[str, object] = {"type": field_type, "value": value} + if confidence is not None: + result["confidence"] = confidence + return result + + +def format_result(filename: str, result: dict[str, object]) -> str: + """Format extracted CU result for LLM consumption. + + For multi-segment results (video/audio with ``segments``), each segment's + markdown and fields are grouped together so the LLM can see which fields + belong to which part of the content. + """ + kind = result.get("kind") + is_video = kind == "audioVisual" + is_audio = kind == "audio" + + # Header -- media-aware label + if is_video: + label = "Video analysis" + elif is_audio: + label = "Audio analysis" + else: + label = "Document analysis" + parts: list[str] = [f'{label} of "{filename}":'] + + # Media metadata line (duration, resolution) + meta_items: list[str] = [] + duration = result.get("duration_seconds") + if duration is not None: + mins, secs = divmod(int(duration), 60) # type: ignore[call-overload] + meta_items.append(f"Duration: {mins}:{secs:02d}") + resolution = result.get("resolution") + if resolution: + meta_items.append(f"Resolution: {resolution}") + if meta_items: + parts.append(" | ".join(meta_items)) + + # --- Multi-segment: format each segment with its own content + fields --- + raw_segments = result.get("segments") + segments: list[dict[str, object]] = ( + cast(list[dict[str, object]], raw_segments) if isinstance(raw_segments, list) else [] + ) + if segments: + for i, seg in enumerate(segments): + # Segment header with time range + start = seg.get("start_time_s") + end = seg.get("end_time_s") + if start is not None and end is not None: + s_min, s_sec = divmod(int(start), 60) # type: ignore[call-overload] + e_min, e_sec = divmod(int(end), 60) # type: ignore[call-overload] + parts.append(f"\n### Segment {i + 1} ({s_min}:{s_sec:02d} - {e_min}:{e_sec:02d})") + else: + parts.append(f"\n### Segment {i + 1}") + + # Segment markdown + seg_md = seg.get("markdown") + if seg_md: + parts.append(f"\n```markdown\n{seg_md}\n```") + + # Segment fields + seg_fields = seg.get("fields") + if isinstance(seg_fields, dict) and seg_fields: + fields_json = json.dumps(seg_fields, indent=2, default=str) + parts.append(f"\n**Fields:**\n```json\n{fields_json}\n```") + + return "\n".join(parts) + + # --- Single-segment: flat format --- + fields_raw = result.get("fields") + fields: dict[str, object] = cast(dict[str, object], fields_raw) if isinstance(fields_raw, dict) else {} + + # For audio: promote Summary field as prose before markdown + if is_audio and fields: + summary_field = fields.get("Summary") + if isinstance(summary_field, dict): + sf = cast(dict[str, object], summary_field) + if sf.get("value"): + parts.append(f"\n## Summary\n\n{sf['value']}") + + # Markdown content + markdown = result.get("markdown") + if markdown: + parts.append(f"\n## Content\n\n```markdown\n{markdown}\n```") + + # Fields section + if fields: + remaining = dict(fields) + if is_audio: + remaining = {k: v for k, v in remaining.items() if k != "Summary"} + if remaining: + fields_json = json.dumps(remaining, indent=2, default=str) + parts.append(f"\n## Extracted Fields\n\n```json\n{fields_json}\n```") + + return "\n".join(parts) diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_file_search.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_file_search.py new file mode 100644 index 0000000000..a9526f6ebc --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_file_search.py @@ -0,0 +1,101 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""File search backend abstraction for vector store file operations. + +Provides a unified interface for uploading CU-extracted content to +vector stores across different LLM clients. Two implementations: + +- ``OpenAIFileSearchBackend`` — for ``OpenAIChatClient`` (Responses API) +- ``FoundryFileSearchBackend`` — for ``FoundryChatClient`` (Responses API via Azure) + +Both share the same OpenAI-compatible vector store file API but differ +in the file upload ``purpose`` value. + +Vector store creation, tool construction, and lifecycle management are +the caller's responsibility — the backend only handles file upload/delete. +""" + +from __future__ import annotations + +import io +from abc import ABC, abstractmethod +from typing import Any + + +class FileSearchBackend(ABC): + """Abstract interface for vector store file operations. + + Implementations handle the differences between OpenAI and Foundry + file upload APIs (e.g., different ``purpose`` values). + + Vector store creation, deletion, and ``file_search`` tool construction + are **not** part of this interface — those are managed by the caller. + """ + + @abstractmethod + async def upload_file(self, vector_store_id: str, filename: str, content: bytes) -> str: + """Upload a file to a vector store and return the file ID.""" + + @abstractmethod + async def delete_file(self, file_id: str) -> None: + """Delete a previously uploaded file by ID.""" + + +class _OpenAICompatBackend(FileSearchBackend): + """Shared base for OpenAI-compatible file upload backends. + + Both OpenAI and Foundry use the same ``client.files.*`` and + ``client.vector_stores.files.*`` API surface. Subclasses only + override the file upload ``purpose``. + """ + + _FILE_PURPOSE: str # Subclasses must set this + + def __init__(self, client: Any) -> None: + self._client = client + + async def upload_file(self, vector_store_id: str, filename: str, content: bytes) -> str: + uploaded = await self._client.files.create( + file=(filename, io.BytesIO(content)), + purpose=self._FILE_PURPOSE, + ) + # Use create_and_poll to wait for indexing to complete before returning. + # Without this, file_search queries may return no results immediately + # after upload because the vector store index isn't ready yet. + await self._client.vector_stores.files.create_and_poll( + vector_store_id=vector_store_id, + file_id=uploaded.id, + ) + return uploaded.id # type: ignore[no-any-return] + + async def delete_file(self, file_id: str) -> None: + await self._client.files.delete(file_id) + + +class OpenAIFileSearchBackend(_OpenAICompatBackend): + """File search backend for OpenAI Responses API. + + Use with ``OpenAIChatClient`` or ``AzureOpenAIResponsesClient``. + Requires an ``AsyncOpenAI`` or ``AsyncAzureOpenAI`` client. + + Args: + client: An async OpenAI client (``AsyncOpenAI`` or ``AsyncAzureOpenAI``) + that supports ``client.files.*`` and ``client.vector_stores.*`` APIs. + """ + + _FILE_PURPOSE = "user_data" + + +class FoundryFileSearchBackend(_OpenAICompatBackend): + """File search backend for Azure AI Foundry. + + Use with ``FoundryChatClient``. Requires the OpenAI-compatible client + obtained from ``FoundryChatClient.client`` (i.e., + ``project_client.get_openai_client()``). + + Args: + client: The OpenAI-compatible async client from a ``FoundryChatClient`` + (access via ``foundry_client.client``). + """ + + _FILE_PURPOSE = "assistants" diff --git a/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_models.py b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_models.py new file mode 100644 index 0000000000..eb02b60b8c --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_models.py @@ -0,0 +1,117 @@ +# Copyright (c) Microsoft. All rights reserved. + +from __future__ import annotations + +from dataclasses import dataclass +from enum import Enum +from typing import Any, TypedDict + +from ._file_search import FileSearchBackend, FoundryFileSearchBackend, OpenAIFileSearchBackend + + +class DocumentStatus(str, Enum): + """Analysis lifecycle state of a tracked document.""" + + ANALYZING = "analyzing" + """CU analysis is in progress (deferred to background).""" + + UPLOADING = "uploading" + """Analysis complete; vector store upload + indexing is in progress.""" + + READY = "ready" + """Analysis (and upload, if applicable) completed successfully.""" + + FAILED = "failed" + """Analysis or upload failed.""" + + +class AnalysisSection(str, Enum): + """Selects which sections of the CU output to pass to the LLM.""" + + MARKDOWN = "markdown" + """Full document text with tables as HTML, reading order preserved.""" + + FIELDS = "fields" + """Extracted typed fields with confidence scores (when available).""" + + +class DocumentEntry(TypedDict): + """Tracks the analysis state of a single document in session state.""" + + status: DocumentStatus + filename: str + media_type: str + analyzer_id: str + analyzed_at: str | None + analysis_duration_s: float | None + upload_duration_s: float | None + result: dict[str, object] | None + error: str | None + + +@dataclass +class FileSearchConfig: + """Configuration for uploading CU-extracted content to an existing vector store. + + When provided to ``ContentUnderstandingContextProvider``, analyzed document + markdown is automatically uploaded to the specified vector store and the + given ``file_search`` tool is registered on the context. This enables + token-efficient RAG retrieval on follow-up turns for large documents. + + The caller is responsible for creating and managing the vector store and + the ``file_search`` tool. Use :meth:`from_openai` or :meth:`from_foundry` + factory methods for convenience. + + Args: + backend: A ``FileSearchBackend`` that handles file upload/delete + operations for the target vector store. + vector_store_id: The ID of a pre-existing vector store to upload to. + file_search_tool: A ``file_search`` tool object created via the LLM + client's ``get_file_search_tool()`` factory method. This is + registered on the context via ``extend_tools`` so the LLM can + retrieve uploaded content. + """ + + backend: FileSearchBackend + vector_store_id: str + file_search_tool: Any + + @staticmethod + def from_openai( + client: Any, + *, + vector_store_id: str, + file_search_tool: Any, + ) -> FileSearchConfig: + """Create a config for OpenAI Responses API (``OpenAIChatClient``). + + Args: + client: An ``AsyncOpenAI`` or ``AsyncAzureOpenAI`` client. + vector_store_id: The ID of the vector store to upload to. + file_search_tool: Tool from ``OpenAIChatClient.get_file_search_tool()``. + """ + return FileSearchConfig( + backend=OpenAIFileSearchBackend(client), + vector_store_id=vector_store_id, + file_search_tool=file_search_tool, + ) + + @staticmethod + def from_foundry( + client: Any, + *, + vector_store_id: str, + file_search_tool: Any, + ) -> FileSearchConfig: + """Create a config for Azure AI Foundry (``FoundryChatClient``). + + Args: + client: The OpenAI-compatible client from ``FoundryChatClient.client``. + vector_store_id: The ID of the vector store to upload to. + file_search_tool: Tool from ``FoundryChatClient.get_file_search_tool()``. + """ + return FileSearchConfig( + backend=FoundryFileSearchBackend(client), + vector_store_id=vector_store_id, + file_search_tool=file_search_tool, + ) diff --git a/python/packages/azure-ai-contentunderstanding/pyproject.toml b/python/packages/azure-ai-contentunderstanding/pyproject.toml new file mode 100644 index 0000000000..c7a90ec489 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/pyproject.toml @@ -0,0 +1,100 @@ +[project] +name = "agent-framework-azure-ai-contentunderstanding" +description = "Azure Content Understanding integration for Microsoft Agent Framework." +authors = [{ name = "Microsoft", email = "af-support@microsoft.com" }] +readme = "README.md" +requires-python = ">=3.10" +version = "1.0.0b260401" +license-files = ["LICENSE"] +urls.homepage = "https://aka.ms/agent-framework" +urls.source = "https://github.com/microsoft/agent-framework/tree/main/python" +urls.release_notes = "https://github.com/microsoft/agent-framework/releases?q=tag%3Apython-1&expanded=true" +urls.issues = "https://github.com/microsoft/agent-framework/issues" +classifiers = [ + "License :: OSI Approved :: MIT License", + "Development Status :: 4 - Beta", + "Intended Audience :: Developers", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Programming Language :: Python :: 3.14", + "Typing :: Typed", +] +dependencies = [ + "agent-framework-core>=1.0.0rc5", + "azure-ai-contentunderstanding>=1.0.0,<1.1", + "aiohttp>=3.9,<4", + "filetype>=1.2,<2", +] + +[tool.uv] +prerelease = "if-necessary-or-explicit" +environments = [ + "sys_platform == 'darwin'", + "sys_platform == 'linux'", + "sys_platform == 'win32'" +] + +[tool.uv-dynamic-versioning] +fallback-version = "0.0.0" + +[tool.pytest.ini_options] +testpaths = 'tests' +addopts = "-ra -q -r fEX" +asyncio_mode = "auto" +asyncio_default_fixture_loop_scope = "function" +timeout = 120 +markers = [ + "integration: marks tests as integration tests that require external services", +] + +[tool.ruff] +extend = "../../pyproject.toml" + +[tool.ruff.lint.per-file-ignores] +"**/tests/**" = ["D", "INP", "TD", "ERA001", "RUF", "S"] +"samples/**" = ["D", "INP", "ERA001", "RUF", "S", "T201", "CPY"] + +[tool.coverage.run] +omit = ["**/__init__.py"] + +[tool.pyright] +extends = "../../pyproject.toml" +include = ["agent_framework_azure_ai_contentunderstanding"] +exclude = ['tests'] + +[tool.mypy] +plugins = ['pydantic.mypy'] +strict = true +python_version = "3.10" +ignore_missing_imports = true +disallow_untyped_defs = true +no_implicit_optional = true +check_untyped_defs = true +warn_return_any = true +show_error_codes = true +warn_unused_ignores = false +disallow_incomplete_defs = true +disallow_untyped_decorators = true + +[tool.bandit] +targets = ["agent_framework_azure_ai_contentunderstanding"] +exclude_dirs = ["tests"] + +[tool.poe] +executor.type = "uv" +include = "../../shared_tasks.toml" + +[tool.poe.tasks.mypy] +help = "Run MyPy for this package." +cmd = "mypy --config-file $POE_ROOT/pyproject.toml agent_framework_azure_ai_contentunderstanding" + +[tool.poe.tasks.test] +help = "Run the default unit test suite for this package." +cmd = 'pytest -m "not integration" --cov=agent_framework_azure_ai_contentunderstanding --cov-report=term-missing:skip-covered tests' + +[build-system] +requires = ["flit-core >= 3.11,<4.0"] +build-backend = "flit_core.buildapi" diff --git a/python/packages/azure-ai-contentunderstanding/samples/01-get-started/01_document_qa.py b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/01_document_qa.py new file mode 100644 index 0000000000..15c9b69510 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/01_document_qa.py @@ -0,0 +1,120 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "agent-framework-azure-ai-contentunderstanding", +# "agent-framework-foundry", +# "azure-identity", +# ] +# /// +# Run with: uv run packages/azure-ai-contentunderstanding/samples/01-get-started/01_document_qa.py + +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +from pathlib import Path + +from agent_framework import Agent, Content, Message +from agent_framework.foundry import FoundryChatClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + +load_dotenv() + +""" +Document Q&A — PDF upload with CU-powered extraction + +This sample demonstrates the simplest CU integration: upload a PDF and +ask questions about it. Azure Content Understanding extracts structured +markdown with table preservation — superior to LLM-only vision for +scanned PDFs, handwritten content, and complex layouts. + +Environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL +""" + +# Path to a sample PDF — uses the shared sample asset if available, +# otherwise falls back to a public URL +SAMPLE_PDF_PATH = Path(__file__).resolve().parents[1] / "shared" / "sample_assets" / "invoice.pdf" + + +async def main() -> None: + credential = AzureCliCredential() + + # Set up Azure Content Understanding context provider + cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=credential, + analyzer_id="prebuilt-documentSearch", # RAG-optimized document analyzer + max_wait=None, # wait until CU analysis finishes (no background deferral) + ) + + # Set up the LLM client + client = FoundryChatClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], + credential=credential, + ) + + # Create agent with CU context provider. + # The provider extracts document content via CU and injects it into the + # LLM context so the agent can answer questions about the document. + async with cu: + agent = Agent( + client=client, + name="DocumentQA", + instructions=( + "You are a helpful document analyst. Use the analyzed document " + "content and extracted fields to answer questions precisely." + ), + context_providers=[cu], + ) + + # --- Turn 1: Upload PDF and ask a question --- + # 4. Upload PDF and ask questions + # The CU provider extracts markdown + fields from the PDF and injects + # the full content into context so the agent can answer precisely. + print("--- Upload PDF and ask questions ---") + + pdf_bytes = SAMPLE_PDF_PATH.read_bytes() + + response = await agent.run( + Message( + role="user", + contents=[ + Content.from_text( + "What is this document about? " + "Who is the vendor, and what is the total amount due?" + ), + Content.from_data( + pdf_bytes, + "application/pdf", + # Always provide filename — used as the document key + additional_properties={"filename": SAMPLE_PDF_PATH.name}, + ), + ], + ) + ) + usage = response.usage_details or {} + print(f"Agent: {response}") + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]\n") + + +if __name__ == "__main__": + asyncio.run(main()) + +""" +Sample output: + +--- Upload PDF and ask questions --- +Agent: This document is an **invoice** for services and fees billed to + **MICROSOFT CORPORATION** (Invoice **INV-100**), including line items + (e.g., Consulting Services, Document Fee, Printing Fee) and a billing summary. + - **Vendor:** **CONTOSO LTD.** + - **Total amount due:** **$610.00** + [Input tokens: 988] +""" diff --git a/python/packages/azure-ai-contentunderstanding/samples/01-get-started/02_multi_turn_session.py b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/02_multi_turn_session.py new file mode 100644 index 0000000000..3404c7fba0 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/02_multi_turn_session.py @@ -0,0 +1,145 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "agent-framework-azure-ai-contentunderstanding", +# "agent-framework-foundry", +# "azure-identity", +# ] +# /// +# Run with: uv run packages/azure-ai-contentunderstanding/samples/01-get-started/02_multi_turn_session.py + +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +from pathlib import Path + +from agent_framework import Agent, AgentSession, Content, Message +from agent_framework.foundry import FoundryChatClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + +load_dotenv() + +""" +Multi-Turn Session — Cached results across turns + +This sample demonstrates multi-turn document Q&A using an AgentSession. +The session persists CU analysis results and conversation history across +turns so the agent can answer follow-up questions about previously +uploaded documents without re-analyzing them. + +Key concepts: + - AgentSession keeps CU state and conversation history across agent.run() calls + - Turn 1: CU analyzes the PDF and injects full content into context + - Turn 2: Unrelated question — agent answers from general knowledge + - Turn 3: Detailed question — agent uses document content from conversation + history (injected in Turn 1) to answer precisely + +Environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL +""" + +SAMPLE_PDF_PATH = Path(__file__).resolve().parents[1] / "shared" / "sample_assets" / "invoice.pdf" + + +async def main() -> None: + # 1. Set up credentials and CU context provider + credential = AzureCliCredential() + + cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=credential, + analyzer_id="prebuilt-documentSearch", + max_wait=None, # wait until CU analysis finishes (no background deferral) + ) + + # 2. Set up the LLM client + client = FoundryChatClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], + credential=credential, + ) + + # 3. Create agent and persistent session + async with cu: + agent = Agent( + client=client, + name="DocumentQA", + instructions=( + "You are a helpful document analyst. Use the analyzed document " + "content and extracted fields to answer questions precisely." + ), + context_providers=[cu], + ) + + # Create a persistent session — this keeps CU state across turns + session = AgentSession() + + # 4. Turn 1: Upload PDF + # CU analyzes the PDF and injects full content into context. + print("--- Turn 1: Upload PDF ---") + pdf_bytes = SAMPLE_PDF_PATH.read_bytes() + response = await agent.run( + Message( + role="user", + contents=[ + Content.from_text("What is this document about?"), + Content.from_data( + pdf_bytes, + "application/pdf", + additional_properties={"filename": SAMPLE_PDF_PATH.name}, + ), + ], + ), + session=session, # <-- persist state across turns + ) + usage = response.usage_details or {} + print(f"Agent: {response}") + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]\n") + + # 5. Turn 2: Unrelated question + # No document needed — agent answers from general knowledge. + print("--- Turn 2: Unrelated question ---") + response = await agent.run("What is the capital of France?", session=session) + usage = response.usage_details or {} + print(f"Agent: {response}") + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]\n") + + # 6. Turn 3: Detailed follow-up + # The agent answers from the full document content that was injected + # into conversation history in Turn 1. No re-analysis or tool call needed. + print("--- Turn 3: Detailed follow-up ---") + response = await agent.run( + "What is the shipping address on the invoice?", + session=session, + ) + usage = response.usage_details or {} + print(f"Agent: {response}") + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]\n") + + +if __name__ == "__main__": + asyncio.run(main()) + +""" +Sample output: + +--- Turn 1: Upload PDF --- +Agent: This document is an **invoice** from **CONTOSO LTD.** to **MICROSOFT + CORPORATION**. Amount Due: $610.00. Invoice INV-100, dated 11/15/2019. + [Input tokens: 975] + +--- Turn 2: Unrelated question --- +Agent: Paris. + [Input tokens: 1134] + +--- Turn 3: Detailed follow-up --- +Agent: Shipping address (SHIP TO): Microsoft Delivery, 123 Ship St, + Redmond WA, 98052. + [Input tokens: 1155] +""" diff --git a/python/packages/azure-ai-contentunderstanding/samples/01-get-started/03_multimodal_chat.py b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/03_multimodal_chat.py new file mode 100644 index 0000000000..d2aa6dca95 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/03_multimodal_chat.py @@ -0,0 +1,188 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "agent-framework-azure-ai-contentunderstanding", +# "agent-framework-foundry", +# "azure-identity", +# ] +# /// +# Run with: uv run packages/azure-ai-contentunderstanding/samples/01-get-started/03_multimodal_chat.py + +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +import time +from pathlib import Path + +from agent_framework import Agent, AgentSession, Content, Message +from agent_framework.foundry import FoundryChatClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + +load_dotenv() + +""" +Multi-Modal Chat — PDF, audio, and video in a single turn + +This sample demonstrates CU's multi-modal capability: upload a PDF invoice, +an audio call recording, and a video file all at once. The provider analyzes +all three in parallel using the right CU analyzer for each media type. + +The provider auto-detects the media type and selects the right CU analyzer: + - PDF/images → prebuilt-documentSearch + - Audio → prebuilt-audioSearch + - Video → prebuilt-videoSearch + +Environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL +""" + +# Local PDF from package assets +SAMPLE_PDF = Path(__file__).resolve().parents[1] / "shared" / "sample_assets" / "invoice.pdf" + +# Public audio/video from Azure CU samples repo (raw GitHub URLs) +_CU_ASSETS = "https://raw.githubusercontent.com/Azure-Samples/azure-ai-content-understanding-assets/main" +AUDIO_URL = f"{_CU_ASSETS}/audio/callCenterRecording.mp3" +VIDEO_URL = f"{_CU_ASSETS}/videos/sdk_samples/FlightSimulator.mp4" + + +async def main() -> None: + # 1. Set up credentials and CU context provider + credential = AzureCliCredential() + + # No analyzer_id specified — the provider auto-detects from media type: + # PDF/images → prebuilt-documentSearch + # Audio → prebuilt-audioSearch + # Video → prebuilt-videoSearch + cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=credential, + max_wait=None, # wait until each analysis finishes + ) + + # 2. Set up the LLM client + client = FoundryChatClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], + credential=credential, + ) + + # 3. Create agent and session + async with cu: + agent = Agent( + client=client, + name="MultiModalAgent", + instructions=( + "You are a helpful assistant that can analyze documents, audio, " + "and video files. Answer questions using the extracted content." + ), + context_providers=[cu], + ) + + session = AgentSession() + + # --- Turn 1: Upload all 3 modalities at once --- + # The provider analyzes all files in parallel using the appropriate + # CU analyzer for each media type. All results are injected into + # the same context so the agent can answer about all of them. + turn1_prompt = ( + "I'm uploading three files: an invoice PDF, a call center " + "audio recording, and a flight simulator video. " + "Give a brief summary of each file." + ) + print("--- Turn 1: Upload PDF + audio + video (parallel analysis) ---") + print(" (CU analysis may take a few minutes for these audio/video files...)") + print(f"User: {turn1_prompt}") + t0 = time.perf_counter() + response = await agent.run( + Message( + role="user", + contents=[ + Content.from_text(turn1_prompt), + Content.from_data( + SAMPLE_PDF.read_bytes(), + "application/pdf", + additional_properties={"filename": "invoice.pdf"}, + ), + Content.from_uri( + AUDIO_URL, + media_type="audio/mp3", + additional_properties={"filename": "callCenterRecording.mp3"}, + ), + Content.from_uri( + VIDEO_URL, + media_type="video/mp4", + additional_properties={"filename": "FlightSimulator.mp4"}, + ), + ], + ), + session=session, + ) + elapsed = time.perf_counter() - t0 + usage = response.usage_details or {} + print(f" [Analyzed in {elapsed:.1f}s | Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + # --- Turn 2: Detail question about the PDF --- + turn2_prompt = "What are the line items and their amounts on the invoice?" + print("--- Turn 2: PDF detail ---") + print(f"User: {turn2_prompt}") + response = await agent.run(turn2_prompt, session=session) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + # --- Turn 3: Detail question about the audio --- + turn3_prompt = "What was the customer's issue in the call recording?" + print("--- Turn 3: Audio detail ---") + print(f"User: {turn3_prompt}") + response = await agent.run(turn3_prompt, session=session) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + # --- Turn 4: Detail question about the video --- + turn4_prompt = "What key scenes or actions are shown in the flight simulator video?" + print("--- Turn 4: Video detail ---") + print(f"User: {turn4_prompt}") + response = await agent.run(turn4_prompt, session=session) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + # --- Turn 5: Cross-document question --- + turn5_prompt = ( + "Across all three files, which one contains financial data, " + "which one involves a customer interaction, and which one is " + "a visual demonstration?" + ) + print("--- Turn 5: Cross-document question ---") + print(f"User: {turn5_prompt}") + response = await agent.run(turn5_prompt, session=session) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + +if __name__ == "__main__": + asyncio.run(main()) + +""" +Sample output: + +--- Turn 1: Upload PDF + audio + video (parallel analysis) --- +User: I'm uploading three files... + (CU analysis may take 1-2 minutes for audio/video files...) + [Analyzed in ~94s | Input tokens: ~2939] +Agent: ### invoice.pdf: An invoice from CONTOSO LTD. to MICROSOFT CORPORATION... + ### callCenterRecording.mp3: A customer service call about point balance... + ### FlightSimulator.mp4: A clip discussing neural text-to-speech... + +--- Turn 2-5: Detail and cross-document questions --- +(Agent answers from conversation history without re-analysis) +""" diff --git a/python/packages/azure-ai-contentunderstanding/samples/01-get-started/04_invoice_processing.py b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/04_invoice_processing.py new file mode 100644 index 0000000000..38de8b3b95 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/04_invoice_processing.py @@ -0,0 +1,146 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "agent-framework-azure-ai-contentunderstanding", +# "agent-framework-foundry", +# "azure-identity", +# ] +# /// +# Run with: uv run packages/azure-ai-contentunderstanding/samples/01-get-started/04_invoice_processing.py + +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +from pathlib import Path + +from agent_framework import Agent, AgentSession, Content, Message +from agent_framework.foundry import FoundryChatClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ( + AnalysisSection, + ContentUnderstandingContextProvider, +) + +load_dotenv() + +""" +Invoice Processing — Structured field extraction with prebuilt-invoice + +This sample demonstrates CU's structured field extraction using the +prebuilt-invoice analyzer. Unlike plain text extraction, the prebuilt-invoice +model returns typed fields (VendorName, InvoiceTotal, DueDate, LineItems, etc.) +with confidence scores — enabling precise, schema-aware document processing. + +Environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL +""" + +SAMPLE_PDF_PATH = Path(__file__).resolve().parents[1] / "shared" / "sample_assets" / "invoice.pdf" + + +async def main() -> None: + # 1. Set up credentials and CU context provider + credential = AzureCliCredential() + + # Default analyzer is prebuilt-documentSearch (RAG-optimized). + # Per-file override via additional_properties["analyzer_id"] lets us + # use prebuilt-invoice for structured field extraction on specific files. + cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=credential, + analyzer_id="prebuilt-documentSearch", # default for all files + max_wait=None, # wait until CU analysis finishes + output_sections=[ + AnalysisSection.MARKDOWN, + AnalysisSection.FIELDS, + ], + ) + + # 2. Set up the LLM client + client = FoundryChatClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], + credential=credential, + ) + + # 3. Create agent and session + async with cu: + agent = Agent( + client=client, + name="InvoiceProcessor", + instructions=( + "You are an invoice processing assistant. Use the extracted fields " + "(JSON with confidence scores) to answer precisely. When fields have " + "low confidence (< 0.8), mention this to the user. Format currency " + "values clearly." + ), + context_providers=[cu], + ) + + session = AgentSession() + + # 4. Upload an invoice PDF + print("--- Upload Invoice ---") + + pdf_bytes = SAMPLE_PDF_PATH.read_bytes() + + response = await agent.run( + Message( + role="user", + contents=[ + Content.from_text( + "Process this invoice. What is the vendor name, total amount, " + "and due date? List all line items if available." + ), + Content.from_data( + pdf_bytes, + "application/pdf", + # Per-file analyzer override: use prebuilt-invoice for + # structured field extraction (VendorName, InvoiceTotal, etc.) + # instead of the provider default (prebuilt-documentSearch). + additional_properties={ + "filename": SAMPLE_PDF_PATH.name, + "analyzer_id": "prebuilt-invoice", + }, + ), + ], + ), + session=session, + ) + print(f"Agent: {response}\n") + + # 5. Follow-up: ask about specific fields + print("--- Follow-up ---") + response = await agent.run( + "What is the payment term? Are there any fields with low confidence?", + session=session, + ) + print(f"Agent: {response}\n") + + +if __name__ == "__main__": + asyncio.run(main()) + +""" +Sample output: + +--- Upload Invoice --- +Agent: ## Key fields (invoice.pdf, page 1) + - Vendor name: CONTOSO LTD. (low confidence: 0.513) + - Total amount: USD $110.00 (low confidence: 0.782) + - Due date: 2019-12-15 (confidence: 0.979) + ## Line items: + 1) Consulting Services -- 2 hours @ $30.00, total $60.00 + 2) Document Fee -- 3 @ $10.00, total $30.00 + 3) Printing Fee -- 10 pages @ $1.00, total $10.00 + +--- Follow-up --- +Agent: Payment term: Not provided (null, confidence 0.872) + Fields with low confidence (< 0.80): VendorName (0.513), CustomerName (0.436), ... + Line item descriptions: Consulting Services (0.585), Document Fee (0.520), ... +""" diff --git a/python/packages/azure-ai-contentunderstanding/samples/01-get-started/05_background_analysis.py b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/05_background_analysis.py new file mode 100644 index 0000000000..b8273259f0 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/05_background_analysis.py @@ -0,0 +1,164 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "agent-framework-azure-ai-contentunderstanding", +# "agent-framework-foundry", +# "azure-identity", +# ] +# /// +# Run with: uv run packages/azure-ai-contentunderstanding/samples/01-get-started/05_background_analysis.py + +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +from pathlib import Path + +from agent_framework import Agent, AgentSession, Content, Message +from agent_framework.foundry import FoundryChatClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + +load_dotenv() + +""" +Background Analysis — Non-blocking file processing with status tracking + +This sample demonstrates the background analysis workflow: when CU analysis +takes longer than max_wait, the provider defers it to a background task and +the agent informs the user. On the next turn, the provider checks if the +background task has completed and surfaces the result. + +This is useful for large files (audio/video) where CU analysis can take +30-60+ seconds. The agent remains responsive while files are being processed. + +Key concepts: + - max_wait=1.0 forces background deferral (analysis takes longer than 1s) + - The provider tracks document status: analyzing → ready + - list_documents() tool shows current status of all tracked documents + - On subsequent turns, completed background tasks are automatically resolved + +TIP: For an interactive version with file upload UI, see the DevUI samples + in 02-devui/01-multimodal_agent/ + +Environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL +""" + +SAMPLE_PDF = Path(__file__).resolve().parents[1] / "shared" / "sample_assets" / "invoice.pdf" + + +async def main() -> None: + # 1. Set up credentials and CU context provider with short timeout + credential = AzureCliCredential() + + # Set max_wait=1.0 to force background deferral. + # Any CU analysis taking longer than 1 second will be deferred to a + # background task. The agent is told the file is "being analyzed" and + # can respond immediately. The result is picked up on the next turn. + cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=credential, + max_wait=1.0, # 1 second — forces background deferral for most files + ) + + # 2. Set up the LLM client + client = FoundryChatClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], + credential=credential, + ) + + # 3. Create agent and session + async with cu: + agent = Agent( + client=client, + name="BackgroundAgent", + instructions=( + "You are a helpful assistant. When a document is still being " + "analyzed, tell the user and suggest they ask again shortly. " + "Use list_documents() to check document status." + ), + context_providers=[cu], + ) + + session = AgentSession() + + # 4. Turn 1: Upload PDF (will timeout and defer to background) + # The provider starts CU analysis but it won't finish within 1 second, + # so it defers to a background task. The agent is told the document + # status is "analyzing" and responds accordingly. + print("--- Turn 1: Upload PDF (max_wait=1s, will defer to background) ---") + print("User: Analyze this invoice for me.") + response = await agent.run( + Message( + role="user", + contents=[ + Content.from_text("Analyze this invoice for me."), + Content.from_data( + SAMPLE_PDF.read_bytes(), + "application/pdf", + additional_properties={"filename": "invoice.pdf"}, + ), + ], + ), + session=session, + ) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + # 5. Turn 2: Check status (analysis likely still in progress) + print("--- Turn 2: Check status ---") + print("User: Is the invoice ready yet?") + response = await agent.run("Is the invoice ready yet?", session=session) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + # 6. Wait for background analysis to complete + print(" (Waiting 30 seconds for CU background analysis to finish...)\n") + await asyncio.sleep(30) + + # 7. Turn 3: Ask again (background task should be resolved now) + # The provider checks the background task, finds it complete, and + # injects the full document content into context. The agent can now + # answer questions about the invoice. + print("--- Turn 3: Ask again (analysis should be complete) ---") + print("User: What is the total amount due on the invoice?") + response = await agent.run( + "What is the total amount due on the invoice?", + session=session, + ) + usage = response.usage_details or {} + print(f" [Input tokens: {usage.get('input_token_count', 'N/A')}]") + print(f"Agent: {response}\n") + + +if __name__ == "__main__": + asyncio.run(main()) + +""" +Sample output: + +--- Turn 1: Upload PDF (max_wait=1s, will defer to background) --- +User: Analyze this invoice for me. + [Input tokens: 319] +Agent: invoice.pdf is still being analyzed. Please ask again in a moment. + +--- Turn 2: Check status --- +User: Is the invoice ready yet? + [Input tokens: 657] +Agent: Not yet -- invoice.pdf is still in analyzing status. + + (Waiting 30 seconds for CU background analysis to finish...) + +--- Turn 3: Ask again (analysis should be complete) --- +User: What is the total amount due on the invoice? + [Input tokens: 1252] +Agent: The amount due on the invoice is $610.00. +""" diff --git a/python/packages/azure-ai-contentunderstanding/samples/01-get-started/06_large_doc_file_search.py b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/06_large_doc_file_search.py new file mode 100644 index 0000000000..777c33bdce --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/01-get-started/06_large_doc_file_search.py @@ -0,0 +1,167 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "agent-framework-azure-ai-contentunderstanding", +# "agent-framework-foundry", +# "azure-identity", +# "openai", +# ] +# /// +# Run with: uv run packages/azure-ai-contentunderstanding/samples/01-get-started/06_large_doc_file_search.py + +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +from pathlib import Path + +from agent_framework import Agent, AgentSession, Content, Message +from agent_framework.foundry import FoundryChatClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv +from openai import AsyncAzureOpenAI + +from agent_framework_azure_ai_contentunderstanding import ( + ContentUnderstandingContextProvider, + FileSearchConfig, +) + +load_dotenv() + +""" +Large Document + file_search RAG — CU extraction + OpenAI vector store + +For large documents (100+ pages) or long audio/video, injecting the full +CU-extracted content into the LLM context is impractical. This sample shows +how to use the built-in file_search integration: CU extracts markdown and +automatically uploads it to an OpenAI vector store for token-efficient RAG. + +When ``FileSearchConfig`` is provided, the provider: + 1. Extracts markdown via CU (handles scanned PDFs, audio, video) + 2. Uploads the extracted markdown to a vector store + 3. Registers a ``file_search`` tool on the agent context + 4. Cleans up the vector store on close + +Architecture: + Large PDF -> CU extracts markdown -> auto-upload to vector store -> file_search + Follow-up -> file_search retrieves top-k chunks -> LLM answers + +NOTE: Requires an async OpenAI client for vector store operations. + +Environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL +""" + +SAMPLE_PDF_PATH = Path(__file__).resolve().parents[1] / "shared" / "sample_assets" / "invoice.pdf" + + +async def main() -> None: + # 1. Set up credentials + credential = AzureCliCredential() + + # 2. Create async OpenAI client for vector store operations + token = credential.get_token("https://cognitiveservices.azure.com/.default").token + openai_client = AsyncAzureOpenAI( + azure_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + api_version="2025-03-01-preview", + azure_ad_token=token, + ) + + # 3. Create LLM client (needed for get_file_search_tool) + client = FoundryChatClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], + credential=credential, + ) + + # 4. Create vector store and file_search tool + vector_store = await openai_client.vector_stores.create( + name="cu_large_doc_demo", + expires_after={"anchor": "last_active_at", "days": 1}, + ) + file_search_tool = client.get_file_search_tool(vector_store_ids=[vector_store.id]) + + # 5. Configure CU provider with file_search integration + # When file_search is set, CU-extracted markdown is automatically uploaded + # to the vector store and the file_search tool is registered on the context. + cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=credential, + analyzer_id="prebuilt-documentSearch", + max_wait=None, # wait until CU analysis + vector store upload finishes + file_search=FileSearchConfig.from_foundry( + openai_client, + vector_store_id=vector_store.id, + file_search_tool=file_search_tool, + ), + ) + + pdf_bytes = SAMPLE_PDF_PATH.read_bytes() + + # The provider handles everything: CU extraction + vector store upload + file_search tool + async with cu: + agent = Agent( + client=client, + name="LargeDocAgent", + instructions=( + "You are a document analyst. Use the file_search tool to find " + "relevant sections from the document and answer precisely. " + "Cite specific sections when answering." + ), + context_providers=[cu], + ) + + session = AgentSession() + + # Turn 1: Upload — CU extracts and uploads to vector store automatically + print("--- Turn 1: Upload document ---") + response = await agent.run( + Message( + role="user", + contents=[ + Content.from_text("What are the key points in this document?"), + Content.from_data( + pdf_bytes, + "application/pdf", + additional_properties={"filename": SAMPLE_PDF_PATH.name}, + ), + ], + ), + session=session, + ) + print(f"Agent: {response}\n") + + # Turn 2: Follow-up — file_search retrieves relevant chunks (token efficient) + print("--- Turn 2: Follow-up (RAG) ---") + response = await agent.run( + "What numbers or financial metrics are mentioned?", + session=session, + ) + print(f"Agent: {response}\n") + + # Explicitly delete the vector store created for this sample + await openai_client.beta.vector_stores.delete(vector_store.id) + await openai_client.close() + print("Done. Vector store deleted and client closed.") + + +if __name__ == "__main__": + asyncio.run(main()) + +""" +Sample output: + +--- Turn 1: Upload document --- +Agent: An invoice from Contoso Ltd. to Microsoft Corporation (INV-100). + Line items: Consulting Services $60, Document Fee $30, Printing Fee $10. + Subtotal $100, Sales tax $10, Total $110, Previous balance $500, Amount due $610. + +--- Turn 2: Follow-up (RAG) --- +Agent: Subtotal $100.00, Sales tax $10.00, Total $110.00, + Previous unpaid balance $500.00, Amount due $610.00. + Line items: 2 hours @ $30 = $60, 3 @ $10 = $30, 10 pages @ $1 = $10. + +Done. Vector store cleaned up automatically. +""" diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/README.md b/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/README.md new file mode 100644 index 0000000000..245f230fe8 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/README.md @@ -0,0 +1,33 @@ +# DevUI Multi-Modal Agent + +Interactive web UI for uploading and chatting with documents, images, audio, and video using Azure Content Understanding. + +## Setup + +1. Set environment variables (or create a `.env` file in `python/`): + ```bash + AZURE_AI_PROJECT_ENDPOINT=https://your-project.api.azureml.ms + AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME=gpt-4.1 + AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/ + ``` + +2. Log in with Azure CLI: + ```bash + az login + ``` + +3. Run with DevUI: + ```bash + uv run poe devui --agent packages/azure-ai-contentunderstanding/samples/devui_multimodal_agent + ``` + +4. Open the DevUI URL in your browser and start uploading files. + +## What You Can Do + +- **Upload PDFs** — including scanned/image-based PDFs that LLM vision struggles with +- **Upload images** — handwritten notes, infographics, charts +- **Upload audio** — meeting recordings, call center calls (transcription with speaker ID) +- **Upload video** — product demos, training videos (frame extraction + transcription) +- **Ask questions** across all uploaded documents +- **Check status** — "which documents are ready?" uses the auto-registered `list_documents()` tool diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/__init__.py b/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/__init__.py new file mode 100644 index 0000000000..3ca9ea7e09 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/__init__.py @@ -0,0 +1,6 @@ +# Copyright (c) Microsoft. All rights reserved. +"""DevUI Multi-Modal Agent with Azure Content Understanding.""" + +from .agent import agent + +__all__ = ["agent"] diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/agent.py b/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/agent.py new file mode 100644 index 0000000000..82b2b500ab --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/agent.py @@ -0,0 +1,71 @@ +# Copyright (c) Microsoft. All rights reserved. +"""DevUI Multi-Modal Agent — file upload + CU-powered analysis. + +This agent uses Azure Content Understanding to analyze uploaded files +(PDFs, scanned documents, handwritten images, audio recordings, video) +and answer questions about them through the DevUI web interface. + +Unlike the standard azure_responses_agent which sends files directly to the LLM, +this agent uses CU for structured extraction — superior for scanned PDFs, +handwritten content, audio transcription, and video analysis. + +Required environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL + +Run with DevUI: + uv run poe devui --agent packages/azure-ai-contentunderstanding/samples/devui_multimodal_agent +""" + +import os + +from agent_framework.azure import AzureOpenAIResponsesClient +from azure.core.credentials import AzureKeyCredential +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + +load_dotenv() + +# --- Auth --- +# AzureCliCredential works for both Azure OpenAI and CU. +# API keys can be set separately if the services are on different resources. +_credential = AzureCliCredential() +_openai_api_key = os.environ.get("AZURE_OPENAI_API_KEY") +_cu_api_key = os.environ.get("AZURE_CONTENTUNDERSTANDING_API_KEY") +_cu_credential = AzureKeyCredential(_cu_api_key) if _cu_api_key else _credential + +cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=_cu_credential, + max_wait=5.0, +) + +if _openai_api_key: + client = AzureOpenAIResponsesClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + deployment_name=os.environ["AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME"], + api_key=_openai_api_key, + ) +else: + client = AzureOpenAIResponsesClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + deployment_name=os.environ["AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME"], + credential=_credential, + ) + +agent = client.as_agent( + name="MultiModalDocAgent", + instructions=( + "You are a helpful document analysis assistant. " + "When a user uploads files, they are automatically analyzed using Azure Content Understanding. " + "Use list_documents() to check which documents are ready, pending, or failed " + "and to see which files are available for answering questions. " + "Tell the user if any documents are still being analyzed. " + "You can process PDFs, scanned documents, handwritten images, audio recordings, and video files. " + "When answering, cite specific content from the documents." + ), + context_providers=[cu], +) diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/README.md b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/README.md new file mode 100644 index 0000000000..3386cb2e83 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/README.md @@ -0,0 +1,51 @@ +# DevUI File Search Agent + +Interactive web UI for uploading and chatting with documents, images, audio, and video using Azure Content Understanding + OpenAI file_search RAG. + +## How It Works + +1. **Upload** any supported file (PDF, image, audio, video) via the DevUI chat +2. **CU analyzes** the file — auto-selects the right analyzer per media type +3. **Markdown extracted** by CU is uploaded to an OpenAI vector store +4. **file_search** tool is registered — LLM retrieves top-k relevant chunks +5. **Ask questions** across all uploaded documents with token-efficient RAG + +## Setup + +1. Set environment variables (or create a `.env` file in `python/`): + ```bash + AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com/ + AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME=gpt-4.1 + AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.services.ai.azure.com/ + ``` + +2. Log in with Azure CLI: + ```bash + az login + ``` + +3. Run with DevUI: + ```bash + devui packages/azure-ai-contentunderstanding/samples/devui_azure_openai_file_search_agent + ``` + +4. Open the DevUI URL in your browser and start uploading files. + +## Supported File Types + +| Type | Formats | CU Analyzer (auto-detected) | +|------|---------|----------------------------| +| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown | `prebuilt-documentSearch` | +| Images | JPEG, PNG, TIFF, BMP | `prebuilt-documentSearch` | +| Audio | WAV, MP3, FLAC, OGG, M4A | `prebuilt-audioSearch` | +| Video | MP4, MOV, AVI, WebM | `prebuilt-videoSearch` | + +## vs. devui_multimodal_agent + +| Feature | multimodal_agent | file_search_agent | +|---------|-----------------|-------------------| +| CU extraction | ✅ Full content injected | ✅ Content indexed in vector store | +| RAG | ❌ | ✅ file_search retrieves top-k chunks | +| Large docs (100+ pages) | ⚠️ May exceed context window | ✅ Token-efficient | +| Multiple large files | ⚠️ Context overflow risk | ✅ All indexed, searchable | +| Best for | Small docs, quick inspection | Large docs, multi-file Q&A | diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/__init__.py b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/__init__.py new file mode 100644 index 0000000000..92f4181db4 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/__init__.py @@ -0,0 +1,6 @@ +# Copyright (c) Microsoft. All rights reserved. +"""DevUI Multi-Modal Agent with CU + file_search RAG.""" + +from .agent import agent + +__all__ = ["agent"] diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/agent.py b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/agent.py new file mode 100644 index 0000000000..994dc3d2db --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/agent.py @@ -0,0 +1,114 @@ +# Copyright (c) Microsoft. All rights reserved. +"""DevUI Multi-Modal Agent — CU extraction + file_search RAG. + +This agent combines Azure Content Understanding with OpenAI file_search +for token-efficient RAG over large or multi-modal documents. + +Upload flow: + 1. CU extracts high-quality markdown (handles scanned PDFs, audio, video) + 2. Extracted markdown is auto-uploaded to an OpenAI vector store + 3. file_search tool is registered so the LLM retrieves top-k chunks + 4. Vector store is configured to auto-expire after inactivity + +This is ideal for large documents (100+ pages), long audio recordings, +or multiple files in the same conversation where full-context injection +would exceed the LLM's context window. + +Analyzer auto-detection: + When no analyzer_id is specified, the provider auto-selects the + appropriate CU analyzer based on media type: + - Documents/images → prebuilt-documentSearch + - Audio → prebuilt-audioSearch + - Video → prebuilt-videoSearch + +Required environment variables: + AZURE_AI_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL + +Run with DevUI: + devui packages/azure-ai-contentunderstanding/samples/devui_azure_openai_file_search_agent +""" + +import os + +from agent_framework.azure import AzureOpenAIResponsesClient +from azure.ai.projects import AIProjectClient +from azure.core.credentials import AzureKeyCredential +from azure.identity import AzureCliCredential +from dotenv import load_dotenv + +from agent_framework_azure_ai_contentunderstanding import ( + ContentUnderstandingContextProvider, + FileSearchConfig, +) + +load_dotenv() + +# --- Auth --- +# AzureCliCredential works for both Azure OpenAI and CU. +# API keys can be set separately if the services are on different resources. +_credential = AzureCliCredential() +_openai_api_key = os.environ.get("AZURE_OPENAI_API_KEY") +_cu_api_key = os.environ.get("AZURE_CONTENTUNDERSTANDING_API_KEY") +_cu_credential = AzureKeyCredential(_cu_api_key) if _cu_api_key else _credential + +_endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"] + +# --- LLM client + sync vector store setup --- +# DevUI loads agent modules synchronously at startup while an event loop is already +# running, so we cannot use async APIs here. A sync AIProjectClient is used for +# one-time vector store creation; runtime file uploads use client.client (async). +if _openai_api_key: + client = AzureOpenAIResponsesClient( + project_endpoint=_endpoint, + deployment_name=os.environ["AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME"], + api_key=_openai_api_key, + ) +else: + client = AzureOpenAIResponsesClient( + project_endpoint=_endpoint, + deployment_name=os.environ["AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME"], + credential=_credential, + ) + +_sync_project = AIProjectClient(endpoint=_endpoint, credential=_credential) # type: ignore[arg-type] +_sync_openai = _sync_project.get_openai_client() +_vector_store = _sync_openai.vector_stores.create( + name="devui_cu_file_search", + expires_after={"anchor": "last_active_at", "days": 1}, +) +_sync_openai.close() + +_file_search_tool = client.get_file_search_tool( + vector_store_ids=[_vector_store.id], + max_num_results=3, # limit chunks to reduce input token usage +) + +# --- CU context provider with file_search --- +# client.client is the async OpenAI client used for runtime file uploads. +# No analyzer_id → auto-selects per media type (documents, audio, video) +cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=_cu_credential, + file_search=FileSearchConfig.from_foundry( + client.client, # reuse the LLM client's internal AsyncAzureOpenAI for file uploads + vector_store_id=_vector_store.id, + file_search_tool=_file_search_tool, + ), +) + +agent = client.as_agent( + name="FileSearchDocAgent", + instructions=( + "You are a helpful document analysis assistant with RAG capabilities. " + "When a user uploads files, they are automatically analyzed using Azure Content Understanding " + "and indexed in a vector store for efficient retrieval. " + "Analysis takes time (seconds for documents, longer for audio/video) — if a document " + "is still pending, let the user know and suggest they ask again shortly. " + "You can process PDFs, scanned documents, handwritten images, audio recordings, and video files. " + "Multiple files can be uploaded and queried in the same conversation. " + "When answering, cite specific content from the documents." + ), + context_providers=[cu], +) diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/README.md b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/README.md new file mode 100644 index 0000000000..db597ff97b --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/README.md @@ -0,0 +1,34 @@ +# DevUI Foundry File Search Agent + +Interactive web UI for uploading and chatting with documents, images, audio, and video using Azure Content Understanding + Foundry file_search RAG. + +This is the **Foundry** variant. For the Azure OpenAI Responses API variant, see `devui_azure_openai_file_search_agent`. + +## How It Works + +1. **Upload** any supported file (PDF, image, audio, video) via the DevUI chat +2. **CU analyzes** the file — auto-selects the right analyzer per media type +3. **Markdown extracted** by CU is uploaded to a Foundry vector store +4. **file_search** tool is registered — LLM retrieves top-k relevant chunks +5. **Ask questions** across all uploaded documents with token-efficient RAG + +## Setup + +1. Set environment variables (or create a `.env` file in `python/`): + ```bash + FOUNDRY_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com/ + FOUNDRY_MODEL=gpt-4.1 + AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.services.ai.azure.com/ + ``` + +2. Log in with Azure CLI: + ```bash + az login + ``` + +3. Run with DevUI: + ```bash + devui packages/azure-ai-contentunderstanding/samples/devui_foundry_file_search_agent + ``` + +4. Open the DevUI URL in your browser and start uploading files. diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/__init__.py b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/__init__.py new file mode 100644 index 0000000000..2a50eae894 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/__init__.py @@ -0,0 +1 @@ +# Copyright (c) Microsoft. All rights reserved. diff --git a/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/agent.py b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/agent.py new file mode 100644 index 0000000000..8e03598164 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/agent.py @@ -0,0 +1,104 @@ +# Copyright (c) Microsoft. All rights reserved. +"""DevUI Multi-Modal Agent — CU extraction + file_search RAG via Azure AI Foundry. + +This agent combines Azure Content Understanding with Foundry's file_search +for token-efficient RAG over large or multi-modal documents. + +Upload flow: + 1. CU extracts high-quality markdown (handles scanned PDFs, audio, video) + 2. Extracted markdown is uploaded to a Foundry vector store + 3. file_search tool is registered so the LLM retrieves top-k chunks + 4. Uploaded files are cleaned up on server shutdown + +This sample uses ``FoundryChatClient`` and ``FoundryFileSearchBackend``. +For the OpenAI Responses API variant, see ``devui_azure_openai_file_search_agent``. + +Analyzer auto-detection: + When no analyzer_id is specified, the provider auto-selects the + appropriate CU analyzer based on media type: + - Documents/images → prebuilt-documentSearch + - Audio → prebuilt-audioSearch + - Video → prebuilt-videoSearch + +Required environment variables: + FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint + FOUNDRY_MODEL — Model deployment name (e.g. gpt-4.1) + AZURE_CONTENTUNDERSTANDING_ENDPOINT — CU endpoint URL + +Run with DevUI: + devui packages/azure-ai-contentunderstanding/samples/devui_foundry_file_search_agent +""" + +import os + +from agent_framework.foundry import FoundryChatClient +from azure.core.credentials import AzureKeyCredential +from azure.identity import AzureCliCredential +from dotenv import load_dotenv +from openai import AzureOpenAI + +from agent_framework_azure_ai_contentunderstanding import ( + ContentUnderstandingContextProvider, + FileSearchConfig, +) + +load_dotenv() + +# --- Auth --- +# AzureCliCredential for Foundry. CU API key optional if on a different resource. +_credential = AzureCliCredential() +_cu_api_key = os.environ.get("AZURE_CONTENTUNDERSTANDING_API_KEY") +_cu_credential = AzureKeyCredential(_cu_api_key) if _cu_api_key else _credential + +# --- Foundry LLM client --- +client = FoundryChatClient( + project_endpoint=os.environ.get("FOUNDRY_PROJECT_ENDPOINT", ""), + model=os.environ.get("FOUNDRY_MODEL", ""), + credential=_credential, +) + +# --- Create vector store (sync client to avoid event loop conflicts in DevUI) --- +_token = _credential.get_token("https://ai.azure.com/.default").token +_sync_openai = AzureOpenAI( + azure_endpoint=os.environ.get("FOUNDRY_PROJECT_ENDPOINT", ""), + azure_ad_token=_token, + api_version="2025-04-01-preview", +) +_vector_store = _sync_openai.vector_stores.create( + name="devui_cu_foundry_file_search", + expires_after={"anchor": "last_active_at", "days": 1}, +) +_sync_openai.close() + +_file_search_tool = client.get_file_search_tool( + vector_store_ids=[_vector_store.id], + max_num_results=3, # limit chunks to reduce input token usage +) + +# --- CU context provider with file_search --- +# No analyzer_id → auto-selects per media type (documents, audio, video) +cu = ContentUnderstandingContextProvider( + endpoint=os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"], + credential=_cu_credential, + max_wait=10.0, + file_search=FileSearchConfig.from_foundry( + client.client, + vector_store_id=_vector_store.id, + file_search_tool=_file_search_tool, + ), +) + +agent = client.as_agent( + name="FoundryFileSearchDocAgent", + instructions=( + "You are a helpful document analysis assistant with RAG capabilities. " + "When a user uploads files, they are automatically analyzed using Azure Content Understanding " + "and indexed in a vector store for efficient retrieval. " + "Analysis takes time (seconds for documents, longer for audio/video) — if a document " + "is still pending, let the user know and suggest they ask again shortly. " + "You can process PDFs, scanned documents, handwritten images, audio recordings, and video files. " + "Multiple files can be uploaded and queried in the same conversation. " + "When answering, cite specific content from the documents." + ), + context_providers=[cu], +) diff --git a/python/packages/azure-ai-contentunderstanding/samples/README.md b/python/packages/azure-ai-contentunderstanding/samples/README.md new file mode 100644 index 0000000000..914ebd78da --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/samples/README.md @@ -0,0 +1,40 @@ +# Azure Content Understanding Samples + +These samples demonstrate how to use the `agent-framework-azure-ai-contentunderstanding` package to add document, image, audio, and video understanding to your agents. + +## Prerequisites + +1. Azure CLI logged in: `az login` +2. Environment variables set (or `.env` file in the `python/` directory): + ``` + AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com + AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1 + AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/ + ``` + +## Samples + +### 01-get-started — Script samples (easy → advanced) + +| # | Sample | Description | Run | +|---|--------|-------------|-----| +| 01 | [Document Q&A](01-get-started/01_document_qa.py) | Upload a PDF, ask questions with CU-powered extraction | `uv run samples/01-get-started/01_document_qa.py` | +| 02 | [Multi-Turn Session](01-get-started/02_multi_turn_session.py) | AgentSession persistence across turns | `uv run samples/01-get-started/02_multi_turn_session.py` | +| 03 | [Multi-Modal Chat](01-get-started/03_multimodal_chat.py) | PDF + audio + video parallel analysis | `uv run samples/01-get-started/03_multimodal_chat.py` | +| 04 | [Invoice Processing](01-get-started/04_invoice_processing.py) | Structured field extraction with prebuilt-invoice | `uv run samples/01-get-started/04_invoice_processing.py` | +| 05 | [Background Analysis](01-get-started/05_background_analysis.py) | Non-blocking analysis with status tracking | `uv run samples/01-get-started/05_background_analysis.py` | +| 06 | [Large Doc + file_search](01-get-started/06_large_doc_file_search.py) | CU extraction + OpenAI vector store RAG | `uv run samples/01-get-started/06_large_doc_file_search.py` | + +### 02-devui — Interactive web UI samples + +| # | Sample | Description | Run | +|---|--------|-------------|-----| +| 01 | [Multi-Modal Agent](02-devui/01-multimodal_agent/) | Web UI for file upload + CU-powered chat | `devui samples/02-devui/01-multimodal_agent` | +| 02a | [file_search (Azure OpenAI backend)](02-devui/02-file_search_agent/azure_openai_backend/) | DevUI with CU + Azure OpenAI vector store | `devui samples/02-devui/02-file_search_agent/azure_openai_backend` | +| 02b | [file_search (Foundry backend)](02-devui/02-file_search_agent/foundry_backend/) | DevUI with CU + Foundry vector store | `devui samples/02-devui/02-file_search_agent/foundry_backend` | + +## Install (preview) + +```bash +pip install --pre agent-framework-azure-ai-contentunderstanding +``` diff --git a/python/packages/azure-ai-contentunderstanding/samples/shared/sample_assets/invoice.pdf b/python/packages/azure-ai-contentunderstanding/samples/shared/sample_assets/invoice.pdf new file mode 100644 index 0000000000..812bcd9b30 Binary files /dev/null and b/python/packages/azure-ai-contentunderstanding/samples/shared/sample_assets/invoice.pdf differ diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/conftest.py b/python/packages/azure-ai-contentunderstanding/tests/cu/conftest.py new file mode 100644 index 0000000000..b429be6a05 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/conftest.py @@ -0,0 +1,102 @@ +# Copyright (c) Microsoft. All rights reserved. + +from __future__ import annotations + +import asyncio +import json +from pathlib import Path +from typing import Any +from unittest.mock import AsyncMock + +import pytest +from azure.ai.contentunderstanding.models import AnalysisResult + +FIXTURES_DIR = Path(__file__).parent / "fixtures" + + +def _load_fixture(name: str) -> dict[str, Any]: + return json.loads((FIXTURES_DIR / name).read_text()) # type: ignore[no-any-return] + + +@pytest.fixture +def pdf_fixture_raw() -> dict[str, Any]: + return _load_fixture("analyze_pdf_result.json") + + +@pytest.fixture +def pdf_analysis_result(pdf_fixture_raw: dict[str, Any]) -> AnalysisResult: + return AnalysisResult(pdf_fixture_raw) + + +@pytest.fixture +def audio_fixture_raw() -> dict[str, Any]: + return _load_fixture("analyze_audio_result.json") + + +@pytest.fixture +def audio_analysis_result(audio_fixture_raw: dict[str, Any]) -> AnalysisResult: + return AnalysisResult(audio_fixture_raw) + + +@pytest.fixture +def invoice_fixture_raw() -> dict[str, Any]: + return _load_fixture("analyze_invoice_result.json") + + +@pytest.fixture +def invoice_analysis_result(invoice_fixture_raw: dict[str, Any]) -> AnalysisResult: + return AnalysisResult(invoice_fixture_raw) + + +@pytest.fixture +def video_fixture_raw() -> dict[str, Any]: + return _load_fixture("analyze_video_result.json") + + +@pytest.fixture +def video_analysis_result(video_fixture_raw: dict[str, Any]) -> AnalysisResult: + return AnalysisResult(video_fixture_raw) + + +@pytest.fixture +def image_fixture_raw() -> dict[str, Any]: + return _load_fixture("analyze_image_result.json") + + +@pytest.fixture +def image_analysis_result(image_fixture_raw: dict[str, Any]) -> AnalysisResult: + return AnalysisResult(image_fixture_raw) + + +@pytest.fixture +def mock_cu_client() -> AsyncMock: + """Create a mock ContentUnderstandingClient.""" + client = AsyncMock() + client.close = AsyncMock() + return client + + +def make_mock_poller(result: AnalysisResult) -> AsyncMock: + """Create a mock poller that returns the given result immediately.""" + poller = AsyncMock() + poller.result = AsyncMock(return_value=result) + return poller + + +def make_slow_poller(result: AnalysisResult, delay: float = 10.0) -> AsyncMock: + """Create a mock poller that simulates a timeout then eventually returns.""" + poller = AsyncMock() + + async def slow_result() -> AnalysisResult: + await asyncio.sleep(delay) + return result + + poller.result = slow_result + return poller + + +def make_failing_poller(error: Exception) -> AsyncMock: + """Create a mock poller that raises an exception.""" + poller = AsyncMock() + poller.result = AsyncMock(side_effect=error) + return poller diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_audio_result.json b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_audio_result.json new file mode 100644 index 0000000000..86227f3a45 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_audio_result.json @@ -0,0 +1,13 @@ +{ + "id": "synthetic-audio-001", + "status": "Succeeded", + "analyzer_id": "prebuilt-audioSearch", + "api_version": "2025-05-01-preview", + "created_at": "2026-03-21T10:05:00Z", + "contents": [ + { + "markdown": "## Call Center Recording\n\n**Duration:** 2 minutes 15 seconds\n**Speakers:** 2\n\n### Transcript\n\n**Speaker 1 (Agent):** Thank you for calling Contoso support. My name is Sarah. How can I help you today?\n\n**Speaker 2 (Customer):** Hi Sarah, I'm calling about my recent order number ORD-5678. It was supposed to arrive yesterday but I haven't received it.\n\n**Speaker 1 (Agent):** I'm sorry to hear that. Let me look up your order. Can you confirm your name and email address?\n\n**Speaker 2 (Customer):** Sure, it's John Smith, john.smith@example.com.\n\n**Speaker 1 (Agent):** Thank you, John. I can see your order was shipped on March 18th. It looks like there was a delay with the carrier. The updated delivery estimate is March 22nd.\n\n**Speaker 2 (Customer):** That's helpful, thank you. Is there anything I can do to track it?\n\n**Speaker 1 (Agent):** Yes, I'll send you a tracking link to your email right away. Is there anything else I can help with?\n\n**Speaker 2 (Customer):** No, that's all. Thanks for your help.\n\n**Speaker 1 (Agent):** You're welcome! Have a great day.", + "fields": {} + } + ] +} diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_image_result.json b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_image_result.json new file mode 100644 index 0000000000..0e86ef4354 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_image_result.json @@ -0,0 +1,857 @@ +{ + "analyzerId": "prebuilt-documentSearch", + "apiVersion": "2025-11-01", + "createdAt": "2026-03-21T22:44:21Z", + "stringEncoding": "codePoint", + "warnings": [], + "contents": [ + { + "path": "input1", + "markdown": "# Contoso Q1 2025 Financial Summary\n\nTotal revenue for Q1 2025 was $42.7 million, an increase of 18% over Q1 2024.\nOperating expenses were $31.2 million. Net profit was $11.5 million. The largest\nrevenue segment was Cloud Services at $19.3 million, followed by Professional\nServices at $14.8 million and Product Licensing at $8.6 million. Headcount at end of\nQ1 was 1,247 employees across 8 offices worldwide.\n", + "fields": { + "Summary": { + "type": "string", + "valueString": "The document provides a financial summary for Contoso in Q1 2025, reporting total revenue of $42.7 million, an 18% increase from Q1 2024. Operating expenses were $31.2 million, resulting in a net profit of $11.5 million. The largest revenue segment was Cloud Services with $19.3 million, followed by Professional Services at $14.8 million and Product Licensing at $8.6 million. The company had 1,247 employees across 8 offices worldwide at the end of Q1.", + "spans": [ + { + "offset": 37, + "length": 77 + }, + { + "offset": 115, + "length": 80 + }, + { + "offset": 196, + "length": 77 + }, + { + "offset": 274, + "length": 84 + }, + { + "offset": 359, + "length": 50 + } + ], + "confidence": 0.592, + "source": "D(1,212.0000,334.0000,1394.0000,334.0000,1394.0000,374.0000,212.0000,374.0000);D(1,213.0000,379.0000,1398.0000,379.0000,1398.0000,422.0000,213.0000,422.0000);D(1,212.0000,423.0000,1389.0000,423.0000,1389.0000,464.0000,212.0000,464.0000);D(1,213.0000,468.0000,1453.0000,468.0000,1453.0000,510.0000,213.0000,510.0000);D(1,213.0000,512.0000,1000.0000,512.0000,1000.0000,554.0000,213.0000,554.0000)" + } + }, + "kind": "document", + "startPageNumber": 1, + "endPageNumber": 1, + "unit": "pixel", + "pages": [ + { + "pageNumber": 1, + "angle": -0.0242, + "width": 1700, + "height": 2200, + "spans": [ + { + "offset": 0, + "length": 410 + } + ], + "words": [ + { + "content": "Contoso", + "span": { + "offset": 2, + "length": 7 + }, + "confidence": 0.99, + "source": "D(1,214,222,401,222,401,274,214,273)" + }, + { + "content": "Q1", + "span": { + "offset": 10, + "length": 2 + }, + "confidence": 0.957, + "source": "D(1,414,222,473,222,473,275,414,274)" + }, + { + "content": "2025", + "span": { + "offset": 13, + "length": 4 + }, + "confidence": 0.929, + "source": "D(1,494,222,607,222,607,276,494,275)" + }, + { + "content": "Financial", + "span": { + "offset": 18, + "length": 9 + }, + "confidence": 0.975, + "source": "D(1,624,222,819,223,819,277,624,276)" + }, + { + "content": "Summary", + "span": { + "offset": 28, + "length": 7 + }, + "confidence": 0.991, + "source": "D(1,836,223,1050,225,1050,279,836,277)" + }, + { + "content": "Total", + "span": { + "offset": 37, + "length": 5 + }, + "confidence": 0.996, + "source": "D(1,212,335,287,334,288,374,212,373)" + }, + { + "content": "revenue", + "span": { + "offset": 43, + "length": 7 + }, + "confidence": 0.994, + "source": "D(1,299,334,417,334,418,374,299,374)" + }, + { + "content": "for", + "span": { + "offset": 51, + "length": 3 + }, + "confidence": 0.994, + "source": "D(1,427,334,467,334,467,374,427,374)" + }, + { + "content": "Q1", + "span": { + "offset": 55, + "length": 2 + }, + "confidence": 0.944, + "source": "D(1,475,334,515,334,515,374,475,374)" + }, + { + "content": "2025", + "span": { + "offset": 58, + "length": 4 + }, + "confidence": 0.876, + "source": "D(1,528,334,604,334,604,374,529,374)" + }, + { + "content": "was", + "span": { + "offset": 63, + "length": 3 + }, + "confidence": 0.991, + "source": "D(1,613,334,672,334,672,374,613,374)" + }, + { + "content": "$", + "span": { + "offset": 67, + "length": 1 + }, + "confidence": 0.999, + "source": "D(1,681,334,698,334,698,374,681,374)" + }, + { + "content": "42.7", + "span": { + "offset": 68, + "length": 4 + }, + "confidence": 0.946, + "source": "D(1,700,334,765,334,765,374,700,374)" + }, + { + "content": "million", + "span": { + "offset": 73, + "length": 7 + }, + "confidence": 0.977, + "source": "D(1,775,334,867,334,867,374,776,374)" + }, + { + "content": ",", + "span": { + "offset": 80, + "length": 1 + }, + "confidence": 0.998, + "source": "D(1,870,334,877,334,877,374,870,374)" + }, + { + "content": "an", + "span": { + "offset": 82, + "length": 2 + }, + "confidence": 0.998, + "source": "D(1,888,334,922,334,922,374,888,374)" + }, + { + "content": "increase", + "span": { + "offset": 85, + "length": 8 + }, + "confidence": 0.991, + "source": "D(1,934,334,1058,335,1059,374,934,374)" + }, + { + "content": "of", + "span": { + "offset": 94, + "length": 2 + }, + "confidence": 0.982, + "source": "D(1,1069,335,1098,335,1098,374,1069,374)" + }, + { + "content": "18", + "span": { + "offset": 97, + "length": 2 + }, + "confidence": 0.963, + "source": "D(1,1108,335,1142,335,1142,374,1108,374)" + }, + { + "content": "%", + "span": { + "offset": 99, + "length": 1 + }, + "confidence": 0.998, + "source": "D(1,1143,335,1171,335,1171,374,1143,374)" + }, + { + "content": "over", + "span": { + "offset": 101, + "length": 4 + }, + "confidence": 0.946, + "source": "D(1,1181,335,1248,335,1248,374,1181,374)" + }, + { + "content": "Q1", + "span": { + "offset": 106, + "length": 2 + }, + "confidence": 0.875, + "source": "D(1,1256,335,1295,335,1295,374,1256,374)" + }, + { + "content": "2024", + "span": { + "offset": 109, + "length": 4 + }, + "confidence": 0.683, + "source": "D(1,1310,335,1384,335,1384,374,1310,374)" + }, + { + "content": ".", + "span": { + "offset": 113, + "length": 1 + }, + "confidence": 0.991, + "source": "D(1,1385,335,1394,335,1394,374,1385,374)" + }, + { + "content": "Operating", + "span": { + "offset": 115, + "length": 9 + }, + "confidence": 0.996, + "source": "D(1,213,380,358,380,358,422,213,422)" + }, + { + "content": "expenses", + "span": { + "offset": 125, + "length": 8 + }, + "confidence": 0.997, + "source": "D(1,369,380,513,379,513,421,369,421)" + }, + { + "content": "were", + "span": { + "offset": 134, + "length": 4 + }, + "confidence": 0.998, + "source": "D(1,521,379,595,379,595,421,521,421)" + }, + { + "content": "$", + "span": { + "offset": 139, + "length": 1 + }, + "confidence": 0.999, + "source": "D(1,603,379,620,379,620,421,603,421)" + }, + { + "content": "31.2", + "span": { + "offset": 140, + "length": 4 + }, + "confidence": 0.938, + "source": "D(1,623,379,686,379,686,421,623,421)" + }, + { + "content": "million", + "span": { + "offset": 145, + "length": 7 + }, + "confidence": 0.913, + "source": "D(1,696,379,790,379,790,421,696,421)" + }, + { + "content": ".", + "span": { + "offset": 152, + "length": 1 + }, + "confidence": 0.975, + "source": "D(1,793,379,800,379,800,421,793,421)" + }, + { + "content": "Net", + "span": { + "offset": 154, + "length": 3 + }, + "confidence": 0.976, + "source": "D(1,811,379,862,379,862,420,811,421)" + }, + { + "content": "profit", + "span": { + "offset": 158, + "length": 6 + }, + "confidence": 0.993, + "source": "D(1,871,379,947,379,947,420,871,420)" + }, + { + "content": "was", + "span": { + "offset": 165, + "length": 3 + }, + "confidence": 0.997, + "source": "D(1,954,379,1012,379,1012,420,953,420)" + }, + { + "content": "$", + "span": { + "offset": 169, + "length": 1 + }, + "confidence": 0.998, + "source": "D(1,1021,379,1039,379,1039,420,1021,420)" + }, + { + "content": "11.5", + "span": { + "offset": 170, + "length": 4 + }, + "confidence": 0.954, + "source": "D(1,1043,379,1106,379,1106,421,1043,420)" + }, + { + "content": "million", + "span": { + "offset": 175, + "length": 7 + }, + "confidence": 0.837, + "source": "D(1,1118,379,1208,379,1208,421,1118,421)" + }, + { + "content": ".", + "span": { + "offset": 182, + "length": 1 + }, + "confidence": 0.978, + "source": "D(1,1210,379,1217,379,1217,421,1210,421)" + }, + { + "content": "The", + "span": { + "offset": 184, + "length": 3 + }, + "confidence": 0.949, + "source": "D(1,1228,379,1285,379,1285,421,1228,421)" + }, + { + "content": "largest", + "span": { + "offset": 188, + "length": 7 + }, + "confidence": 0.978, + "source": "D(1,1295,379,1398,379,1398,421,1295,421)" + }, + { + "content": "revenue", + "span": { + "offset": 196, + "length": 7 + }, + "confidence": 0.995, + "source": "D(1,212,425,334,425,334,464,212,464)" + }, + { + "content": "segment", + "span": { + "offset": 204, + "length": 7 + }, + "confidence": 0.996, + "source": "D(1,344,425,472,424,472,464,344,464)" + }, + { + "content": "was", + "span": { + "offset": 212, + "length": 3 + }, + "confidence": 0.998, + "source": "D(1,480,424,541,424,541,464,480,464)" + }, + { + "content": "Cloud", + "span": { + "offset": 216, + "length": 5 + }, + "confidence": 0.997, + "source": "D(1,550,424,636,424,637,464,551,464)" + }, + { + "content": "Services", + "span": { + "offset": 222, + "length": 8 + }, + "confidence": 0.995, + "source": "D(1,647,424,774,424,774,464,647,464)" + }, + { + "content": "at", + "span": { + "offset": 231, + "length": 2 + }, + "confidence": 0.996, + "source": "D(1,784,424,812,424,812,464,784,464)" + }, + { + "content": "$", + "span": { + "offset": 234, + "length": 1 + }, + "confidence": 0.998, + "source": "D(1,820,424,837,424,837,464,820,464)" + }, + { + "content": "19.3", + "span": { + "offset": 235, + "length": 4 + }, + "confidence": 0.879, + "source": "D(1,840,424,903,423,903,463,840,464)" + }, + { + "content": "million", + "span": { + "offset": 240, + "length": 7 + }, + "confidence": 0.876, + "source": "D(1,915,423,1006,423,1006,463,915,463)" + }, + { + "content": ",", + "span": { + "offset": 247, + "length": 1 + }, + "confidence": 0.999, + "source": "D(1,1008,423,1015,423,1015,463,1008,463)" + }, + { + "content": "followed", + "span": { + "offset": 249, + "length": 8 + }, + "confidence": 0.978, + "source": "D(1,1026,423,1148,424,1148,463,1026,463)" + }, + { + "content": "by", + "span": { + "offset": 258, + "length": 2 + }, + "confidence": 0.986, + "source": "D(1,1160,424,1194,424,1194,463,1160,463)" + }, + { + "content": "Professional", + "span": { + "offset": 261, + "length": 12 + }, + "confidence": 0.965, + "source": "D(1,1204,424,1389,424,1389,463,1204,463)" + }, + { + "content": "Services", + "span": { + "offset": 274, + "length": 8 + }, + "confidence": 0.991, + "source": "D(1,213,469,341,469,341,510,213,510)" + }, + { + "content": "at", + "span": { + "offset": 283, + "length": 2 + }, + "confidence": 0.997, + "source": "D(1,352,469,380,469,380,510,352,510)" + }, + { + "content": "$", + "span": { + "offset": 286, + "length": 1 + }, + "confidence": 0.998, + "source": "D(1,388,469,405,469,405,510,388,510)" + }, + { + "content": "14.8", + "span": { + "offset": 287, + "length": 4 + }, + "confidence": 0.973, + "source": "D(1,410,469,472,469,472,510,410,510)" + }, + { + "content": "million", + "span": { + "offset": 292, + "length": 7 + }, + "confidence": 0.987, + "source": "D(1,483,469,575,469,575,510,483,510)" + }, + { + "content": "and", + "span": { + "offset": 300, + "length": 3 + }, + "confidence": 0.999, + "source": "D(1,585,469,638,469,638,510,585,510)" + }, + { + "content": "Product", + "span": { + "offset": 304, + "length": 7 + }, + "confidence": 0.995, + "source": "D(1,652,469,765,469,765,510,652,510)" + }, + { + "content": "Licensing", + "span": { + "offset": 312, + "length": 9 + }, + "confidence": 0.993, + "source": "D(1,777,469,914,469,914,510,777,510)" + }, + { + "content": "at", + "span": { + "offset": 322, + "length": 2 + }, + "confidence": 0.998, + "source": "D(1,925,469,953,469,953,510,925,510)" + }, + { + "content": "$", + "span": { + "offset": 325, + "length": 1 + }, + "confidence": 0.998, + "source": "D(1,961,469,978,469,978,510,961,510)" + }, + { + "content": "8.6", + "span": { + "offset": 326, + "length": 3 + }, + "confidence": 0.958, + "source": "D(1,980,469,1025,469,1025,510,980,510)" + }, + { + "content": "million", + "span": { + "offset": 330, + "length": 7 + }, + "confidence": 0.908, + "source": "D(1,1036,469,1128,468,1128,510,1036,510)" + }, + { + "content": ".", + "span": { + "offset": 337, + "length": 1 + }, + "confidence": 0.987, + "source": "D(1,1130,468,1137,468,1137,510,1130,510)" + }, + { + "content": "Headcount", + "span": { + "offset": 339, + "length": 9 + }, + "confidence": 0.934, + "source": "D(1,1150,468,1310,468,1310,510,1150,510)" + }, + { + "content": "at", + "span": { + "offset": 349, + "length": 2 + }, + "confidence": 0.993, + "source": "D(1,1318,468,1348,468,1348,510,1318,510)" + }, + { + "content": "end", + "span": { + "offset": 352, + "length": 3 + }, + "confidence": 0.947, + "source": "D(1,1355,468,1410,468,1410,510,1355,510)" + }, + { + "content": "of", + "span": { + "offset": 356, + "length": 2 + }, + "confidence": 0.974, + "source": "D(1,1419,468,1453,468,1453,509,1419,509)" + }, + { + "content": "Q1", + "span": { + "offset": 359, + "length": 2 + }, + "confidence": 0.931, + "source": "D(1,213,512,252,512,252,554,213,554)" + }, + { + "content": "was", + "span": { + "offset": 362, + "length": 3 + }, + "confidence": 0.847, + "source": "D(1,267,512,326,512,326,554,267,554)" + }, + { + "content": "1,247", + "span": { + "offset": 366, + "length": 5 + }, + "confidence": 0.523, + "source": "D(1,338,512,419,512,419,554,338,554)" + }, + { + "content": "employees", + "span": { + "offset": 372, + "length": 9 + }, + "confidence": 0.972, + "source": "D(1,429,513,591,512,591,554,429,554)" + }, + { + "content": "across", + "span": { + "offset": 382, + "length": 6 + }, + "confidence": 0.972, + "source": "D(1,601,512,697,512,697,554,601,554)" + }, + { + "content": "8", + "span": { + "offset": 389, + "length": 1 + }, + "confidence": 0.946, + "source": "D(1,708,512,725,512,725,553,708,554)" + }, + { + "content": "offices", + "span": { + "offset": 391, + "length": 7 + }, + "confidence": 0.95, + "source": "D(1,736,512,831,512,831,553,736,553)" + }, + { + "content": "worldwide", + "span": { + "offset": 399, + "length": 9 + }, + "confidence": 0.988, + "source": "D(1,840,512,989,512,989,552,840,553)" + }, + { + "content": ".", + "span": { + "offset": 408, + "length": 1 + }, + "confidence": 0.996, + "source": "D(1,991,512,1000,512,1000,552,991,552)" + } + ], + "lines": [ + { + "content": "Contoso Q1 2025 Financial Summary", + "source": "D(1,214,221,1050,225,1050,279,213,273)", + "span": { + "offset": 2, + "length": 33 + } + }, + { + "content": "Total revenue for Q1 2025 was $42.7 million, an increase of 18% over Q1 2024.", + "source": "D(1,212,334,1394,335,1394,374,212,374)", + "span": { + "offset": 37, + "length": 77 + } + }, + { + "content": "Operating expenses were $31.2 million. Net profit was $11.5 million. The largest", + "source": "D(1,213,379,1398,378,1398,421,213,422)", + "span": { + "offset": 115, + "length": 80 + } + }, + { + "content": "revenue segment was Cloud Services at $19.3 million, followed by Professional", + "source": "D(1,212,424,1389,423,1389,463,212,464)", + "span": { + "offset": 196, + "length": 77 + } + }, + { + "content": "Services at $14.8 million and Product Licensing at $8.6 million. Headcount at end of", + "source": "D(1,213,469,1453,468,1453,510,213,511)", + "span": { + "offset": 274, + "length": 84 + } + }, + { + "content": "Q1 was 1,247 employees across 8 offices worldwide.", + "source": "D(1,213,512,1000,512,1000,554,213,554)", + "span": { + "offset": 359, + "length": 50 + } + } + ] + } + ], + "paragraphs": [ + { + "role": "title", + "content": "Contoso Q1 2025 Financial Summary", + "source": "D(1,214,219,1050,225,1050,279,213,273)", + "span": { + "offset": 0, + "length": 35 + } + }, + { + "content": "Total revenue for Q1 2025 was $42.7 million, an increase of 18% over Q1 2024. Operating expenses were $31.2 million. Net profit was $11.5 million. The largest revenue segment was Cloud Services at $19.3 million, followed by Professional Services at $14.8 million and Product Licensing at $8.6 million. Headcount at end of Q1 was 1,247 employees across 8 offices worldwide.", + "source": "D(1,212,334,1453,333,1454,553,212,554)", + "span": { + "offset": 37, + "length": 372 + } + } + ], + "sections": [ + { + "span": { + "offset": 0, + "length": 409 + }, + "elements": [ + "/paragraphs/0", + "/paragraphs/1" + ] + } + ], + "analyzerId": "prebuilt-documentSearch", + "mimeType": "image/png" + } + ] +} \ No newline at end of file diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_invoice_result.json b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_invoice_result.json new file mode 100644 index 0000000000..076649f0dd --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_invoice_result.json @@ -0,0 +1,114 @@ +{ + "analyzerId": "prebuilt-invoice", + "apiVersion": "2025-11-01", + "createdAt": "2026-03-21T22:44:33Z", + "stringEncoding": "codePoint", + "warnings": [], + "contents": [ + { + "markdown": "# Master Services Agreement\n\nClient: Alpine Industries Inc.\n\nContract Reference: MSA-2025-ALP-00847\n\nEffective Date: January 15, 2025\nPrepared for: Robert Chen, Chief Executive Officer, Alpine Industries Inc.\n\nAddress: 742 Evergreen Blvd, Denver, CO 80203\n\nThis Master Services Agreement (the 'Agreement') is entered into by and between Alpine Industries\nInc. (the 'Client') and TechServe Global Partners (the 'Provider'). This agreement governs the provision\nof managed technology services as descri", + "fields": { + "VendorName": { + "type": "string", + "valueString": "TechServe Global Partners", + "confidence": 0.71 + }, + "DueDate": { + "type": "date", + "valueDate": "2025-02-15", + "confidence": 0.793 + }, + "InvoiceDate": { + "type": "date", + "valueDate": "2025-01-15", + "confidence": 0.693 + }, + "InvoiceId": { + "type": "string", + "valueString": "INV-100", + "confidence": 0.489 + }, + "AmountDue": { + "type": "object", + "valueObject": { + "Amount": { + "type": "number", + "valueNumber": 610, + "confidence": 0.758 + }, + "CurrencyCode": { + "type": "string", + "valueString": "USD" + } + } + }, + "SubtotalAmount": { + "type": "object", + "valueObject": { + "Amount": { + "type": "number", + "valueNumber": 100, + "confidence": 0.902 + }, + "CurrencyCode": { + "type": "string", + "valueString": "USD" + } + } + }, + "LineItems": { + "type": "array", + "valueArray": [ + { + "type": "object", + "valueObject": { + "Description": { + "type": "string", + "valueString": "Consulting Services", + "confidence": 0.664 + }, + "Quantity": { + "type": "number", + "valueNumber": 2, + "confidence": 0.957 + }, + "UnitPrice": { + "type": "object", + "valueObject": { + "Amount": { + "type": "number", + "valueNumber": 30, + "confidence": 0.956 + }, + "CurrencyCode": { + "type": "string", + "valueString": "USD" + } + } + } + } + }, + { + "type": "object", + "valueObject": { + "Description": { + "type": "string", + "valueString": "Document Fee", + "confidence": 0.712 + }, + "Quantity": { + "type": "number", + "valueNumber": 3, + "confidence": 0.939 + } + } + } + ] + } + }, + "kind": "document", + "startPageNumber": 1, + "endPageNumber": 100 + } + ] +} \ No newline at end of file diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_pdf_result.json b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_pdf_result.json new file mode 100644 index 0000000000..d5671616f9 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_pdf_result.json @@ -0,0 +1,23 @@ +{ + "analyzerId": "prebuilt-documentSearch", + "apiVersion": "2025-11-01", + "createdAt": "2026-03-21T22:44:09Z", + "contents": [ + { + "path": "input1", + "markdown": "# Contoso Q1 2025 Financial Summary\n\nTotal revenue for Q1 2025 was $42.7 million, an increase of 18% over Q1 2024.\nOperating expenses were $31.2 million. Net profit was $11.5 million. The largest\nrevenue segment was Cloud Services at $19.3 million, followed by Professional\nServices at $14.8 million and Product Licensing at $8.6 million. Headcount at end of\nQ1 was 1,247 employees across 8 offices worldwide.\n\n\n\n\n# Contoso Q2 2025 Financial Summary\n\nTotal revenue for Q2 2025 was $48.1 million, an increase of 22% over Q2 2024.\nOperating expenses were $33.9 million. Net profit was $14.2 million. Cloud Services\ngrew to $22.5 million, Professional Services was $15.7 million, and Product Licensing\nwas $9.9 million. The company opened a new office in Tokyo, bringing the total to 9\noffices. Headcount grew to 1,389 employees.\n\n\n\n\n## Contoso Product Roadmap 2025\n\nThree major product launches are planned for 2025: (1) Contoso CloudVault - an\nenterprise document storage solution, launching August 2025, with an expected price\nof $29.99/user/month. (2) Contoso DataPulse - a real-time analytics dashboard,\nlaunching October 2025. (3) Contoso SecureLink - a zero-trust networking product,\nlaunching December 2025. Total R&D; budget for 2025 is $18.4 million.\n\n\n\n\n# Contoso Employee Satisfaction Survey Results\n\nThe annual employee satisfaction survey was completed in March 2025 with a 87%\nresponse rate. Overall satisfaction score was 4.2 out of 5.0. Work-life balance scored\n3.8/5.0. Career growth opportunities scored 3.9/5.0. Compensation satisfaction\nscored 3.6/5.0. The top requested improvement was 'more flexible remote work\noptions' cited by 62% of respondents. Employee retention rate for the trailing 12\nmonths was 91%.\n\n\n\n\n## Contoso Partnership Announcements\n\nContoso announced three strategic partnerships in H1 2025: (1) A joint venture with\nMeridian Technologies for AI-powered document processing, valued at $5.2 million\nover 3 years. (2) A distribution agreement with Pacific Rim Solutions covering 12\ncountries in Asia-Pacific. (3) A technology integration partnership with NovaBridge\nSystems for unified identity management. The Chief Partnership Officer, Helena\nNakagawa, stated the partnerships are expected to generate an additional $15 million\nin revenue by 2027.\n", + "fields": { + "Summary": { + "type": "string", + "valueString": "The document provides a comprehensive overview of Contoso's key business metrics and initiatives for 2025, including financial performance for Q1 and Q2 with revenue, expenses, and profit details; a product roadmap with three major launches and R&D budget; employee satisfaction survey results highlighting scores and retention; and strategic partnership announcements expected to boost future revenue.", + "confidence": 0.46 + } + }, + "kind": "document", + "startPageNumber": 1, + "endPageNumber": 5, + "mimeType": "application/pdf", + "analyzerId": "prebuilt-documentSearch" + } + ] +} diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_video_result.json b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_video_result.json new file mode 100644 index 0000000000..e9834fa955 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_video_result.json @@ -0,0 +1,51 @@ +{ + "id": "synthetic-video-001", + "status": "Succeeded", + "analyzer_id": "prebuilt-videoSearch", + "api_version": "2025-05-01-preview", + "created_at": "2026-03-21T10:15:00Z", + "contents": [ + { + "kind": "audioVisual", + "startTimeMs": 1000, + "endTimeMs": 14000, + "width": 640, + "height": 480, + "markdown": "# Video: 00:01.000 => 00:14.000\n\nTranscript\n```\nWEBVTT\n\n00:01.000 --> 00:05.000\nWelcome to the Contoso Product Demo.\n\n00:05.000 --> 00:14.000\nToday we'll be showcasing our latest cloud infrastructure management tool.\n```", + "fields": { + "Summary": { + "type": "string", + "valueString": "Introduction to the Contoso Product Demo showcasing the latest cloud infrastructure management tool." + } + } + }, + { + "kind": "audioVisual", + "startTimeMs": 15000, + "endTimeMs": 35000, + "width": 640, + "height": 480, + "markdown": "# Video: 00:15.000 => 00:35.000\n\nTranscript\n```\nWEBVTT\n\n00:15.000 --> 00:25.000\nAs you can see on the dashboard, the system provides real-time monitoring of all deployed resources.\n\n00:25.000 --> 00:35.000\nKey features include automated scaling, cost optimization, and security compliance monitoring.\n```", + "fields": { + "Summary": { + "type": "string", + "valueString": "Dashboard walkthrough covering real-time monitoring, automated scaling, cost optimization, and security compliance." + } + } + }, + { + "kind": "audioVisual", + "startTimeMs": 36000, + "endTimeMs": 42000, + "width": 640, + "height": 480, + "markdown": "# Video: 00:36.000 => 00:42.000\n\nTranscript\n```\nWEBVTT\n\n00:36.000 --> 00:42.000\nVisit contoso.com/cloud-manager to learn more and start your free trial.\n```", + "fields": { + "Summary": { + "type": "string", + "valueString": "Call to action directing viewers to contoso.com/cloud-manager for more information and a free trial." + } + } + } + ] +} diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/test_context_provider.py b/python/packages/azure-ai-contentunderstanding/tests/cu/test_context_provider.py new file mode 100644 index 0000000000..1aa22ea379 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/test_context_provider.py @@ -0,0 +1,2054 @@ +# Copyright (c) Microsoft. All rights reserved. + +from __future__ import annotations + +import asyncio +import base64 +import contextlib +import json +from typing import Any +from unittest.mock import AsyncMock, MagicMock + +from agent_framework import Content, Message, SessionContext +from agent_framework._sessions import AgentSession +from azure.ai.contentunderstanding.models import AnalysisResult + +from agent_framework_azure_ai_contentunderstanding import ( + AnalysisSection, + ContentUnderstandingContextProvider, + DocumentStatus, +) +from agent_framework_azure_ai_contentunderstanding._constants import SUPPORTED_MEDIA_TYPES + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +_SAMPLE_PDF_BYTES = b"%PDF-1.4 fake content for testing" + + +def _make_mock_poller(result: AnalysisResult) -> AsyncMock: + """Create a mock poller that returns the given result immediately.""" + poller = AsyncMock() + poller.result = AsyncMock(return_value=result) + return poller + + +def _make_slow_poller(result: AnalysisResult, delay: float = 10.0) -> AsyncMock: + """Create a mock poller that simulates a timeout then eventually returns.""" + poller = AsyncMock() + + async def slow_result() -> AnalysisResult: + await asyncio.sleep(delay) + return result + + poller.result = slow_result + return poller + + +def _make_failing_poller(error: Exception) -> AsyncMock: + """Create a mock poller that raises an exception.""" + poller = AsyncMock() + poller.result = AsyncMock(side_effect=error) + return poller + + +def _make_data_uri(data: bytes, media_type: str) -> str: + return f"data:{media_type};base64,{base64.b64encode(data).decode('ascii')}" + + +def _make_content_from_data(data: bytes, media_type: str, filename: str | None = None) -> Content: + props = {"filename": filename} if filename else None + return Content.from_data(data, media_type, additional_properties=props) + + +def _make_context(messages: list[Message]) -> SessionContext: + return SessionContext(input_messages=messages) + + +def _make_provider( + mock_client: AsyncMock | None = None, + **kwargs: Any, +) -> ContentUnderstandingContextProvider: + provider = ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + credential=AsyncMock(), + **kwargs, + ) + if mock_client: + provider._client = mock_client # type: ignore[assignment] + return provider + + +def _make_mock_agent() -> MagicMock: + return MagicMock() + + +# =========================================================================== +# Test Classes +# =========================================================================== + + +class TestInit: + def test_default_values(self) -> None: + provider = ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + credential=AsyncMock(), + ) + assert provider.analyzer_id is None + assert provider.max_wait == 5.0 + assert provider.output_sections == [AnalysisSection.MARKDOWN, AnalysisSection.FIELDS] + assert provider.source_id == "azure_ai_contentunderstanding" + + def test_custom_values(self) -> None: + provider = ContentUnderstandingContextProvider( + endpoint="https://custom.cognitiveservices.azure.com/", + credential=AsyncMock(), + analyzer_id="prebuilt-invoice", + max_wait=10.0, + output_sections=[AnalysisSection.MARKDOWN], + source_id="custom_cu", + ) + assert provider.analyzer_id == "prebuilt-invoice" + assert provider.max_wait == 10.0 + assert provider.output_sections == [AnalysisSection.MARKDOWN] + assert provider.source_id == "custom_cu" + + def test_max_wait_none(self) -> None: + provider = ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + credential=AsyncMock(), + max_wait=None, + ) + assert provider.max_wait is None + + def test_endpoint_from_env_var(self, monkeypatch: Any) -> None: + """Endpoint can be loaded from AZURE_CONTENTUNDERSTANDING_ENDPOINT env var.""" + monkeypatch.setenv( + "AZURE_CONTENTUNDERSTANDING_ENDPOINT", + "https://env-test.cognitiveservices.azure.com/", + ) + provider = ContentUnderstandingContextProvider(credential=AsyncMock()) + assert provider._endpoint == "https://env-test.cognitiveservices.azure.com/" + + def test_explicit_endpoint_overrides_env_var(self, monkeypatch: Any) -> None: + """Explicit endpoint kwarg takes priority over env var.""" + monkeypatch.setenv( + "AZURE_CONTENTUNDERSTANDING_ENDPOINT", + "https://env-test.cognitiveservices.azure.com/", + ) + provider = ContentUnderstandingContextProvider( + endpoint="https://explicit.cognitiveservices.azure.com/", + credential=AsyncMock(), + ) + assert provider._endpoint == "https://explicit.cognitiveservices.azure.com/" + + def test_missing_endpoint_raises(self) -> None: + """Missing endpoint (no kwarg, no env var) raises an error.""" + import pytest as _pytest + from agent_framework.exceptions import SettingNotFoundError + + with _pytest.raises(SettingNotFoundError, match="endpoint"): + ContentUnderstandingContextProvider(credential=AsyncMock()) + + def test_missing_credential_raises(self) -> None: + """Missing credential raises ValueError.""" + import pytest as _pytest + + with _pytest.raises(ValueError, match="credential is required"): + ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + ) + + +class TestAsyncContextManager: + async def test_aenter_returns_self(self) -> None: + provider = ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + credential=AsyncMock(), + ) + result = await provider.__aenter__() + assert result is provider + await provider.__aexit__(None, None, None) + + async def test_aexit_closes_client(self) -> None: + provider = ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + credential=AsyncMock(), + ) + mock_client = AsyncMock() + provider._client = mock_client # type: ignore[assignment] + await provider.__aexit__(None, None, None) + mock_client.close.assert_called_once() + + +class TestBeforeRunNewFile: + async def test_single_pdf_analyzed( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("What's on this invoice?"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "invoice.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Document should be in state + assert "documents" in state + assert "invoice.pdf" in state["documents"] + assert state["documents"]["invoice.pdf"]["status"] == DocumentStatus.READY + + # Binary should be stripped from input + for m in context.input_messages: + for c in m.contents: + assert c.media_type != "application/pdf" + + # Context should have messages injected + assert len(context.context_messages) > 0 + + async def test_url_input_analyzed( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this document"), + Content.from_uri("https://example.com/report.pdf", media_type="application/pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # URL input should use begin_analyze + mock_cu_client.begin_analyze.assert_called_once() + assert "report.pdf" in state["documents"] + assert state["documents"]["report.pdf"]["status"] == DocumentStatus.READY + + async def test_text_only_skipped(self, mock_cu_client: AsyncMock) -> None: + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message(role="user", contents=[Content.from_text("What's the weather?")]) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # No CU calls + mock_cu_client.begin_analyze.assert_not_called() + mock_cu_client.begin_analyze_binary.assert_not_called() + # No documents + assert state.get("documents", {}) == {} + + +class TestBeforeRunMultiFile: + async def test_two_files_both_analyzed( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + image_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock( + side_effect=[ + _make_mock_poller(pdf_analysis_result), + _make_mock_poller(image_analysis_result), + ] + ) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Compare these documents"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc1.pdf"), + _make_content_from_data(b"\x89PNG fake", "image/png", "chart.png"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert len(state["documents"]) == 2 + assert state["documents"]["doc1.pdf"]["status"] == DocumentStatus.READY + assert state["documents"]["chart.png"]["status"] == DocumentStatus.READY + + +class TestBeforeRunTimeout: + async def test_exceeds_max_wait_defers_to_background( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_slow_poller(pdf_analysis_result, delay=10.0)) + provider = _make_provider(mock_client=mock_cu_client, max_wait=0.1) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "big_doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["big_doc.pdf"]["status"] == DocumentStatus.ANALYZING + assert "big_doc.pdf" in state.get("_pending_tasks", {}) + + # Instructions should mention analyzing + assert any("being analyzed" in instr for instr in context.instructions) + + # Clean up the background task + state["_pending_tasks"]["big_doc.pdf"].cancel() + with contextlib.suppress(asyncio.CancelledError, Exception): + await state["_pending_tasks"]["big_doc.pdf"] + + +class TestBeforeRunPendingResolution: + async def test_pending_completes_on_next_turn( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + provider = _make_provider(mock_client=mock_cu_client) + + # Simulate a completed background task + async def return_result() -> AnalysisResult: + return pdf_analysis_result + + task: asyncio.Task[AnalysisResult] = asyncio.ensure_future(return_result()) + await asyncio.sleep(0.01) # Let task complete + + state: dict[str, Any] = { + "_pending_tasks": {"report.pdf": task}, + "documents": { + "report.pdf": { + "status": DocumentStatus.ANALYZING, + "filename": "report.pdf", + "media_type": "application/pdf", + "analyzer_id": "prebuilt-documentSearch", + "analyzed_at": None, + "analysis_duration_s": None, + "upload_duration_s": None, + "result": None, + "error": None, + }, + }, + } + + msg = Message(role="user", contents=[Content.from_text("Is the report ready?")]) + context = _make_context([msg]) + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["report.pdf"]["status"] == DocumentStatus.READY + assert state["documents"]["report.pdf"]["result"] is not None + assert "report.pdf" not in state.get("_pending_tasks", {}) + + +class TestBeforeRunPendingFailure: + async def test_pending_task_failure_updates_state( + self, + mock_cu_client: AsyncMock, + ) -> None: + provider = _make_provider(mock_client=mock_cu_client) + + async def failing_task() -> AnalysisResult: + raise RuntimeError("CU service unavailable") + + task: asyncio.Task[AnalysisResult] = asyncio.ensure_future(failing_task()) + await asyncio.sleep(0.01) # Let task fail + + state: dict[str, Any] = { + "_pending_tasks": {"bad_doc.pdf": task}, + "documents": { + "bad_doc.pdf": { + "status": DocumentStatus.ANALYZING, + "filename": "bad_doc.pdf", + "media_type": "application/pdf", + "analyzer_id": "prebuilt-documentSearch", + "analyzed_at": None, + "analysis_duration_s": None, + "upload_duration_s": None, + "result": None, + "error": None, + }, + }, + } + + msg = Message(role="user", contents=[Content.from_text("Check status")]) + context = _make_context([msg]) + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["bad_doc.pdf"]["status"] == DocumentStatus.FAILED + assert "CU service unavailable" in (state["documents"]["bad_doc.pdf"]["error"] or "") + + +class TestDocumentKeyDerivation: + def test_filename_from_additional_properties(self) -> None: + content = _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "my_report.pdf") + key = ContentUnderstandingContextProvider._derive_doc_key(content) + assert key == "my_report.pdf" + + def test_url_basename(self) -> None: + content = Content.from_uri("https://example.com/docs/annual_report.pdf", media_type="application/pdf") + key = ContentUnderstandingContextProvider._derive_doc_key(content) + assert key == "annual_report.pdf" + + def test_content_hash_fallback(self) -> None: + content = Content.from_data(_SAMPLE_PDF_BYTES, "application/pdf") + key = ContentUnderstandingContextProvider._derive_doc_key(content) + assert key.startswith("doc_") + assert len(key) == 12 # "doc_" + 8 hex chars + + +class TestSessionState: + async def test_documents_persist_across_turns( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + state: dict[str, Any] = {} + session = AgentSession() + + # Turn 1: upload + msg1 = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + ctx1 = _make_context([msg1]) + await provider.before_run(agent=_make_mock_agent(), session=session, context=ctx1, state=state) + + assert "doc.pdf" in state["documents"] + + # Turn 2: follow-up (no file) + msg2 = Message(role="user", contents=[Content.from_text("What's the total?")]) + ctx2 = _make_context([msg2]) + await provider.before_run(agent=_make_mock_agent(), session=session, context=ctx2, state=state) + + # Document should still be there + assert "doc.pdf" in state["documents"] + assert state["documents"]["doc.pdf"]["status"] == DocumentStatus.READY + + +class TestListDocumentsTool: + async def test_returns_all_docs_with_status( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + state: dict[str, Any] = {} + session = AgentSession() + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "test.pdf"), + ], + ) + context = _make_context([msg]) + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Find the list_documents tool + list_tool = None + for tool in context.tools: + if getattr(tool, "name", None) == "list_documents": + list_tool = tool + break + + assert list_tool is not None + result = list_tool.func() # type: ignore[union-attr] + parsed = json.loads(result) + assert len(parsed) == 1 + assert parsed[0]["name"] == "test.pdf" + assert parsed[0]["status"] == DocumentStatus.READY + + +class TestOutputFiltering: + def test_default_markdown_and_fields(self, pdf_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(pdf_analysis_result) + + assert "markdown" in result + assert "fields" in result + assert "Contoso" in str(result["markdown"]) + + def test_markdown_only(self, pdf_analysis_result: AnalysisResult) -> None: + provider = _make_provider(output_sections=[AnalysisSection.MARKDOWN]) + result = provider._extract_sections(pdf_analysis_result) + + assert "markdown" in result + assert "fields" not in result + + def test_fields_only(self, invoice_analysis_result: AnalysisResult) -> None: + provider = _make_provider(output_sections=[AnalysisSection.FIELDS]) + result = provider._extract_sections(invoice_analysis_result) + + assert "markdown" not in result + assert "fields" in result + fields = result["fields"] + assert isinstance(fields, dict) + assert "VendorName" in fields + + def test_field_values_extracted(self, invoice_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(invoice_analysis_result) + + fields = result.get("fields") + assert isinstance(fields, dict) + assert "VendorName" in fields + assert fields["VendorName"]["value"] is not None + assert fields["VendorName"]["confidence"] is not None + + def test_invoice_field_extraction_matches_expected(self, invoice_analysis_result: AnalysisResult) -> None: + """Full invoice field extraction should match expected JSON structure. + + This test defines the complete expected output for all fields in the + invoice fixture, making it easy to review the extraction behavior at + a glance. Confidence is only present when the CU service provides it. + """ + provider = _make_provider() + result = provider._extract_sections(invoice_analysis_result) + fields = result.get("fields") + + expected_fields = { + "VendorName": { + "type": "string", + "value": "TechServe Global Partners", + "confidence": 0.71, + }, + "DueDate": { + "type": "date", + # SDK .value returns datetime.date for date fields + "value": fields["DueDate"]["value"], # dynamic — date object + "confidence": 0.793, + }, + "InvoiceDate": { + "type": "date", + "value": fields["InvoiceDate"]["value"], + "confidence": 0.693, + }, + "InvoiceId": { + "type": "string", + "value": "INV-100", + "confidence": 0.489, + }, + "AmountDue": { + "type": "object", + # No confidence — object types don't have it + "value": { + "Amount": {"type": "number", "value": 610.0, "confidence": 0.758}, + "CurrencyCode": {"type": "string", "value": "USD"}, + }, + }, + "SubtotalAmount": { + "type": "object", + "value": { + "Amount": {"type": "number", "value": 100.0, "confidence": 0.902}, + "CurrencyCode": {"type": "string", "value": "USD"}, + }, + }, + "LineItems": { + "type": "array", + "value": [ + { + "type": "object", + "value": { + "Description": {"type": "string", "value": "Consulting Services", "confidence": 0.664}, + "Quantity": {"type": "number", "value": 2.0, "confidence": 0.957}, + "UnitPrice": { + "type": "object", + "value": { + "Amount": {"type": "number", "value": 30.0, "confidence": 0.956}, + "CurrencyCode": {"type": "string", "value": "USD"}, + }, + }, + }, + }, + { + "type": "object", + "value": { + "Description": {"type": "string", "value": "Document Fee", "confidence": 0.712}, + "Quantity": {"type": "number", "value": 3.0, "confidence": 0.939}, + }, + }, + ], + }, + } + + assert fields == expected_fields + + +class TestDuplicateDocumentKey: + async def test_duplicate_filename_rejected( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """Uploading the same filename twice in the same session should reject the second.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + # Turn 1: upload invoice.pdf + msg1 = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "invoice.pdf"), + ], + ) + context1 = _make_context([msg1]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context1, state=state) + assert "invoice.pdf" in state["documents"] + assert state["documents"]["invoice.pdf"]["status"] == DocumentStatus.READY + + # Turn 2: upload invoice.pdf again (different content but same filename) + msg2 = Message( + role="user", + contents=[ + Content.from_text("Analyze this too"), + _make_content_from_data(b"different-content", "application/pdf", "invoice.pdf"), + ], + ) + context2 = _make_context([msg2]) + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context2, state=state) + + # Should still have only one document, not re-analyzed + assert mock_cu_client.begin_analyze_binary.call_count == 1 + # Instructions should mention duplicate + assert any("already uploaded" in instr for instr in context2.instructions) + + async def test_duplicate_in_same_turn_rejected( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """Two files with the same filename in the same turn: first wins, second rejected.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze both"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "report.pdf"), + _make_content_from_data(b"other-content", "application/pdf", "report.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Only analyzed once (first one wins) + assert mock_cu_client.begin_analyze_binary.call_count == 1 + assert "report.pdf" in state["documents"] + assert any("already uploaded" in instr for instr in context.instructions) + + +class TestBinaryStripping: + async def test_supported_files_stripped( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("What's in here?"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # PDF should be stripped; text should remain + for m in context.input_messages: + for c in m.contents: + assert c.media_type != "application/pdf" + assert any(c.text and "What's in here?" in c.text for c in m.contents) + + async def test_unsupported_files_left_in_place(self, mock_cu_client: AsyncMock) -> None: + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("What's in this zip?"), + Content.from_data( + b"PK\x03\x04fake", + "application/zip", + additional_properties={"filename": "archive.zip"}, + ), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Zip should NOT be stripped (unsupported) + found_zip = False + for m in context.input_messages: + for c in m.contents: + if c.media_type == "application/zip": + found_zip = True + assert found_zip + + +# Real magic-byte headers for binary sniffing tests +_MP4_MAGIC = b"\x00\x00\x00\x1cftypisom" + b"\x00" * 250 +_WAV_MAGIC = b"RIFF\x00\x00\x00\x00WAVE" + b"\x00" * 250 +_MP3_MAGIC = b"ID3\x04\x00\x00" + b"\x00" * 250 +_FLAC_MAGIC = b"fLaC\x00\x00\x00\x00" + b"\x00" * 250 +_OGG_MAGIC = b"OggS\x00\x02" + b"\x00" * 250 +_AVI_MAGIC = b"RIFF\x00\x00\x00\x00AVI " + b"\x00" * 250 +_MOV_MAGIC = b"\x00\x00\x00\x14ftypqt " + b"\x00" * 250 + + +class TestMimeSniffing: + """Tests for binary MIME sniffing via filetype when upstream MIME is unreliable.""" + + async def test_octet_stream_mp4_detected_and_stripped( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """MP4 uploaded as application/octet-stream should be sniffed, corrected, and stripped.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("What's in this file?"), + _make_content_from_data(_MP4_MAGIC, "application/octet-stream", "video.mp4"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # MP4 should be stripped from input + for m in context.input_messages: + for c in m.contents: + assert c.media_type != "application/octet-stream", "octet-stream content should be stripped" + + # CU should have been called + assert mock_cu_client.begin_analyze_binary.called + + async def test_octet_stream_wav_detected_via_sniff( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """WAV uploaded as application/octet-stream should be detected via filetype sniffing.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Transcribe"), + _make_content_from_data(_WAV_MAGIC, "application/octet-stream", "audio.wav"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Should be detected and analyzed + assert "audio.wav" in state["documents"] + # The media_type should be corrected to audio/wav (via _MIME_ALIASES) + assert state["documents"]["audio.wav"]["media_type"] == "audio/wav" + + async def test_octet_stream_mp3_detected_via_sniff( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """MP3 uploaded as application/octet-stream should be detected as audio/mpeg.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Transcribe"), + _make_content_from_data(_MP3_MAGIC, "application/octet-stream", "song.mp3"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert "song.mp3" in state["documents"] + assert state["documents"]["song.mp3"]["media_type"] == "audio/mpeg" + + async def test_octet_stream_flac_alias_normalized( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """FLAC sniffed as audio/x-flac should be normalized to audio/flac.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Transcribe"), + _make_content_from_data(_FLAC_MAGIC, "application/octet-stream", "music.flac"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert "music.flac" in state["documents"] + assert state["documents"]["music.flac"]["media_type"] == "audio/flac" + + async def test_octet_stream_unknown_binary_not_stripped( + self, + mock_cu_client: AsyncMock, + ) -> None: + """Unknown binary with application/octet-stream should NOT be stripped.""" + provider = _make_provider(mock_client=mock_cu_client) + + unknown_bytes = b"\x00\x01\x02\x03random garbage" + b"\x00" * 250 + msg = Message( + role="user", + contents=[ + Content.from_text("What is this?"), + _make_content_from_data(unknown_bytes, "application/octet-stream", "mystery.bin"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Unknown file should NOT be stripped + found_octet = False + for m in context.input_messages: + for c in m.contents: + if c.media_type == "application/octet-stream": + found_octet = True + assert found_octet + + async def test_missing_mime_falls_back_to_filename( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """Content with empty MIME but a .mp4 filename should be detected via mimetypes fallback.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + # Use garbage binary (filetype won't detect) but filename has .mp4 + garbage = b"\x00" * 300 + content = Content.from_data(garbage, "", additional_properties={"filename": "recording.mp4"}) + msg = Message( + role="user", + contents=[Content.from_text("Analyze"), content], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Should be detected via filename and analyzed + assert "recording.mp4" in state["documents"] + + async def test_correct_mime_not_sniffed( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """Files with correct MIME type should go through fast path without sniffing.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert "doc.pdf" in state["documents"] + assert state["documents"]["doc.pdf"]["media_type"] == "application/pdf" + + async def test_sniffed_video_uses_correct_analyzer( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """MP4 sniffed from octet-stream should use prebuilt-videoSearch analyzer.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) # analyzer_id=None → auto-detect + + msg = Message( + role="user", + contents=[ + Content.from_text("What's in this video?"), + _make_content_from_data(_MP4_MAGIC, "application/octet-stream", "demo.mp4"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["demo.mp4"]["analyzer_id"] == "prebuilt-videoSearch" + + +class TestErrorHandling: + async def test_cu_service_error(self, mock_cu_client: AsyncMock) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_failing_poller(RuntimeError("Service unavailable")) + ) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "error.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["error.pdf"]["status"] == DocumentStatus.FAILED + assert "Service unavailable" in (state["documents"]["error.pdf"]["error"] or "") + + async def test_lazy_initialization_on_before_run(self) -> None: + """before_run works with eagerly-initialized client.""" + provider = ContentUnderstandingContextProvider( + endpoint="https://test.cognitiveservices.azure.com/", + credential=AsyncMock(), + ) + assert provider._client is not None + + mock_client = AsyncMock() + mock_client.begin_analyze_binary = AsyncMock( + side_effect=Exception("mock error"), + ) + provider._client = mock_client # type: ignore[assignment] + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + # Client should still be set + assert provider._client is not None + + +class TestMultiModalFixtures: + def test_pdf_fixture_loads(self, pdf_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(pdf_analysis_result) + assert "markdown" in result + assert "Contoso" in str(result["markdown"]) + + def test_audio_fixture_loads(self, audio_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(audio_analysis_result) + assert "markdown" in result + assert "Call Center" in str(result["markdown"]) + + def test_video_fixture_loads(self, video_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(video_analysis_result) + assert "markdown" in result + # All 3 segments should be concatenated at top level (for file_search) + md = str(result["markdown"]) + assert "Contoso Product Demo" in md + assert "real-time monitoring" in md + assert "contoso.com/cloud-manager" in md + # Duration should span all segments: (42000 - 1000) / 1000 = 41.0 + assert result.get("duration_seconds") == 41.0 + # kind from first segment + assert result.get("kind") == "audioVisual" + # resolution from first segment + assert result.get("resolution") == "640x480" + # Multi-segment: fields should be in per-segment list, not merged at top level + assert "fields" not in result # no top-level fields for multi-segment + segments = result.get("segments") + assert isinstance(segments, list) + assert len(segments) == 3 + # Each segment should have its own fields and time range + seg0 = segments[0] + assert "fields" in seg0 + assert "Summary" in seg0["fields"] + assert seg0.get("start_time_s") == 1.0 + assert seg0.get("end_time_s") == 14.0 + seg2 = segments[2] + assert "fields" in seg2 + assert "Summary" in seg2["fields"] + assert seg2.get("start_time_s") == 36.0 + assert seg2.get("end_time_s") == 42.0 + + def test_image_fixture_loads(self, image_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(image_analysis_result) + assert "markdown" in result + + def test_invoice_fixture_loads(self, invoice_analysis_result: AnalysisResult) -> None: + provider = _make_provider() + result = provider._extract_sections(invoice_analysis_result) + assert "markdown" in result + assert "fields" in result + fields = result["fields"] + assert isinstance(fields, dict) + assert "VendorName" in fields + # Single-segment: should NOT have segments key + assert "segments" not in result + + +class TestFormatResult: + def test_format_includes_markdown_and_fields(self) -> None: + result: dict[str, object] = { + "markdown": "# Hello World", + "fields": {"Name": {"type": "string", "value": "Test", "confidence": 0.9}}, + } + formatted = ContentUnderstandingContextProvider._format_result("test.pdf", result) + + assert 'Document analysis of "test.pdf"' in formatted + assert "# Hello World" in formatted + assert "Extracted Fields" in formatted + assert '"Name"' in formatted + + def test_format_markdown_only(self) -> None: + result: dict[str, object] = {"markdown": "# Just Text"} + formatted = ContentUnderstandingContextProvider._format_result("doc.pdf", result) + + assert "# Just Text" in formatted + assert "Extracted Fields" not in formatted + + def test_format_multi_segment_video(self) -> None: + """Multi-segment results should format each segment with its own content + fields.""" + result: dict[str, object] = { + "kind": "audioVisual", + "duration_seconds": 41.0, + "resolution": "640x480", + "markdown": "scene1\n\n---\n\nscene2", # concatenated for file_search + "segments": [ + { + "start_time_s": 1.0, + "end_time_s": 14.0, + "markdown": "Welcome to the Contoso demo.", + "fields": { + "Summary": {"type": "string", "value": "Product intro"}, + "Speakers": { + "type": "object", + "value": {"count": 1, "names": ["Host"]}, + }, + }, + }, + { + "start_time_s": 15.0, + "end_time_s": 31.0, + "markdown": "Here we show real-time monitoring.", + "fields": { + "Summary": {"type": "string", "value": "Feature walkthrough"}, + "Speakers": { + "type": "object", + "value": {"count": 2, "names": ["Host", "Engineer"]}, + }, + }, + }, + ], + } + formatted = ContentUnderstandingContextProvider._format_result("demo.mp4", result) + + expected = ( + 'Video analysis of "demo.mp4":\n' + "Duration: 0:41 | Resolution: 640x480\n" + "\n### Segment 1 (0:01 - 0:14)\n" + "\n```markdown\nWelcome to the Contoso demo.\n```\n" + "\n**Fields:**\n```json\n" + "{\n" + ' "Summary": {\n' + ' "type": "string",\n' + ' "value": "Product intro"\n' + " },\n" + ' "Speakers": {\n' + ' "type": "object",\n' + ' "value": {\n' + ' "count": 1,\n' + ' "names": [\n' + ' "Host"\n' + " ]\n" + " }\n" + " }\n" + "}\n```\n" + "\n### Segment 2 (0:15 - 0:31)\n" + "\n```markdown\nHere we show real-time monitoring.\n```\n" + "\n**Fields:**\n```json\n" + "{\n" + ' "Summary": {\n' + ' "type": "string",\n' + ' "value": "Feature walkthrough"\n' + " },\n" + ' "Speakers": {\n' + ' "type": "object",\n' + ' "value": {\n' + ' "count": 2,\n' + ' "names": [\n' + ' "Host",\n' + ' "Engineer"\n' + " ]\n" + " }\n" + " }\n" + "}\n```" + ) + assert formatted == expected + + # Verify ordering: segment 1 markdown+fields appear before segment 2 + seg1_pos = formatted.index("Segment 1") + seg2_pos = formatted.index("Segment 2") + contoso_pos = formatted.index("Welcome to the Contoso demo.") + monitoring_pos = formatted.index("Here we show real-time monitoring.") + intro_pos = formatted.index("Product intro") + walkthrough_pos = formatted.index("Feature walkthrough") + host_only_pos = formatted.index('"count": 1') + host_engineer_pos = formatted.index('"count": 2') + assert (seg1_pos < contoso_pos < intro_pos < host_only_pos + < seg2_pos < monitoring_pos < walkthrough_pos < host_engineer_pos) + + def test_format_single_segment_no_segments_key(self) -> None: + """Single-segment results should NOT have segments key — flat format.""" + result: dict[str, object] = { + "kind": "document", + "markdown": "# Invoice content", + "fields": { + "VendorName": {"type": "string", "value": "Contoso", "confidence": 0.95}, + "ShippingAddress": { + "type": "object", + "value": {"street": "123 Main St", "city": "Redmond", "state": "WA"}, + "confidence": 0.88, + }, + }, + } + formatted = ContentUnderstandingContextProvider._format_result("invoice.pdf", result) + + expected = ( + 'Document analysis of "invoice.pdf":\n' + "\n## Content\n\n" + "```markdown\n# Invoice content\n```\n" + "\n## Extracted Fields\n\n" + "```json\n" + "{\n" + ' "VendorName": {\n' + ' "type": "string",\n' + ' "value": "Contoso",\n' + ' "confidence": 0.95\n' + " },\n" + ' "ShippingAddress": {\n' + ' "type": "object",\n' + ' "value": {\n' + ' "street": "123 Main St",\n' + ' "city": "Redmond",\n' + ' "state": "WA"\n' + " },\n" + ' "confidence": 0.88\n' + " }\n" + "}\n" + "```" + ) + assert formatted == expected + + # Verify ordering: header → markdown content → fields + header_pos = formatted.index('Document analysis of "invoice.pdf"') + content_header_pos = formatted.index("## Content") + markdown_pos = formatted.index("# Invoice content") + fields_header_pos = formatted.index("## Extracted Fields") + vendor_pos = formatted.index("Contoso") + address_pos = formatted.index("ShippingAddress") + street_pos = formatted.index("123 Main St") + assert (header_pos < content_header_pos < markdown_pos + < fields_header_pos < vendor_pos < address_pos < street_pos) + + +class TestSupportedMediaTypes: + def test_pdf_supported(self) -> None: + assert "application/pdf" in SUPPORTED_MEDIA_TYPES + + def test_audio_supported(self) -> None: + assert "audio/mp3" in SUPPORTED_MEDIA_TYPES + assert "audio/wav" in SUPPORTED_MEDIA_TYPES + + def test_video_supported(self) -> None: + assert "video/mp4" in SUPPORTED_MEDIA_TYPES + + def test_zip_not_supported(self) -> None: + assert "application/zip" not in SUPPORTED_MEDIA_TYPES + + +class TestAnalyzerAutoDetection: + """Verify _resolve_analyzer_id auto-selects the right analyzer by media type.""" + + def test_explicit_analyzer_always_wins(self) -> None: + provider = _make_provider(analyzer_id="prebuilt-invoice") + assert provider._resolve_analyzer_id("audio/mp3") == "prebuilt-invoice" + assert provider._resolve_analyzer_id("video/mp4") == "prebuilt-invoice" + assert provider._resolve_analyzer_id("application/pdf") == "prebuilt-invoice" + + def test_auto_detect_pdf(self) -> None: + provider = _make_provider() # analyzer_id=None + assert provider._resolve_analyzer_id("application/pdf") == "prebuilt-documentSearch" + + def test_auto_detect_image(self) -> None: + provider = _make_provider() + assert provider._resolve_analyzer_id("image/jpeg") == "prebuilt-documentSearch" + assert provider._resolve_analyzer_id("image/png") == "prebuilt-documentSearch" + + def test_auto_detect_audio(self) -> None: + provider = _make_provider() + assert provider._resolve_analyzer_id("audio/mp3") == "prebuilt-audioSearch" + assert provider._resolve_analyzer_id("audio/wav") == "prebuilt-audioSearch" + assert provider._resolve_analyzer_id("audio/mpeg") == "prebuilt-audioSearch" + + def test_auto_detect_video(self) -> None: + provider = _make_provider() + assert provider._resolve_analyzer_id("video/mp4") == "prebuilt-videoSearch" + assert provider._resolve_analyzer_id("video/webm") == "prebuilt-videoSearch" + + def test_auto_detect_unknown_falls_back_to_document(self) -> None: + provider = _make_provider() + assert provider._resolve_analyzer_id("application/octet-stream") == "prebuilt-documentSearch" + + +class TestFileSearchIntegration: + _FILE_SEARCH_TOOL = {"type": "file_search", "vector_store_ids": ["vs_test123"]} + + def _make_mock_backend(self) -> AsyncMock: + """Create a mock FileSearchBackend.""" + backend = AsyncMock() + backend.upload_file = AsyncMock(return_value="file_test456") + backend.delete_file = AsyncMock() + return backend + + def _make_file_search_config(self, backend: AsyncMock | None = None) -> Any: + from agent_framework_azure_ai_contentunderstanding import FileSearchConfig + + return FileSearchConfig( + backend=backend or self._make_mock_backend(), + vector_store_id="vs_test123", + file_search_tool=self._FILE_SEARCH_TOOL, + ) + + async def test_file_search_uploads_to_vector_store( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_backend = self._make_mock_backend() + config = self._make_file_search_config(mock_backend) + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(pdf_analysis_result), + ) + provider = _make_provider( + mock_client=mock_cu_client, + file_search=config, + ) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run( + agent=_make_mock_agent(), + session=session, + context=context, + state=state, + ) + + # File should be uploaded via backend + mock_backend.upload_file.assert_called_once() + call_args = mock_backend.upload_file.call_args + assert call_args[0][0] == "vs_test123" # vector_store_id + assert call_args[0][1] == "doc.pdf.md" # filename + # file_search tool should be registered on context + assert self._FILE_SEARCH_TOOL in context.tools + + async def test_file_search_no_content_injection( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """When file_search is enabled, full content should NOT be injected into context.""" + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(pdf_analysis_result), + ) + provider = _make_provider( + mock_client=mock_cu_client, + file_search=self._make_file_search_config(), + ) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run( + agent=_make_mock_agent(), + session=session, + context=context, + state=state, + ) + + # Context messages should NOT contain full document content + # (file_search handles retrieval instead) + for msgs in context.context_messages.values(): + for m in msgs: + assert "Document Content" not in m.text + + async def test_cleanup_deletes_uploaded_files( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_backend = self._make_mock_backend() + config = self._make_file_search_config(mock_backend) + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(pdf_analysis_result), + ) + provider = _make_provider( + mock_client=mock_cu_client, + file_search=config, + ) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run( + agent=_make_mock_agent(), + session=session, + context=context, + state=state, + ) + + # Close should clean up uploaded files (not the vector store itself) + await provider.close() + mock_backend.delete_file.assert_called_once_with("file_test456") + + async def test_no_file_search_injects_content( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """Without file_search, full content should be injected (default behavior).""" + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(pdf_analysis_result), + ) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run( + agent=_make_mock_agent(), + session=session, + context=context, + state=state, + ) + + # Without file_search, content SHOULD be injected + found_content = False + for msgs in context.context_messages.values(): + for m in msgs: + if "Document Content" in m.text or "Contoso" in m.text: + found_content = True + assert found_content + + async def test_file_search_multiple_files( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + audio_analysis_result: AnalysisResult, + ) -> None: + """Multiple files should each be uploaded to the vector store.""" + mock_backend = self._make_mock_backend() + # Return different file IDs for each upload + mock_backend.upload_file = AsyncMock(side_effect=["file_001", "file_002"]) + config = self._make_file_search_config(mock_backend) + mock_cu_client.begin_analyze_binary = AsyncMock( + side_effect=[ + _make_mock_poller(pdf_analysis_result), + _make_mock_poller(audio_analysis_result), + ], + ) + provider = _make_provider( + mock_client=mock_cu_client, + file_search=config, + ) + + msg = Message( + role="user", + contents=[ + Content.from_text("Compare these"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "doc.pdf"), + _make_content_from_data(b"\x00audio-fake", "audio/mp3", "call.mp3"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Two files uploaded via backend + assert mock_backend.upload_file.call_count == 2 + + async def test_file_search_skips_empty_markdown( + self, + mock_cu_client: AsyncMock, + ) -> None: + """Upload should be skipped when CU returns no markdown content.""" + mock_backend = self._make_mock_backend() + config = self._make_file_search_config(mock_backend) + + # Create a result with empty markdown + empty_result = AnalysisResult({"contents": [{"markdown": "", "fields": {}}]}) + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(empty_result), + ) + provider = _make_provider( + mock_client=mock_cu_client, + file_search=config, + ) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "empty.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # No file should be uploaded (empty markdown) + mock_backend.upload_file.assert_not_called() + + async def test_pending_resolution_uploads_to_vector_store( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """When a background task completes in file_search mode, content should be + uploaded to the vector store — NOT injected into context messages.""" + mock_backend = self._make_mock_backend() + config = self._make_file_search_config(mock_backend) + provider = _make_provider( + mock_client=mock_cu_client, + file_search=config, + ) + + # Simulate a completed background task + async def return_result() -> AnalysisResult: + return pdf_analysis_result + + task: asyncio.Task[AnalysisResult] = asyncio.ensure_future(return_result()) + await asyncio.sleep(0.01) + + state: dict[str, Any] = { + "_pending_tasks": {"report.pdf": task}, + "documents": { + "report.pdf": { + "status": DocumentStatus.ANALYZING, + "filename": "report.pdf", + "media_type": "application/pdf", + "analyzer_id": "prebuilt-documentSearch", + "analyzed_at": None, + "analysis_duration_s": None, + "upload_duration_s": None, + "result": None, + "error": None, + }, + }, + } + + msg = Message(role="user", contents=[Content.from_text("Is the report ready?")]) + context = _make_context([msg]) + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Document should be ready + assert state["documents"]["report.pdf"]["status"] == DocumentStatus.READY + + # Content should NOT be injected into context messages + for msgs in context.context_messages.values(): + for m in msgs: + assert "Document Content" not in m.text + + # Should be uploaded to vector store via backend + mock_backend.upload_file.assert_called_once() + + # Instructions should mention file_search, not "provided above" + assert any("file_search" in instr for instr in context.instructions) + assert not any("provided above" in instr for instr in context.instructions) + + +class TestCloseCancel: + async def test_close_cancels_pending_tasks(self) -> None: + """close() should cancel any pending background analysis tasks.""" + provider = _make_provider(mock_client=AsyncMock()) + + # Simulate a long-running pending task + async def slow() -> None: + await asyncio.sleep(100) + + task = asyncio.create_task(slow()) + provider._all_pending_tasks.append(task) + + await provider.close() + + # Allow the cancellation to propagate + with contextlib.suppress(asyncio.CancelledError): + await task + + assert task.cancelled() + assert len(provider._all_pending_tasks) == 0 + + +class TestSessionIsolation: + """Verify that per-session state (pending tasks, uploads) is isolated between sessions.""" + + async def test_background_task_isolated_per_session( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """A background task from session A must not leak into session B.""" + mock_cu_client.begin_analyze_binary = AsyncMock(return_value=_make_slow_poller(pdf_analysis_result, delay=10.0)) + provider = _make_provider(mock_client=mock_cu_client, max_wait=0.1) + + # Session A: upload a file that times out → defers to background + msg_a = Message( + role="user", + contents=[ + Content.from_text("Analyze this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "report.pdf"), + ], + ) + state_a: dict[str, Any] = {} + context_a = _make_context([msg_a]) + await provider.before_run(agent=_make_mock_agent(), session=AgentSession(), context=context_a, state=state_a) + + # Session A should have a pending task + assert "report.pdf" in state_a.get("_pending_tasks", {}) + + # Session B: separate state, no pending tasks + state_b: dict[str, Any] = {} + msg_b = Message(role="user", contents=[Content.from_text("Hello")]) + context_b = _make_context([msg_b]) + await provider.before_run(agent=_make_mock_agent(), session=AgentSession(), context=context_b, state=state_b) + + # Session B must NOT see session A's pending task + assert "_pending_tasks" not in state_b or "report.pdf" not in state_b.get("_pending_tasks", {}) + # Session B must NOT have session A's documents + assert "report.pdf" not in state_b.get("documents", {}) + + # Clean up + for task in state_a.get("_pending_tasks", {}).values(): + task.cancel() + with contextlib.suppress(asyncio.CancelledError, Exception): + await task + + async def test_completed_task_resolves_in_correct_session( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """A completed background task should only inject content into its own session.""" + provider = _make_provider(mock_client=mock_cu_client) + + # Simulate completed task in session A + async def return_result() -> AnalysisResult: + return pdf_analysis_result + + task_a: asyncio.Task[AnalysisResult] = asyncio.ensure_future(return_result()) + await asyncio.sleep(0.01) + + state_a: dict[str, Any] = { + "_pending_tasks": {"report.pdf": task_a}, + "documents": { + "report.pdf": { + "status": DocumentStatus.ANALYZING, + "filename": "report.pdf", + "media_type": "application/pdf", + "analyzer_id": "prebuilt-documentSearch", + "analyzed_at": None, + "analysis_duration_s": None, + "upload_duration_s": None, + "result": None, + "error": None, + }, + }, + } + state_b: dict[str, Any] = {} + + # Run session A — should resolve the task + context_a = _make_context([Message(role="user", contents=[Content.from_text("Is it ready?")])]) + await provider.before_run(agent=_make_mock_agent(), session=AgentSession(), context=context_a, state=state_a) + assert state_a["documents"]["report.pdf"]["status"] == DocumentStatus.READY + + # Run session B — must NOT have any documents or resolved content + context_b = _make_context([Message(role="user", contents=[Content.from_text("Hello")])]) + await provider.before_run(agent=_make_mock_agent(), session=AgentSession(), context=context_b, state=state_b) + assert "report.pdf" not in state_b.get("documents", {}) + # Session B context should have no document-related instructions + assert not any("report.pdf" in instr for instr in context_b.instructions) + + +class TestAnalyzerAutoDetectionE2E: + """End-to-end: verify _analyze_file stores the resolved analyzer in DocumentEntry.""" + + async def test_audio_file_uses_audio_analyzer( + self, + mock_cu_client: AsyncMock, + audio_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(audio_analysis_result), + ) + provider = _make_provider(mock_client=mock_cu_client) # analyzer_id=None + + msg = Message( + role="user", + contents=[ + Content.from_text("Transcribe this"), + _make_content_from_data(b"\x00audio", "audio/mp3", "call.mp3"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["call.mp3"]["analyzer_id"] == "prebuilt-audioSearch" + # CU client should have been called with the audio analyzer + mock_cu_client.begin_analyze_binary.assert_called_once() + call_args = mock_cu_client.begin_analyze_binary.call_args + assert call_args[0][0] == "prebuilt-audioSearch" + + async def test_video_file_uses_video_analyzer( + self, + mock_cu_client: AsyncMock, + video_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(video_analysis_result), + ) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze this video"), + _make_content_from_data(b"\x00video", "video/mp4", "demo.mp4"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["demo.mp4"]["analyzer_id"] == "prebuilt-videoSearch" + call_args = mock_cu_client.begin_analyze_binary.call_args + assert call_args[0][0] == "prebuilt-videoSearch" + + async def test_pdf_file_uses_document_analyzer( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(pdf_analysis_result), + ) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Read this"), + _make_content_from_data(_SAMPLE_PDF_BYTES, "application/pdf", "report.pdf"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["report.pdf"]["analyzer_id"] == "prebuilt-documentSearch" + call_args = mock_cu_client.begin_analyze_binary.call_args + assert call_args[0][0] == "prebuilt-documentSearch" + + async def test_explicit_override_ignores_media_type( + self, + mock_cu_client: AsyncMock, + audio_analysis_result: AnalysisResult, + ) -> None: + """Explicit analyzer_id should override auto-detection even for audio.""" + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(audio_analysis_result), + ) + provider = _make_provider(mock_client=mock_cu_client, analyzer_id="prebuilt-invoice") + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze"), + _make_content_from_data(b"\x00audio", "audio/mp3", "call.mp3"), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + assert state["documents"]["call.mp3"]["analyzer_id"] == "prebuilt-invoice" + call_args = mock_cu_client.begin_analyze_binary.call_args + assert call_args[0][0] == "prebuilt-invoice" + + async def test_per_file_analyzer_overrides_provider_default( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """Per-file analyzer_id in additional_properties overrides provider-level default.""" + mock_cu_client.begin_analyze_binary = AsyncMock( + return_value=_make_mock_poller(pdf_analysis_result), + ) + # Provider default is prebuilt-documentSearch + provider = _make_provider( + mock_client=mock_cu_client, + analyzer_id="prebuilt-documentSearch", + ) + + msg = Message( + role="user", + contents=[ + Content.from_text("Process this invoice"), + Content.from_data( + _SAMPLE_PDF_BYTES, + "application/pdf", + # Per-file override to prebuilt-invoice + additional_properties={ + "filename": "invoice.pdf", + "analyzer_id": "prebuilt-invoice", + }, + ), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run( + agent=_make_mock_agent(), session=session, context=context, state=state + ) + + # Per-file override should win + assert state["documents"]["invoice.pdf"]["analyzer_id"] == "prebuilt-invoice" + call_args = mock_cu_client.begin_analyze_binary.call_args + assert call_args[0][0] == "prebuilt-invoice" + + +class TestWarningsExtraction: + """Verify that CU analysis warnings are included in extracted output.""" + + def test_warnings_included_when_present(self) -> None: + """Non-empty warnings list should appear with code/message/target (RAI warnings).""" + provider = _make_provider() + fixture = { + "contents": [ + { + "path": "input1", + "markdown": "Some content", + "kind": "document", + } + ], + "warnings": [ + { + "code": "ContentFiltered", + "message": "Content was filtered due to Responsible AI policy.", + "target": "contents/0/markdown", + }, + { + "code": "ContentFiltered", + "message": "Violence content detected and filtered.", + }, + ], + } + result_obj = AnalysisResult(fixture) + extracted = provider._extract_sections(result_obj) + assert "warnings" in extracted + warnings = extracted["warnings"] + assert isinstance(warnings, list) + assert len(warnings) == 2 + # First warning has code + message + target + assert warnings[0]["code"] == "ContentFiltered" + assert warnings[0]["message"] == "Content was filtered due to Responsible AI policy." + assert warnings[0]["target"] == "contents/0/markdown" + # Second warning has code + message but no target + assert warnings[1]["code"] == "ContentFiltered" + assert warnings[1]["message"] == "Violence content detected and filtered." + assert "target" not in warnings[1] + + def test_warnings_omitted_when_empty(self, pdf_analysis_result: AnalysisResult) -> None: + """Empty/None warnings should not appear in extracted result.""" + provider = _make_provider() + extracted = provider._extract_sections(pdf_analysis_result) + assert "warnings" not in extracted + + +class TestCategoryExtraction: + """Verify that content-level category is included in extracted output.""" + + def test_category_included_single_segment(self) -> None: + """Category from classifier analyzer should appear in single-segment output.""" + provider = _make_provider() + fixture = { + "contents": [ + { + "path": "input1", + "markdown": "Contract text...", + "kind": "document", + "category": "Legal Contract", + } + ], + } + result_obj = AnalysisResult(fixture) + extracted = provider._extract_sections(result_obj) + assert extracted.get("category") == "Legal Contract" + + def test_category_in_multi_segment_video(self) -> None: + """Each segment should carry its own category in multi-segment output.""" + provider = _make_provider() + fixture = { + "contents": [ + { + "path": "input1", + "kind": "audioVisual", + "startTimeMs": 0, + "endTimeMs": 30000, + "markdown": "Opening scene with product showcase.", + "category": "ProductDemo", + "fields": { + "Summary": { + "type": "string", + "valueString": "Product demo intro", + } + }, + }, + { + "path": "input1", + "kind": "audioVisual", + "startTimeMs": 30000, + "endTimeMs": 60000, + "markdown": "Customer testimonial segment.", + "category": "Testimonial", + "fields": { + "Summary": { + "type": "string", + "valueString": "Customer feedback", + } + }, + }, + ], + } + result_obj = AnalysisResult(fixture) + extracted = provider._extract_sections(result_obj) + + # Top-level metadata + assert extracted["kind"] == "audioVisual" + assert extracted["duration_seconds"] == 60.0 + + # Segments should have per-segment category + segments = extracted["segments"] + assert isinstance(segments, list) + assert len(segments) == 2 + + # First segment: ProductDemo + assert segments[0]["category"] == "ProductDemo" + assert segments[0]["start_time_s"] == 0.0 + assert segments[0]["end_time_s"] == 30.0 + assert segments[0]["markdown"] == "Opening scene with product showcase." + assert "Summary" in segments[0]["fields"] + + # Second segment: Testimonial + assert segments[1]["category"] == "Testimonial" + assert segments[1]["start_time_s"] == 30.0 + assert segments[1]["end_time_s"] == 60.0 + assert segments[1]["markdown"] == "Customer testimonial segment." + + # Top-level concatenated markdown for file_search + assert "Opening scene" in extracted["markdown"] + assert "Customer testimonial" in extracted["markdown"] + + def test_category_omitted_when_none(self, pdf_analysis_result: AnalysisResult) -> None: + """No category should be in output when analyzer doesn't classify.""" + provider = _make_provider() + extracted = provider._extract_sections(pdf_analysis_result) + assert "category" not in extracted + + +class TestContentRangeSupport: + """Verify that content_range from additional_properties is passed to CU.""" + + async def test_content_range_passed_to_begin_analyze( + self, + mock_cu_client: AsyncMock, + pdf_analysis_result: AnalysisResult, + ) -> None: + """content_range in additional_properties should be forwarded to AnalysisInput.""" + from azure.ai.contentunderstanding.models import AnalysisInput + + mock_cu_client.begin_analyze = AsyncMock(return_value=_make_mock_poller(pdf_analysis_result)) + provider = _make_provider(mock_client=mock_cu_client) + + msg = Message( + role="user", + contents=[ + Content.from_text("Analyze pages 1-3"), + Content.from_uri( + "https://example.com/report.pdf", + media_type="application/pdf", + additional_properties={"filename": "report.pdf", "content_range": "1-3"}, + ), + ], + ) + context = _make_context([msg]) + state: dict[str, Any] = {} + session = AgentSession() + + await provider.before_run(agent=_make_mock_agent(), session=session, context=context, state=state) + + # Verify begin_analyze was called with AnalysisInput containing content_range + mock_cu_client.begin_analyze.assert_called_once() + call_kwargs = mock_cu_client.begin_analyze.call_args + inputs_arg = call_kwargs.kwargs.get("inputs") or call_kwargs[1].get("inputs") + assert inputs_arg is not None + assert len(inputs_arg) == 1 + assert isinstance(inputs_arg[0], AnalysisInput) + assert inputs_arg[0].content_range == "1-3" + assert inputs_arg[0].url == "https://example.com/report.pdf" diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/test_integration.py b/python/packages/azure-ai-contentunderstanding/tests/cu/test_integration.py new file mode 100644 index 0000000000..34c8675726 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/test_integration.py @@ -0,0 +1,315 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Integration tests for ContentUnderstandingContextProvider. + +These tests require a live Azure Content Understanding endpoint. +Set AZURE_CONTENTUNDERSTANDING_ENDPOINT to enable them. + +To generate fixtures for unit tests, run these tests with --update-fixtures flag +and the resulting JSON files will be written to tests/cu/fixtures/. +""" + +from __future__ import annotations + +import json +import os +from pathlib import Path + +import pytest + +skip_if_cu_integration_tests_disabled = pytest.mark.skipif( + not os.environ.get("AZURE_CONTENTUNDERSTANDING_ENDPOINT"), + reason="CU integration tests disabled (AZURE_CONTENTUNDERSTANDING_ENDPOINT not set)", +) + +FIXTURES_DIR = Path(__file__).parent / "fixtures" + +# Shared sample asset — same PDF used by samples and integration tests +INVOICE_PDF_PATH = ( + Path(__file__).resolve().parents[2] / "samples" / "shared" / "sample_assets" / "invoice.pdf" +) + + +@pytest.mark.flaky +@pytest.mark.integration +@skip_if_cu_integration_tests_disabled +async def test_analyze_pdf_binary() -> None: + """Analyze a PDF via binary upload and optionally capture fixture.""" + from azure.ai.contentunderstanding.aio import ContentUnderstandingClient + from azure.identity.aio import DefaultAzureCredential + + endpoint = os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"] + analyzer_id = os.environ.get("AZURE_CONTENTUNDERSTANDING_ANALYZER_ID", "prebuilt-documentSearch") + + pdf_path = INVOICE_PDF_PATH + assert pdf_path.exists(), f"Test fixture not found: {pdf_path}" + pdf_bytes = pdf_path.read_bytes() + + async with DefaultAzureCredential() as credential, ContentUnderstandingClient(endpoint, credential) as client: + poller = await client.begin_analyze_binary( + analyzer_id, + binary_input=pdf_bytes, + content_type="application/pdf", + ) + result = await poller.result() + + assert result.contents + assert result.contents[0].markdown + assert len(result.contents[0].markdown) > 10 + assert "CONTOSO LTD." in result.contents[0].markdown + + # Optionally capture fixture + if os.environ.get("CU_UPDATE_FIXTURES"): + FIXTURES_DIR.mkdir(exist_ok=True) + fixture_path = FIXTURES_DIR / "analyze_pdf_result.json" + fixture_path.write_text(json.dumps(result.as_dict(), indent=2, default=str)) + + +@pytest.mark.flaky +@pytest.mark.integration +@skip_if_cu_integration_tests_disabled +async def test_before_run_e2e() -> None: + """End-to-end test: Content.from_data → before_run → state populated.""" + from agent_framework import Content, Message, SessionContext + from agent_framework._sessions import AgentSession + from azure.identity.aio import DefaultAzureCredential + + from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + + endpoint = os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"] + + pdf_path = INVOICE_PDF_PATH + assert pdf_path.exists(), f"Test fixture not found: {pdf_path}" + pdf_bytes = pdf_path.read_bytes() + + async with DefaultAzureCredential() as credential: + cu = ContentUnderstandingContextProvider( + endpoint=endpoint, + credential=credential, + max_wait=None, # wait until analysis completes (no background deferral) + ) + async with cu: + msg = Message( + role="user", + contents=[ + Content.from_text("What's in this document?"), + Content.from_data( + pdf_bytes, + "application/pdf", + additional_properties={"filename": "invoice.pdf"}, + ), + ], + ) + context = SessionContext(input_messages=[msg]) + state: dict[str, object] = {} + session = AgentSession() + + from unittest.mock import MagicMock + + await cu.before_run(agent=MagicMock(), session=session, context=context, state=state) + + docs = state.get("documents", {}) + assert isinstance(docs, dict) + assert "invoice.pdf" in docs + doc_entry = docs["invoice.pdf"] + assert doc_entry["status"] == "ready" + assert doc_entry["result"] is not None + assert doc_entry["result"].get("markdown") + assert len(doc_entry["result"]["markdown"]) > 10 + assert "CONTOSO LTD." in doc_entry["result"]["markdown"] + + +# Raw GitHub URL for a public invoice PDF from the CU samples repo +_INVOICE_PDF_URL = ( + "https://raw.githubusercontent.com/Azure-Samples/" + "azure-ai-content-understanding-assets/main/document/invoice.pdf" +) + + +@pytest.mark.flaky +@pytest.mark.integration +@skip_if_cu_integration_tests_disabled +async def test_before_run_uri_content() -> None: + """End-to-end test: Content.from_uri with an external URL → before_run → state populated. + + Verifies that CU can analyze a file referenced by URL (not base64 data). + Uses a public invoice PDF from the Azure CU samples repository. + """ + from agent_framework import Content, Message, SessionContext + from agent_framework._sessions import AgentSession + from azure.identity.aio import DefaultAzureCredential + + from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + + endpoint = os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"] + + async with DefaultAzureCredential() as credential: + cu = ContentUnderstandingContextProvider( + endpoint=endpoint, + credential=credential, + max_wait=None, # wait until analysis completes (no background deferral) + ) + async with cu: + msg = Message( + role="user", + contents=[ + Content.from_text("What's on this invoice?"), + Content.from_uri( + uri=_INVOICE_PDF_URL, + media_type="application/pdf", + additional_properties={"filename": "invoice.pdf"}, + ), + ], + ) + context = SessionContext(input_messages=[msg]) + state: dict[str, object] = {} + session = AgentSession() + + from unittest.mock import MagicMock + + await cu.before_run(agent=MagicMock(), session=session, context=context, state=state) + + docs = state.get("documents", {}) + assert isinstance(docs, dict) + assert "invoice.pdf" in docs + + doc_entry = docs["invoice.pdf"] + assert doc_entry["status"] == "ready" + assert doc_entry["result"] is not None + assert doc_entry["result"].get("markdown") + assert len(doc_entry["result"]["markdown"]) > 10 + assert "CONTOSO LTD." in doc_entry["result"]["markdown"] + + +@pytest.mark.flaky +@pytest.mark.integration +@skip_if_cu_integration_tests_disabled +async def test_before_run_data_uri_content() -> None: + """End-to-end test: Content.from_uri with a base64 data URI → before_run → state populated. + + Verifies that CU can analyze a file embedded as a data URI (data:application/pdf;base64,...). + This tests the data URI path: from_uri with "data:" prefix → type="data" → begin_analyze_binary. + """ + import base64 + + from agent_framework import Content, Message, SessionContext + from agent_framework._sessions import AgentSession + from azure.identity.aio import DefaultAzureCredential + + from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + + endpoint = os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"] + + pdf_path = INVOICE_PDF_PATH + assert pdf_path.exists(), f"Test fixture not found: {pdf_path}" + pdf_bytes = pdf_path.read_bytes() + b64 = base64.b64encode(pdf_bytes).decode("ascii") + data_uri = f"data:application/pdf;base64,{b64}" + + async with DefaultAzureCredential() as credential: + cu = ContentUnderstandingContextProvider( + endpoint=endpoint, + credential=credential, + max_wait=None, # wait until analysis completes + ) + async with cu: + msg = Message( + role="user", + contents=[ + Content.from_text("What's on this invoice?"), + Content.from_uri( + uri=data_uri, + media_type="application/pdf", + additional_properties={"filename": "invoice_b64.pdf"}, + ), + ], + ) + context = SessionContext(input_messages=[msg]) + state: dict[str, object] = {} + session = AgentSession() + + from unittest.mock import MagicMock + + await cu.before_run(agent=MagicMock(), session=session, context=context, state=state) + + docs = state.get("documents", {}) + assert isinstance(docs, dict) + assert "invoice_b64.pdf" in docs + + doc_entry = docs["invoice_b64.pdf"] + assert doc_entry["status"] == "ready" + assert doc_entry["result"] is not None + assert doc_entry["result"].get("markdown") + assert len(doc_entry["result"]["markdown"]) > 10 + assert "CONTOSO LTD." in doc_entry["result"]["markdown"] + + +@pytest.mark.flaky +@pytest.mark.integration +@skip_if_cu_integration_tests_disabled +async def test_before_run_background_analysis() -> None: + """End-to-end test: max_wait timeout → background analysis → resolved on next turn. + + Uses a short max_wait (0.5s) so CU analysis is deferred to background. + Then waits for analysis to complete and calls before_run again to verify + the background task resolves and the document becomes ready. + """ + import asyncio + + from agent_framework import Content, Message, SessionContext + from agent_framework._sessions import AgentSession + from azure.identity.aio import DefaultAzureCredential + + from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider + + endpoint = os.environ["AZURE_CONTENTUNDERSTANDING_ENDPOINT"] + + async with DefaultAzureCredential() as credential: + cu = ContentUnderstandingContextProvider( + endpoint=endpoint, + credential=credential, + max_wait=0.5, # short timeout to force background deferral + ) + async with cu: + # Turn 1: upload file — should time out and defer to background + msg = Message( + role="user", + contents=[ + Content.from_text("What's on this invoice?"), + Content.from_uri( + uri=_INVOICE_PDF_URL, + media_type="application/pdf", + additional_properties={"filename": "invoice.pdf"}, + ), + ], + ) + context = SessionContext(input_messages=[msg]) + state: dict[str, object] = {} + session = AgentSession() + + from unittest.mock import MagicMock + + await cu.before_run(agent=MagicMock(), session=session, context=context, state=state) + + docs = state.get("documents", {}) + assert isinstance(docs, dict) + assert "invoice.pdf" in docs + assert docs["invoice.pdf"]["status"] == "analyzing", ( + f"Expected 'analyzing' but got '{docs['invoice.pdf']['status']}' — " + "CU responded too fast for the 0.5s timeout" + ) + assert docs["invoice.pdf"]["result"] is None + + # Wait for background analysis to complete + await asyncio.sleep(30) + + # Turn 2: no new files — should resolve the background task + msg2 = Message(role="user", contents=[Content.from_text("Is it ready?")]) + context2 = SessionContext(input_messages=[msg2]) + + await cu.before_run(agent=MagicMock(), session=session, context=context2, state=state) + + assert docs["invoice.pdf"]["status"] == "ready" + assert docs["invoice.pdf"]["result"] is not None + assert docs["invoice.pdf"]["result"].get("markdown") + assert "CONTOSO LTD." in docs["invoice.pdf"]["result"]["markdown"] diff --git a/python/packages/azure-ai-contentunderstanding/tests/cu/test_models.py b/python/packages/azure-ai-contentunderstanding/tests/cu/test_models.py new file mode 100644 index 0000000000..79c4fffd94 --- /dev/null +++ b/python/packages/azure-ai-contentunderstanding/tests/cu/test_models.py @@ -0,0 +1,81 @@ +# Copyright (c) Microsoft. All rights reserved. + +from __future__ import annotations + +from unittest.mock import AsyncMock + +from agent_framework_azure_ai_contentunderstanding._models import ( + AnalysisSection, + DocumentEntry, + DocumentStatus, + FileSearchConfig, +) + + +class TestAnalysisSection: + def test_values(self) -> None: + assert AnalysisSection.MARKDOWN == "markdown" + assert AnalysisSection.FIELDS == "fields" + + def test_is_string(self) -> None: + assert isinstance(AnalysisSection.MARKDOWN, str) + assert isinstance(AnalysisSection.FIELDS, str) + + def test_members(self) -> None: + assert len(AnalysisSection) == 2 + + +class TestDocumentEntry: + def test_construction(self) -> None: + entry: DocumentEntry = { + "status": DocumentStatus.READY, + "filename": "invoice.pdf", + "media_type": "application/pdf", + "analyzer_id": "prebuilt-documentSearch", + "analyzed_at": "2026-01-01T00:00:00+00:00", + "analysis_duration_s": 1.23, + "upload_duration_s": None, + "result": {"markdown": "# Title"}, + "error": None, + } + assert entry["status"] == DocumentStatus.READY + assert entry["filename"] == "invoice.pdf" + assert entry["analyzer_id"] == "prebuilt-documentSearch" + assert entry["analysis_duration_s"] == 1.23 + assert entry["upload_duration_s"] is None + + def test_failed_entry(self) -> None: + entry: DocumentEntry = { + "status": DocumentStatus.FAILED, + "filename": "bad.pdf", + "media_type": "application/pdf", + "analyzer_id": "prebuilt-documentSearch", + "analyzed_at": "2026-01-01T00:00:00+00:00", + "analysis_duration_s": 0.5, + "upload_duration_s": None, + "result": None, + "error": "Service unavailable", + } + assert entry["status"] == DocumentStatus.FAILED + assert entry["error"] == "Service unavailable" + assert entry["result"] is None + + +class TestFileSearchConfig: + def test_required_fields(self) -> None: + backend = AsyncMock() + tool = {"type": "file_search", "vector_store_ids": ["vs_123"]} + config = FileSearchConfig(backend=backend, vector_store_id="vs_123", file_search_tool=tool) + assert config.backend is backend + assert config.vector_store_id == "vs_123" + assert config.file_search_tool is tool + + def test_from_openai_factory(self) -> None: + from agent_framework_azure_ai_contentunderstanding._file_search import OpenAIFileSearchBackend + + client = AsyncMock() + tool = {"type": "file_search", "vector_store_ids": ["vs_abc"]} + config = FileSearchConfig.from_openai(client, vector_store_id="vs_abc", file_search_tool=tool) + assert isinstance(config.backend, OpenAIFileSearchBackend) + assert config.vector_store_id == "vs_abc" + assert config.file_search_tool is tool diff --git a/python/pyproject.toml b/python/pyproject.toml index c19a6024d0..af98bb5fa1 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -87,6 +87,7 @@ agent-framework-openai = { workspace = true } agent-framework-purview = { workspace = true } agent-framework-redis = { workspace = true } agent-framework-github-copilot = { workspace = true } +agent-framework-azure-ai-contentunderstanding = { workspace = true } agent-framework-claude = { workspace = true } agent-framework-orchestrations = { workspace = true } litellm = { url = "https://files.pythonhosted.org/packages/57/77/0c6eca2cb049793ddf8ce9cdcd5123a35666c4962514788c4fc90edf1d3b/litellm-1.82.1-py3-none-any.whl" } @@ -187,6 +188,7 @@ executionEnvironments = [ { root = "packages/ag-ui/tests", reportPrivateUsage = "none" }, { root = "packages/anthropic/tests", reportPrivateUsage = "none" }, { root = "packages/azure-ai-search/tests", reportPrivateUsage = "none" }, + { root = "packages/azure-ai-contentunderstanding/tests", reportPrivateUsage = "none" }, { root = "packages/azure-ai/tests", reportPrivateUsage = "none" }, { root = "packages/azure-cosmos/tests", reportPrivateUsage = "none" }, { root = "packages/azurefunctions/tests", reportPrivateUsage = "none" }, diff --git a/python/uv.lock b/python/uv.lock index 317113161f..e08919b9ee 100644 --- a/python/uv.lock +++ b/python/uv.lock @@ -31,6 +31,7 @@ members = [ "agent-framework-ag-ui", "agent-framework-anthropic", "agent-framework-azure-ai", + "agent-framework-azure-ai-contentunderstanding", "agent-framework-azure-ai-search", "agent-framework-azure-cosmos", "agent-framework-azurefunctions", @@ -228,6 +229,25 @@ requires-dist = [ { name = "azure-identity", specifier = ">=1,<2" }, ] +[[package]] +name = "agent-framework-azure-ai-contentunderstanding" +version = "1.0.0b260401" +source = { editable = "packages/azure-ai-contentunderstanding" } +dependencies = [ + { name = "agent-framework-core", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, + { name = "aiohttp", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, + { name = "azure-ai-contentunderstanding", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, + { name = "filetype", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, +] + +[package.metadata] +requires-dist = [ + { name = "agent-framework-core", editable = "packages/core" }, + { name = "aiohttp", specifier = ">=3.9,<4" }, + { name = "azure-ai-contentunderstanding", specifier = ">=1.0.0,<1.1" }, + { name = "filetype", specifier = ">=1.2,<2" }, +] + [[package]] name = "agent-framework-azure-ai-search" version = "1.0.0b260319" @@ -1036,6 +1056,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/6d/6d/15070d23d7a94833a210da09d5d7ed3c24838bb84f0463895e5d159f1695/azure_ai_agents-1.2.0b5-py3-none-any.whl", hash = "sha256:257d0d24a6bf13eed4819cfa5c12fb222e5908deafb3cbfd5711d3a511cc4e88", size = 217948, upload-time = "2025-09-30T01:55:04.155Z" }, ] +[[package]] +name = "azure-ai-contentunderstanding" +version = "1.0.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "azure-core", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, + { name = "isodate", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, + { name = "typing-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/3d/97/6696d3fecb5650213c4b29dd45a306cc1da954e70e168605a5d372c51c3e/azure_ai_contentunderstanding-1.0.1.tar.gz", hash = "sha256:f653ea85a73df7d377ab55e39d7f02e271c66765f5fa5a3a56b59798bcb01e2c", size = 214634, upload-time = "2026-03-10T02:01:20.737Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/f4/bb26c5b347f18fc85a066b4360a93204466ef7026d28585f3bf77c1a73ed/azure_ai_contentunderstanding-1.0.1-py3-none-any.whl", hash = "sha256:8d34246482691229ef75fe25f18c066d5f6adfe03b638c47f9b784c2992e6611", size = 101275, upload-time = "2026-03-10T02:01:22.181Z" }, +] + [[package]] name = "azure-ai-inference" version = "1.0.0b9" @@ -2059,6 +2093,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a4/a5/842ae8f0c08b61d6484b52f99a03510a3a72d23141942d216ebe81fefbce/filelock-3.25.2-py3-none-any.whl", hash = "sha256:ca8afb0da15f229774c9ad1b455ed96e85a81373065fb10446672f64444ddf70", size = 26759, upload-time = "2026-03-11T20:45:37.437Z" }, ] +[[package]] +name = "filetype" +version = "1.2.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/bb/29/745f7d30d47fe0f251d3ad3dc2978a23141917661998763bebb6da007eb1/filetype-1.2.0.tar.gz", hash = "sha256:66b56cd6474bf41d8c54660347d37afcc3f7d1970648de365c102ef77548aadb", size = 998020, upload-time = "2022-11-02T17:34:04.141Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/18/79/1b8fa1bb3568781e84c9200f951c735f3f157429f44be0495da55894d620/filetype-1.2.0-py2.py3-none-any.whl", hash = "sha256:7ce71b6880181241cf7ac8697a2f1eb6a8bd9b429f7ad6d27b8db9ba5f1c2d25", size = 19970, upload-time = "2022-11-02T17:34:01.425Z" }, +] + [[package]] name = "flask" version = "3.1.3"