Skip to content

aws-samples/intelligent-document-processing-with-amazon-bedrock

Intelligent Document Processing with Generative AI

πŸš€ Extract information from unstructured documents at scale with Amazon Bedrock

🌎 Open-source asset published at aws-samples GitHub

media/diagram.png

Converting documents into structured databases is a recurring business need. Common use cases include creating product feature tables from descriptions, extracting metadata from legal contracts, and analyzing customer reviews.

This repository provides an AWS CDK solution for intelligent document processing (IDP) using generative AI.

✨ Key features:

  • Extract structured information:
    • Well-defined entities (names, titles, etc.)
    • Numeric scores (sentiment, urgency, etc.)
    • Free-form content (summaries, responses, etc.)
  • Simply describe attributes to extract without data annotation or model training
  • Leverage Amazon Bedrock Data Automation and multi-modal LLMs (including VLMs for complex diagrams)
  • Process PDFs, MS Office files, images, and text via Python API or web interface
  • Deploy as MCP server to equip AI agents with document processing capabilities

See the Security section below before deployment.

Contents

πŸ“Ή Demo

API Example

See the demo notebook for complete implementation:

docs = ['doc1', 'doc2']

features = [
    {"name": "delay", "description": "delay of the shipment in days"},
    {"name": "shipment_id", "description": "unique shipment identifier"},
    {"name": "summary", "description": "one-sentence summary of the doc"},
]

run_idp_bedrock_api(documents=docs, features=features)
# [{'delay': 2, 'shipment_id': '123890', 'summary': 'summary1'},
#  {'delay': 3, 'shipment_id': '678623', 'summary': 'summary2'}]

Web Interface

idp_demo.mp4

πŸ—οΈ Architecture

media/architecture.png

πŸ”§ Deployment

Deploy to your AWS account using a local IDE or SageMaker Notebook instance.

Recommendation: Use SageMaker (ml.t3.large) to avoid local setup. Ensure the IAM role has CloudFormation deployment permissions.

1. Clone Repository

git clone https://github.com/aws-samples/intelligent-document-processing-with-amazon-bedrock.git
cd intelligent-document-processing-with-amazon-bedrock

2. Install Prerequisites

SageMaker Notebook:

sh install_deps.sh

Local Development: Ensure you have:

3. Setup Environment

sh install_env.sh
source .venv/bin/activate

4. Configure Stack

Copy config-example.yml to config.yml and customize the settings:

stack_name: idp-test  # Stack name and resource prefix (<16 chars, no "aws" prefix)
# ... other settings
frontend:
  deploy_ecs: True  # Deploy web interface

Add your email to the Cognito users list in the authentication section.

5. Enable Bedrock Models

  1. Open AWS Bedrock console in your target region
  2. Navigate to "Model Access"
  3. Request access for models specified in config.yml

6. Bootstrap CDK

cdk bootstrap --profile [PROFILE_NAME]

7. Deploy Stack

Ensure Docker is running, then:

cdk deploy --profile [PROFILE_NAME]

Cleanup

cdk destroy --profile [PROFILE_NAME]

Or delete the CloudFormation stack from AWS console.

Troubleshooting

Permission Issues CDK deployment requires near-admin permissions. See minimal required permissions.

S3 Bucket Deletion Empty the S3 bucket manually before stack deletion if it contains documents.

Python Path Error If you see /bin/sh: python3: command not found, update the Python path in cdk.json.

πŸ’» Usage

We offer three ways to interact with the IDP solution, including API calls, web application, and MCP server.

1. Python API

Follow the demo notebook to process documents programmatically:

  • Provide input documents
  • Define attributes to extract

2. Web Application

Access:

  • URL appears in CDK deployment output as "CloudfrontDistributionName"
  • Or find the CloudFront distribution domain in AWS console

Login:

  • Username: Your email from config.yml
  • Password: Temporary password sent to your email after deployment

Local testing:

cd src/ecs
# Set STACK_NAME in .env file
# Configure AWS credentials in .env or export AWS_PROFILE
uv venv && source .venv/bin/activate
uv sync --extra dev
streamlit run src/Home.py

Access the UI at http://localhost:8501

3. MCP Integration

⚠️ SECURITY NOTICE: The MCP server package is NOT available on PyPI. Only use the local installation methods described below. Any package named idp-bedrock-mcp-server on PyPI is not official and may be malicious.

Deploy MCP servers to expose document processing as standardized tools for AI agents.

Two Options Available:

Local Stdio Server (recommended for development)

  • Supports local file upload
  • Uses your AWS credentials
  • Easy installation via local deployment script
cd mcp/local_server/
sh deploy_stdio_server.sh

Bedrock AgentCore Server (recommended for production)

  • Remote hosted service
  • Cognito authentication
  • Scalable infrastructure
cd mcp/bedrock_server/
python deploy_idp_bedrock_mcp.py

Available Tools:

  • extract_document_attributes: Process documents with custom attributes
  • get_extraction_status: Check job status
  • list_supported_models: Get available Bedrock models
  • get_bucket_info: Get S3 bucket details

See detailed documentation:

πŸ‘₯ Team

Core Team:

image image
Nikita Kozodoi Nuno Castro

Contributors:

image image image image image image image image
Romain Besombes Zainab Afolabi Egor Krasheninnikov Huong Vu Aiham Taleb Elizaveta Zinovyeva Babs Khalidson Ennio Pastore

Acknowledgements:

🀝 Contributing

We welcome contributions to improve this project!

In order to keep coding standards and formatting consistent, we use pre-commit. This can be run from the terminal via uv run pre-commit run -a.

See CONTRIBUTING for more information.

πŸ”’οΈ Security

Note: this asset represents a proof-of-value for the services included and is not intended as a production-ready solution. The asset is not scoped for handling regulated and/or PII data. You must determine how the AWS Shared Responsibility applies to their specific use case and implement the needed controls to achieve their desired security outcomes. AWS offers a broad set of security tools and configurations to enable out customers.

  • Network & Delivery:
    • Amazon CloudFront:
      • Use geography-aware rules to block or allow access to CloudFront distributions where required.
      • Use AWS WAF on public CloudFront distributions.
      • Ensure that solution CloudFront distributions use a security policy with minimum TLSv1.1 or TLSv1.2 and appropriate security ciphers for HTTPS viewer connections. Currently, the CloudFront distribution allows for SSLv3 or TLSv1 for HTTPS viewer connections and uses SSLv3 or TLSv1 for communication to the origin.
    • Amazon API Gateway:
      • Activate request validation on API Gateway endpoints to do first-pass input validation.
      • Use AWS WAF on public-facing API Gateway Endpoints.
  • AI & Data:
    • Amazon Bedrock:
      • Enable model invocation logging and set alerts to ensure adherence to any responsible AI policies. Model invocation logging is disabled by default. See Bedrock model invocation logging documentation
      • Consider enabling Bedrock Guardrails to add baseline protections against analyzing documents or extracting attributes covering certain protected topics; as well as detecting and masking PII data in the user-uploaded inputs.
    • Note that the solution is not scoped for processing regulated data.
  • Security & Compliance:
    • Amazon Cognito:
      • Implement multi-factor authentication (MFA) in each Cognito User Pool.
      • Consider implementing AdvanceSecurityMode to ENFORCE in Cognito User Pools.
    • Amazon KMS:
      • Implement KMS key rotation for regulatory compliance or other specific cases.
      • Configure, monitor, and alert on KMS events according to lifecycle policies.
  • Serverless:
    • AWS Lambda:
      • Periodically scan all AWS Lambda container images for vulnerabilities according to lifecycle policies. AWS Inspector can be used for that.

πŸ“ License

Licensed under the MIT-0 License. See the LICENSE file.

About

πŸš€ Extract information from your documents with Generative AI

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 11