Skip to content

Conversation

@mochow13
Copy link
Contributor

@mochow13 mochow13 commented Dec 7, 2025

Closes #3621

@DouweM DouweM changed the title #3621 - Pass s3:// file URLs directly to API in BedrockConverseModel Pass s3:// file URLs directly to API in BedrockConverseModel Dec 9, 2025
format = item.media_type.split('/')[1]
assert format in ('jpeg', 'png', 'gif', 'webp'), f'Unsupported image format: {format}'
image: ImageBlockTypeDef = {'format': format, 'source': {'bytes': downloaded_item['data']}}
image: ImageBlockTypeDef = {'format': format, 'source': cast(Any, source)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of casting this to Any, can we fix the source type hint to be DocumentSourceTypeDef?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

'name': name,
'format': item.format,
'source': {'bytes': downloaded_item['data']},
'source': cast(Any, source),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

"""Test that s3:// image URLs are passed directly to Bedrock API without downloading."""
m = BedrockConverseModel('us.amazon.nova-pro-v1:0', provider=bedrock_provider)
agent = Agent(m, system_prompt='You are a helpful chatbot.')
image_url = ImageUrl(url='s3://my-bucket/images/test-image.jpg', media_type='image/jpeg')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing these are not real files in S3 and the cassettes were manually modified? That'll get us into trouble if we re-record the cassettes :) So I'd rather either use a real S3 URL (may be hard), or directly test the result of _map_messages as we do in other tests already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah cassettes were manual (with help of AI of course). I have updated the tests to directly test _map_messages result. Not sure whether I should update the cassettes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cassettes can be deleted then

format = downloaded_item['data_type']
source: dict[str, Any]
if item.url.startswith('s3://'):
source = {'s3Location': {'uri': item.url}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a bucketOwner field that users may want to set. Maybe we can tell them to encode it as a query param on the URL, and parse it out here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like s3://my-bucket/key?bucketOwner=owner?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's what I was thinking

if item.url.startswith('s3://'):
source = {'s3Location': {'uri': item.url}}
else:
downloaded_item = await download_item(item, data_format='bytes', type_format='extension')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_item currently has logic gating for gs:// URLs; let's check s3:// URLs there as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the existing code in download_item checks for gs:// and youtube URLs:

    if item.url.startswith('gs://'):
        raise UserError('Downloading from protocol "gs://" is not supported.')
    elif isinstance(item, VideoUrl) and item.is_youtube:
        raise UserError('Downloading YouTube videos is not supported.')

What check do you mean for s3:// here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same check raising an error saying that download_item does not support s3:// URLs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this feature in input.md. At the bottom there's already a section on uploaded files to Google, can you mention S3 files + BedrockConverseModel there as well please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass s3:// file URLs directly to API in BedrockConverseModel

2 participants