Skip to content

[FLINK-AGENTS][integrations] Fix OpenSearchVectorStore for Amazon OpenSearch Serverless#678

Merged
wenjin272 merged 1 commit into
apache:mainfrom
avichaym:fix/aoss-integration
May 18, 2026
Merged

[FLINK-AGENTS][integrations] Fix OpenSearchVectorStore for Amazon OpenSearch Serverless#678
wenjin272 merged 1 commit into
apache:mainfrom
avichaym:fix/aoss-integration

Conversation

@avichaym
Copy link
Copy Markdown
Contributor

Fixes #674

The integration defaults to service_type=serverless but doesn't actually work against AOSS. Five fixes:

Add x-amz-content-sha256 + Content-Length headers for SigV4 signing (AOSS returns 403 without them)
Skip _refresh calls on serverless (AOSS returns 404 — API not exposed)
Omit custom _id in _bulk actions on serverless (AOSS rejects them)
Validate _bulk responses for partial failures (was silently losing data)
Use FAISS engine instead of default NMSLIB for index creation (NMSLIB doesn't support filtered KNN)

Also adds a 15s settle after index creation on serverless for AOSS propagation.

Tested end-to-end against a live AOSS VECTORSEARCH collection.

@github-actions github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 14, 2026
@wenjin272 wenjin272 added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels May 15, 2026
Copy link
Copy Markdown
Contributor

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @avichaym, thanks for your work. I left two comments.

// OpenSearch's bulk index operation is upsert-by-id, so addEmbedding doubles as update.
// BaseVectorStore.update() already enforces that every document carries an id, so
// addEmbedding will not generate new ones here.
addEmbedding(documents, collection, extraArgs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, updateEmbedding is implemented by inserting a document with the same ID. In the AOSS scenario, since the client cannot specify an ID, this approach fails to achieve the update purpose. Therefore, we should throw an "unsupported operation" exception.

@@ -400,7 +418,14 @@
allIds.add(id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since AOSS does not allow clients to specify IDs, returning this ID in the AOSS scenario is meaningless, as users cannot use it for get or delete operations. I believe that in the AOSS scenario, we should return the ID generated by AOSS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both points addressed. updateEmbedding on AOSS now throws UnsupportedOperationException (with a message pointing to the provisioned domain as an alternative), and add() on serverless parses items[].index._id out of each _bulk response and returns those AOSS-generated ids instead of the client-side UUIDs. Provisioned-domain behaviour is unchanged. Verified end-to-end against the live AOSS test collection.

@avichaym avichaym force-pushed the fix/aoss-integration branch from 0ee2aed to 6e1be92 Compare May 15, 2026 13:49
…nSearch Serverless

Fixes apache#674

AOSS-specific differences from provisioned domains, addressed without changing
domain behaviour:

- SigV4: add x-amz-content-sha256 and Content-Length before signing.
- _refresh: skipped (not exposed on AOSS).
- _bulk: omit custom _id on serverless and return AOSS-generated ids from
  the response so add() callers get usable ids for later get/delete.
- updateEmbedding on serverless throws UnsupportedOperationException
  (no client-controllable _id, so update-by-id is impossible).
- _bulk partial failures now surfaced (was silently dropping data).
- createKnnIndex pins FAISS/HNSW (default NMSLIB on AOSS rejects filters).
- Index propagation: 15s settle after create on serverless.

Verified end-to-end against a live AOSS VECTORSEARCH collection.
@avichaym avichaym force-pushed the fix/aoss-integration branch from 6e1be92 to c0b6ca0 Compare May 15, 2026 14:04
Copy link
Copy Markdown
Contributor

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenjin272 wenjin272 merged commit 5eb9125 into apache:main May 18, 2026
84 of 92 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] OpenSearchVectorStore fails against Amazon OpenSearch Serverless (AOSS)

2 participants