[FLINK-AGENTS][integrations] Fix OpenSearchVectorStore for Amazon OpenSearch Serverless#678
Conversation
| // OpenSearch's bulk index operation is upsert-by-id, so addEmbedding doubles as update. | ||
| // BaseVectorStore.update() already enforces that every document carries an id, so | ||
| // addEmbedding will not generate new ones here. | ||
| addEmbedding(documents, collection, extraArgs); |
There was a problem hiding this comment.
Currently, updateEmbedding is implemented by inserting a document with the same ID. In the AOSS scenario, since the client cannot specify an ID, this approach fails to achieve the update purpose. Therefore, we should throw an "unsupported operation" exception.
| @@ -400,7 +418,14 @@ | |||
| allIds.add(id); | |||
There was a problem hiding this comment.
Since AOSS does not allow clients to specify IDs, returning this ID in the AOSS scenario is meaningless, as users cannot use it for get or delete operations. I believe that in the AOSS scenario, we should return the ID generated by AOSS.
There was a problem hiding this comment.
Both points addressed. updateEmbedding on AOSS now throws UnsupportedOperationException (with a message pointing to the provisioned domain as an alternative), and add() on serverless parses items[].index._id out of each _bulk response and returns those AOSS-generated ids instead of the client-side UUIDs. Provisioned-domain behaviour is unchanged. Verified end-to-end against the live AOSS test collection.
0ee2aed to
6e1be92
Compare
…nSearch Serverless Fixes apache#674 AOSS-specific differences from provisioned domains, addressed without changing domain behaviour: - SigV4: add x-amz-content-sha256 and Content-Length before signing. - _refresh: skipped (not exposed on AOSS). - _bulk: omit custom _id on serverless and return AOSS-generated ids from the response so add() callers get usable ids for later get/delete. - updateEmbedding on serverless throws UnsupportedOperationException (no client-controllable _id, so update-by-id is impossible). - _bulk partial failures now surfaced (was silently dropping data). - createKnnIndex pins FAISS/HNSW (default NMSLIB on AOSS rejects filters). - Index propagation: 15s settle after create on serverless. Verified end-to-end against a live AOSS VECTORSEARCH collection.
6e1be92 to
c0b6ca0
Compare
Fixes #674
The integration defaults to service_type=serverless but doesn't actually work against AOSS. Five fixes:
Also adds a 15s settle after index creation on serverless for AOSS propagation.
Tested end-to-end against a live AOSS VECTORSEARCH collection.