Skip to content

Python: Bug: Cosmos DB for MongoDB vector index uses the similarity code as the index kind #14104

Description

@EsraaKamel11

Describe the bug
CosmosMongoCollection._get_index_definitions (python/semantic_kernel/connectors/azure_cosmos_db.py, line 401) sets
cosmosSearchOptions["kind"] from DISTANCE_FUNCTION_MAP_MONGODB — a similarity code ("COS"/"IP"/"L2") —
instead of INDEX_KIND_MAP_MONGODB (the index kind: "vector-ivf"/"vector-hnsw"/"vector-diskann").

This causes three problems:

  1. The createIndexes command sends an invalid kind — Cosmos DB for MongoDB vCore requires kind to be one of
    vector-ivf/vector-hnsw/vector-diskann, so vector-index creation fails against a live account.
  2. kind ends up equal to similarity.
  3. The match index_kind block (line 411) can never match a vector-* case, so the HNSW/IVF/DiskANN tuning options
    (m, efConstruction, numList, maxDegree, lBuild) are silently dropped.

The mapped value of INDEX_KIND_MAP_MONGODB is never read (it's only used for a membership check at line 392) — the
tell. The sibling NoSQL path does it correctly at line 149: "type": INDEX_KIND_MAP_NOSQL[field.index_kind].

To Reproduce
Deterministic unit-level repro (no live account needed):

import asyncio
from unittest.mock import AsyncMock, MagicMock
from pymongo import AsyncMongoClient
from semantic_kernel.connectors.azure_cosmos_db import CosmosMongoCollection
from semantic_kernel.data.vector import VectorStoreCollectionDefinition, VectorStoreField

definition = VectorStoreCollectionDefinition(fields=[
    VectorStoreField("key", name="id"),
    VectorStoreField("data", name="content"),
    VectorStoreField("vector", name="vector", dimensions=5,
                     index_kind="hnsw", distance_function="cosine_similarity"),
])
db = AsyncMock(); db.create_collection = AsyncMock(); db.command = AsyncMock()
client = AsyncMock(spec=AsyncMongoClient); client.get_database = MagicMock(return_value=db)
col = CosmosMongoCollection(collection_name="c", record_type=dict,
                            definition=definition, mongo_client=client, database_name="d")

asyncio.run(col.ensure_collection_exists(m=16, efConstruction=64))
opts = db.command.call_args.kwargs["command"]["indexes"][1]["cosmosSearchOptions"]
print(opts)
# Actual:   {'kind': 'COS', 'similarity': 'COS', 'dimensions': 5}
#           -> invalid kind, and m / efConstruction were silently dropped

Against a live Cosmos DB for MongoDB vCore account, ensure_collection_exists() fails because kind="COS" is not a valid vector index kind.

Expected behavior
cosmosSearchOptions["kind"] == "vector-hnsw" and cosmosSearchOptions["similarity"] == "COS" (the two must be distinct), and the HNSW tuning options (m, efConstruction) appear in cosmosSearchOptions.

Screenshots
N/A

Platform

  • Language: Python
  • Source: main branch of repository (also affects current pip releases)
  • AI model: N/A
  • IDE: VS Code
  • OS: Windows

Additional context
Root cause is a single wrong map lookup at line 401 (should be INDEX_KIND_MAP_MONGODB[field.index_kind]). I have a fix + tests ready and will open a PR shortly. I'll take this — PR incoming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpythonPull requests for the Python Semantic Kerneltriage

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions