feat(semantic-cache): add Google AI (Gemini) embedding provider#160
feat(semantic-cache): add Google AI (Gemini) embedding provider#160Vswaroop04 wants to merge 3 commits into
Conversation
KIvanow
left a comment
There was a problem hiding this comment.
Hey @Vswaroop04 thank you for your contribution! A few issues though:
- Unsigned commit (259e5f1) - same as your other three open PRs.
- API key passed as a URL query parameter - inconsistent with every other provider and a security regression.
Every existing provider (Cohere, Voyage, OpenAI, etc.) passes the key in a header:
// cohere.ts / voyage.ts
headers: { Authorization:Bearer ${apiKey}} - The Google implementation puts the key in the URL:
// google.ts - don't do this
${baseUrl}/models/${model}:embedContent?key=${encodeURIComponent(apiKey)} - And Python:
params={"key": key} - Keys in URLs end up in server access logs, proxy logs, and HTTP Referer headers. Google's API supports x-goog-api-key as a request header - please use that instead, consistent with how every other provider in this package works.
- httpx.AsyncClient is never closed (Python). The lazy-initialized client in _client[0] holds a connection pool with no cleanup path. In test environments this produces ResourceWarnings; in production it leaks file descriptors over time. Either expose a close() helper or document that callers are responsible for cleanup, but the current code silently leaks.
- title behaviour is inconsistent between TS and Python. TypeScript skips title when falsy (if (opts?.title)), so an empty string is silently dropped. Python sends it when title is not None, so "" would be included in the request body. Pick one semantics and apply it to both - !== undefined / is not None is the more predictable choice.
Adds packages/semantic-cache/src/embed/google.ts and the matching packages/semantic-cache-py/betterdb_semantic_cache/embed/google.py, following the existing cohere/voyage patterns. Both providers target the Google AI (Gemini) embedContent REST API: - Default model: text-embedding-004 (768-dim) - Configurable taskType (RETRIEVAL_QUERY default, RETRIEVAL_DOCUMENT for storage) - Optional title (improves retrieval quality for document task type) - Optional outputDimensionality for text-embedding-004+ truncation - Uses native fetch (TS) / httpx (Python) — no Google SDK required - API key from GOOGLE_API_KEY env var or explicit option
- Move API key from URL query param to x-goog-api-key header (security) - Add close() helper to Python embed fn to clean up httpx client - Fix title check in TS to use !== undefined (consistent with Python is not None)
259e5f1 to
c083b2b
Compare
Thanks for the thorough review! All issues addressed Should have caught these before opening nice finds! |
KIvanow
left a comment
There was a problem hiding this comment.
Hey @Vswaroop04 - all four issues from the first review are fixed, thanks! One small thing before this merges:
The close() helper is a good addition but it's invisible to callers — nothing in the module docstring or create_google_embed docstring mentions it exists. Please add a note, e.g.:
When finished, release the connection pool:
await embed.close()
One line in the docstring is enough. After that this is good to go.
Done |
Summary
Adds Google AI embedding support to both packages, following the same provider pattern already used for Cohere and Voyage.
Added providers
Both providers target the embedContent REST API and support:
Usage
TypeScript
Python
Test