The retrieval module provides semantic search over past emails, CRM data, and calendar events using Gemini embeddings stored in PostgreSQL with pgvector.
Wraps Gemini embeddings via LangChain's GoogleGenerativeAIEmbeddings:
embed_text(text)→ 768-dimensional float vectorembed_batch(texts)→ list of vectors with rate limiting
PostgreSQL-backed vector storage using pgvector extension:
store_embedding(text, embedding, metadata)→ INSERT with vectorsearch_similar(query_embedding, top_k=5)→ cosine similarity search- IVFFlat index for fast approximate nearest neighbor queries
Aggregates relevant context for email processing:
class ContextBuilder:
async def build_context(email, classification) -> list[str]:
# 1. Embed the email text
# 2. Search for similar past emails
# 3. Fetch CRM contact data (if recognized sender)
# 4. Get recent calendar events (if meeting-related)
# 5. Rank and deduplicate
# 6. Return top-k formatted context stringsUsing Google Gemini's embedding model via LangChain:
- Model:
models/embedding-001 - Dimensions: 768
- Accessed through
langchain-google-genaipackage
PostgreSQL table embeddings:
embeddingcolumn:vector(768)type- Indexed with IVFFlat for cosine similarity
- Partitioned by
source_type(email, crm, calendar)
- Generate query embedding from email text
- Cosine similarity search in pgvector
- Filter by user_id and optional source_type
- Return top-k results with similarity scores
- Format results as context strings for LLM prompt
- Email sync: Each new email is embedded on fetch
- CRM updates: Contact data re-embedded on changes
- Calendar events: Event descriptions embedded on sync