BBQ (derived from RaBitQ) is Elasticsearch and Lucene's implementation of 1-bit binary quantization for vector search, providing extreme compression while maintaining search accuracy through mathematical guarantees.
BBQ is Elasticsearch's adaptation of the RaBitQ algorithm with minor modifications for integration with Lucene's HNSW implementation. The algorithm was developed by researchers at NTU and adopted by major search platforms.
- Reduces each dimension from 32 bits (float) to 1 bit (sign)
- 32x compression: 1024-dim vector: 4KB → 128 bytes
- Uses sign information only
- Mathematical optimality guarantees
- Asymptotically optimal error bound for distance estimation
- Inner product preservation for similarity computation
- Reliable ordering for nearest neighbor search
- Reranking support with corrective factors
-
Normalization:
- Normalize input vectors to unit length
- Ensures consistent scale
-
Random Rotation:
- Apply random orthogonal transformation
- Distributes information across dimensions
- Rotation matrix computed once, reused
-
Sign Extraction:
- Store sign bit for each dimension
- 1 bit per dimension
- Results in binary vector
-
Corrective Factors:
- Store small correction term per vector
- Improves distance estimation
- Minimal additional storage (~8 bytes)
-
Query Quantization:
- Apply same rotation to query
- Extract sign vector
-
Hamming Distance:
- XOR between binary vectors
- POPCNT for bit counting
- Hardware-accelerated (CPU instruction)
-
Distance Estimation:
- Use Hamming distance + corrective factors
- Estimate original cosine similarity
- Reliable for ranking
-
Reranking (optional):
- Retrieve top-k candidates
- Rerank with full precision
- Improves final accuracy
- 32x compression vs float32
- No codebook storage required
- Minimal metadata overhead
- Scales to billions of vectors
- Hamming distance: Single CPU instruction
- SIMD optimization: Process multiple comparisons
- Cache-friendly: Binary vectors fit in cache
- Low memory bandwidth: 32x less data movement
- Optimal error bounds: Provably good approximation
- No training required: Deterministic process
- Rotation-based: Distributes information uniformly
- Inner product preservation: Maintains similarity structure
- Storage: 32x reduction
- Example: 1 billion 1024-dim vectors
- Original: 4TB
- BBQ: 125GB + corrections
- Total: ~140GB (28x effective compression)
- Hamming distance: Nanoseconds per comparison
- Throughput: Millions of comparisons/second
- Latency: Sub-millisecond for large datasets
- Scalability: Linear with database size
- Recall: 95-98% @ k=10 (typical)
- With reranking: 98-99.5% recall
- Trade-off: Adjustable via candidate set size
- BBQ integrated with Lucene's HNSW
- Binary vectors stored in graph
- Fast graph traversal
- Efficient approximate search
GET /my-index/_search
{
"knn": {
"field": "vector",
"query_vector": [...],
"k": 10,
"num_candidates": 100
}
}PUT /my-index
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "hnsw",
"m": 16,
"ef_construction": 100
},
"quantization": {
"type": "bbq"
}
}
}
}
}- Large-scale semantic search
- Document retrieval
- Image similarity search
- Recommendation systems
- High-dimensional embeddings
- Cost-sensitive deployments
- Dataset > 10 million vectors
- Storage costs significant
- High query volume
- Recall requirements: 95%+
- Cosine similarity metric
- BBQ: 32x compression
- SQ8: 4x compression
- BBQ: Faster search
- SQ8: Better accuracy
- BBQ: No training required
- PQ: Higher compression possible
- BBQ: Faster search
- PQ: More complex setup
- BBQ: 32x less storage
- Dense: Perfect accuracy
- BBQ: Faster search
- Dense: Simpler implementation
- Use with HNSW index
- Set appropriate num_candidates
- Enable reranking for high accuracy
- Monitor recall metrics
- Tune based on use case
- num_candidates: 2-5x k for good recall
- reranking: Enable for >98% recall
- HNSW parameters: Tune m and ef_construction
- Hardware: CPU with AVX2/AVX-512
- Track recall@k metrics
- Measure query latency
- Monitor storage usage
- Compare with full precision
- A/B test configurations
- Metric constraint: Best for cosine similarity
- Accuracy: Not perfect (95-98% typical)
- Euclidean: Less optimal for L2 distance
- Reranking: May need for high recall
- Computed once per index
- Stored as metadata
- Applied to all vectors
- Reused for queries
- Per-vector correction term
- ~8 bytes per vector
- Improves distance estimation
- Used in scoring
- POPCNT: Bit counting instruction
- SIMD: Parallel processing
- AVX-512: 512-bit operations
- Cache: Binary vectors fit well
Based on RaBitQ research from NTU (National Taiwan University):
- Published 2024
- Theoretical foundations in coding theory
- Implemented in major search platforms
- Active area of research
- Multi-bit extensions
- Adaptive quantization
- Better corrective factors
- Hardware-specific optimizations
- Integration with other indexes
- RaBitQ: Original algorithm
- Binary Embeddings: Related concept
- Hamming Distance: Core primitive
- Product Quantization: Alternative approach
- Scalar Quantization: Simpler method
Available in:
- Elasticsearch (open source)
- Elastic Cloud (managed service)
- Pricing based on Elasticsearch licensing
- Elastic blog: RaBitQ explainer
- Lucene implementation
- Research papers on RaBitQ
- Elasticsearch documentation
- Performance benchmarks