Add embeddings endpoint support #501

maryamtahhan · 2025-12-05T14:34:24Z

Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

Add embeddings request type to schemas
Implement EmbeddingsResponseHandler for processing embedding responses
Add EmbeddingsRequestFormatter for request preparation
Implement mock server handler with synthetic embedding generation
Add e2e and unit tests for embeddings benchmarking
Add embeddings guide documentation

Summary

This PR adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

The implementation includes full integration with guidellm's existing infrastructure: request formatting, response handling, metrics collection, synthetic data generation, and the mock server for testing.

Details

Core Implementation:

Add embeddings to GenerativeRequestType literal in schemas
Implement EmbeddingsResponseHandler for processing embedding API responses
Implement EmbeddingsRequestFormatter for preparing embedding requests
Add embeddings route to OpenAI backend (/v1/embeddings)
Export EmbeddingsResponseHandler from backends module

Mock Server Support:

Implement EmbeddingsHandler for mock server
Add EmbeddingsRequest, EmbeddingObject, and EmbeddingsResponse models
Register /v1/embeddings endpoint in mock server
Generate synthetic normalized embedding vectors (configurable dimensions)
Apply realistic timing delays based on token count

Testing:

Add unit tests for EmbeddingsResponseHandler
Add e2e tests: test_embeddings_max_requests_benchmark, test_embeddings_max_seconds_benchmark, test_embeddings_rate_benchmark
Update existing tests to account for new request type
All tests pass (1717 unit tests, 2 integration tests)

Documentation:

Create comprehensive embeddings guide (docs/guides/embeddings.md, 284 lines)
Update backends guide with embeddings endpoint information
Update datasets guide with embeddings-specific synthetic data examples
Add docstrings to all new classes and methods

Code Quality:

All linting checks pass (tox -e quality)
All type checks pass (tox -e types)
Pre-commit hooks pass
Code properly formatted with ruff and mdformat

Test Plan

Automated Tests:

Run unit tests: tox -e test-unit
- Verify TestEmbeddingsResponseHandler tests pass
Run integration tests: tox -e test-integration
Run e2e tests: tox -e test-e2e
- Tests will pass in CI with vllm-sim binary

Manual Testing:

Start the mock server:
```
guidellm mock-server --port 8000
```
Test embeddings endpoint directly:
curl -X POST http://localhost:8000/v1/embeddings
-H "Content-Type: application/json"
-d '{"input":"Test sentence","model":"test"}'
Expected: Returns JSON with embedding vectors (1536 dimensions by default)
Run an embeddings benchmark:
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 5
--max-requests 20
--data '{"type":"synthetic","prompt_tokens":128,"output_tokens":1}'
--processor gpt2
Expected: Benchmark completes successfully with metrics for 20 requests
Test with vLLM serving an embedding model:
vllm serve "BAAI/bge-small-en-v1.5"
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 10
--max-requests 100
--data '{"type":"synthetic","prompt_tokens":256,"output_tokens":1}'
--processor gpt2

Related Issues

Addresses need for embedding model benchmarking support
Complements existing text completions, chat completions, and audio benchmarking

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models. - Add embeddings request type to schemas - Implement EmbeddingsResponseHandler for processing embedding responses - Add EmbeddingsRequestFormatter for request preparation - Implement mock server handler with synthetic embedding generation - Add e2e and unit tests for embeddings benchmarking - Add embeddings guide documentation Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan force-pushed the feat-embedding-testing branch from 5ddf82e to 375eab2 Compare December 5, 2025 14:36

maryamtahhan force-pushed the feat-embedding-testing branch from 375eab2 to 02ad329 Compare December 5, 2025 14:38

Merge branch 'main' into feat-embedding-testing

9b4f29e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add embeddings endpoint support #501

Add embeddings endpoint support #501

Uh oh!

maryamtahhan commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add embeddings endpoint support #501

Are you sure you want to change the base?

Add embeddings endpoint support #501

Uh oh!

Conversation

maryamtahhan commented Dec 5, 2025

Summary

Details

Test Plan

Use of AI

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant