Add embeddings endpoint support #501
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.
Add embeddings request type to schemas
Implement EmbeddingsResponseHandler for processing embedding responses
Add EmbeddingsRequestFormatter for request preparation
Implement mock server handler with synthetic embedding generation
Add e2e and unit tests for embeddings benchmarking
Add embeddings guide documentation
Summary
This PR adds support for benchmarking the
/v1/embeddingsendpoint, enabling performance testing of text embedding models.The implementation includes full integration with guidellm's existing infrastructure: request formatting, response handling, metrics collection, synthetic data generation, and the mock server for testing.
Details
Core Implementation:
embeddingstoGenerativeRequestTypeliteral in schemasEmbeddingsResponseHandlerfor processing embedding API responsesEmbeddingsRequestFormatterfor preparing embedding requests/v1/embeddings)EmbeddingsResponseHandlerfrom backends moduleMock Server Support:
EmbeddingsHandlerfor mock serverEmbeddingsRequest,EmbeddingObject, andEmbeddingsResponsemodels/v1/embeddingsendpoint in mock serverTesting:
EmbeddingsResponseHandlertest_embeddings_max_requests_benchmark,test_embeddings_max_seconds_benchmark,test_embeddings_rate_benchmarkDocumentation:
docs/guides/embeddings.md, 284 lines)Code Quality:
tox -e quality)tox -e types)Test Plan
Automated Tests:
tox -e test-unitTestEmbeddingsResponseHandlertests passtox -e test-integrationtox -e test-e2eManual Testing:
curl -X POST http://localhost:8000/v1/embeddings
-H "Content-Type: application/json"
-d '{"input":"Test sentence","model":"test"}'
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 5
--max-requests 20
--data '{"type":"synthetic","prompt_tokens":128,"output_tokens":1}'
--processor gpt2
vllm serve "BAAI/bge-small-en-v1.5"
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 10
--max-requests 100
--data '{"type":"synthetic","prompt_tokens":256,"output_tokens":1}'
--processor gpt2
Related Issues
Use of AI
## WRITTEN BY AI ##)