Skip to content

Conversation

@maryamtahhan
Copy link

Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

  • Add embeddings request type to schemas

  • Implement EmbeddingsResponseHandler for processing embedding responses

  • Add EmbeddingsRequestFormatter for request preparation

  • Implement mock server handler with synthetic embedding generation

  • Add e2e and unit tests for embeddings benchmarking

  • Add embeddings guide documentation

    Summary

This PR adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

The implementation includes full integration with guidellm's existing infrastructure: request formatting, response handling, metrics collection, synthetic data generation, and the mock server for testing.

Details

Core Implementation:

  • Add embeddings to GenerativeRequestType literal in schemas
  • Implement EmbeddingsResponseHandler for processing embedding API responses
  • Implement EmbeddingsRequestFormatter for preparing embedding requests
  • Add embeddings route to OpenAI backend (/v1/embeddings)
  • Export EmbeddingsResponseHandler from backends module

Mock Server Support:

  • Implement EmbeddingsHandler for mock server
  • Add EmbeddingsRequest, EmbeddingObject, and EmbeddingsResponse models
  • Register /v1/embeddings endpoint in mock server
  • Generate synthetic normalized embedding vectors (configurable dimensions)
  • Apply realistic timing delays based on token count

Testing:

  • Add unit tests for EmbeddingsResponseHandler
  • Add e2e tests: test_embeddings_max_requests_benchmark, test_embeddings_max_seconds_benchmark, test_embeddings_rate_benchmark
  • Update existing tests to account for new request type
  • All tests pass (1717 unit tests, 2 integration tests)

Documentation:

  • Create comprehensive embeddings guide (docs/guides/embeddings.md, 284 lines)
  • Update backends guide with embeddings endpoint information
  • Update datasets guide with embeddings-specific synthetic data examples
  • Add docstrings to all new classes and methods

Code Quality:

  • All linting checks pass (tox -e quality)
  • All type checks pass (tox -e types)
  • Pre-commit hooks pass
  • Code properly formatted with ruff and mdformat

Test Plan

Automated Tests:

  1. Run unit tests: tox -e test-unit
    • Verify TestEmbeddingsResponseHandler tests pass
  2. Run integration tests: tox -e test-integration
  3. Run e2e tests: tox -e test-e2e
    • Tests will pass in CI with vllm-sim binary

Manual Testing:

  1. Start the mock server:
    guidellm mock-server --port 8000
    
  2. Test embeddings endpoint directly:
    curl -X POST http://localhost:8000/v1/embeddings
    -H "Content-Type: application/json"
    -d '{"input":"Test sentence","model":"test"}'
  3. Expected: Returns JSON with embedding vectors (1536 dimensions by default)
  4. Run an embeddings benchmark:
    guidellm benchmark run
    --target http://localhost:8000
    --request-type embeddings
    --rate 5
    --max-requests 20
    --data '{"type":"synthetic","prompt_tokens":128,"output_tokens":1}'
    --processor gpt2
  5. Expected: Benchmark completes successfully with metrics for 20 requests
  6. Test with vLLM serving an embedding model:
    vllm serve "BAAI/bge-small-en-v1.5"
    guidellm benchmark run
    --target http://localhost:8000
    --request-type embeddings
    --rate 10
    --max-requests 100
    --data '{"type":"synthetic","prompt_tokens":256,"output_tokens":1}'
    --processor gpt2

Related Issues

  • Addresses need for embedding model benchmarking support
  • Complements existing text completions, chat completions, and audio benchmarking

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@maryamtahhan maryamtahhan force-pushed the feat-embedding-testing branch from 5ddf82e to 375eab2 Compare December 5, 2025 14:36
Adds support for benchmarking the /v1/embeddings endpoint,
enabling performance testing of text embedding models.

- Add embeddings request type to schemas
- Implement EmbeddingsResponseHandler for processing embedding responses
- Add EmbeddingsRequestFormatter for request preparation
- Implement mock server handler with synthetic embedding generation
- Add e2e and unit tests for embeddings benchmarking
- Add embeddings guide documentation

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan force-pushed the feat-embedding-testing branch from 375eab2 to 02ad329 Compare December 5, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant