Up to 9% decrease in output tokens per second between v0.3 and v0.4

## **Describe the bug**

The measured output token throughput in GuideLLM v0.4.0 and later is significantly lower despite all other metrics being near identical.

## **Expected behavior**

Output token throughput should not differ from v0.3.x when all else is equal.

## **Environment**
Include all relevant environment information:
1. OS [e.g. Ubuntu 20.04]: Fedora 43 container on OCP 4.19
2. Python version [e.g. 3.12.2]: 3.13.9
3. GuideLLM version: 0.4.0 and 0.5.0

## **To Reproduce**

Run the following GuideLLM test on v0.3.0 and v0.4.0, then compare metrics in output JSON:

```sh
# Ensure we use the same endpoint across versions
export GUIDELLM_REQUEST_TYPE=text_completions
export GUIDELLM_TARGET=http://localhost:8080

guidellm benchmark run \
            --target="${GUIDELLM_TARGET}" \
            --model=RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic \
            --processor=RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic \
            --rate-type=concurrent \
            --data=prompt_tokens=1000,output_tokens=1000 \
            --max-seconds=600 \
            --rate=1,50,100,200,300,500,650
```

## **Additional context**

### Total Output Tokens per second over intended concurrency

<img width="1110" height="335" alt="Image" src="https://github.com/user-attachments/assets/da5361c2-a1dd-41c4-82f6-75fbd8d8fa24" />

### Percent Change from v0.3.0 baseline

<img width="355" height="1050" alt="Image" src="https://github.com/user-attachments/assets/d21e1087-00aa-4784-a484-29122e0079f3" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Up to 9% decrease in output tokens per second between v0.3 and v0.4 #514

Describe the bug

Expected behavior

Environment

To Reproduce

Additional context

Total Output Tokens per second over intended concurrency

Percent Change from v0.3.0 baseline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Up to 9% decrease in output tokens per second between v0.3 and v0.4 #514

Description

Describe the bug

Expected behavior

Environment

To Reproduce

Additional context

Total Output Tokens per second over intended concurrency

Percent Change from v0.3.0 baseline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions