Add benchmark results for anthropic/claude-haiku-4.5 by mentatbot[bot] · Pull Request #340 · AbanteAI/LoCoDiff-bench

mentatbot · 2025-10-15T18:43:59Z

This PR adds benchmark results for the anthropic/claude-haiku-4.5 model on the locodiff-250425 benchmark set.

Benchmark Summary

Model: anthropic/claude-haiku-4.5
Total benchmarks: 200
Successful: 82 (41.0% success rate)
Failed (output mismatch): 118
API errors: 0
Total cost: $16.17

Results

The benchmark results have been saved to locodiff-250425/results/*/anthropic_claude-haiku-4.5/ with the following structure for each test case:

metadata.json - Test metadata and results
raw_response.txt - Raw model response
extracted_output.txt - Extracted code output
output.diff - Generated diff

Performance Notes

Claude Haiku 4.5 achieved a 41% success rate on this benchmark, which tests the model's ability to reconstruct code changes from git history. The model completed all 200 benchmarks without any API errors, demonstrating stable performance throughout the run.

🤖 This PR was created with Mentat. See my steps and cost here ✨

Wake on any new activity.

Benchmark Summary: - Model: anthropic/claude-haiku-4.5 - Total benchmarks: 200 - Successful: 82 (41% success rate) - Failed (output mismatch): 118 - API errors: 0 - Total cost: $16.17 Results saved to locodiff-250425/results/*/anthropic_claude-haiku-4.5/ Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/4096e202-549d-486f-9abe-de8c098f876a Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>

- Added display name for anthropic/claude-haiku-4.5 in benchmark_config.yaml - Generated visualization pages for all models including Haiku 4.5 - Updated docs/index.html and model-specific pages Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/0104fe76-ae55-4aca-864b-c41eab846c3f Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>

mentatbot bot requested a review from biobootloader October 15, 2025 18:44

biobootloader merged commit f6fb4be into main Oct 15, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark results for anthropic/claude-haiku-4.5#340

Add benchmark results for anthropic/claude-haiku-4.5#340
biobootloader merged 2 commits intomainfrom
mentat-264

mentatbot bot commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mentatbot bot commented Oct 15, 2025

Benchmark Summary

Results

Performance Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant