Skip to content

Add benchmark results for anthropic/claude-haiku-4.5#340

Merged
biobootloader merged 2 commits intomainfrom
mentat-264
Oct 15, 2025
Merged

Add benchmark results for anthropic/claude-haiku-4.5#340
biobootloader merged 2 commits intomainfrom
mentat-264

Conversation

@mentatbot
Copy link
Contributor

@mentatbot mentatbot bot commented Oct 15, 2025

This PR adds benchmark results for the anthropic/claude-haiku-4.5 model on the locodiff-250425 benchmark set.

Benchmark Summary

  • Model: anthropic/claude-haiku-4.5
  • Total benchmarks: 200
  • Successful: 82 (41.0% success rate)
  • Failed (output mismatch): 118
  • API errors: 0
  • Total cost: $16.17

Results

The benchmark results have been saved to locodiff-250425/results/*/anthropic_claude-haiku-4.5/ with the following structure for each test case:

  • metadata.json - Test metadata and results
  • raw_response.txt - Raw model response
  • extracted_output.txt - Extracted code output
  • output.diff - Generated diff

Performance Notes

Claude Haiku 4.5 achieved a 41% success rate on this benchmark, which tests the model's ability to reconstruct code changes from git history. The model completed all 200 benchmarks without any API errors, demonstrating stable performance throughout the run.


🤖 This PR was created with Mentat. See my steps and cost here

  • Wake on any new activity.

Benchmark Summary:
- Model: anthropic/claude-haiku-4.5
- Total benchmarks: 200
- Successful: 82 (41% success rate)
- Failed (output mismatch): 118
- API errors: 0
- Total cost: $16.17

Results saved to locodiff-250425/results/*/anthropic_claude-haiku-4.5/

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/4096e202-549d-486f-9abe-de8c098f876a

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
@mentatbot mentatbot bot requested a review from biobootloader October 15, 2025 18:44
- Added display name for anthropic/claude-haiku-4.5 in benchmark_config.yaml
- Generated visualization pages for all models including Haiku 4.5
- Updated docs/index.html and model-specific pages

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/0104fe76-ae55-4aca-864b-c41eab846c3f

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
@biobootloader biobootloader merged commit f6fb4be into main Oct 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant