Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions benchmark_pipeline/benchmark_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ model_display_names:
"anthropic/claude-3.7-sonnet": "Sonnet 3.7"
"anthropic/claude-3.7-sonnetthinking": "Sonnet 3.7 Thinking"
"anthropic/claude-4.5-sonnet": "Sonnet 4.5"
"anthropic/claude-haiku-4.5": "Haiku 4.5"
"anthropic/claude-opus-4.1": "Claude Opus 4.1"
"anthropic/claude-sonnet-4": "Sonnet 4"
"anthropic/claude-sonnet-4thinking": "Sonnet 4 Thinking"
Expand Down
1,209 changes: 1,209 additions & 0 deletions docs/cases.html

Large diffs are not rendered by default.

51 changes: 51 additions & 0 deletions docs/cases/anthropic_claude-haiku-4.5/aider_aider___init__.py.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/__init__.py - Haiku 4.5</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/__init__.py</h1>
<h2>Model: Haiku 4.5</h2>
<p><a href="../../models/anthropic_claude-haiku-4.5.html">All Haiku 4.5 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Haiku 4.5</p>
<p><strong>Status:</strong> <span class="success">Success</span></p>
<p><strong>Prompt Tokens:</strong> 59517</p>
<p><strong>Native Prompt Tokens:</strong> 67626</p>
<p><strong>Native Completion Tokens:</strong> 156</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.068406</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider___init__.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider___init__.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider___init__.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<div class="success-message"><p>✓ No differences found (successful run)</p><p>Expected output matches the model output exactly.</p></div>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/analytics.py - Haiku 4.5</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/analytics.py</h1>
<h2>Model: Haiku 4.5</h2>
<p><a href="../../models/anthropic_claude-haiku-4.5.html">All Haiku 4.5 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Haiku 4.5</p>
<p><strong>Status:</strong> <span class="failure">Failure</span></p>
<p><strong>Prompt Tokens:</strong> 24542</p>
<p><strong>Native Prompt Tokens:</strong> 30657</p>
<p><strong>Native Completion Tokens:</strong> 2105</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.041182</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_analytics.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_analytics.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_analytics.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<pre class="diff"><div></div><div>index bcb21745a..cb655cb85 100644</div><div class="diff-header">--- a/aider_aider_analytics.py_expectedoutput.txt (expected):tmp/tmps8oom0wz_expected.txt </div><div class="diff-header">+++ b/aider_aider_analytics.py_extracted.txt (actual):tmp/tmptbh4rhdr_actual.txt </div><div class="diff-info">@@ -72,6 +72,7 @@ class Analytics:</div><div> </div><div> def __init__(self, logfile=None, permanently_disable=False):</div><div> self.logfile = logfile</div><div class="diff-added">+ self.asked_opt_in = False</div><div> self.get_or_create_uuid()</div><div> </div><div> if self.permanently_disable or permanently_disable or not self.asked_opt_in:</div><div></div></pre>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

51 changes: 51 additions & 0 deletions docs/cases/anthropic_claude-haiku-4.5/aider_aider_args.py.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/args.py - Haiku 4.5</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/args.py</h1>
<h2>Model: Haiku 4.5</h2>
<p><a href="../../models/anthropic_claude-haiku-4.5.html">All Haiku 4.5 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Haiku 4.5</p>
<p><strong>Status:</strong> <span class="failure">Failure</span></p>
<p><strong>Prompt Tokens:</strong> 61543</p>
<p><strong>Native Prompt Tokens:</strong> 76671</p>
<p><strong>Native Completion Tokens:</strong> 7362</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.113481</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_args.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_args.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_args.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<pre class="diff"><div></div><div>index e64aa9deb..f214a8d7f 100644</div><div class="diff-header">--- a/aider_aider_args.py_expectedoutput.txt (expected):tmp/tmpflwcxu0a_expected.txt </div><div class="diff-header">+++ b/aider_aider_args.py_extracted.txt (actual):tmp/tmp703g0pr9_actual.txt </div><div class="diff-info">@@ -138,12 +138,6 @@ def get_parser(default_config_files, git_root):</div><div> default=True,</div><div> help="Verify the SSL cert when connecting to models (default: True)",</div><div> )</div><div class="diff-removed">- group.add_argument(</div><div class="diff-removed">- "--timeout",</div><div class="diff-removed">- type=float,</div><div class="diff-removed">- default=None,</div><div class="diff-removed">- help="Timeout in seconds for API calls (default: None)",</div><div class="diff-removed">- )</div><div> group.add_argument(</div><div> "--edit-format",</div><div> "--chat-mode",</div><div class="diff-info">@@ -524,26 +518,6 @@ def get_parser(default_config_files, git_root):</div><div> )</div><div> </div><div> ##########</div><div class="diff-removed">- group = parser.add_argument_group("Analytics")</div><div class="diff-removed">- group.add_argument(</div><div class="diff-removed">- "--analytics",</div><div class="diff-removed">- action=argparse.BooleanOptionalAction,</div><div class="diff-removed">- default=None,</div><div class="diff-removed">- help="Enable/disable analytics for current session (default: random)",</div><div class="diff-removed">- )</div><div class="diff-removed">- group.add_argument(</div><div class="diff-removed">- "--analytics-log",</div><div class="diff-removed">- metavar="ANALYTICS_LOG_FILE",</div><div class="diff-removed">- help="Specify a file to log analytics events",</div><div class="diff-removed">- )</div><div class="diff-removed">- group.add_argument(</div><div class="diff-removed">- "--analytics-disable",</div><div class="diff-removed">- action="store_true",</div><div class="diff-removed">- help="Permanently disable analytics",</div><div class="diff-removed">- default=False,</div><div class="diff-removed">- )</div><div class="diff-removed">-</div><div class="diff-removed">- #########</div><div> group = parser.add_argument_group("Upgrading")</div><div> group.add_argument(</div><div> "--just-check-update",</div><div class="diff-info">@@ -788,6 +762,26 @@ def get_parser(default_config_files, git_root):</div><div> help="Specify which editor to use for the /editor command",</div><div> )</div><div> </div><div class="diff-added">+ ##########</div><div class="diff-added">+ group = parser.add_argument_group("Analytics")</div><div class="diff-added">+ group.add_argument(</div><div class="diff-added">+ "--analytics",</div><div class="diff-added">+ action=argparse.BooleanOptionalAction,</div><div class="diff-added">+ default=None,</div><div class="diff-added">+ help="Enable/disable analytics for current session (default: random)",</div><div class="diff-added">+ )</div><div class="diff-added">+ group.add_argument(</div><div class="diff-added">+ "--analytics-log",</div><div class="diff-added">+ metavar="ANALYTICS_LOG_FILE",</div><div class="diff-added">+ help="Specify a file to log analytics events",</div><div class="diff-added">+ )</div><div class="diff-added">+ group.add_argument(</div><div class="diff-added">+ "--analytics-disable",</div><div class="diff-added">+ action="store_true",</div><div class="diff-added">+ help="Permanently disable analytics",</div><div class="diff-added">+ default=False,</div><div class="diff-added">+ )</div><div class="diff-added">+</div><div> ##########</div><div> group = parser.add_argument_group("Deprecated model settings")</div><div> # Add deprecated model shortcut arguments</div><div></div></pre>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/coders/editblock_coder.py - Haiku 4.5</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/coders/editblock_coder.py</h1>
<h2>Model: Haiku 4.5</h2>
<p><a href="../../models/anthropic_claude-haiku-4.5.html">All Haiku 4.5 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Haiku 4.5</p>
<p><strong>Status:</strong> <span class="failure">Failure</span></p>
<p><strong>Prompt Tokens:</strong> 56338</p>
<p><strong>Native Prompt Tokens:</strong> 72436</p>
<p><strong>Native Completion Tokens:</strong> 5769</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.101281</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_coders_editblock_coder.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_coders_editblock_coder.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_coders_editblock_coder.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<pre class="diff"><div></div><div>index 42fc8b445..57433d007 100644</div><div class="diff-header">--- a/aider_aider_coders_editblock_coder.py_expectedoutput.txt (expected):tmp/tmpdxz1qpnp_expected.txt </div><div class="diff-header">+++ b/aider_aider_coders_editblock_coder.py_extracted.txt (actual):tmp/tmpgntz377__actual.txt </div><div class="diff-info">@@ -181,10 +181,6 @@ def replace_most_similar_chunk(whole, part, replace):</div><div> pass</div><div> </div><div> return</div><div class="diff-removed">- # Try fuzzy matching</div><div class="diff-removed">- res = replace_closest_edit_distance(whole_lines, part, part_lines, replace_lines)</div><div class="diff-removed">- if res:</div><div class="diff-removed">- return res</div><div> </div><div> </div><div> def try_dotdotdots(whole, part, replace):</div><div class="diff-info">@@ -319,12 +315,17 @@ def replace_closest_edit_distance(whole_lines, part, part_lines, replace_lines):</div><div> if max_similarity < similarity_thresh:</div><div> return</div><div> </div><div class="diff-added">+ replace_lines = replace.splitlines()</div><div class="diff-added">+</div><div> modified_whole = (</div><div> whole_lines[:most_similar_chunk_start]</div><div> + replace_lines</div><div> + whole_lines[most_similar_chunk_end:]</div><div> )</div><div class="diff-removed">- modified_whole = "".join(modified_whole)</div><div class="diff-added">+ modified_whole = "\n".join(modified_whole)</div><div class="diff-added">+</div><div class="diff-added">+ if whole.endswith("\n"):</div><div class="diff-added">+ modified_whole += "\n"</div><div> </div><div> return modified_whole</div><div> </div><div></div></pre>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/coders/editblock_prompts.py - Haiku 4.5</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/coders/editblock_prompts.py</h1>
<h2>Model: Haiku 4.5</h2>
<p><a href="../../models/anthropic_claude-haiku-4.5.html">All Haiku 4.5 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Haiku 4.5</p>
<p><strong>Status:</strong> <span class="failure">Failure</span></p>
<p><strong>Prompt Tokens:</strong> 35371</p>
<p><strong>Native Prompt Tokens:</strong> 42180</p>
<p><strong>Native Completion Tokens:</strong> 2006</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.05221</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_coders_editblock_prompts.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_coders_editblock_prompts.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-haiku-4.5/aider_aider_coders_editblock_prompts.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<pre class="diff"><div></div><div>index b000ba510..c3db76f00 100644</div><div class="diff-header">--- a/aider_aider_coders_editblock_prompts.py_expectedoutput.txt (expected):tmp/tmpniyv0pxh_expected.txt </div><div class="diff-header">+++ b/aider_aider_coders_editblock_prompts.py_extracted.txt (actual):tmp/tmphw50qoll_actual.txt </div><div class="diff-info">@@ -55,6 +55,7 @@ Examples of when to suggest shell commands:</div><div> Keep in mind these details about the user's platform and environment:</div><div> {platform}</div><div> """</div><div class="diff-added">+</div><div> example_messages = [</div><div> dict(</div><div> role="user",</div><div></div></pre>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Loading