Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions benchmark_pipeline/benchmark_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ model_display_names:
"anthropic/claude-3.5-sonnet-20240620": "Sonnet 3.5"
"anthropic/claude-3.7-sonnet": "Sonnet 3.7"
"anthropic/claude-3.7-sonnetthinking": "Sonnet 3.7 Thinking"
"anthropic/claude-opus-4.1": "Claude Opus 4.1"
"anthropic/claude-sonnet-4": "Sonnet 4"
"anthropic/claude-sonnet-4thinking": "Sonnet 4 Thinking"
"openai/gpt-4.1": "GPT-4.1"
Expand Down
1,209 changes: 1,209 additions & 0 deletions docs/cases.html

Large diffs are not rendered by default.

51 changes: 51 additions & 0 deletions docs/cases/anthropic_claude-opus-4.1/aider_aider___init__.py.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/__init__.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/__init__.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="success">Success</span></p>
<p><strong>Prompt Tokens:</strong> 59517</p>
<p><strong>Native Prompt Tokens:</strong> 67626</p>
<p><strong>Native Completion Tokens:</strong> 327</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $1.038915</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider___init__.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider___init__.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider___init__.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<div class="success-message"><p>✓ No differences found (successful run)</p><p>Expected output matches the model output exactly.</p></div>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

51 changes: 51 additions & 0 deletions docs/cases/anthropic_claude-opus-4.1/aider_aider_analytics.py.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/analytics.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/analytics.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="failure">Failure</span></p>
<p><strong>Prompt Tokens:</strong> 24542</p>
<p><strong>Native Prompt Tokens:</strong> 30657</p>
<p><strong>Native Completion Tokens:</strong> 2152</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.621255</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_analytics.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_analytics.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_analytics.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<pre class="diff"><div></div><div>index bcb21745a..cb655cb85 100644</div><div class="diff-header">--- a/aider_aider_analytics.py_expectedoutput.txt (expected):tmp/tmp1jdkcki9_expected.txt </div><div class="diff-header">+++ b/aider_aider_analytics.py_extracted.txt (actual):tmp/tmp5cxdkvna_actual.txt </div><div class="diff-info">@@ -72,6 +72,7 @@ class Analytics:</div><div> </div><div> def __init__(self, logfile=None, permanently_disable=False):</div><div> self.logfile = logfile</div><div class="diff-added">+ self.asked_opt_in = False</div><div> self.get_or_create_uuid()</div><div> </div><div> if self.permanently_disable or permanently_disable or not self.asked_opt_in:</div><div></div></pre>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

51 changes: 51 additions & 0 deletions docs/cases/anthropic_claude-opus-4.1/aider_aider_args.py.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/args.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/args.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="success">Success</span></p>
<p><strong>Prompt Tokens:</strong> 61543</p>
<p><strong>Native Prompt Tokens:</strong> 76671</p>
<p><strong>Native Completion Tokens:</strong> 7464</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $1.709865</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_args.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_args.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_args.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<div class="success-message"><p>✓ No differences found (successful run)</p><p>Expected output matches the model output exactly.</p></div>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/coders/editblock_prompts.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/coders/editblock_prompts.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="success">Success</span></p>
<p><strong>Prompt Tokens:</strong> 35371</p>
<p><strong>Native Prompt Tokens:</strong> 42180</p>
<p><strong>Native Completion Tokens:</strong> 2072</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.7881</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_editblock_prompts.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_editblock_prompts.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_editblock_prompts.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<div class="success-message"><p>✓ No differences found (successful run)</p><p>Expected output matches the model output exactly.</p></div>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/coders/patch_coder.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/coders/patch_coder.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="failure">Failure</span></p>
<p><strong>Prompt Tokens:</strong> 22441</p>
<p><strong>Native Prompt Tokens:</strong> 28880</p>
<p><strong>Native Completion Tokens:</strong> 7800</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $1.0182</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_patch_coder.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_patch_coder.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_patch_coder.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<pre class="diff"><div></div><div>index 1992834ec..ae4f0e21e 100644</div><div class="diff-header">--- a/aider_aider_coders_patch_coder.py_expectedoutput.txt (expected):tmp/tmp4qhok082_expected.txt </div><div class="diff-header">+++ b/aider_aider_coders_patch_coder.py_extracted.txt (actual):tmp/tmp06tdzz40_actual.txt </div><div class="diff-info">@@ -546,7 +546,7 @@ class PatchCoder(Coder):</div><div> action = PatchAction(type=ActionType.ADD, path="", new_content="\n".join(added_lines))</div><div> return action, index</div><div> </div><div class="diff-removed">- def apply_edits(self, edits: List[PatchAction]):</div><div class="diff-added">+ def apply_edits(self, edits: List[EditResult]):</div><div> """</div><div> Applies the parsed PatchActions to the corresponding files.</div><div> """</div><div></div></pre>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/coders/wholefile_coder.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/coders/wholefile_coder.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="success">Success</span></p>
<p><strong>Prompt Tokens:</strong> 20299</p>
<p><strong>Native Prompt Tokens:</strong> 26267</p>
<p><strong>Native Completion Tokens:</strong> 1389</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.49818</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_wholefile_coder.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_wholefile_coder.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_coders_wholefile_coder.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<div class="success-message"><p>✓ No differences found (successful run)</p><p>Expected output matches the model output exactly.</p></div>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Case: aider/exceptions.py - Claude Opus 4.1</title>
<link rel="stylesheet" href="../../styles.css">
</head>
<body>
<header>
<h1>Case: aider/exceptions.py</h1>
<h2>Model: Claude Opus 4.1</h2>
<p><a href="../../models/anthropic_claude-opus-4.1.html">All Claude Opus 4.1 Cases</a> | <a href="../../cases.html">All Cases</a> | <a href="../../index.html">Home</a></p>
</header>
<main>
<section class="case-details">
<div class="case-info">
<h2>Benchmark Case Information</h2>
<p><strong>Model:</strong> Claude Opus 4.1</p>
<p><strong>Status:</strong> <span class="success">Success</span></p>
<p><strong>Prompt Tokens:</strong> 7265</p>
<p><strong>Native Prompt Tokens:</strong> 8871</p>
<p><strong>Native Completion Tokens:</strong> 1313</p>
<p><strong>Native Tokens Reasoning:</strong> 0</p>
<p><strong>Native Finish Reason:</strong> stop</p>
<p><strong>Cost:</strong> $0.23154</p>
</div>

<div class="content-links">
<h2>View Content</h2>
<ul>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_exceptions.py/prompt.html" class="content-link">View Prompt</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_exceptions.py/expected.html" class="content-link">View Expected Output</a></li>
<li><a href="../../content/anthropic_claude-opus-4.1/aider_aider_exceptions.py/actual.html" class="content-link">View Actual Output</a></li>
</ul>
</div>

<div class="diff-section">
<h2>Diff (Expected vs Actual)</h2>
<div id="diff-output">
<div class="success-message"><p>✓ No differences found (successful run)</p><p>Expected output matches the model output exactly.</p></div>
</div>
</div>
</section>
</main>
<footer>
<p>LoCoDiff-bench - <a href="https://github.com/AbanteAI/LoCoDiff-bench">GitHub Repository</a></p>
</footer>
</body>
</html>

Loading