Skip to content

Benchmark: LiveCodeBench — contamination-free code generation #29

@rajkumar42

Description

@rajkumar42

Overview

Evaluate OpenSymbolicAI against LiveCodeBench — a continuously refreshed, contamination-free coding benchmark sourcing problems from LeetCode, AtCoder, and CodeForces.

Why this benchmark

  • Contamination-free: regularly refreshed problems eliminate memorization advantage
  • Tests code generation, self-repair, code execution, and test output prediction
  • Well-known leaderboard tracked by Artificial Analysis
  • Demonstrates OpenSymbolicAI works on algorithmic problem-solving, not just tool orchestration

References

Tasks

  • Review LiveCodeBench evaluation format and API
  • Design primitives for code generation and testing
  • Implement benchmark harness
  • Run evaluation and collect results
  • Document findings

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions