Skip to content
Change the repository type filter

All

    Repositories list

    • anvil

      Public
      Python
      8610Updated Feb 11, 2026Feb 11, 2026
    • AfterQuery's MLE Reasoning Harness
      Python
      1000Updated Feb 2, 2026Feb 2, 2026
    • IDE-Bench

      Public
      Comprehensive framework for evaluating AI IDE agents on real-world, cross-stack SWE tasks
      Python
      8400Updated Feb 1, 2026Feb 1, 2026
    • RL Intern Take-Home Assignment
      Python
      1000Updated Dec 30, 2025Dec 30, 2025
    • public documentation for appbench.ai
      1000Updated Dec 10, 2025Dec 10, 2025
    • Python
      8000Updated Oct 27, 2025Oct 27, 2025
    • vader

      Public
      Java
      31010Updated May 24, 2025May 24, 2025
    • FinanceQA

      Public
      FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities in Large Language Models
      1600Updated Feb 2, 2025Feb 2, 2025