Skip to content

Harden Harbor iterative fallback for Frontier-CS 2.0#123

Merged
joyemang33 merged 2 commits into
mainfrom
vector-db-ann-task
May 29, 2026
Merged

Harden Harbor iterative fallback for Frontier-CS 2.0#123
joyemang33 merged 2 commits into
mainfrom
vector-db-ann-task

Conversation

@joyemang33
Copy link
Copy Markdown
Contributor

Summary

This PR hardens Harborized Frontier-CS evaluation around iterative submissions and timeout fallback, especially for Frontier-CS 2.0 directory-style tasks such as Vector DB ANN.

Main changes:

  • Route Frontier-CS 2.0 final verification through the judge sidecar instead of importing evaluator code directly.
  • Preserve judge-owned iterative submission records via the judge sidecar and mirror them into verifier artifacts.
  • Fix best-submission fallback when an agent times out or when the final workspace scores worse than an earlier submission.
  • Normalize CLI accounting for iterative submission rewards so raw 0-100 scores are reported as 0-1 rewards where appropriate.
  • Clarify algorithmic Harbor artifacts are rebuilt from judge-owned submissions.
  • Document Vector DB ANN’s resource budget and set it to 8 vCPUs / 16 GiB.

Please read CONTRIBUTING.md before submitting.

Type of Change

  • New research problem
  • New algorithmic problem
  • New Frontier-CS 2.0 problem
  • Bug fix
  • Documentation update
  • Other: Harbor adapter and CLI evaluation infrastructure

Testing

  • git diff --check
  • python3 -m py_compile on changed Python files
  • Generated vector_db_ann Harbor task successfully
  • Ran ANN Harbor trial and verified timeout fallback:
    • best iterative score: 31.3897
    • used_best_submission=1
    • recall_at_10=1.0
    • final reward: 0.31389732738629883

Checklist

  • Code follows the project structure and conventions
  • Self-review completed
  • Documentation updated (if applicable)

CI Validation (for new problems)

N/A. This PR updates an existing Frontier-CS 2.0 task and Harbor evaluation infrastructure; it does not add a new problem.

@joyemang33 joyemang33 force-pushed the vector-db-ann-task branch from ccb567f to d7afa7e Compare May 29, 2026 12:58
@joyemang33 joyemang33 marked this pull request as ready for review May 29, 2026 13:00
@joyemang33 joyemang33 merged commit 384947d into main May 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant