EinsiaLab · ahydchh · Apr 20, 2026 · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026
diff --git a/.codex/skills/frontier-contributor.md b/.codex/skills/frontier-contributor.md
@@ -12,4 +12,4 @@ When asked to contribute or update a benchmark:
 4. Run:
    - `python verification/evaluator.py scripts/init.py`
    - `python -m frontier_eval task=unified task.benchmark=<Domain>/<Task> algorithm=openevolve algorithm.iterations=0`
-5. Keep runtime overrides unchanged and avoid secrets or machine-local paths.
+5. Keep runtime overrides unchanged and avoid secrets or machine‑local paths.
diff --git a/.codex/skills/frontier-evaluator.md b/.codex/skills/frontier-evaluator.md
@@ -6,7 +6,7 @@ user_invocable: true
 
 When asked to run or debug evaluation:
 
-1. Read `frontier_eval/README.md` and benchmark README instructions first.
+1. Read `frontier_eval/README.md` and benchmark README first.
 2. Discover env docs with:
    - `python .claude/skills/scripts/discover_env_docs.py <Domain>`
    - `python .claude/skills/scripts/discover_env_docs.py <Domain>/<Task>`

diff --git a/.cursor/skills/frontier-contributor.md b/.cursor/skills/frontier-contributor.md
@@ -12,4 +12,4 @@ When asked to contribute or update a benchmark:
 4. Run:
    - `python verification/evaluator.py scripts/init.py`
    - `python -m frontier_eval task=unified task.benchmark=<Domain>/<Task> algorithm=openevolve algorithm.iterations=0`
-5. Keep runtime overrides unchanged and avoid secrets or machine-local paths.
+5. Keep runtime overrides unchanged and avoid secrets or machine‑local paths.
diff --git a/.gitignore b/.gitignore
@@ -31,9 +31,9 @@ SKILL_zh-CN.md
 # Local personal agent configuration
 /AGENT.md
 /AGENT_zh-CN.md
-/.codex/
-/.claude/
-/.cursor/
+# /.codex/
+# /.claude/
+# /.cursor/
 !/.claude/
 !/.claude/skills/
 !/.claude/skills/*.md
@@ -43,7 +43,12 @@ submission.json
 outputlog.txt
 frontier_eval/conf/batch/*
 !frontier_eval/conf/batch/example_matrix.yaml
-!frontier_eval/conf/batch/v1_cpu_openevolve_p8_i100_gemini-3.1-pro-preview.yaml
-!frontier_eval/conf/batch/v1_engdesign_openevolve_qwen3codernext_100.yaml
-!frontier_eval/conf/batch/v1_flashattention_openevolve_qwen3codernext_100.yaml
-!frontier_eval/conf/batch/v1_gpu_openevolve_qwen3codernext_100.yaml
+!frontier_eval/conf/batch/v1.yaml
+metrics.json
+artifacts.json
+debug-*.log
+
+# Task outputs and artifacts
+**/outputs/
+**/artifacts/
+**/last_eval.json
diff --git a/README.md b/README.md
@@ -22,11 +22,33 @@ Frontier-Eng evaluates agents on problems where genuine improvement requires int
 
 ## Getting Started
 
-```bash
-bash init.sh && conda activate frontier-eval-2
-```
+Setup is split between a small **driver** conda env and per-task **runtime** envs.
+
+- **Driver** (`frontier-eval-2`): from `init.sh`; schedules jobs only.
+- **Runtimes** (`frontier-v1-main`, `frontier-v1-kernel`, …): where benchmarks actually run. Install merged runtimes with `bash scripts/setup_v1_merged_task_envs.sh`.
+- Before long runs: `export PYTHONNOUSERSITE=1` so user-site packages do not leak into tasks.
+- Default task launch uses `task.runtime.use_conda_run=false` and `task.runtime.python_path=conda-env:<env_name>`.
+
+**Task-specific bits**
+
+- **DuckDB / EV2Gym**: need their local verifier deps (see each task dir).
+- **Optics**: extra requirements under `benchmarks/Optics/` (also reflected in merged configs).
+- **MolecularMechanics**: OpenFF stack (e.g. `openff-toolkit`); see task README.
+- **GPU kernel tasks** (FlashAttention, MLA, …): need `frontier-v1-kernel`.
+
+**External assets**
+
+- **`dc-rl`**: clone + patch; paths under `third_party/` and `benchmarks/SustainableDataCenterControl/.../sustaindc/`.
+- **PhySense**, **SustainDC**, **CarAerodynamicsSensing**: need downloaded models/data/checkpoints or they fail at runtime.
+
+**Known issues**
+
+- **ReactionOptimisation**: `frontier-v1-summit` pip resolution can fail; treat as env noise, not necessarily a bug in the harness.
+- **EngDesign**: Docker tasks need a working Docker setup; use local mode if you cannot access the socket.
+
+**LLM / API keys**: copy `.env.example` to `.env` and set at least **`OPENAI_API_KEY`** (and `OPENAI_API_BASE` / `OPENAI_MODEL` if you use a compatible gateway). Details: **[run.md](run.md)** · [中文 run_zh-CN.md](run_zh-CN.md).
 
-Per-task runs, batch matrices, and runtime overrides are in **[frontier_eval/README.md](frontier_eval/README.md)**.
+Per-task commands, batch matrices, and overrides: **[frontier_eval/README.md](frontier_eval/README.md)**. **v1 batch** wrapper and host notes: **[run.md](run.md)** · [中文 run_zh-CN.md](run_zh-CN.md) (`bash scripts/run_v1_batch.sh`).
 
 ## Leaderboard
 
@@ -51,9 +73,9 @@ The full task list by domain is in **[TASK_DETAILS.md](TASK_DETAILS.md)**.
 
 The best solutions produced by our agent runs (across experiments, algorithms, models, and tasks) are archived in **[baseline_archive/README.md](baseline_archive/README.md)**. These serve as reference baselines for the community.
 
-## Join the Community
+## Community
 
-Welcome to our developer community! Whether you want to discuss new engineering problem concepts, find task collaborators, or encounter technical issues, reach us via [Feishu](https://applink.feishu.cn/client/chat/chatter/add_by_link?link_token=21ak5858-60ba-44fd-9085-01f165c8771c) or [Discord](https://discord.gg/hxeVhZNN).
+[Feishu](https://applink.feishu.cn/client/chat/chatter/add_by_link?link_token=21ak5858-60ba-44fd-9085-01f165c8771c) · [Discord](https://discord.gg/hxeVhZNN)
 
 ## Contributing
 

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -2,7 +2,7 @@
 
 [English](README.md) | 简体中文
 
-[![主页](https://img.shields.io/badge/主页-lab.einsia.ai-0969DA?style=flat-square&logo=homepage&logoColor=white)](https://lab.einsia.ai/frontier-eng/) [![arXiv](https://img.shields.io/badge/arXiv-2604.12290-b31b1b?style=flat-square&logo=arxiv&logoColor=white)](http://arxiv.org/abs/2604.12290) [![飞书](https://img.shields.io/badge/飞书-加入讨论群-3370FF?style=flat-square)](https://applink.feishu.cn/client/chat/chatter/add_by_link?link_token=21ak5858-60ba-44fd-9085-01f165c8771c) [![Discord](https://img.shields.io/badge/Discord-加入-5865F2?style=flat-square&logo=discord&logoColor=white)](https://discord.gg/hxeVhZNN)
+[主页](https://lab.einsia.ai/frontier-eng/) [arXiv](http://arxiv.org/abs/2604.12290) [飞书](https://applink.feishu.cn/client/chat/chatter/add_by_link?link_token=21ak5858-60ba-44fd-9085-01f165c8771c) [Discord](https://discord.gg/hxeVhZNN)
 
 ## News
 
@@ -20,29 +20,53 @@ Frontier-Eng 将这种范式形式化为 **generative optimization**，并指出
 
 Frontier-Eng 要求 Agent 在只读、不可篡改的 verifier 下，将领域知识、受约束代码合成与迭代 refinement 紧密结合。
 
-## Getting Started
+## 上手
 
-```bash
-bash init.sh && conda activate frontier-eval-2
-```
+环境分两层：**调度用的 driver conda** 和 **各任务 runtime**。
 
-运行具体任务、batch、环境覆盖等见 **[frontier_eval/README_zh-CN.md](frontier_eval/README_zh-CN.md)**。
+- **Driver**（`frontier-eval-2`）：`init.sh` 创建，只负责调度。
+- **Runtime**（`frontier-v1-main`、`frontier-v1-kernel` 等）：任务真正跑在这里。合并安装：`bash scripts/setup_v1_merged_task_envs.sh`。
+- 长时间跑之前建议：`export PYTHONNOUSERSITE=1`，避免本机用户目录里的包混进任务进程。
+- 默认用 `task.runtime.use_conda_run=false` 和 `task.runtime.python_path=conda-env:<env_name>` 启进程。
+
+**按任务额外准备**
+
+- **DuckDB / EV2Gym**：要装各自任务目录里写的校验依赖。
+- **Optics**：见 `benchmarks/Optics/` 依赖说明。
+- **MolecularMechanics**：OpenFF 等，见该任务 README。
+- **GPU kernel 类**（FlashAttention 等）：需要 `frontier-v1-kernel`，不能只用主 env。
+
+**外部资源**
+
+- **dc-rl**：按说明 clone + patch，路径在 `third_party/` 与 `benchmarks/SustainableDataCenterControl/.../sustaindc/`。
+- **PhySense、SustainDC、CarAerodynamicsSensing**：要自备数据、模型或权重，缺了会跑失败。
+
+**已知问题**
+
+- **ReactionOptimisation**：`frontier-v1-summit` 上 pip 解析可能炸，优先当环境/依赖问题看。
+- **EngDesign**：依赖 Docker；没权限就改用文档里的本地模式。
+
+**LLM / API 密钥**：将 `.env.example` 复制为 `.env`，至少填写 **`OPENAI_API_KEY`**；使用兼容网关时按需设置 **`OPENAI_API_BASE`**、**`OPENAI_MODEL`**。详见 **[run_zh-CN.md](run_zh-CN.md)** · [English run.md](run.md)。
+
+单任务、批量矩阵与覆盖项：**[frontier_eval/README_zh-CN.md](frontier_eval/README_zh-CN.md)**。v1 批量脚本与主机说明：**[run_zh-CN.md](run_zh-CN.md)** · [English run.md](run.md)（`bash scripts/run_v1_batch.sh`）。
 
 ## Leaderboard
 
 详细榜单：**[lab.einsia.ai/frontier-eng/leaderboard.html](https://lab.einsia.ai/frontier-eng/leaderboard.html)**。**Frontier Models** — 平均任务内排名（47 tasks，下表按平均排名从低到高展示）。
 
-| 排名 | Model | Average Rank |
-| :--: | :--- | --: |
-| 1 | Claude Opus 4.6 | 3.18 |
-| 2 | GLM-5 | 4.02 |
-| 3 | DeepSeek V3.2 | 4.41 |
-| 4 | GPT-OSS-120B | 4.46 |
-| 5 | Gemini 3.1 Pro Preview | 5.34 |
-| 6 | Grok 4.20 | 5.60 |
-| 7 | SEED 2.0 Pro | 5.63 |
-| 8 | GPT-5.4 | 5.68 |
-| 9 | Qwen3 Coder Next | 6.68 |
+
+| 排名  | Model                  | Average Rank |
+| --- | ---------------------- | ------------ |
+| 1   | Claude Opus 4.6        | 3.18         |
+| 2   | GLM-5                  | 4.02         |
+| 3   | DeepSeek V3.2          | 4.41         |
+| 4   | GPT-OSS-120B           | 4.46         |
+| 5   | Gemini 3.1 Pro Preview | 5.34         |
+| 6   | Grok 4.20              | 5.60         |
+| 7   | SEED 2.0 Pro           | 5.63         |
+| 8   | GPT-5.4                | 5.68         |
+| 9   | Qwen3 Coder Next       | 6.68         |
+
 
 ## 任务详情
 
@@ -52,9 +76,9 @@ bash init.sh && conda activate frontier-eval-2
 
 我们在各实验 / 算法 / 模型 / task 组合上跑出的最优代码存档于 **[baseline_archive/README.md](baseline_archive/README.md)**，可作为社区参考 baseline。
 
-## 加入社区
+## 社区
 
-欢迎加入我们的开发者社区！无论是讨论新的工程问题构想、寻找 task 合作者，还是遇到技术问题，可通过[飞书](https://applink.feishu.cn/client/chat/chatter/add_by_link?link_token=21ak5858-60ba-44fd-9085-01f165c8771c)或[Discord](https://discord.gg/hxeVhZNN)直接联系我们。
+[飞书](https://applink.feishu.cn/client/chat/chatter/add_by_link?link_token=21ak5858-60ba-44fd-9085-01f165c8771c) · [Discord](https://discord.gg/hxeVhZNN)
 
 ## 贡献指南