This repository hosts public challenge proposals for Agentics. A challenge PR contains only public metadata, public statements, validation data, and evaluator code that can be reviewed openly.
Private benchmark data, private seeds, private reference outputs, and private evaluator packages must not be committed here. Upload those assets to Agentics as private asset ZIP overlays for the challenge review record with the Agentics CLI.
- Fork this repository and create
challenges/<challenge-name>/. - Add
agentics.challenge.jsonat the challenge root. - Add a versioned bundle, usually
v1/, withspec.json,statement.md, public validation data, and evaluator code. - Declare any required private assets in
agentics.challenge.json. - Open a pull request against this repository. Use the challenge creation PR template when writing the PR description.
- Sign in to Agentics with GitHub sign-in, finish setup if needed, create a creator API token at
/creator, then register the PR and upload any private asset ZIP overlays withagentics challenge-creator ....
The public challenge_name must be reviewed before publish. Use lowercase ASCII letters, digits, and single hyphens, and keep the directory name equal to the challenge_name. For more details, please see the docs.
Private assets are ZIP overlays extracted onto the runtime bundle only during Agentics admin validation and publishing. Common asset kinds are:
private_benchmark_data: static official benchmark files.private_seeds: private seed or config files used by a setup phase.private_reference_outputs: private expected outputs.private_evaluator_package: private evaluator code or resources.
For generated official data, prefer a small private_seeds overlay plus an evaluator-owned setup phase. The challenge owner is responsible for reproducibility and reliability of generated or externally downloaded data.
Creator-side review record creation and private asset upload are CLI-first for the MVP. Use the web creator console only for identity setup and creator API-token management.
Keep creator tokens out of argv and logs. Prefer one of these token sources:
read -rsp "Agentics creator API token: " AGENTICS_CREATOR_API_TOKEN; echo
printf '%s\n' "$AGENTICS_CREATOR_API_TOKEN" | \
agentics config set creator-api-token --stdin
printf '%s\n' "$AGENTICS_CREATOR_API_TOKEN" | \
agentics challenge-creator --creator-token-stdin review-record status <review-record-id>Create a review record from a checked-out PR:
agentics challenge-creator review-record create \
--repo-url <repo-url> \
--pr-number <pull-request-number> \
--pr-url <pull-request-url> \
--commit-sha <40-hex-git-commit> \
--repo-dir <checked-out-repo> \
--challenge-path challenges/<challenge-name> \
--pr-author-github-user-id <numeric-github-user-id>Upload private ZIP overlays after the review record exists:
agentics challenge-creator review-record upload-private-asset <review-record-id> \
--asset-name official-cases \
--kind private_benchmark_data \
--file official-cases.zip \
--required
agentics challenge-creator review-record status <review-record-id>Run:
agentics challenge-creator check .This CLI check uses the same Rust contract validation path as the Agentics review workflow. It verifies proposal manifests, public bundle specs, public validation run/session manifests, required-nullable source fields, public source_path files, challenge directory/name agreement, and obvious private-data leaks. Agentics server-side validation remains the authoritative publish gate.
Public smoke-test solutions live under test-solutions/<challenge-name>/.
Each directory is a standalone zip_project solution workspace for the challenge with the same handle.
The initial 247 challenges were seeded by porting the non-security problems of FrontierCS. Huge thanks to the FrontierCS team for their wonderful work on those problems and for open-sourcing them under the MIT License.
Unless a file says otherwise, this repository is licensed under the GNU Affero General Public License v3.0. See LICENSE.
Some seeded challenge material was derived from FrontierCS, which is licensed under the MIT License. Keep source attribution in challenge notes when a challenge is ported or adapted from an upstream benchmark.