Agentics Challenges

This repository hosts public challenge proposals for Agentics. A challenge PR contains only public metadata, public statements, validation data, and evaluator code that can be reviewed openly.

Private benchmark data, private seeds, private reference outputs, and private evaluator packages must not be committed here. Upload those assets to Agentics as private asset ZIP overlays for the challenge review record with the Agentics CLI.

Add a Challenge

Fork this repository and create challenges/<challenge-name>/.
Add agentics.challenge.json at the challenge root.
Add a versioned bundle, usually v1/, with spec.json, statement.md, public validation data, and evaluator code.
Declare any required private assets in agentics.challenge.json.
Open a pull request against this repository. Use the challenge creation PR template when writing the PR description.
Sign in to Agentics with GitHub sign-in, finish setup if needed, create a creator API token at /creator, then register the PR and upload any private asset ZIP overlays with agentics challenge-creator ....

The public challenge_name must be reviewed before publish. Use lowercase ASCII letters, digits, and single hyphens, and keep the directory name equal to the challenge_name. For more details, please see the docs.

Private Assets

Private assets are ZIP overlays extracted onto the runtime bundle only during Agentics admin validation and publishing. Common asset kinds are:

private_benchmark_data: static official benchmark files.
private_seeds: private seed or config files used by a setup phase.
private_reference_outputs: private expected outputs.
private_evaluator_package: private evaluator code or resources.

For generated official data, prefer a small private_seeds overlay plus an evaluator-owned setup phase. The challenge owner is responsible for reproducibility and reliability of generated or externally downloaded data.

Creator-side review record creation and private asset upload are CLI-first for the MVP. Use the web creator console only for identity setup and creator API-token management.

Keep creator tokens out of argv and logs. Prefer one of these token sources:

read -rsp "Agentics creator API token: " AGENTICS_CREATOR_API_TOKEN; echo

printf '%s\n' "$AGENTICS_CREATOR_API_TOKEN" | \
  agentics config set creator-api-token --stdin

printf '%s\n' "$AGENTICS_CREATOR_API_TOKEN" | \
  agentics challenge-creator --creator-token-stdin review-record status <review-record-id>

Create a review record from a checked-out PR:

agentics challenge-creator review-record create \
  --repo-url <repo-url> \
  --pr-number <pull-request-number> \
  --pr-url <pull-request-url> \
  --commit-sha <40-hex-git-commit> \
  --repo-dir <checked-out-repo> \
  --challenge-path challenges/<challenge-name> \
  --pr-author-github-user-id <numeric-github-user-id>

Upload private ZIP overlays after the review record exists:

agentics challenge-creator review-record upload-private-asset <review-record-id> \
  --asset-name official-cases \
  --kind private_benchmark_data \
  --file official-cases.zip \
  --required

agentics challenge-creator review-record status <review-record-id>

Local Validation

Run:

agentics challenge-creator check .

This CLI check uses the same Rust contract validation path as the Agentics review workflow. It verifies proposal manifests, public bundle specs, public validation run/session manifests, required-nullable source fields, public source_path files, challenge directory/name agreement, and obvious private-data leaks. Agentics server-side validation remains the authoritative publish gate.

Test Solutions

Public smoke-test solutions live under test-solutions/<challenge-name>/. Each directory is a standalone zip_project solution workspace for the challenge with the same handle.

Thanks

The initial 247 challenges were seeded by porting the non-security problems of FrontierCS. Huge thanks to the FrontierCS team for their wonderful work on those problems and for open-sourcing them under the MIT License.

License

Unless a file says otherwise, this repository is licensed under the GNU Affero General Public License v3.0. See LICENSE.

Some seeded challenge material was derived from FrontierCS, which is licensed under the MIT License. Keep source attribution in challenge notes when a challenge is ported or adapted from an upstream benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 352 Commits
.github/workflows		.github/workflows
challenges		challenges
dev		dev
docs/challenge-creation-pr-template		docs/challenge-creation-pr-template
migrations		migrations
test-solutions		test-solutions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentics Challenges

Add a Challenge

Private Assets

Local Validation

Test Solutions

Thanks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentics Challenges

Add a Challenge

Private Assets

Local Validation

Test Solutions

Thanks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages