agentic skills for academic demo by vedika-saravanan · Pull Request #617 · NVIDIA/cudaqx

vedika-saravanan · 2026-06-16T22:34:23Z

Description

Adds an academic CUDA-QX VQE/QAOA skill plus supporting evaluation tooling for comparing skill-guided answers against a no-skill baseline.

Changes

Add the cudaq-academic-vqe-qaoa skill with install, VQE, QAOA, and metrics references
Add prompt and assertion files for the academic VQE/QAOA comparison workflow
Add evaluate_metrics.py for deterministic assertion-based scoring
Add compare_runs.py for initializing paired run files and comparing with-skill vs without-skill results
Add record_responses.py to simplify capturing multiline baseline responses
Add BASELINE_COLLECTION.md with clean baseline collection instructions
Document that generated run files under runs/ are local validation artifacts and are gitignored

Validation

Verified eval Python scripts compile
Verified eval JSON files parse
Ran compare_runs.py init in a temporary directory
Ran compare_runs.py compare against the initialized temp run files
Locally validated a populated with-skill run during development with 100% pass rate, 21/21 coverage, and 0 forbidden hits

Notes

Generated .agents/evals/academic-vqe-qaoa/runs/*.json
(codex_with_skill.json
codex_without_skill.json) files are not included in this PR because the runs/ directory is gitignored. They should be treated as local validation artifacts unless explicitly force-added for a validation snapshot.

Token counts should be treated as approximate unless exact model usage metadata is present.

compare_runs.py, record_responses.py, and BASELINE_COLLECTION.md are included as eval workflow tooling. This PR should contain only the skill and not the eval collection workflow, those helper files should be removed.

Runtime / performance impact

Self-review checklist

Please confirm each item before requesting review. Check [x] or strike
through and explain.

Before requesting review

I reviewed my own full diff in GitHub or my editor.
PR is in Draft if it is not yet ready for review.
Temporary / debugging changes have been removed.
Local test logs reviewed; no unexplained warnings or errors.
CI logs reviewed; no unexplained warnings or errors.
Full CI has been run.

Scope and size

PR is under ~1000 lines, or an exception is justified in the description.
Refactoring-only changes are isolated in their own PR(s).
No existing tests were disabled or modified just to make this PR pass
(if so, an issue has been raised).

Tests

New functionality has new tests.
Tests fail if the new functionality is broken (including crashes), not
just when it is missing.
Negative tests added where exceptions are expected.
Truth data added where simple EXPECT_* / assert checks are
insufficient for algorithmic correctness.
CI runtime impact considered; team notified if significant.

Documentation

Public-facing APIs have Doxygen docs.
User-visible behavior changes have public docs, or a follow-up is
tracked.

Code style

Naming follows the existing convention (snake_case vs camelCase) for
the area being modified.

Dependencies

No new third-party dependencies, or the team has been notified and
OSRB tickets filed.

Signed-off-by: vedika-saravanan <vsaravanan@nvidia.com>

vedika-saravanan force-pushed the agentic-academia branch 3 times, most recently from dedb2d2 to 94ab1ab Compare June 17, 2026 00:48

agentic skills for academic demo

ddd020d

Signed-off-by: vedika-saravanan <vsaravanan@nvidia.com>

vedika-saravanan force-pushed the agentic-academia branch from 94ab1ab to ddd020d Compare June 17, 2026 01:13

vedika-saravanan assigned vedika-saravanan and kvmto Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agentic skills for academic demo#617

agentic skills for academic demo#617
vedika-saravanan wants to merge 1 commit into
NVIDIA:mainfrom
vedika-saravanan:agentic-academia

vedika-saravanan commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vedika-saravanan commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Validation

Notes

Runtime / performance impact

Self-review checklist

Before requesting review

Scope and size

Tests

Documentation

Code style

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vedika-saravanan commented Jun 16, 2026 •

edited

Loading