Skip to content

agentic skills for academic demo#617

Draft
vedika-saravanan wants to merge 1 commit into
NVIDIA:mainfrom
vedika-saravanan:agentic-academia
Draft

agentic skills for academic demo#617
vedika-saravanan wants to merge 1 commit into
NVIDIA:mainfrom
vedika-saravanan:agentic-academia

Conversation

@vedika-saravanan

@vedika-saravanan vedika-saravanan commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds an academic CUDA-QX VQE/QAOA skill plus supporting evaluation tooling for comparing skill-guided answers against a no-skill baseline.

Changes

  • Add the cudaq-academic-vqe-qaoa skill with install, VQE, QAOA, and metrics references
  • Add prompt and assertion files for the academic VQE/QAOA comparison workflow
  • Add evaluate_metrics.py for deterministic assertion-based scoring
  • Add compare_runs.py for initializing paired run files and comparing with-skill vs without-skill results
  • Add record_responses.py to simplify capturing multiline baseline responses
  • Add BASELINE_COLLECTION.md with clean baseline collection instructions
  • Document that generated run files under runs/ are local validation artifacts and are gitignored

Validation

  • Verified eval Python scripts compile
  • Verified eval JSON files parse
  • Ran compare_runs.py init in a temporary directory
  • Ran compare_runs.py compare against the initialized temp run files
  • Locally validated a populated with-skill run during development with 100% pass rate, 21/21 coverage, and 0 forbidden hits

Notes

Generated .agents/evals/academic-vqe-qaoa/runs/*.json
(codex_with_skill.json
codex_without_skill.json) files are not included in this PR because the runs/ directory is gitignored. They should be treated as local validation artifacts unless explicitly force-added for a validation snapshot.

Token counts should be treated as approximate unless exact model usage metadata is present.

compare_runs.py, record_responses.py, and BASELINE_COLLECTION.md are included as eval workflow tooling. This PR should contain only the skill and not the eval collection workflow, those helper files should be removed.

Runtime / performance impact

Self-review checklist

Please confirm each item before requesting review. Check [x] or strike
through and explain.

Before requesting review

  • I reviewed my own full diff in GitHub or my editor.
  • PR is in Draft if it is not yet ready for review.
  • Temporary / debugging changes have been removed.
  • Local test logs reviewed; no unexplained warnings or errors.
  • CI logs reviewed; no unexplained warnings or errors.
  • Full CI has been run.

Scope and size

  • PR is under ~1000 lines, or an exception is justified in the description.
  • Refactoring-only changes are isolated in their own PR(s).
  • No existing tests were disabled or modified just to make this PR pass
    (if so, an issue has been raised).

Tests

  • New functionality has new tests.
  • Tests fail if the new functionality is broken (including crashes), not
    just when it is missing.
  • Negative tests added where exceptions are expected.
  • Truth data added where simple EXPECT_* / assert checks are
    insufficient for algorithmic correctness.
  • CI runtime impact considered; team notified if significant.

Documentation

  • Public-facing APIs have Doxygen docs.
  • User-visible behavior changes have public docs, or a follow-up is
    tracked.

Code style

  • Naming follows the existing convention (snake_case vs camelCase) for
    the area being modified.

Dependencies

  • No new third-party dependencies, or the team has been notified and
    OSRB tickets filed.

@vedika-saravanan vedika-saravanan force-pushed the agentic-academia branch 3 times, most recently from dedb2d2 to 94ab1ab Compare June 17, 2026 00:48
Signed-off-by: vedika-saravanan <vsaravanan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants