Skip to content

feat(ci): supply-chain & security hardening (#443, #689, #690, #691, #692, #468, #552)#718

Merged
dgenio merged 4 commits into
mainfrom
claude/issue-triage-grouping-6pwo4k
Jun 22, 2026
Merged

feat(ci): supply-chain & security hardening (#443, #689, #690, #691, #692, #468, #552)#718
dgenio merged 4 commits into
mainfrom
claude/issue-triage-grouping-6pwo4k

Conversation

@dgenio

@dgenio dgenio commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

Coordinated supply-chain & security CI hardening, delivered as one PR under the
supply-chain hardening umbrella #443. Closes its decomposed sub-issues and
adjacent supply-chain items: #689 (CodeQL + dependency scanning), #690
(release attestations), #691 (SECURITY.md alignment + ownership checks),
#692 (security-exception runbook), #468 (release-integrity gates), and
#552 (OpenSSF Scorecard + badge).

The repo had strong functional CI but no automated security tooling — no CodeQL,
Dependabot, dependency audit, OpenSSF Scorecard, or release attestations. (The
push that opened this branch surfaced the gap directly: GitHub reports 6 open
Dependabot advisories on main.)

Closes #443
Closes #689
Closes #690
Closes #691
Closes #692
Closes #468
Closes #552

Changes

Why

Grounded in the triage of the open backlog: #689#692 are formal sub-issues of
the #443 umbrella (Parent: #443), and #468/#552 are line-items of #443's own
proposed scope. They share one code area (.github/ + SECURITY.md) and one
implementation path, so a single focused PR is cleaner than seven.

How verified

Ran in an isolated venv (no src/ changed, so the heavy example/demo legs are deferred to CI):

  • ruff format --check on changed scripts/tests — 3 files already formatted
  • ruff check on changed scripts/tests — All checks passed!
  • mypy scripts/check_security_policy.py scripts/check_readme_version.pySuccess: no issues found
  • pytest tests/test_check_security_policy.py tests/test_check_readme_version.py tests/test_check_doc_snippets.py tests/test_check_module_size.py31 passed
  • make security-policy-checkin sync (0.16.0); make readme-version-checkin sync (0.16.0)
  • Fails-without-fix proof: reverting SECURITY.md to 0.14.x makes check_security_policy.py exit 1 with the exact drift message; restoring exits 0.
  • python scripts/check_readme_version.py --print-version0.16.0 (drives the publish tag-gate)
  • All 8 workflow YAMLs + dependabot.yml + mkdocs.yml parse via yaml.safe_load.

Checklist

  • Tests added or updated for every new/changed public function (tests/test_check_security_policy.py)
  • [~] make ci passes locally — ran fmt + lint + type + targeted tests + both policy gates; the example/demo/full-matrix legs are deferred to CI (no src/ changes)
  • CHANGELOG.md updated under ## [Unreleased]
  • Docstrings added for all new public APIs (Google-style)
  • N/A — Public-API change? No src/ package surface changed (scripts/ is not part of api/public_api.txt)
  • Every modified module stays ≤ 300 lines (new script 151, test 84)
  • Related issues linked above
  • Agent-facing docs updated (AGENTS.md, docs/agent-context/workflows.md now list the new gate)

Notes for reviewers

  • Action SHA-pinning (one line-item of Add release-pipeline integrity gates: tag/version check, pre-publish tests, pinned actions, version-reference drift checks #468): intentionally deferred to Dependabot's github-actions updates rather than hand-pinning every workflow to commit SHAs. Hand-pinning contradicts the repo's existing @v4/@v5 tag idiom and would be a large, noisy, drift-prone diff. The new Dependabot github-actions ecosystem keeps tags current; OpenSSF Scorecard will still flag pinned-deps as an advisory and can be revisited. Documented in docs/security_tooling.md.
  • OpenSSF Best Practices badge (Apply for the OpenSSF Best Practices badge and surface project-health signals #552) requires a manual application at bestpractices.dev; tracked as a step in the runbook. The automated Scorecard badge ships now (resolves after the first main run).
  • Action versions used (codeql-action@v3, ossf/scorecard-action@v2, attest-build-provenance@v2, upload-artifact@v4) match current majors and the repo's tag convention.
  • server.json still reads 0.15.0 (separate pre-existing release-metadata drift) — left out of scope here; the new publish tag-gate covers pyproject version, not server.json.

🤖 Generated with Claude Code

https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH


Generated by Claude Code

…692, #468, #552)

Coordinated security-posture pass under the supply-chain hardening umbrella
(#443), delivered as one PR:

- CodeQL code scanning with the security-extended pack on PR/main/weekly (#689)
- pip-audit dependency scanning: gating on core deps, report-only dev extra (#689)
- OpenSSF Scorecard analysis + README badge; Best Practices badge tracked (#552)
- Dependabot weekly pip + github-actions updates, grouped (#443)
- Release-integrity verify job in publish.yml: tag<->version gate, pre-publish
  tests, twine check before upload (#468)
- Build-provenance attestations for released artifacts (#690)
- security-policy-check gate (scripts/check_security_policy.py) wired into
  make ci and ci.yml; refresh SECURITY.md supported series to 0.16.x (#691)
- Security tooling runbook docs/security_tooling.md: triage SLA, ownership,
  false-positive exception process (#692)
- check_readme_version.py gains --print-version for the release gate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
Copilot AI review requested due to automatic review settings June 22, 2026 07:37
@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements coordinated CI supply-chain and security hardening for contextweaver by adding automated security scanning (CodeQL, OpenSSF Scorecard, pip-audit, Dependabot), strengthening release integrity checks, and introducing a gating drift guard to keep SECURITY.md aligned with the package version and valid repo links.

Changes:

  • Added new security workflows: CodeQL scanning, OpenSSF Scorecard analysis (SARIF → code scanning), and pip-audit (gating core deps; report-only dev extra), plus Dependabot configuration.
  • Hardened the release pipeline with a pre-publish verification job and build-provenance attestations.
  • Added a gating security-policy-check (script + tests) and updated docs (SECURITY.md, runbook, mkdocs nav, agent/workflow docs, changelog) to reflect the new tooling.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_check_security_policy.py Adds unit tests for the new SECURITY.md drift/link guard, including a live “repo is in sync” assertion.
SECURITY.md Updates supported minor series to 0.16.x and documents automated security tooling + runbook link.
scripts/check_security_policy.py Introduces the SECURITY.md supported-series drift check + relative link validation gate.
scripts/check_readme_version.py Adds --print-version to expose the package version for release gating without re-parsing TOML in shell.
README.md Adds the OpenSSF Scorecard badge.
mkdocs.yml Adds the security tooling runbook to the docs nav.
Makefile Adds security-policy-check target and wires it into make ci.
docs/security_tooling.md New runbook documenting tooling, triage SLAs, and exception/suppression process.
docs/agent-context/workflows.md Documents the new make security-policy-check gate as part of make ci.
CHANGELOG.md Records the security hardening work under Unreleased.
AGENTS.md Updates the documented make ci gate list to include security-policy-check.
.github/workflows/publish.yml Adds a verify job (tag↔version, tests, twine check) and build-provenance attestation step.
.github/workflows/pip-audit.yml New workflow running pip-audit with gating core deps and report-only dev extra.
.github/workflows/ossf-scorecard.yml New Scorecard workflow publishing SARIF + results for the badge endpoint.
.github/workflows/codeql.yml New CodeQL workflow using security-extended queries on PR/main/weekly schedule.
.github/workflows/ci.yml Wires the new scripts/check_security_policy.py drift check into the gating CI workflow.
.github/dependabot.yml New Dependabot config for weekly grouped pip updates and GitHub Actions updates.

Comment thread scripts/check_security_policy.py Outdated
Comment thread docs/security_tooling.md Outdated
Comment thread .github/workflows/publish.yml Outdated
claude added 2 commits June 22, 2026 07:45
…eQL label, harden link check, regen llms

Review feedback on #718:
- publish.yml: pin release-path actions (checkout, setup-python,
  attest-build-provenance, pypa publish) to immutable commit SHAs with `# vX`
  comments; Dependabot github-actions keeps them current. Addresses the #468
  SHA-pinning line-item for the high-trust release job.
- docs/security_tooling.md: correct the CodeQL row — code-scanning alerts are
  advisory and do not fail the PR check by default; it is not gating.
- check_security_policy.py: find_broken_links now rejects absolute paths and
  ../ traversal instead of letting an existing-but-non-repo-relative target
  pass; add tests.
- pip-audit.yml: report-only dev-extra audit uses `|| true` so it reports a
  green (non-blocking) check while keeping findings in the log.
- Regenerate llms-full.txt for the new/changed security docs (drift gate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
numpy>=2.5 (a transitive [dev] dep via chromadb/langgraph/crewai) ships .pyi
stubs using PEP 695 `type` statements. Under the project's mypy
`python_version = "3.10"` target these raise a hard syntax error
("Type statement is only supported in Python 3.12 and greater") that aborts
the whole `mypy src/ examples/ scripts/` run on the 3.12/3.13 cells — a
pre-existing dependency-drift break unrelated to contextweaver code.

Add a scoped override (`follow_imports = "skip"` + `follow_imports_for_stubs`)
so mypy treats numpy as `Any` without parsing its stubs. `follow_imports_for_stubs`
is the load-bearing setting — without it the skip does not apply to .pyi files
and the error persists. Validated against numpy 2.5.0 + mypy 2.1.0 on Python
3.12: the type gate goes green and numpy resolves to `Any` with no false errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

Benchmark delta (vs main)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size recall@k (head Δ vs base) MRR (head Δ vs base) p99 (ms)
50 ✅ 0.5649 (+0.0000) ✅ 0.4978 (+0.0000) ⚠️ 1.046 (base 0.759)
83 ✅ 0.3825 (+0.0000) ✅ 0.3242 (+0.0000) ✅ 0.840 (base 1.134)
1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 38.931 (base 41.711)

Per-backend × per-size matrix

backend size recall@k (Δ) MRR (Δ) p99 (ms)
bm25 100 ✅ 0.3825 (+0.0000) ✅ 0.3399 (+0.0000) ✅ 6.557 (base 8.140)
bm25 500 ✅ 0.2250 (+0.0000) ✅ 0.2165 (+0.0000) ✅ 29.691 (base 38.989)
bm25 1000 ✅ 0.1575 (+0.0000) ✅ 0.1525 (+0.0000) ✅ 86.725 (base 111.716)
embedding_hashing 100 ✅ 0.5175 (+0.0000) ✅ 0.4360 (+0.0000) ✅ 8.890 (base 7.225)
embedding_hashing 500 ✅ 0.2700 (+0.0000) ✅ 0.2674 (+0.0000) ✅ 42.234 (base 44.182)
embedding_hashing 1000 ✅ 0.2000 (+0.0000) ✅ 0.1931 (+0.0000) ✅ 99.769 (base 98.277)
embedding_st 100 skipped (skipped: missing sentence-transformers)
embedding_st 500 skipped (skipped: missing sentence-transformers)
embedding_st 1000 skipped (skipped: missing sentence-transformers)
fuzzy 100 skipped (skipped: missing rapidfuzz)
fuzzy 500 skipped (skipped: missing rapidfuzz)
fuzzy 1000 skipped (skipped: missing rapidfuzz)
tfidf 100 ✅ 0.3825 (+0.0000) ✅ 0.3220 (+0.0000) ✅ 1.070 (base 1.102)
tfidf 500 ✅ 0.2325 (+0.0000) ✅ 0.2314 (+0.0000) ✅ 9.595 (base 11.492)
tfidf 1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 37.372 (base 50.755)

Context pipeline (per scenario)

scenario tokens dropped dedup
large_catalog 1480 (base 1514, Δ-34) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
long_conversation 2500 (base 2548, Δ-48) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
mixed_payload 488 (base 497, Δ-9) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
short_conversation 487 (base 496, Δ-9) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
stress_conversation 6590 (base 6651, Δ-61) 11 (base 7, Δ+4) 4 (base 4, Δ+0)
tiny_payload 256 (base 267, Δ-11) 0 (base 0, Δ+0) 0 (base 0, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

… flag

The --print-version flag (added for the publish.yml release-integrity
tag-gate, #468) had no test. A regression in its output — a trailing
banner or extra line — would silently break the `[ "$tag" != "v$version" ]`
comparison and either block a valid release or pass a mistagged one.

Add a capsys test asserting `main(["--print-version"])` prints the bare
pyproject version (exactly `<version>\n`) and returns 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HCfLwmKtfqovpiFDuRuKea
@dgenio dgenio merged commit d043fa0 into main Jun 22, 2026
13 checks passed
@dgenio dgenio deleted the claude/issue-triage-grouping-6pwo4k branch June 22, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment