feat(ci): supply-chain & security hardening (#443, #689, #690, #691, #692, #468, #552)#718
Conversation
…692, #468, #552) Coordinated security-posture pass under the supply-chain hardening umbrella (#443), delivered as one PR: - CodeQL code scanning with the security-extended pack on PR/main/weekly (#689) - pip-audit dependency scanning: gating on core deps, report-only dev extra (#689) - OpenSSF Scorecard analysis + README badge; Best Practices badge tracked (#552) - Dependabot weekly pip + github-actions updates, grouped (#443) - Release-integrity verify job in publish.yml: tag<->version gate, pre-publish tests, twine check before upload (#468) - Build-provenance attestations for released artifacts (#690) - security-policy-check gate (scripts/check_security_policy.py) wired into make ci and ci.yml; refresh SECURITY.md supported series to 0.16.x (#691) - Security tooling runbook docs/security_tooling.md: triage SLA, ownership, false-positive exception process (#692) - check_readme_version.py gains --print-version for the release gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
There was a problem hiding this comment.
Pull request overview
This PR implements coordinated CI supply-chain and security hardening for contextweaver by adding automated security scanning (CodeQL, OpenSSF Scorecard, pip-audit, Dependabot), strengthening release integrity checks, and introducing a gating drift guard to keep SECURITY.md aligned with the package version and valid repo links.
Changes:
- Added new security workflows: CodeQL scanning, OpenSSF Scorecard analysis (SARIF → code scanning), and pip-audit (gating core deps; report-only dev extra), plus Dependabot configuration.
- Hardened the release pipeline with a pre-publish verification job and build-provenance attestations.
- Added a gating
security-policy-check(script + tests) and updated docs (SECURITY.md, runbook, mkdocs nav, agent/workflow docs, changelog) to reflect the new tooling.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_check_security_policy.py |
Adds unit tests for the new SECURITY.md drift/link guard, including a live “repo is in sync” assertion. |
SECURITY.md |
Updates supported minor series to 0.16.x and documents automated security tooling + runbook link. |
scripts/check_security_policy.py |
Introduces the SECURITY.md supported-series drift check + relative link validation gate. |
scripts/check_readme_version.py |
Adds --print-version to expose the package version for release gating without re-parsing TOML in shell. |
README.md |
Adds the OpenSSF Scorecard badge. |
mkdocs.yml |
Adds the security tooling runbook to the docs nav. |
Makefile |
Adds security-policy-check target and wires it into make ci. |
docs/security_tooling.md |
New runbook documenting tooling, triage SLAs, and exception/suppression process. |
docs/agent-context/workflows.md |
Documents the new make security-policy-check gate as part of make ci. |
CHANGELOG.md |
Records the security hardening work under Unreleased. |
AGENTS.md |
Updates the documented make ci gate list to include security-policy-check. |
.github/workflows/publish.yml |
Adds a verify job (tag↔version, tests, twine check) and build-provenance attestation step. |
.github/workflows/pip-audit.yml |
New workflow running pip-audit with gating core deps and report-only dev extra. |
.github/workflows/ossf-scorecard.yml |
New Scorecard workflow publishing SARIF + results for the badge endpoint. |
.github/workflows/codeql.yml |
New CodeQL workflow using security-extended queries on PR/main/weekly schedule. |
.github/workflows/ci.yml |
Wires the new scripts/check_security_policy.py drift check into the gating CI workflow. |
.github/dependabot.yml |
New Dependabot config for weekly grouped pip updates and GitHub Actions updates. |
…eQL label, harden link check, regen llms Review feedback on #718: - publish.yml: pin release-path actions (checkout, setup-python, attest-build-provenance, pypa publish) to immutable commit SHAs with `# vX` comments; Dependabot github-actions keeps them current. Addresses the #468 SHA-pinning line-item for the high-trust release job. - docs/security_tooling.md: correct the CodeQL row — code-scanning alerts are advisory and do not fail the PR check by default; it is not gating. - check_security_policy.py: find_broken_links now rejects absolute paths and ../ traversal instead of letting an existing-but-non-repo-relative target pass; add tests. - pip-audit.yml: report-only dev-extra audit uses `|| true` so it reports a green (non-blocking) check while keeping findings in the log. - Regenerate llms-full.txt for the new/changed security docs (drift gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
numpy>=2.5 (a transitive [dev] dep via chromadb/langgraph/crewai) ships .pyi
stubs using PEP 695 `type` statements. Under the project's mypy
`python_version = "3.10"` target these raise a hard syntax error
("Type statement is only supported in Python 3.12 and greater") that aborts
the whole `mypy src/ examples/ scripts/` run on the 3.12/3.13 cells — a
pre-existing dependency-drift break unrelated to contextweaver code.
Add a scoped override (`follow_imports = "skip"` + `follow_imports_for_stubs`)
so mypy treats numpy as `Any` without parsing its stubs. `follow_imports_for_stubs`
is the load-bearing setting — without it the skip does not apply to .pyi files
and the error persists. Validated against numpy 2.5.0 + mypy 2.1.0 on Python
3.12: the type gate goes green and numpy resolves to `Any` with no false errors.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
Benchmark delta (vs
|
| size | recall@k (head Δ vs base) | MRR (head Δ vs base) | p99 (ms) |
|---|---|---|---|
| 50 | ✅ 0.5649 (+0.0000) | ✅ 0.4978 (+0.0000) | |
| 83 | ✅ 0.3825 (+0.0000) | ✅ 0.3242 (+0.0000) | ✅ 0.840 (base 1.134) |
| 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 38.931 (base 41.711) |
Per-backend × per-size matrix
| backend | size | recall@k (Δ) | MRR (Δ) | p99 (ms) |
|---|---|---|---|---|
| bm25 | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3399 (+0.0000) | ✅ 6.557 (base 8.140) |
| bm25 | 500 | ✅ 0.2250 (+0.0000) | ✅ 0.2165 (+0.0000) | ✅ 29.691 (base 38.989) |
| bm25 | 1000 | ✅ 0.1575 (+0.0000) | ✅ 0.1525 (+0.0000) | ✅ 86.725 (base 111.716) |
| embedding_hashing | 100 | ✅ 0.5175 (+0.0000) | ✅ 0.4360 (+0.0000) | ✅ 8.890 (base 7.225) |
| embedding_hashing | 500 | ✅ 0.2700 (+0.0000) | ✅ 0.2674 (+0.0000) | ✅ 42.234 (base 44.182) |
| embedding_hashing | 1000 | ✅ 0.2000 (+0.0000) | ✅ 0.1931 (+0.0000) | ✅ 99.769 (base 98.277) |
| embedding_st | 100 | skipped (skipped: missing sentence-transformers) | — | — |
| embedding_st | 500 | skipped (skipped: missing sentence-transformers) | — | — |
| embedding_st | 1000 | skipped (skipped: missing sentence-transformers) | — | — |
| fuzzy | 100 | skipped (skipped: missing rapidfuzz) | — | — |
| fuzzy | 500 | skipped (skipped: missing rapidfuzz) | — | — |
| fuzzy | 1000 | skipped (skipped: missing rapidfuzz) | — | — |
| tfidf | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3220 (+0.0000) | ✅ 1.070 (base 1.102) |
| tfidf | 500 | ✅ 0.2325 (+0.0000) | ✅ 0.2314 (+0.0000) | ✅ 9.595 (base 11.492) |
| tfidf | 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 37.372 (base 50.755) |
Context pipeline (per scenario)
| scenario | tokens | dropped | dedup |
|---|---|---|---|
| large_catalog | 1480 (base 1514, Δ-34) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| long_conversation | 2500 (base 2548, Δ-48) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| mixed_payload | 488 (base 497, Δ-9) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| short_conversation | 487 (base 496, Δ-9) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| stress_conversation | 6590 (base 6651, Δ-61) | 11 (base 7, Δ+4) | 4 (base 4, Δ+0) |
| tiny_payload | 256 (base 267, Δ-11) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.
… flag The --print-version flag (added for the publish.yml release-integrity tag-gate, #468) had no test. A regression in its output — a trailing banner or extra line — would silently break the `[ "$tag" != "v$version" ]` comparison and either block a valid release or pass a mistagged one. Add a capsys test asserting `main(["--print-version"])` prints the bare pyproject version (exactly `<version>\n`) and returns 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HCfLwmKtfqovpiFDuRuKea
Summary
Coordinated supply-chain & security CI hardening, delivered as one PR under the
supply-chain hardening umbrella #443. Closes its decomposed sub-issues and
adjacent supply-chain items: #689 (CodeQL + dependency scanning), #690
(release attestations), #691 (SECURITY.md alignment + ownership checks),
#692 (security-exception runbook), #468 (release-integrity gates), and
#552 (OpenSSF Scorecard + badge).
The repo had strong functional CI but no automated security tooling — no CodeQL,
Dependabot, dependency audit, OpenSSF Scorecard, or release attestations. (The
push that opened this branch surfaced the gap directly: GitHub reports 6 open
Dependabot advisories on
main.)Closes #443
Closes #689
Closes #690
Closes #691
Closes #692
Closes #468
Closes #552
Changes
.github/workflows/codeql.yml(new) — CodeQLsecurity-extendedanalysis on PR,main, and weekly (Enable CodeQL and dependency vulnerability workflows #689)..github/workflows/pip-audit.yml(new) — dependency vuln scan: gating on core deps, report-only for the dev extra (Enable CodeQL and dependency vulnerability workflows #689)..github/workflows/ossf-scorecard.yml(new) — OpenSSF Scorecard analysis; SARIF → code scanning;publish_resultsfor the README badge (Apply for the OpenSSF Best Practices badge and surface project-health signals #552)..github/dependabot.yml(new) — weekly groupedpip+github-actionsupdates ([CI] Supply-chain and security hardening: CodeQL, Dependabot, pip-audit, OpenSSF Scorecard, release attestations #443)..github/workflows/publish.yml(edit) — newverifyjob gates publish: tag↔pyprojectversion check, pre-publishpytest,twine check;attestations: write+actions/attest-build-provenanceon built artifacts (Add release-pipeline integrity gates: tag/version check, pre-publish tests, pinned actions, version-reference drift checks #468, Release provenance and artifact attestation workflow #690).scripts/check_security_policy.py+tests/test_check_security_policy.py(new) — gating drift guard: SECURITY.md supported series must matchpyproject.toml; relative links must resolve. Wired intomake ciandci.yml(Security-policy docs alignment and ownership checks #691).SECURITY.md(edit) — supported series0.14.x→0.16.x; new "Automated Security Tooling" section linking the runbook (Security-policy docs alignment and ownership checks #691).docs/security_tooling.md(new) +mkdocs.ymlnav — triage SLA, ownership, and false-positive exception process (Exception process for security tooling noise #692).README.md— OpenSSF Scorecard badge (Apply for the OpenSSF Best Practices badge and surface project-health signals #552).scripts/check_readme_version.py—--print-versionflag (single source of truth for the release gate).Makefile,AGENTS.md,docs/agent-context/workflows.md,CHANGELOG.md— wire/record the newsecurity-policy-checkgate.Why
Grounded in the triage of the open backlog: #689–#692 are formal sub-issues of
the #443 umbrella (
Parent: #443), and #468/#552 are line-items of #443's ownproposed scope. They share one code area (
.github/+SECURITY.md) and oneimplementation path, so a single focused PR is cleaner than seven.
How verified
Ran in an isolated venv (no
src/changed, so the heavy example/demo legs are deferred to CI):ruff format --checkon changed scripts/tests — 3 files already formattedruff checkon changed scripts/tests — All checks passed!mypy scripts/check_security_policy.py scripts/check_readme_version.py— Success: no issues foundpytest tests/test_check_security_policy.py tests/test_check_readme_version.py tests/test_check_doc_snippets.py tests/test_check_module_size.py— 31 passedmake security-policy-check— in sync (0.16.0);make readme-version-check— in sync (0.16.0)0.14.xmakescheck_security_policy.pyexit 1 with the exact drift message; restoring exits 0.python scripts/check_readme_version.py --print-version→0.16.0(drives the publish tag-gate)dependabot.yml+mkdocs.ymlparse viayaml.safe_load.Checklist
tests/test_check_security_policy.py)make cipasses locally — ran fmt + lint + type + targeted tests + both policy gates; theexample/demo/full-matrix legs are deferred to CI (nosrc/changes)CHANGELOG.mdupdated under## [Unreleased]src/package surface changed (scripts/is not part ofapi/public_api.txt)AGENTS.md,docs/agent-context/workflows.mdnow list the new gate)Notes for reviewers
github-actionsupdates rather than hand-pinning every workflow to commit SHAs. Hand-pinning contradicts the repo's existing@v4/@v5tag idiom and would be a large, noisy, drift-prone diff. The new Dependabotgithub-actionsecosystem keeps tags current; OpenSSF Scorecard will still flag pinned-deps as an advisory and can be revisited. Documented indocs/security_tooling.md.mainrun).codeql-action@v3,ossf/scorecard-action@v2,attest-build-provenance@v2,upload-artifact@v4) match current majors and the repo's tag convention.server.jsonstill reads0.15.0(separate pre-existing release-metadata drift) — left out of scope here; the new publish tag-gate coverspyprojectversion, notserver.json.🤖 Generated with Claude Code
https://claude.ai/code/session_0195S6jDSNCgWjmmLXXiDRSH
Generated by Claude Code