feat(adr-102): add kg admin verify-backup — validate a backup without restoring#493
Merged
Conversation
…ut restoring
Exposes the offline backup-object oracle as a first-class CLI command, backed by a
new server-side endpoint — single source of truth, no cross-language drift.
Architecture (per maintainer decision): the validation rules live once, in Python
(scripts/development/lint/lint_backup.py). The CLI does ZERO validation logic; it
uploads the file and renders the server's report.
- api/Dockerfile + Dockerfile.rocm-host: COPY scripts/development/lint/ into the API
image. The oracle was only present in dev via the repo mount; this ships it so the
endpoint works in production too. Oracle stays stdlib-only / standalone.
- api/app/lib/backup_oracle.py (new): thin adapter that loads lint_backup by path
(importlib spec_from_file_location, cached) and exposes validate_backup_object()
returning a JSON report {ok, format_version, errors, warnings, notices, issues}.
- POST /admin/backup/verify (new route): accepts a .tar.gz or .json upload (same
containers as /restore), runs the oracle + best-effort de-interned statistics,
returns the report. Read-only — no graph access, nothing queued. Gated by
backups:read (admin-default; grant to another role to delegate verification).
- CLI: client.verifyBackup() + `kg admin verify-backup [file]` (positional path,
--file from the backup dir, or interactive pick). Renders errors/warnings/notices
with codes + JSON-path locations, record counts, and a pass/fail verdict; exits
nonzero on errors.
Why a sibling command (verify-backup) rather than `backup verify`: backup/restore/
backups are flat siblings under admin; converting backup into a group risked the
existing leaf command, so verify-backup is a clean, discoverable sibling.
Tests: tests/api/test_backup_verify.py (5) — valid ok, dimension-mismatch surfaced,
bad extension 400, invalid JSON 400, legacy format refused. Live end-to-end verified
against a real 16 MB archive (3994 concepts) → "Valid backup".
NOTE: the API image must be rebuilt/republished for verify to work in production
(the COPY line is new). Do not publish without maintainer approval.
- Clean up the saved .tar.gz in the finally block: extraction failures (corrupt gzip / missing manifest) previously leaked the uploaded archive in /tmp — exactly the malformed-archive case verify exists to catch. (The same pre-existing pattern in /restore is left for a separate fix.) - Guard file.filename None -> treat as bad extension (400) instead of AttributeError 500. - Set external_deps=0 in the stats-fallback branch for response-shape symmetry. - Dockerfiles: COPY just lint_backup.py (not the whole lint dir) to avoid shipping __pycache__ and unrelated lint scripts into the API image. 5/5 verify route tests still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A first-class CLI command to validate a
kg-backup/2file without restoring it, backed by a new server-side endpoint. Implements the maintainer-chosen architecture (Option A): the validation rules live once, in Python (scripts/development/lint/lint_backup.py); the CLI does zero validation logic — it uploads the file and renders the server's report. No cross-language drift.Pieces
api/Dockerfile+Dockerfile.rocm-host:COPY scripts/development/lint/into the image. The oracle was only present in dev via the repo mount; this ships it so the endpoint works in production. Oracle stays stdlib-only / standalone.api/app/lib/backup_oracle.py(new): loadslint_backupby path (cachedimportlib), exposesvalidate_backup_object()→{ok, format_version, errors, warnings, notices, issues}.POST /admin/backup/verify: accepts.tar.gz/.jsonupload (same containers as/restore), runs the oracle + best-effort de-interned statistics, returns the report. Read-only — no graph access, nothing queued.client.verifyBackup()+kg admin verify-backup [file](positional path /--filefrom backup dir / interactive pick). Renders issues with codes + JSON-path locations, record counts, pass/fail verdict; exits nonzero on errors.Permissions
Gated by
backups:read— admin-default (migration 028/037 grant it toadmin+platform_admin), and an admin can grantbackups:readto any other role to delegate verification. Verify is read-only and strictly less privileged than restore, so it intentionally does not reusebackups:restore. (If you'd prefer a distinctbackups:verifyaction, that's a small migration — flagged, not done.)Naming
verify-backupis a clean sibling ofbackup/restore/backups(which are flat underadmin). Convertingbackupinto a subcommand group would have risked the existing leaf command, so a sibling was the low-risk choice closest to the requestedkg admin backup verify.Tests
tests/api/test_backup_verify.py(5): valid → ok; dimension-mismatch surfaced (E_CONCEPT_EMBEDDING_DIM); bad extension → 400; invalid JSON → 400; legacy format refused (E_LOWER_MAJOR). Live end-to-end verified against a real 16 MB archive. CLI builds clean.The
COPY scripts/development/lint/line is new, so the API image must be rebuilt/republished for verify to work in production (./publish.sh images). Not published from here — needs maintainer approval.Follow-on (parked)
"Pick one ontology to restore out of a full backup" — a filtered restore that builds on this inspect/verify flow. Deserves its own design/ADR.
🤖 Generated with Claude Code