Build an MVP local search and browse tool for python-github-backup output. The tool should help both humans and agents search issues, pull requests, discussions, releases, comments, reviews, and attachment metadata without requiring GitHub API access.
The MVP should be useful with this local backup:
C:\CodeBlocks\ggml-org-backup\backup
The project itself lives at:
C:\CodeBlocks\github-backup-browser
- No GitHub API calls.
- No authentication.
- No live sync.
- No import of binary attachment contents into SQLite.
- No OCR or indexing of images/files attached to issues/PRs/discussions.
- No source-code repository indexing.
- No multi-user permissions.
- No Go implementation.
- No raw JSON storage in the DB.
If a missing JSON field becomes important, re-run import after a schema change.
The tool should be easy for agents to call with minimal arguments.
Resolve the database path as:
GHBB_DB, if the environment variable is set and non-empty../ghbb.db, relative to the current working directory.
There should be no --db option in the MVP.
ghbb import accepts one optional positional backup root:
ghbb import [BACKUP_ROOT]If omitted, detect a backup root in this order:
./backup, if./backup/repositoriesexists.., if./repositoriesexists.- Otherwise fail with a clear error message.
There should be no --org option. Derive repository owner and repo name from JSON URLs. See importer.md.
Use uv to manage dependencies.
Initial command:
uv add typer rich fastapi "uvicorn[standard]" jinja2 markdown-it-py bleachRecommended use:
typer: CLI command routing.rich: CLI tables, progress bars, readable errors.fastapi: local web app and JSON API.uvicorn: development/local server.jinja2: HTML templates.markdown-it-py: Markdown rendering for issue/PR/discussion bodies.bleach: sanitize rendered Markdown before serving HTML.sqlite3: Python stdlib; no SQLAlchemy needed for MVP.json,pathlib,datetime,re,urllib.parse: Python stdlib.
Before implementing FTS, confirm Python's SQLite has FTS5:
uv run python -c "import sqlite3; c=sqlite3.connect(':memory:'); c.execute('create virtual table t using fts5(x)'); print('fts5 ok')"Current project root is already a standalone uv project.
C:\CodeBlocks\github-backup-browser\
pyproject.toml
README.md
docs/
src/
ghbb/
__init__.py # can keep script entry or delegate to cli.main
cli.py # Typer app
config.py # DB path and backup root resolution
db.py # SQLite connection, schema setup, transactions
schema.sql # SQL schema
importer.py # backup traversal and import orchestration
normalizers.py # issue/PR/discussion/release parsing
search.py # FTS query construction and search SQL
render.py # Markdown rendering/sanitization helpers
web.py # FastAPI app factory
templates/
base.html
index.html
search.html
repo.html
item.html
static/
app.css
tests/
test_repo_derivation.py
test_normalizers.py
test_search_query.py
Update the script entry in pyproject.toml if needed:
[project.scripts]
ghbb = "ghbb.cli:main"For MVP, a full rebuild on each import is acceptable and recommended for simplicity.
Suggested import flow:
- Resolve DB path from
GHBB_DBor./ghbb.db. - Resolve backup root.
- Validate backup root contains
repositories/. - Open SQLite.
- Ensure schema exists.
- In one transaction:
- clear imported tables;
- insert one
import_runsrow; - scan all repositories;
- import labels;
- import issues;
- import pull requests;
- import discussions;
- import releases and release asset metadata;
- import attachment manifests as metadata;
- rebuild FTS documents.
- Print counts.
This avoids subtle incremental import bugs. The current test backup has about 30k issue/PR/discussion/release JSON files, which is reasonable for full re-import.
A later version can add incremental import using source file size/mtime/hash columns.
Deliver:
uv run ghbb --help
uv run ghbb db-pathTasks:
- Add Typer CLI.
- Implement
get_db_path()usingGHBB_DBor./ghbb.db. - Implement backup root resolution.
- Add basic error formatting.
Acceptance:
uv run ghbb --helpworks.uv run ghbb db-pathprints the resolved database path.GHBB_DB=C:\tmp\test.db uv run ghbb db-pathprints the override.
db-path is not essential for end users, but useful for agents and smoke tests.
Deliver:
uv run ghbb import C:\CodeBlocks\ggml-org-backup\backup
uv run ghbb statsTasks:
- Add schema from schema.md.
- Implement normalizers from importer.md.
- Import item/comment/label/attachment metadata.
- Import release asset metadata.
- Rebuild FTS.
- Print import summary.
Acceptance:
- Import completes without reading binary attachment contents.
statsshows expected repo and item counts from testing.md.- Re-running import produces the same counts.
Deliver:
uv run ghbb search "Fabrice Bellard"
uv run ghbb show ggml issue 1
uv run ghbb show llama.cpp pull 10001
uv run ghbb search "KV cache" --json
uv run ghbb show ggml issue 1 --jsonTasks:
- Implement FTS query builder.
- Implement filters:
--repo,--kind,--state,--author,--label,--limit,--offset,--json. - Implement
showby repo/kind/key. - Format human output with Rich.
- Return stable JSON for agents.
Acceptance:
- Search finds matches in item bodies and comments.
- Results link comment/review hits to parent item.
- JSON output includes IDs needed for follow-up calls.
Deliver:
uv run ghbb serveTasks:
- Add FastAPI app.
- Add HTML pages:
- home/search page;
- search results;
- repo browse;
- item thread view.
- Add JSON API endpoints from cli-web-api.md.
- Render Markdown safely.
- Do not serve local attachment files by default.
Acceptance:
- Browser opens at
http://127.0.0.1:8765/. - Search works in the browser.
- Item page shows title, metadata, body, comments/reviews/replies, labels, and attachment metadata.
- API endpoints return JSON useful for agents.
Tasks:
- Add tests for repo derivation and normalizers.
- Add README usage examples.
- Add clear error messages for missing FTS5, bad backup roots, empty DB, malformed JSON.
- Add import progress bars.
- Add pagination for web and CLI.
- Add basic performance pragmas.
During import:
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA temp_store = MEMORY;
PRAGMA foreign_keys = ON;Use executemany for bulk inserts where practical. Keep import inside a transaction.
For full rebuild import, clear tables in dependency order or disable foreign keys only if necessary. Prefer ON DELETE CASCADE and simple deletes from root tables.
The web app is a local browser for untrusted GitHub-authored content.
- Bind to
127.0.0.1by default. - Escape all plain text.
- Sanitize Markdown-rendered HTML with
bleach. - Do not execute scripts from GitHub content.
- Do not render raw
bodyHTMLfrom GitHub discussion JSON without sanitizing. - Do not serve downloaded attachment files in the MVP.
- Link original attachment URLs as external links.
A different agent should be able to run:
cd C:\CodeBlocks\github-backup-browser
uv sync
uv run ghbb import C:\CodeBlocks\ggml-org-backup\backup
uv run ghbb stats
uv run ghbb search "Fabrice Bellard"
uv run ghbb show ggml issue 1
uv run ghbb serveAnd get:
- correct import counts;
- useful CLI search results;
- useful JSON output with
--json; - a local web UI that can search and browse threads.