GitHub - mcp-tool-shop-org/code-batch: Content-addressed batch execution engine — deterministic sharding, queryable outputs, no database required

Content-addressed batch execution engine with deterministic sharding and queryable outputs.

What it is: A filesystem-based execution substrate that snapshots code, shards work deterministically, and indexes every output for structured queries — no database required.

Who it's for: Developers building repeatable code analysis pipelines, CI integrations, or batch transformation workflows that need reproducibility and auditability.

Why it's different: Every input is content-addressed and every execution is deterministic. Re-run the same batch six months later and get identical results. Query outputs by semantic type without parsing logs.

Overview

CodeBatch provides a filesystem-based execution substrate for running deterministic transformations over codebases. It captures inputs as immutable snapshots, executes work in isolated shards, and indexes all semantic outputs for efficient querying—without requiring a database.

Documentation

SPEC.md — Full storage and execution specification
docs/TASKS.md — Task reference (parse, analyze, symbols, lint)
CHANGELOG.md — Version history

Quick Start

# Initialize a store
codebatch init ./store

# Create a snapshot of a directory
codebatch snapshot ./my-project --store ./store

# List available pipelines
codebatch pipelines

# Initialize a batch with a pipeline
codebatch batch init --snapshot <id> --pipeline full --store ./store

# Run all tasks and shards (Phase 5 workflow)
codebatch run --batch <id> --store ./store

# View progress
codebatch status --batch <id> --store ./store

# View summary
codebatch summary --batch <id> --store ./store

Human Workflow (Phase 5)

Phase 5 adds human-friendly commands that compose existing primitives:

# Run entire batch (no manual shard iteration needed)
codebatch run --batch <id> --store ./store

# Resume interrupted execution
codebatch resume --batch <id> --store ./store

# Progress summary
codebatch status --batch <id> --store ./store

# Output summary
codebatch summary --batch <id> --store ./store

Discoverability

# List pipelines
codebatch pipelines

# Show pipeline details
codebatch pipeline full

# List tasks in a batch
codebatch tasks --batch <id> --store ./store

# List shards for a task
codebatch shards --batch <id> --task 01_parse --store ./store

Query Aliases

# Show errors
codebatch errors --batch <id> --store ./store

# List files in a snapshot
codebatch files --batch <id> --store ./store

# Top output kinds
codebatch top --batch <id> --store ./store

Exploration & Comparison (Phase 6)

Phase 6 adds read-only views for exploring outputs and comparing batches—without modifying the store.

# Inspect all outputs for a file
codebatch inspect src/main.py --batch <id> --store ./store

# Compare two batches
codebatch diff <batchA> <batchB> --store ./store

# Show regressions (new/worsened diagnostics)
codebatch regressions <batchA> <batchB> --store ./store

# Show improvements (fixed/improved diagnostics)
codebatch improvements <batchA> <batchB> --store ./store

# Explain data sources for any command
codebatch inspect src/main.py --batch <id> --store ./store --explain

Low-Level Commands

For fine-grained control, the original commands remain available:

# Run a specific shard
codebatch run-shard --batch <id> --task 01_parse --shard ab --store ./store

# Query outputs
codebatch query outputs --batch <id> --task 01_parse --store ./store

# Query diagnostics
codebatch query diagnostics --batch <id> --task 01_parse --store ./store

# Build LMDB acceleration cache
codebatch index-build --batch <id> --store ./store

Spec Versioning

The specification uses semantic versioning with draft/stable markers. Each version is tagged in git (e.g., spec-v1.0-draft). Breaking changes increment the major version. Implementations should declare which spec version they target and tolerate unknown fields for forward compatibility.

Project Structure

schemas/      JSON Schema definitions for all record types
src/          Core implementation
tests/        Test suites and fixtures
docs/         Documentation
.github/      CI/CD workflows

Support

Questions / help: Discussions
Bug reports: Issues

Security & Data Scope

CodeBatch is a local-first CLI tool — no network requests, no telemetry, deterministic execution.

Data accessed: Reads source files for content-addressed snapshotting (SHA-256). Writes batch stores, shard outputs, and LMDB indexes to user-specified directories.
Data NOT accessed: No network requests. No telemetry. No cloud services. No credential storage.
Permissions required: File system read for source directories, write for store/output directories.

See SECURITY.md for vulnerability reporting.

Scorecard

Category	Score
Security	10/10
Error Handling	10/10
Operator Docs	10/10
Shipping Hygiene	10/10
Identity	10/10
Overall	50/50

License

MIT

Built by MCP Tool Shop

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
schemas		schemas
scripts		scripts
site		site
src/codebatch		src/codebatch
tests		tests
.gitignore		.gitignore
.spec_baseline_hash		.spec_baseline_hash
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.es.md		README.es.md
README.fr.md		README.fr.md
README.hi.md		README.hi.md
README.it.md		README.it.md
README.ja.md		README.ja.md
README.md		README.md
README.pt-BR.md		README.pt-BR.md
README.zh.md		README.zh.md
SCORECARD.md		SCORECARD.md
SECURITY.md		SECURITY.md
SHIP_GATE.md		SHIP_GATE.md
SPEC.md		SPEC.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Documentation

Quick Start

Human Workflow (Phase 5)

Discoverability

Query Aliases

Exploration & Comparison (Phase 6)

Low-Level Commands

Spec Versioning

Project Structure

Support

Security & Data Scope

Scorecard

License

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

License

mcp-tool-shop-org/code-batch

Folders and files

Latest commit

History

Repository files navigation

Overview

Documentation

Quick Start

Human Workflow (Phase 5)

Discoverability

Query Aliases

Exploration & Comparison (Phase 6)

Low-Level Commands

Spec Versioning

Project Structure

Support

Security & Data Scope

Scorecard

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages