Skip to content

Content-addressed batch execution engine — deterministic sharding, queryable outputs, no database required

License

Notifications You must be signed in to change notification settings

mcp-tool-shop-org/code-batch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

日本語 | 中文 | Español | Français | हिन्दी | Italiano | Português (BR)

CodeBatch

CI MIT License Landing Page

Content-addressed batch execution engine with deterministic sharding and queryable outputs.

What it is: A filesystem-based execution substrate that snapshots code, shards work deterministically, and indexes every output for structured queries — no database required.

Who it's for: Developers building repeatable code analysis pipelines, CI integrations, or batch transformation workflows that need reproducibility and auditability.

Why it's different: Every input is content-addressed and every execution is deterministic. Re-run the same batch six months later and get identical results. Query outputs by semantic type without parsing logs.

Overview

CodeBatch provides a filesystem-based execution substrate for running deterministic transformations over codebases. It captures inputs as immutable snapshots, executes work in isolated shards, and indexes all semantic outputs for efficient querying—without requiring a database.

Documentation

Quick Start

# Initialize a store
codebatch init ./store

# Create a snapshot of a directory
codebatch snapshot ./my-project --store ./store

# List available pipelines
codebatch pipelines

# Initialize a batch with a pipeline
codebatch batch init --snapshot <id> --pipeline full --store ./store

# Run all tasks and shards (Phase 5 workflow)
codebatch run --batch <id> --store ./store

# View progress
codebatch status --batch <id> --store ./store

# View summary
codebatch summary --batch <id> --store ./store

Human Workflow (Phase 5)

Phase 5 adds human-friendly commands that compose existing primitives:

# Run entire batch (no manual shard iteration needed)
codebatch run --batch <id> --store ./store

# Resume interrupted execution
codebatch resume --batch <id> --store ./store

# Progress summary
codebatch status --batch <id> --store ./store

# Output summary
codebatch summary --batch <id> --store ./store

Discoverability

# List pipelines
codebatch pipelines

# Show pipeline details
codebatch pipeline full

# List tasks in a batch
codebatch tasks --batch <id> --store ./store

# List shards for a task
codebatch shards --batch <id> --task 01_parse --store ./store

Query Aliases

# Show errors
codebatch errors --batch <id> --store ./store

# List files in a snapshot
codebatch files --batch <id> --store ./store

# Top output kinds
codebatch top --batch <id> --store ./store

Exploration & Comparison (Phase 6)

Phase 6 adds read-only views for exploring outputs and comparing batches—without modifying the store.

# Inspect all outputs for a file
codebatch inspect src/main.py --batch <id> --store ./store

# Compare two batches
codebatch diff <batchA> <batchB> --store ./store

# Show regressions (new/worsened diagnostics)
codebatch regressions <batchA> <batchB> --store ./store

# Show improvements (fixed/improved diagnostics)
codebatch improvements <batchA> <batchB> --store ./store

# Explain data sources for any command
codebatch inspect src/main.py --batch <id> --store ./store --explain

Low-Level Commands

For fine-grained control, the original commands remain available:

# Run a specific shard
codebatch run-shard --batch <id> --task 01_parse --shard ab --store ./store

# Query outputs
codebatch query outputs --batch <id> --task 01_parse --store ./store

# Query diagnostics
codebatch query diagnostics --batch <id> --task 01_parse --store ./store

# Build LMDB acceleration cache
codebatch index-build --batch <id> --store ./store

Spec Versioning

The specification uses semantic versioning with draft/stable markers. Each version is tagged in git (e.g., spec-v1.0-draft). Breaking changes increment the major version. Implementations should declare which spec version they target and tolerate unknown fields for forward compatibility.

Project Structure

schemas/      JSON Schema definitions for all record types
src/          Core implementation
tests/        Test suites and fixtures
docs/         Documentation
.github/      CI/CD workflows

Support

Security & Data Scope

CodeBatch is a local-first CLI tool — no network requests, no telemetry, deterministic execution.

  • Data accessed: Reads source files for content-addressed snapshotting (SHA-256). Writes batch stores, shard outputs, and LMDB indexes to user-specified directories.
  • Data NOT accessed: No network requests. No telemetry. No cloud services. No credential storage.
  • Permissions required: File system read for source directories, write for store/output directories.

See SECURITY.md for vulnerability reporting.


Scorecard

Category Score
Security 10/10
Error Handling 10/10
Operator Docs 10/10
Shipping Hygiene 10/10
Identity 10/10
Overall 50/50

License

MIT


Built by MCP Tool Shop

About

Content-addressed batch execution engine — deterministic sharding, queryable outputs, no database required

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors

Languages