Skip to content

HrxuAlbert/H-CSC

Repository files navigation

H-CSC: Hierarchical Certified Semantic Commitment

Byzantine-fault-tolerant finality control for LLM-agent collaboration.

License: MIT Python 3.9+

Reference implementation, benchmarks, and reproduction scripts for the paper:

Hierarchical Certified Semantic Commitment: Typed Finality Control for Byzantine-Robust LLM-Agent Collaboration. Haoran Xu. arXiv:XXXX.XXXXX (2026). (update with the arXiv ID once live)


What is H-CSC?

Given n LLM agents of which up to f may be Byzantine, and a round of structured natural-language proposals, H-CSC decides what kind of finality the round supports and emits exactly one typed, certificate-backed outcome:

  • semantic_commit — a 2f+1 within-verdict semantic core backs the verdict; the protocol emits a parameter-bound digest over the quantised embedding aggregate.
  • verdict_commit — the verdict has a 2f+1 quorum and a margin, but the semantic rationale is dispersed; a verdict-level certificate is emitted with no semantic aggregate.
  • abort — neither finality signal is admissible; an explicit typed reason is returned.

Every outcome carries the same 2f+1 distinct-signer certificate envelope; only the underlying digest differs. The contribution is typed finality, not raw commit accuracy.


Repository layout

hcsc-release/
├── fba/                     # Core H-CSC protocol (typed commitment, certificates,
│                            #   semantic core, geometric-median aggregation, digest)
├── bench/                   # MVR-50 real-agent benchmark engine (Climate-FEVER)
│   ├── scripts/             #   honest/Byzantine generation, commitment, analysis
│   ├── scripts/coverage_recovery/   # baselines B0–B3 + bootstrap CIs + figures
│   ├── configs/ prompts/ examples/  # run configs, attack/agent prompts, schema samples
│   └── data/                # frozen tasks/views + frozen LLM proposals (for cached repro)
├── experiments_commitment/  # BCS_v1 controlled-diagnostic runners
├── experiments_corrected/   # corrected-pipeline audits + topology design-space ablation
├── tests_commitment/  tests_corrected/  tests_bench/   # test suites (all CPU, offline)
├── scripts/                 # one-command reproduction scripts
└── data/                    # download_artifacts.py → fetch CRSE checkpoint + datasets

The protocol core (fba/) and benchmark (bench/) form one importable workspace: run everything with the repo root on PYTHONPATH (the scripts set PYTHONPATH=.).


Install

git clone https://github.com/HrxuAlbert/H-CSC.git
cd H-CSC
pip install -r requirements.txt

CPU-only is sufficient for the headline reproduction below.


Quick reproduction (CPU-only, no API key, no checkpoint)

The headline MVR-50 results are reproducible from cached embeddings shipped in bench/data/results/coverage_recovery/embeddings_cache.npz — no model download, no API spend. This regenerates the per-task outcomes, bootstrap CIs, and the trade-off figure:

export PYTHONPATH=.
python3 -m bench.scripts.coverage_recovery.run_coverage_recovery_variants
python3 -m bench.scripts.coverage_recovery.bootstrap_coverage_recovery_ci
python3 -m bench.scripts.coverage_recovery.plot_hcsc_tradeoff      # writes the forest plot PDF to a local (gitignored) output dir

Run the test suites (all offline, CPU):

PYTHONPATH=. python3 tests_commitment/run_tests_commitment.py     # 57 protocol tests
PYTHONPATH=. python3 tests_bench/run_tests_bench.py               # benchmark tests

Full reproduction (with the CRSE encoder)

To reproduce the exact paper numbers (CRSE embeddings, not the base-encoder fallback) and to re-run the commitment pipeline over the frozen proposals, fetch the externally-hosted artifacts first:

python3 data/download_artifacts.py        # CRSE checkpoint + BCS_v1 / dataset files
bash scripts/repro_mvr50_from_cache.sh    # CPU-only; re-runs commitment from frozen proposals
bash scripts/repro_bcs_only.sh            # BCS_v1 controlled diagnostic

See data/download_artifacts.py for the externally-hosted artifacts and their URLs.

Re-running the live LLM-agent generation (optional, costs API credits)

export OPENAI_API_KEY=...      # and/or ANTHROPIC_API_KEY / OPENROUTER_API_KEY
# edit a config under bench/configs/ (set api_calls_enabled: true), then:
python3 -m bench.scripts.run_mvr_variant --config bench/configs/real_agent_mvr50.yaml

Generation is cache-first: existing responses are never re-requested. See bench/api_cost_plan.md for the cost model.


Data & checkpoints (external)

GitHub hosts code + small frozen inputs needed to run and reproduce the benchmark only. The CRSE checkpoint (~419 MB) and the BCS_v1 / Climate-FEVER source datasets (multi-GB) are hosted externally; data/download_artifacts.py fetches them into the paths the code expects (Colab_trained_Model/500pt_best_model.pt, data/). Set the URLs at the top of that script once the Zenodo/Hugging Face records are minted.

Citation

@article{xu2026hcsc,
  title   = {Hierarchical Certified Semantic Commitment: Typed Finality Control
             for Byzantine-Robust LLM-Agent Collaboration},
  author  = {Xu, Haoran},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2026}
}

License

MIT — see LICENSE.

About

Hierarchical Certified Semantic Commitment (H-CSC): Byzantine-fault-tolerant finality control for LLM-agent collaboration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors