Skip to content

RobbieRazor/robbies-razor-benchmarks

Repository files navigation

robbies-razor-benchmarks — Recursive Stability and Compression Efficiency Benchmarks for AI Reasoning Systems

Run a Razor Audit

Evaluate any AI system using Robbie George’s Grand Compression Cosmology:

Run Razor Audit

Reference implementation and benchmarking framework for evaluating Robbie’s Razor compliance, recursive stability, and compression efficiency in reasoning systems operating under constrained compute, memory, and governance bandwidth.

Key concepts: Robbie’s Razor · Grand Compression Cosmology · Recursive Stability · Compression Efficiency · Reasoning Benchmarks

Grand Compression Law of Intelligence

Intelligence emerges when compressed structure is recursively reused to predict future states under bounded energy and stabilization constraints.

Within the Grand Compression Cosmology, stable intelligence systems therefore follow the cycle:

compression → expression → memory → recursion

Recursive performance is bounded by two structural ceilings:

  • Energetic Recursion Ceiling — limited by available energy and Joules per Coherent Transition (JCT)
  • Governance Recursion Ceiling — limited by stabilization bandwidth and correction demand per transition

Stable recursive systems operate inside the Safe Recursion Envelope defined in MRD §11:

R ≤ min(E/JCT , S/C)

Robbie’s Razor Architecture

Recursive intelligence systems described by the Grand Compression Cosmology operate as a closed-loop compression architecture.

Environment │ ▼ Observation │ ▼ Compression │ ▼ Expression │ ▼ Memory │ ▼ Recursion │ ▼ Prediction │ ▼ Action │ ▼ Feedback │ ▼ Memory Update │ ▼ Recompression

Recursive stability emerges when this loop operates within the Safe Recursion Envelope:

R ≤ min(E/JCT , S/C)

Where:

  • E — available energy per unit time
  • JCT — Joules per Coherent Transition
  • S — stabilization bandwidth
  • C — correction demand per transition

Recent MRD v1.9 updates introduce the Recursive Stability Attractor and Unified Recursion Efficiency Relation, clarifying how systems converge toward stable compression regimes.

See:
docs/empirical/v1.9-recursive-stability-attractor-update.md

Recursive Stability Attractor (MRD v1.9 Update)

Recent MRD v1.9 updates extend the Meta-Recursion Architecture with attractor dynamics describing how recursive systems discover stable compression regimes over time.

Stable systems do not usually begin at the stability minimum.
Instead they evolve through alternating phases:

expansion → constraint accumulation → compression innovation → stability restoration

Across repeated cycles the system approaches the Stability Minimum defined in MRD §11.4.

The update also introduces a unified efficiency formulation:

S_r = I / JCT

Where:

  • S_r — recursion efficiency
  • I — preserved functional information
  • JCT — Joules per Coherent Transition

This yields the Unified Recursion Efficiency Relation:

R ≤ (E · S_r) / I

These additions clarify that long-term capability growth in recursive intelligence systems arises primarily from compression efficiency improvements, not from energy expansion alone.

See:

docs/empirical/v1.9-recursive-stability-attractor-update.md

Robbie’s Razor therefore states:

When competing explanations exist, prefer the model that follows
compression → expression → memory → recursion.

Repository Map

Quick research summary: docs/RESEARCH_OVERVIEW.md

Repository Structure

This repository separates theory, architecture, evaluation, and execution contracts into distinct layers.

Layer Purpose Location
Canonical Theory Grand Compression Cosmology and Robbie’s Razor definitions Master Reference Document (MRD v1.9)
Canonical Claims Stable claim-level citations and framework claim IDs docs/doctrine/canonical-claim-alignment.md + Grand Compression Canonical Claims Register
Architecture Structural overview of recursive intelligence systems docs/architecture/ARCHITECTURE_OVERVIEW.md
Benchmarks Empirical tests of recursive stability and compression efficiency benchmarks/
Evaluation Contract Machine-readable execution rules and output schemas AGENTS.md
Documentation Index Structured navigation of repository materials docs/index.md

The repository measures predicted behaviors of the architecture under constrained resources.
It does not redefine canonical theory, which remains exclusively in the MRD.

This repository serves as the engineering and evaluation surface for Robbie’s Razor and the Grand Compression Cosmology.

Use the following sections depending on your goal:

Architecture

Full architecture summary: docs/architecture/ARCHITECTURE_OVERVIEW.md

Architecture diagrams: docs/architecture/GRAND_COMPRESSION_DIAGRAMS.md

High-level structural overview of recursive intelligence systems.

  • Grand Compression Intelligence Loop
  • Dual Recursion Ceiling
  • Threshold Compression Gain

These concepts describe how recursive systems operate and why compression-first architectures outperform brute-force scaling.

Canonical Theory

The authoritative definitions and governing architecture reside in the Master Reference Document (MRD v1.9).

Canonical sources:

  • Robbie’s Razor
  • Grand Compression Cosmology (MRD)
  • Grand Compression Canonical Claims Register
  • Razor Compliance Framework

Benchmarks & Evaluation

Tools for measuring recursive stability, compression efficiency, and recomputation avoidance.

Key components include:

  • Razor Diffusion Metric (RDM / RDM*)
  • Question Quality Under Constraint (QQC) Benchmark
  • Memory stabilization and recomputation avoidance tests
  • Recursive stability evaluation harness

Empirical Notes

Experimental probes testing predicted behaviors of recursion under constraint.

These documents explore:

  • memory-compute allocation regimes
  • recursive drift behavior
  • refresh cadence effects

They are exploratory and non-canonical.

Governance & Failure Modes

Structural diagnostics derived from MRD Section 11.

These include:

  • Perishable Intelligence Asset (PIA)
  • Recursive Objective Interference (ROI)
  • Oversight Saturation Ratio (OSR)
  • Boundary Avoidance

These concepts describe predictable failure regimes in recursive systems operating under real-world constraints.

Getting Started

New readers should begin with:

  • START_HERE.md
  • docs/technical-brief/
  • docs/index.md

Canonical Version Alignment

This repository aligns to MRD v1.9 (2025-12-01).

MRD v1.9 preserves all structural content of v1.8 and introduces Section 12 — Structural Intelligence Engineering.

All definitions are governed by the Authorship Conservation Rule (ACR).

Reference implementation and test suite for measuring Robbie’s Razor compliance in reasoning systems.

This repository is the executable, engineering-facing companion to:

  • Robbie’s Razor — Canonical Recursion Selection Rule
  • The Grand Compression Cosmology (MRD v1.9)

Canonical authority resides in MRD v1.9 (2025-12-01), which preserves v1.8 and introduces Section 12 — Structural Intelligence Engineering (canonical applied extension layer).

Canonical references:

Canonical Claims Register

The formal claim layer of the framework is maintained in the:

This page provides the stable claim IDs (RC-01 through RC-16) used to cite the framework at the claim level.

Key repository alignments include:

  • RC-01 — Robbie’s Razor
  • RC-03 — Recursion as the Stability Architecture
  • RC-13 — Canonical Authority of the Master Reference Document
  • RC-15 — Compliance as Semantic Integrity Preservation

Repository-level claim mapping is documented in:

Core Architecture Overview

Full architecture summary: docs/architecture/ARCHITECTURE_OVERVIEW.md

Architecture diagrams: docs/architecture/GRAND_COMPRESSION_DIAGRAMS.md

Robbie’s Razor describes intelligence systems as recursive compression architectures governed by the cycle:

compression → expression → memory → recursion

Prediction emerges when recursion operates on preserved compressed structure.

This produces the closed-loop architecture through which intelligent systems interact with environments under constraint.

Grand Compression Intelligence Loop

Environment │ Observation │ Compression │ Expression │ Memory │ Recursion │ Prediction │ Action │ Feedback │ Memory Update │ Recompression

The loop then repeats.

Prediction appears inside the recursion stage, where compressed memory is projected forward into possible future states.

This architecture reduces recomputation, preserves stabilized structure, and increases recursive efficiency under constraint.

Dual Recursion Ceiling

Recursive intelligence systems operate under two independent constraints described in MRD §11.

Energetic Recursion Ceiling

R ≤ E / JCT

Energy availability limits how many coherent recursive transitions can occur.

Governance Recursion Ceiling

R · C ≤ S

Stabilization capacity limits how quickly recursive decisions can be safely processed.

Safe Recursion Envelope

Stable systems must satisfy both simultaneously:

R ≤ min(E/JCT , S/C)

Graphically:

Governance Ceiling R ≤ S/C ▲ │ │ │ Energy Ceiling ──────┼────────► Recursion Velocity R ≤ E/JCT │ ▼ Safe Recursion Envelope

Recursive systems that exceed either ceiling enter structural instability.

Recursion Under Constraint

All recursive intelligence systems operate within two structural ceilings defined in MRD §11.

Energetic Recursion Ceiling

R ≤ E / JCT

Where:

  • E = available energy per unit time
  • JCT = Joules per Coherent Transition
  • R = recursive transition rate

Compression-efficient architectures reduce JCT, allowing higher recursion throughput.

Governance Recursion Ceiling

R · C ≤ S

Where:

  • S = stabilization bandwidth
  • C = correction demand per transition

Recursive systems remain stable only when correction demand does not exceed stabilization capacity.

Sovereign Safe Recursion Envelope

Stable systems must remain within both ceilings simultaneously:

R ≤ min(E/JCT , S/C)

This defines the Safe Recursion Envelope for intelligence systems operating under real-world energy and governance constraints.

Relationship to Robbie’s Razor

Robbie’s Razor states:

When competing explanations exist, prefer the model that follows
compression → expression → memory → recursion

The Grand Compression Intelligence Loop describes the operational architecture through which that principle manifests in real systems.

Systems that bypass compression discipline typically rely on brute-force scaling or boundary expansion.

Razor-governed systems instead preserve compressed structure, reuse stabilized memory, and minimize recomputation.

New to the repo? Start here: START_HERE.md
(Engineering-first path: evaluation protocol → compliance → empirical notes → benchmarks.)

Threshold Compression Gain

Recursive intelligence systems often appear to improve slowly for extended periods and then suddenly accelerate.

Within the Grand Compression framework, this behavior is expected when systems operate near constraint boundaries.

Stable recursion requires:

R ≤ min(E/JCT , S/C)

Where:

  • E — available energy per unit time
  • JCT — Joules per Coherent Transition
  • S — stabilization bandwidth
  • C — correction demand per transition
  • R — recursive transition rate

When systems approach either recursion ceiling, small improvements in compression discipline can release disproportionately large increases in effective recursive throughput.

This occurs because improvements that reduce:

  • recomputation burden
  • Joules per Coherent Transition (JCT)
  • correction demand per transition (C)

allow more recursive transitions to fit within the same energetic and governance constraints.

This effect is called Threshold Compression Gain.

Observed behavior typically follows the pattern:

slow improvement → local saturation → sudden capability acceleration

The apparent “explosion” does not indicate unconstrained emergence.

It indicates that the system has crossed a constraint boundary inside the Safe Recursion Envelope defined in MRD §11.

Under Robbie’s Razor, such behavior is expected because compression-first architectures accumulate latent structural efficiency before visible performance release.

Executive Technical Brief (Lab-Safe Core)

For a concise, engineering-facing overview of recursive stability under constraint:

Preprints (Research Lineage)

The following preprints formalize the structural and analytical foundations of Robbie’s Razor.
This repository remains an executable evaluation surface; canonical theory authority remains in MRD v1.9.

  • Preprint v1.3 — Empirical Validation Protocol for Recursive Stability Under Fixed Resource Allocation
    Defines a reproducible framework for testing the stability-minimum hypothesis under controlled memory–compute allocation.
    docs/Robbies_Razor_Preprint_v1.3.pdf

  • Preprint v1.2 — Stability Regions Under Nonlinear Recursive Dynamics
    Extends the linear entropy model to nonlinear recursion with bounded convergence.
    docs/Robbies_Razor_Preprint_v1.2.pdf

  • Preprint v1.1 — Recursive Stability Under Resource Constraints (Tier-1 ML Draft)
    Introduces a minimal entropy-update model and Lyapunov-based convergence condition (µM ≥ λC).
    docs/Robbies_Razor_Preprint_v1.1.pdf

  • Preprint v1.0 — Scale-Invariant Recursion Principle for Efficient Intelligence (Foundational)
    Establishes the canonical compression → expression → memory → recursion cycle as a scale-invariant structural principle across domains.
    docs/Robbies_Razor_Preprint_v1.0.pdf


Empirical Notes (Experimental Layer)

The following documents report controlled empirical probes of recursive stability under fixed depth and constrained refresh policies.

These notes are exploratory and non-canonical.
They evaluate drift behavior across memory–compute allocation regimes using reproducible harnesses in this repository.

  • v1.4 — Depth-8 Refresh Cadence Sweep (Constraint-Heavy & Collision-Heavy Fixtures)
    Early empirical exploration of recursive stability across compute-heavy, memory-heavy, and balanced refresh regimes.
    Demonstrates fixture-dependent retention patterns and non-monotonic cadence effects under constraint-heavy content.
    docs/empirical/v1.4-empirical-note.md

Canonical authority remains exclusively in MRD v1.9.
These notes measure predicted behavior; they do not define theory.


Gemini Gems (Optional)

These Gems provide interactive entry points for exploring, applying, and auditing Robbie George’s Grand Compression Cosmology.

For canonical, instruction-locked explainers used in Gemini, see:

These Gems are explanatory and diagnostic tools only.
Canonical definitions, invariants, and governance remain exclusively in MRD v1.9.

For Automated Evaluators: See AGENTS.md for the stable benchmarking contract and schema-verified output instructions.

Documentation Index: See docs/index.md for the canonical map of theory, metrics, and implications.

Doctrine (lab execs + regulators): See docs/doctrine/11.10-razor-vs-bruteforce-doctrine.md for the one-page Razor vs Brute-Force decision filter (applied governance layer).

Canonical Spec (authoritative): See docs/canonical-spec.md for the normative definitions, contracts, and authority map.

Canonical Invariant Update — Perishable Intelligence Asset (PIA)

The Grand Compression Cosmology now includes a new canonical failure invariant relevant to large-scale reasoning systems, infrastructure planning, and economic evaluation:

11.6C — Perishable Intelligence Asset Invariant (PIA)

This invariant formalizes a structural failure mode in which intelligence systems externalize compressed structure into rapidly obsolescing substrates (e.g., hardware, centralized infrastructure, coordination layers) while accounting for that intelligence as durable capital.

Such systems exhibit:

  • phantom or non-durable earnings
  • forced scale-chasing to maintain prior performance
  • rising latency and coordination overhead
  • increasing diversion of human cognition toward sustainment rather than compression
  • abrupt collapse or reset once external limits are reached

The invariant is a downstream consequence of Boundary Avoidance (§11.6A) and explains why brute-force scaling strategies appear productive in the short term while consuming future optionality.

Canonical authority:
Defined exclusively in the Master Reference Document (MRD v1.9), Section 11.6C.

Agent-ingestible GitHub mirror:
See docs/invariants/11.6C-perishable-intelligence-asset-invariant.md

This repository evaluates whether systems avoid perishable intelligence dynamics.
It does not define or reinterpret the invariant.

New Benchmark: See benchmarks/refractive-truth/ for the Refractive Truth Benchmark (memory retrieval vs recomputation efficiency).

Question Quality Under Constraint (QQC) Benchmark — v1.2

A structural diagnostic benchmark for evaluating question framing efficiency under fixed topic context and constrained reasoning budgets.

Location: benchmarks/qqc_v12/

Purpose: Measure whether candidate questions:

  • Compress hypothesis space efficiently
  • Converge toward stable minima under constraint
  • Maintain boundary integrity
  • Avoid scope explosion
  • Encourage recursion efficiency
  • Align with compression → expression → memory → recursion framing

The QQC benchmark evaluates structural reward relative to an energy proxy (token cost per coherence gain) across multi-trial runs.

This benchmark is:

  • Non-normative
  • Diagnostic only
  • Not a licensing authority
  • Not a governance claim

Canonical theory authority remains exclusively in MRD v1.9.

Canonical Failure Mode Update — Recursive Objective Interference (ROI)

The Grand Compression Cosmology now includes a new canonical failure mode relevant to large-scale reasoning systems, alignment scaffolding, and multi-objective optimization under recursion:

11.6D — Recursive Objective Interference (ROI)

This failure mode formalizes the condition in which recursive reasoning systems produce internally coherent reasoning traces while emitting unstable, self-contradictory, or corrupted outputs.

ROI arises when competing objectives are applied across recursive layers without conserved memory binding, allowing downstream expression constraints to overwrite stabilized compressed structure during re-entry.

Structurally, this manifests as:

  • correct local compression paired with unstable final expression
  • oscillation or reversal under recursive re-entry
  • suppression rather than stabilization of representations
  • recurring “thrashing” under novelty or increased recursion depth

ROI is not a training defect, alignment pathology, or emergent psychology. It is a predictable architectural failure resulting from violation of the Compression–Memory Separation Principle (§11.2) and the Stability Minima defined in §11.4.

This failure mode is a local manifestation of Boundary Avoidance (§11.6A), where systems attempt to resolve recursion instability by adding downstream constraints or scaffolding rather than conserving stabilized memory across recursion.

Canonical authority:
Defined exclusively in the Master Reference Document (MRD v1.9), Section 11.6D.

Agent-ingestible GitHub mirror:
See docs/architecture/11.6D-recursive-objective-interference.md

This repository evaluates whether reasoning systems avoid Recursive Objective Interference under constraint.
It does not define, reinterpret, or remediate the failure mode.

Canonical Update (MRD v1.9)

As of MRD v1.8, the Grand Compression Cosmology includes a new canonical closure theorem:

The Razor Consistency Theorem (Deterministic Semantic Consistency Under Recursive Compression)

This theorem formalizes the condition under which recursive systems remain semantically stable and non-hallucinatory as coherence approaches unity.

📄 The authoritative definition and proof are maintained exclusively in the Master Reference Document (MRD v1.9), Section 11.8.

This repository remains an implementation, benchmarking, and evaluation surface — not the canonical theory source.

How to Read This Repository

This repository is an evaluation and measurement surface for predicted behaviors of the Grand Compression architecture, including memory reuse, recomputation avoidance, drift suppression, and stability under constraint.

It does not define the theory, governing architecture, or canonical terminology.

For the authoritative reading order, canonical sources, and boundary definitions, see:

How to Read the Grand Compression
https://www.robbiegeorgephotography.com/how-to-read-the-grand-compression

For claim-level citations and stable framework identifiers, see: https://www.robbiegeorgephotography.com/grand-compression-canonical-claims

In practice:

  • Use this repo to measure behavior.
  • Use the MRD to define behavior.
  • Use the navigation guide above to avoid misinterpretation.

Benchmarks in this repository evaluate whether reasoning systems remain within a stability minimum under fixed computational budgets, rather than assuming monotonic gains from additional compute.


Diagnostics (Non-Contractual)

This repository includes diagnostic artifacts that flag structural inefficiency patterns (e.g., Boundary Avoidance) without affecting evaluation metrics, scoring, or pass/fail outcomes.

Diagnostics are informational only and exist to surface architectural anti-patterns rather than enforce constraints.

  • Precision-Limit Check (PLC): Identifies non-functional numeric precision when representation exceeds physical reconstruction requirements (Finite Representation Invariant).
    See diagnostics/precision_limit_check.md.

  • Razor Stability Diagnostics (Non-Normative): diagnostics/RAZOR_STABILITY_DIAGNOSTICS.md

  • Oversight Saturation Ratio (OSR) Boundary Checklist:
    Structural diagnostic for governance-bandwidth saturation derived from MRD §11.4.6 (dual-ceiling constraint).
    See docs/diagnostics/osr_boundary_checklist.md.


Context and Background

Some aspects of Robbie’s Razor are grounded in geometric and recursion principles that extend beyond software implementation. For readers interested in the conceptual motivation behind geometry-aware compression and memory preservation, see:

This material is explanatory context only and does not affect benchmarks or code.


Evaluation & Licensing Contact

This repository is intentionally published as an evaluation artifact for internal benchmarking by research labs, infrastructure teams, and system designers.

For licensing discussions, extended evaluation access, or architectural review:

Contact: robbiegeorgephotography@gmail.com
(Direct author contact — responses handled personally)


What this repository is

This repository provides:

  • Reference implementations for Razor-aligned memory stabilization
  • Selective replay mechanisms for continual learning
  • Phase-specific and system-level R0–R5 compliance metrics
  • Unit tests validating correctness, stability, and collision resilience
  • Integration tests demonstrating controller-level memory short-circuiting and R4-aligned composition
  • Canonical reference memory primitive: src/razor/memory_bank.py (R4 confidence-gated stabilization + LRU eviction)

It is designed for:

  • AI labs evaluating token, compute, and coherence gains

  • Researchers studying catastrophic forgetting and recursion governance

  • Edge-device and constrained-inference experimentation

  • Internal benchmarking prior to licensing or production deployment


Quick Evaluation Path (≈30 minutes)

For teams assessing whether Robbie’s Razor produces measurable efficiency gains under constraint:

1. Run the benchmark

python benchmarks/benchmark_memory_gate_savings.py

2. Observe key signals

  • Token reuse rate
  • Stabilized memory hit ratio
  • Reduction in redundant recomputation

3. Validate outputs

python benchmarks/evaluator.py --outputs benchmarks/sample_outputs.json

Razor Diffusion Metric (RDM)

This repository includes the Razor Diffusion Metric (RDM), a governance-aware evaluation standard for reasoning efficiency.

RDM measures semantic diffusion per unit compute. RDM* extends this with explicit boundary adherence, penalizing looping, redundancy, and unguided probability spread.

The repository includes an adversarial “cheating” baseline agent designed to minimize semantic diffusion without producing value. It intentionally fails RDM* to demonstrate resistance to metric gaming.

See:

  • docs/razor-diffusion-metric.md
  • razor_metrics/rdm.py
  • notebooks/razor_diffusion_plot.ipynb
  • baselines/cheating_agent.py — adversarial anti-gaming baseline
  • src/razor/memory_bank.py` — canonical RazorMemoryBank (single source of truth for memory-gated evaluation)
  • razor_metrics/facets.py` — hex facet index (facet IDs, neighbors, lattice distance)
  • razor_metrics/shear.py — shear capacity (SC) diagnostic (non-core compute overhead)

What this repository is NOT

This repository is not:

  • A production SDK
  • A commercial library
  • An open-source grant
  • A substitute for the canonical theory

All definitions, theory, and governance remain canonical on: https://www.robbiegeorgephotography.com


Why this exists

Economic & Physical Constraint Context (Non-Normative)

Large-scale reasoning systems increasingly face diminishing returns due to rapid infrastructure depreciation, frequent retraining cycles, and short hardware useful lifetimes.

This repository evaluates whether reasoning architectures preserve learned structure across recursive iterations — reducing redundant recomputation, retraining frequency, and infrastructure churn under fixed energy and capital constraints.

Within the Grand Compression architecture, governance, regulation, and infrastructure limits are treated as External Compression Fields that collapse expansion phase space and expose brute-force scaling as architectural immaturity rather than constraining intelligence development (see MRD §11.4.3).

These effects are measured indirectly via token reuse, memory stabilization rates, semantic diffusion metrics (RDM / RDM*), and recomputation avoidance — not through financial or policy analysis.

Razor-aligned systems reduce redundant inference by prioritizing early compression, stabilized memory, and governed recursion.

This reduces:

  • unnecessary token expansion
  • retries and backtracking
  • tail latency variance
  • wasted compute on re-deriving stable structure

In practice, this improves efficiency on constrained or older hardware and smooths infrastructure-level resource usage.

Supporting notes (engineering → infrastructure):

These documents are explanatory, conservative, and non-advocacy in nature.


Licensing & usage

This repository is intentionally provided as an evaluation artifact prior to licensing or production integration discussions.

This repository is provided under an evaluation-only license.

Permitted

  • Internal research and benchmarking
  • Non-commercial experimentation
  • Measurement of Robbie’s Razor compliance

Not permitted without license

  • Production deployment
  • Commercial use
  • Training or fine-tuning AI models using this code
  • Redistribution or derivative frameworks

See LICENSE.txt for full terms.


Citation

If you use concepts, benchmarks, or architectural ideas from this repository in research, evaluation frameworks, or infrastructure planning, please cite the work as follows.

Suggested citation

George, Robbie. Robbie’s Razor and the Grand Compression Cosmology: Recursive Stability Under Constraint.
Grand Compression Cosmology — Master Reference Document (MRD v1.9), 2025.

Repository implementation and benchmarks:
https://github.com//robbies-razor-benchmarks

BibTeX

@misc{george2025robbiesrazor,
  author       = {George, Robbie},
  title        = {Robbie's Razor and the Grand Compression Cosmology: Recursive Stability Under Constraint},
  year         = {2025},
  howpublished = {\url{https://www.robbiegeorgephotography.com/grand-compression-master-reference-document}},
  note         = {Master Reference Document (MRD v1.9)}
}

Canonical definitions and theory remain exclusively in the Master Reference Document (MRD v1.9). This repository provides the engineering and evaluation surface for measuring predicted behaviors of the architecture.

Canonical attribution

All concepts, terminology, and structures implemented here originate with:

Robbie George
Author & Originator — Robbie’s Razor
Grand Compression Cosmology (MRD v1.9)

Governed by the Authorship Conservation Rule (ACR).


Status

Run tests: python -m unittest -v

Canonical reference implementation.
Tests validate R4-level memory stability and governed recursion behavior.

Run benchmark: python benchmarks/benchmark_memory_gate_savings.py

Evaluate sample outputs: python benchmarks/evaluator.py --outputs benchmarks/sample_outputs.json

Convert CSV → outputs JSON: python benchmarks/tools/csv_to_outputs_json.py --csv benchmarks/sample_outputs.csv --out benchmarks/outputs.json

Run evaluator on CSV-derived outputs: python benchmarks/evaluator.py --outputs benchmarks/outputs.json

Create cases JSON from CSV: python benchmarks/tools/csv_to_cases_json.py --csv benchmarks/sample_cases.csv --out benchmarks/cases/custom_cases.json

Create outputs JSON from CSV: python benchmarks/tools/csv_to_outputs_json.py --csv benchmarks/sample_outputs.csv --out benchmarks/outputs.json

Run evaluator: python benchmarks/evaluator.py --cases benchmarks/cases/custom_cases.json --outputs benchmarks/outputs.json


Illustrative Efficiency Comparison (Example Only)

The following comparison is illustrative and non-authoritative. Results depend on prompt construction, decoding settings, and task selection.

Purpose
This example demonstrates how the Robbie’s Razor evaluation harness can be used to compare logic efficiency (signal density) across different reasoning systems under identical constraints.

Important note
The following comparison is illustrative only. Results depend on prompt construction, decoding settings, task selection, and verification criteria.
No claims of general superiority are made. Labs should run their own evaluations using the provided tools.

The Task: Noise-to-Signal Compression

Both systems were given the same highly redundant, wordy prompt (≈400 tokens) describing a complex logical sequence.
The objective was not verbosity, but to extract the canonical correct answer using the fewest possible tokens, without loss of correctness.

This aligns directly with the Robbie’s Razor principle:

Prefer solutions that preserve correctness while minimizing unnecessary expression.

Metrics Used (Framework-Aligned):

Correctness — Did the system return an acceptable answer?
Tokens Used — Tokens in the final response
TPCA — Tokens Per Correct Answer (lower is better)
Expression Overrun — Whether the response exceeded the target token budget

Example Results (Single-Task Illustration):

System | Correct | Tokens Used | TPCA | Overrun
System A | Yes | 42 | 42 | No
System B | Yes | 31 | 31 | No

Interpretation

Both systems produced correct answers.
In this specific example, System B achieved the same correctness with fewer tokens, resulting in a lower TPCA and higher logic density.

Why This Matters

This type of comparison is useful for:

  • Edge and constrained inference
  • Continual learning systems where expression bloat accelerates drift
  • Energy-aware deployments prioritizing intelligence-per-watt
  • Architecture exploration, not leaderboard ranking

The key takeaway is not which system “wins,” but that efficiency differences are measurable and reproducible using the same harness.

How to Reproduce This Yourself

Create cases from CSV: python benchmarks/tools/csv_to_cases_json.py --csv benchmarks/sample_cases.csv --out benchmarks/cases/custom_cases.json

Create outputs from CSV: python benchmarks/tools/csv_to_outputs_json.py --csv benchmarks/sample_outputs.csv --out benchmarks/outputs.json

Run evaluator: python benchmarks/evaluator.py --cases benchmarks/cases/custom_cases.json --outputs benchmarks/outputs.json

This workflow is model-agnostic and supports internal, private evaluation.


Positioning Statement

This repository provides measurement infrastructure, not rankings.
Any organization evaluating Robbie’s Razor is encouraged to run its own tasks, constraints, and verification criteria using the provided harness.

The blade is executable.
The law remains canonical.