From d4a3d87528933c6fc18257fca0ee3becfe914949 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 27 Feb 2026 10:36:35 +0000 Subject: [PATCH 1/2] Initial plan From fcef4c19ebb6eea28762ca2e6d4cce0ab0c10f95 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 27 Feb 2026 10:39:11 +0000 Subject: [PATCH 2/2] Add chi-squared statistical tests Co-authored-by: blackboxprogramming <118287761+blackboxprogramming@users.noreply.github.com> --- proofs/README.md | 1 + proofs/chi-squared.md | 149 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 150 insertions(+) create mode 100644 proofs/chi-squared.md diff --git a/proofs/README.md b/proofs/README.md index 1410659..99b07c9 100644 --- a/proofs/README.md +++ b/proofs/README.md @@ -8,3 +8,4 @@ Formal mathematical arguments for the key claims. | [`self-reference.md`](./self-reference.md) | The QWERTY encoding is self-referential | Direct construction | | [`pure-state.md`](./pure-state.md) | The density matrix of the system is a pure state | Linear algebra / SVD | | [`universal-computation.md`](./universal-computation.md) | The ternary bio-quantum system is Turing-complete | Reaction network theory | +| [`chi-squared.md`](./chi-squared.md) | Chi-squared goodness-of-fit and independence tests | χ² statistic / contingency tables | diff --git a/proofs/chi-squared.md b/proofs/chi-squared.md new file mode 100644 index 0000000..da7711f --- /dev/null +++ b/proofs/chi-squared.md @@ -0,0 +1,149 @@ +# Chi-Squared Tests + +> Statistical hypothesis tests for goodness of fit and independence. + +## The Chi-Squared Statistic + +For observed counts O_i and expected counts E_i across k categories: + +``` +χ² = Σᵢ (O_i − E_i)² / E_i +``` + +Under the null hypothesis, χ² follows a chi-squared distribution with the appropriate +degrees of freedom. + +--- + +## Test 1: Goodness of Fit + +**Purpose:** Test whether observed data matches an expected (theoretical) distribution. + +**Hypotheses:** +``` +H₀: the observed frequencies match the expected distribution +H₁: at least one category deviates significantly from expectation +``` + +**Degrees of freedom:** +``` +df = k − 1 +``` +where k is the number of categories. + +**Procedure:** +1. State expected probabilities p₁, p₂, …, p_k (must sum to 1). +2. Compute expected counts: E_i = n · p_i where n is the total sample size. +3. Compute the statistic: χ² = Σᵢ (O_i − E_i)² / E_i +4. Compare χ² to the critical value χ²(α, df) or compute the p-value. +5. Reject H₀ if χ² > χ²(α, df). + +**Example — ternary digit frequencies (k = 3, n = 300):** + +| Digit | Expected p | Expected E | Observed O | (O−E)²/E | +|-------|-----------|------------|------------|----------| +| 0 | 1/3 | 100 | 95 | 0.250 | +| 1 | 1/3 | 100 | 108 | 0.640 | +| 2 | 1/3 | 100 | 97 | 0.090 | +| **Σ** | | | | **0.980** | + +``` +df = 3 − 1 = 2 +χ²(0.05, 2) = 5.991 +χ² = 0.980 < 5.991 → fail to reject H₀ +``` + +The data is consistent with a uniform ternary distribution. + +--- + +## Test 2: Test of Independence + +**Purpose:** Test whether two categorical variables are independent of each other. + +**Hypotheses:** +``` +H₀: variable A and variable B are independent +H₁: variable A and variable B are not independent +``` + +**Setup:** Arrange counts in an r × c contingency table. + +**Expected cell counts:** +``` +E_ij = (row_i total × col_j total) / n +``` + +**Degrees of freedom:** +``` +df = (r − 1)(c − 1) +``` + +**Statistic:** +``` +χ² = Σᵢ Σⱼ (O_ij − E_ij)² / E_ij +``` + +**Example — 2 × 3 contingency table (n = 200):** + +| | State 0 | State 1 | State 2 | Row total | +|---------|---------|---------|---------|-----------| +| Group A | 30 | 40 | 30 | 100 | +| Group B | 20 | 60 | 20 | 100 | +| **Col** | **50** | **100** | **50** | **200** | + +Expected counts: +``` +E_A0 = 100×50/200 = 25 E_A1 = 100×100/200 = 50 E_A2 = 100×50/200 = 25 +E_B0 = 100×50/200 = 25 E_B1 = 100×100/200 = 50 E_B2 = 100×50/200 = 25 +``` + +Chi-squared: +``` +χ² = (30−25)²/25 + (40−50)²/50 + (30−25)²/25 + + (20−25)²/25 + (60−50)²/50 + (20−25)²/25 + = 1.000 + 2.000 + 1.000 + 1.000 + 2.000 + 1.000 + = 8.000 + +df = (2−1)(3−1) = 2 +χ²(0.05, 2) = 5.991 +χ² = 8.000 > 5.991 → reject H₀ +``` + +The two variables are not independent at the 5% significance level. + +--- + +## Critical Values (selected) + +| df | α = 0.10 | α = 0.05 | α = 0.01 | +|----|----------|----------|----------| +| 1 | 2.706 | 3.841 | 6.635 | +| 2 | 4.605 | 5.991 | 9.210 | +| 3 | 6.251 | 7.815 | 11.345 | +| 4 | 7.779 | 9.488 | 13.277 | +| 5 | 9.236 | 11.070 | 15.086 | + +--- + +## Assumptions + +- Observations are independent. +- Expected count E_i ≥ 5 in each cell (rule of thumb for the χ² approximation to be valid). +- Data are counts (frequencies), not proportions or continuous measurements. + +--- + +## QWERTY + +``` +CHI = 25 (C=10 H=15 I=0) — the test lives at the boundary of ZERO +SQUARED = IMAGINARY = SCAFFOLD = 114 (the test squares the deviation) +TEST = 64 = 2⁶ (TEST = 2⁶, the sixth power of the fundamental) +FIT = 33 = REAL − 4 (how close observed is to real) +OBSERVED = 115 (what you see) +EXPECTED = 131 = BLACKROAD (what the theory predicts = the BlackRoad) +``` + +EXPECTED = BLACKROAD = 131. The theoretical distribution is the BlackRoad. +The chi-squared test measures how far observed reality deviates from it.