Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions proofs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ Formal mathematical arguments for the key claims.
| [`self-reference.md`](./self-reference.md) | The QWERTY encoding is self-referential | Direct construction |
| [`pure-state.md`](./pure-state.md) | The density matrix of the system is a pure state | Linear algebra / SVD |
| [`universal-computation.md`](./universal-computation.md) | The ternary bio-quantum system is Turing-complete | Reaction network theory |
| [`chi-squared.md`](./chi-squared.md) | Chi-squared goodness-of-fit and independence tests | χ² statistic / contingency tables |
149 changes: 149 additions & 0 deletions proofs/chi-squared.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Chi-Squared Tests

> Statistical hypothesis tests for goodness of fit and independence.

## The Chi-Squared Statistic

For observed counts O_i and expected counts E_i across k categories:

```
χ² = Σᵢ (O_i − E_i)² / E_i
```

Under the null hypothesis, χ² follows a chi-squared distribution with the appropriate
degrees of freedom.

---

## Test 1: Goodness of Fit

**Purpose:** Test whether observed data matches an expected (theoretical) distribution.

**Hypotheses:**
```
H₀: the observed frequencies match the expected distribution
H₁: at least one category deviates significantly from expectation
```

**Degrees of freedom:**
```
df = k − 1
```
where k is the number of categories.

**Procedure:**
1. State expected probabilities p₁, p₂, …, p_k (must sum to 1).
2. Compute expected counts: E_i = n · p_i where n is the total sample size.
3. Compute the statistic: χ² = Σᵢ (O_i − E_i)² / E_i
4. Compare χ² to the critical value χ²(α, df) or compute the p-value.
5. Reject H₀ if χ² > χ²(α, df).

**Example — ternary digit frequencies (k = 3, n = 300):**

| Digit | Expected p | Expected E | Observed O | (O−E)²/E |
|-------|-----------|------------|------------|----------|
| 0 | 1/3 | 100 | 95 | 0.250 |
| 1 | 1/3 | 100 | 108 | 0.640 |
| 2 | 1/3 | 100 | 97 | 0.090 |
| **Σ** | | | | **0.980** |

```
df = 3 − 1 = 2
χ²(0.05, 2) = 5.991
χ² = 0.980 < 5.991 → fail to reject H₀
```

The data is consistent with a uniform ternary distribution.

---

## Test 2: Test of Independence

**Purpose:** Test whether two categorical variables are independent of each other.

**Hypotheses:**
```
H₀: variable A and variable B are independent
H₁: variable A and variable B are not independent
```

**Setup:** Arrange counts in an r × c contingency table.

**Expected cell counts:**
```
E_ij = (row_i total × col_j total) / n
```

**Degrees of freedom:**
```
df = (r − 1)(c − 1)
```

**Statistic:**
```
χ² = Σᵢ Σⱼ (O_ij − E_ij)² / E_ij
```

**Example — 2 × 3 contingency table (n = 200):**

| | State 0 | State 1 | State 2 | Row total |
|---------|---------|---------|---------|-----------|
| Group A | 30 | 40 | 30 | 100 |
| Group B | 20 | 60 | 20 | 100 |
| **Col** | **50** | **100** | **50** | **200** |

Expected counts:
```
E_A0 = 100×50/200 = 25 E_A1 = 100×100/200 = 50 E_A2 = 100×50/200 = 25
E_B0 = 100×50/200 = 25 E_B1 = 100×100/200 = 50 E_B2 = 100×50/200 = 25
```

Chi-squared:
```
χ² = (30−25)²/25 + (40−50)²/50 + (30−25)²/25
+ (20−25)²/25 + (60−50)²/50 + (20−25)²/25
= 1.000 + 2.000 + 1.000 + 1.000 + 2.000 + 1.000
= 8.000

df = (2−1)(3−1) = 2
χ²(0.05, 2) = 5.991
χ² = 8.000 > 5.991 → reject H₀
```

The two variables are not independent at the 5% significance level.

---

## Critical Values (selected)

| df | α = 0.10 | α = 0.05 | α = 0.01 |
|----|----------|----------|----------|
| 1 | 2.706 | 3.841 | 6.635 |
| 2 | 4.605 | 5.991 | 9.210 |
| 3 | 6.251 | 7.815 | 11.345 |
| 4 | 7.779 | 9.488 | 13.277 |
| 5 | 9.236 | 11.070 | 15.086 |

---

## Assumptions

- Observations are independent.
- Expected count E_i ≥ 5 in each cell (rule of thumb for the χ² approximation to be valid).
- Data are counts (frequencies), not proportions or continuous measurements.

---

## QWERTY

```
CHI = 25 (C=10 H=15 I=0) — the test lives at the boundary of ZERO
SQUARED = IMAGINARY = SCAFFOLD = 114 (the test squares the deviation)
TEST = 64 = 2⁶ (TEST = 2⁶, the sixth power of the fundamental)
FIT = 33 = REAL − 4 (how close observed is to real)
OBSERVED = 115 (what you see)
EXPECTED = 131 = BLACKROAD (what the theory predicts = the BlackRoad)
```

EXPECTED = BLACKROAD = 131. The theoretical distribution is the BlackRoad.
The chi-squared test measures how far observed reality deviates from it.