test(survey): validate Survey Data Support methodology + promote to Complete by igerber · Pull Request #558 · igerber/diff-diff

igerber · 2026-06-27T14:51:29Z

Summary

Add tests/test_methodology_survey.py (33 tests, 10 Binder-equation-anchored classes) — the methodology validation suite for Survey Data Support. It isolates the design-based TSL and replicate-weight variance identities that the broad survey suite previously covered only indirectly: the multi-stratum Bessel decomposition, the fweight (df=Σw−k) and aweight (unweighted-meat) structures, the exact DEFF = design_var/srs_var ratio, and the residual-scale==score-scale cross-function identity. The other 6 core identities are equation-anchored and reference the existing direct oracles (no duplication).
Validation outcome: the core variance machinery (compute_survey_vcov / _compute_stratified_psu_meat / compute_survey_if_variance / compute_replicate_vcov / df_survey) was read against Binder (1983) Eq. 4.7 and docs/methodology/survey-theory.md §5/§6 and verified to implement the documented identities faithfully — no code change was required (consistent with the machine-precision R-parity scenarios already passing).
Correction (found by the fidelity walk): Korn & Graubard (1990) was mis-cited as JASA 85(409); the survey-df / Bonferroni-t paper is The American Statistician 44(4):270-276 (DOI 10.1080/00031305.1990.10475737). Corrected in REGISTRY + the new tests, and added — with Lumley (2004) JSS 9(8) and Solon-Haider-Wooldridge (2015) JHR 50(2) — to docs/references.rst (both were cited in REGISTRY/theory but absent from references.rst).
Promote Survey Data Support to Complete — the last In Progress methodology-review row, so the tracker is now fully Complete. The Complete entry has full Verified Components / R-parity table / Corrections Made / Deviations / cross-estimator gaps-boundary; the status row, Priority Order, and the now-stale "In Progress band" prose are swept.

Methodology references

Method: Survey Data Support — design-based TSL + replicate-weight variance.
Sources: Binder (1983) ISR 51(3):279-292 (Eq. 4.7); Lumley (2004) JSS 9(8); Korn-Graubard (1990) The American Statistician 44(4):270-276; Solon-Haider-Wooldridge (2015) JHR 50(2).
Deviations (documented in REGISTRY ## Survey Data Support + the tracker Complete entry; none undocumented): lonely_psu default "remove" vs R "fail"; replicate factor divides by design R not n_valid; PSU-level Hall-Mammen wild bootstrap; strata-vs-no-strata non-bit-equality (RNG path).

Validation

Tests added: tests/test_methodology_survey.py (33 tests, all passing; black/ruff clean).
R-parity suites re-confirmed: tests/test_survey_r_crossvalidation.py + tests/test_survey_estimator_validation.py (48 passed, 6 skipped where real-data goldens are absent in this checkout).
No diff_diff/ source changed — this PR is test + documentation only.

Security / privacy

Confirm no secrets/PII in this PR: Yes

github-actions · 2026-06-27T14:54:57Z

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

No estimator behavior or production variance code changed; this PR is tests and documentation only.
The new survey methodology tests align with the documented Binder TSL, replicate-weight, df, weight-type, and DEFF contracts in REGISTRY.md.
Documented deviations are clearly labeled in the registry, so they are P3-informational, not blockers.
One minor documentation wording issue: the tracker says several survey/non-HC1 paths are “silently overridden,” but the cited estimator paths now explicitly reject those combinations.
I could not run the test suite here because the environment lacks pytest and numpy; git diff --check passed.

Methodology

Severity: P3 informational
Finding: The documented survey deviations are correctly labeled and therefore not defects: lonely_psu="remove", simple n_PSU - n_strata df, fixed replicate R scaling, and known R-parity gaps are all covered in docs/methodology/REGISTRY.md:L4111-L4189 and docs/methodology/REGISTRY.md:L4307-L4382.
Impact: No undocumented methodology mismatch found in the changed survey validation material.
Concrete fix: None required.

Code Quality

Severity: None
Finding: No production code changed. The new tests use clear hand-oracle assertions for TSL meat, FPC, weight-type meat, replicate factors, and DEFF, e.g. tests/test_methodology_survey.py:L265-L322, tests/test_methodology_survey.py:L453-L516, and tests/test_methodology_survey.py:L525-L596.
Impact: No code-quality regression identified.
Concrete fix: None.

Performance

Severity: None
Finding: No runtime code paths changed; added tests are deterministic unit/methodology tests.
Impact: No library performance impact.
Concrete fix: None.

Maintainability

Severity: None
Finding: The methodology tracker now has a complete survey entry with verified components, corrections, deviations, and R-parity summary at METHODOLOGY_REVIEW.md:L1432-L1463.
Impact: Improves future reviewability.
Concrete fix: None.

Tech Debt

Severity: P3 informational
Finding: Cross-estimator survey boundaries remain documented as outstanding concerns in METHODOLOGY_REVIEW.md:L1465-L1472. These are not introduced by the PR and are mostly fail-closed/documented deferrals.
Impact: No new untracked correctness debt introduced by this PR.
Concrete fix: None required for approval.

Security

Severity: None
Finding: No secrets or sensitive tokens found in changed files by static scan.
Impact: No security issue identified.
Concrete fix: None.

Documentation/Tests

Severity: P3
Finding: METHODOLOGY_REVIEW.md:L1468 says the cited survey + non-HC1 paths are “silently overridden,” but at least StackedDiD, SunAbraham, WooldridgeDiD, and TWFE replicate HC2/HC2-BM now explicitly raise at diff_diff/stacked_did.py:L427-L435, diff_diff/sun_abraham.py:L750-L756, diff_diff/wooldridge.py:L701-L708, and diff_diff/twfe.py:L251-L258.
Impact: Internal tracker wording is stale and could mislead future reviewers, but it does not affect runtime behavior.
Concrete fix: Reword that bullet to distinguish explicit rejections from any remaining override-style boundary, and update the line refs accordingly.
Severity: P3 informational
Finding: I could not execute tests/test_methodology_survey.py in this environment because pytest is not installed, and direct Python import checks showed numpy is also missing.
Impact: Static review only; CI should be relied on for execution.
Concrete fix: None in the PR unless CI lacks these dependencies.

…omplete Add tests/test_methodology_survey.py (33 tests, anchored to Binder 1983 Eq. 4.7 and survey-theory.md sections 5/6) isolating the design-based TSL and replicate- weight variance identities that the broad survey suite previously covered only indirectly: the multi-stratum Bessel decomposition, the fweight (df=sum(w)-k) and aweight (unweighted-meat) structures, the exact DEFF = design_var/srs_var ratio, and the residual-scale==score-scale cross-function identity. The core variance machinery (compute_survey_vcov / _compute_stratified_psu_meat / compute_replicate_vcov / df_survey) was read against Binder Eq. 4.7 and verified to implement the documented identities faithfully -- no code change was required. Promote the Survey Data Support methodology-review row to Complete (the last In Progress row, so the tracker is now fully Complete). Correct the Korn & Graubard (1990) citation venue (JASA 85(409) -> The American Statistician 44(4):270-276) and add Lumley (2004) JSS, Korn-Graubard (1990), and Solon-Haider-Wooldridge (2015) to docs/references.rst. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-27T14:58:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2898b0e5fa874fb51ff8f122e7ac3348a34a545f

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

This PR changes tests and documentation only; no estimator or production inference code changed.
Affected methodology surface: Survey Data Support, including Binder TSL variance, IF variance, replicate-weight variance, survey df, fweight/aweight inference, and DEFF diagnostics.
The new methodology tests match the documented formulas in REGISTRY.md and survey-theory.md.
Prior P3 wording issue is resolved: the tracker now correctly describes survey + non-HC1 paths as explicit NotImplementedError guards.
I could not run pytest here because pytest is not installed; AST syntax parsing of the new test file passed.

Methodology

Severity: P3 informational
Finding: Documented survey deviations remain properly labeled in docs/methodology/REGISTRY.md:L4122-L4132, docs/methodology/REGISTRY.md:L4180-L4189, and docs/methodology/REGISTRY.md:L4307-L4381, so they are not methodology defects.
Impact: No undocumented mismatch found against Binder Eq. 4.7, survey-theory §5/§6, or the registry contracts.
Concrete fix: None required.

Code Quality

Severity: None
Finding: No production code changed. The added tests use hand-oracle checks for multi-stratum Bessel/FPC, singleton handling, weight-type meat, replicate factors, and DEFF, e.g. tests/test_methodology_survey.py:L265-L322, tests/test_methodology_survey.py:L453-L516, and tests/test_methodology_survey.py:L525-L641.
Impact: No code-quality regression identified.
Concrete fix: None.

Performance

Severity: None
Finding: Added deterministic methodology tests only; no runtime paths changed.
Impact: No library performance impact.
Concrete fix: None.

Maintainability

Severity: None
Finding: The Survey Data Support tracker entry now has verified components, R-parity summary, corrections, deviations, and explicit coverage boundaries at METHODOLOGY_REVIEW.md:L1432-L1472.
Impact: Improves future methodology reviewability.
Concrete fix: None.

Tech Debt

Severity: None
Finding: Cross-estimator survey limitations are documented as fail-closed boundaries rather than silent behavior, including replicate-weight gaps and survey + non-HC1 rejections at METHODOLOGY_REVIEW.md:L1465-L1472.
Impact: No new untracked correctness debt introduced.
Concrete fix: None.

Security

Severity: None
Finding: Narrow secret-pattern scan of changed files found no private keys, common API tokens, or credential assignments.
Impact: No security issue identified.
Concrete fix: None.

Documentation/Tests

Severity: P3 informational
Finding: Prior review’s stale wording issue is resolved: the tracker now says survey + non-HC1 paths explicitly raise, matching stacked_did.py:L427-L435, sun_abraham.py:L750-L758, wooldridge.py:L701-L710, and twfe.py:L251-L260.
Impact: Documentation no longer mischaracterizes those paths as silent overrides.
Concrete fix: None.
Severity: P3 informational
Finding: tests/test_methodology_survey.py parses successfully and contains 33 tests, but I could not execute it because pytest is not installed in this environment. git diff --check passed.
Impact: Static review only; CI should verify execution.
Concrete fix: None unless CI lacks pytest.

igerber force-pushed the feature/survey-methodology-validation branch from bcec813 to 2898b0e Compare June 27, 2026 14:56

igerber added the ready-for-ci Triggers CI test workflows label Jun 27, 2026

igerber merged commit 59ed280 into main Jun 27, 2026
47 of 49 checks passed

igerber deleted the feature/survey-methodology-validation branch June 27, 2026 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(survey): validate Survey Data Support methodology + promote to Complete#558

test(survey): validate Survey Data Support methodology + promote to Complete#558
igerber merged 1 commit into
mainfrom
feature/survey-methodology-validation

igerber commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 27, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 27, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented Jun 27, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant