Skip to content

test(survey): validate Survey Data Support methodology + promote to Complete#558

Merged
igerber merged 1 commit into
mainfrom
feature/survey-methodology-validation
Jun 27, 2026
Merged

test(survey): validate Survey Data Support methodology + promote to Complete#558
igerber merged 1 commit into
mainfrom
feature/survey-methodology-validation

Conversation

@igerber

@igerber igerber commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add tests/test_methodology_survey.py (33 tests, 10 Binder-equation-anchored classes) — the methodology validation suite for Survey Data Support. It isolates the design-based TSL and replicate-weight variance identities that the broad survey suite previously covered only indirectly: the multi-stratum Bessel decomposition, the fweight (df=Σw−k) and aweight (unweighted-meat) structures, the exact DEFF = design_var/srs_var ratio, and the residual-scale==score-scale cross-function identity. The other 6 core identities are equation-anchored and reference the existing direct oracles (no duplication).
  • Validation outcome: the core variance machinery (compute_survey_vcov / _compute_stratified_psu_meat / compute_survey_if_variance / compute_replicate_vcov / df_survey) was read against Binder (1983) Eq. 4.7 and docs/methodology/survey-theory.md §5/§6 and verified to implement the documented identities faithfully — no code change was required (consistent with the machine-precision R-parity scenarios already passing).
  • Correction (found by the fidelity walk): Korn & Graubard (1990) was mis-cited as JASA 85(409); the survey-df / Bonferroni-t paper is The American Statistician 44(4):270-276 (DOI 10.1080/00031305.1990.10475737). Corrected in REGISTRY + the new tests, and added — with Lumley (2004) JSS 9(8) and Solon-Haider-Wooldridge (2015) JHR 50(2) — to docs/references.rst (both were cited in REGISTRY/theory but absent from references.rst).
  • Promote Survey Data Support to Complete — the last In Progress methodology-review row, so the tracker is now fully Complete. The Complete entry has full Verified Components / R-parity table / Corrections Made / Deviations / cross-estimator gaps-boundary; the status row, Priority Order, and the now-stale "In Progress band" prose are swept.

Methodology references

  • Method: Survey Data Support — design-based TSL + replicate-weight variance.
  • Sources: Binder (1983) ISR 51(3):279-292 (Eq. 4.7); Lumley (2004) JSS 9(8); Korn-Graubard (1990) The American Statistician 44(4):270-276; Solon-Haider-Wooldridge (2015) JHR 50(2).
  • Deviations (documented in REGISTRY ## Survey Data Support + the tracker Complete entry; none undocumented): lonely_psu default "remove" vs R "fail"; replicate factor divides by design R not n_valid; PSU-level Hall-Mammen wild bootstrap; strata-vs-no-strata non-bit-equality (RNG path).

Validation

  • Tests added: tests/test_methodology_survey.py (33 tests, all passing; black/ruff clean).
  • R-parity suites re-confirmed: tests/test_survey_r_crossvalidation.py + tests/test_survey_estimator_validation.py (48 passed, 6 skipped where real-data goldens are absent in this checkout).
  • No diff_diff/ source changed — this PR is test + documentation only.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

@github-actions

Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • No estimator behavior or production variance code changed; this PR is tests and documentation only.
  • The new survey methodology tests align with the documented Binder TSL, replicate-weight, df, weight-type, and DEFF contracts in REGISTRY.md.
  • Documented deviations are clearly labeled in the registry, so they are P3-informational, not blockers.
  • One minor documentation wording issue: the tracker says several survey/non-HC1 paths are “silently overridden,” but the cited estimator paths now explicitly reject those combinations.
  • I could not run the test suite here because the environment lacks pytest and numpy; git diff --check passed.

Methodology

  • Severity: P3 informational
    Finding: The documented survey deviations are correctly labeled and therefore not defects: lonely_psu="remove", simple n_PSU - n_strata df, fixed replicate R scaling, and known R-parity gaps are all covered in docs/methodology/REGISTRY.md:L4111-L4189 and docs/methodology/REGISTRY.md:L4307-L4382.
    Impact: No undocumented methodology mismatch found in the changed survey validation material.
    Concrete fix: None required.

Code Quality

  • Severity: None
    Finding: No production code changed. The new tests use clear hand-oracle assertions for TSL meat, FPC, weight-type meat, replicate factors, and DEFF, e.g. tests/test_methodology_survey.py:L265-L322, tests/test_methodology_survey.py:L453-L516, and tests/test_methodology_survey.py:L525-L596.
    Impact: No code-quality regression identified.
    Concrete fix: None.

Performance

  • Severity: None
    Finding: No runtime code paths changed; added tests are deterministic unit/methodology tests.
    Impact: No library performance impact.
    Concrete fix: None.

Maintainability

  • Severity: None
    Finding: The methodology tracker now has a complete survey entry with verified components, corrections, deviations, and R-parity summary at METHODOLOGY_REVIEW.md:L1432-L1463.
    Impact: Improves future reviewability.
    Concrete fix: None.

Tech Debt

  • Severity: P3 informational
    Finding: Cross-estimator survey boundaries remain documented as outstanding concerns in METHODOLOGY_REVIEW.md:L1465-L1472. These are not introduced by the PR and are mostly fail-closed/documented deferrals.
    Impact: No new untracked correctness debt introduced by this PR.
    Concrete fix: None required for approval.

Security

  • Severity: None
    Finding: No secrets or sensitive tokens found in changed files by static scan.
    Impact: No security issue identified.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Finding: METHODOLOGY_REVIEW.md:L1468 says the cited survey + non-HC1 paths are “silently overridden,” but at least StackedDiD, SunAbraham, WooldridgeDiD, and TWFE replicate HC2/HC2-BM now explicitly raise at diff_diff/stacked_did.py:L427-L435, diff_diff/sun_abraham.py:L750-L756, diff_diff/wooldridge.py:L701-L708, and diff_diff/twfe.py:L251-L258.
    Impact: Internal tracker wording is stale and could mislead future reviewers, but it does not affect runtime behavior.
    Concrete fix: Reword that bullet to distinguish explicit rejections from any remaining override-style boundary, and update the line refs accordingly.

  • Severity: P3 informational
    Finding: I could not execute tests/test_methodology_survey.py in this environment because pytest is not installed, and direct Python import checks showed numpy is also missing.
    Impact: Static review only; CI should be relied on for execution.
    Concrete fix: None in the PR unless CI lacks these dependencies.

…omplete

Add tests/test_methodology_survey.py (33 tests, anchored to Binder 1983 Eq. 4.7
and survey-theory.md sections 5/6) isolating the design-based TSL and replicate-
weight variance identities that the broad survey suite previously covered only
indirectly: the multi-stratum Bessel decomposition, the fweight (df=sum(w)-k) and
aweight (unweighted-meat) structures, the exact DEFF = design_var/srs_var ratio,
and the residual-scale==score-scale cross-function identity. The core variance
machinery (compute_survey_vcov / _compute_stratified_psu_meat /
compute_replicate_vcov / df_survey) was read against Binder Eq. 4.7 and verified
to implement the documented identities faithfully -- no code change was required.

Promote the Survey Data Support methodology-review row to Complete (the last
In Progress row, so the tracker is now fully Complete). Correct the Korn & Graubard
(1990) citation venue (JASA 85(409) -> The American Statistician 44(4):270-276) and
add Lumley (2004) JSS, Korn-Graubard (1990), and Solon-Haider-Wooldridge (2015) to
docs/references.rst.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the feature/survey-methodology-validation branch from bcec813 to 2898b0e Compare June 27, 2026 14:56
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2898b0e5fa874fb51ff8f122e7ac3348a34a545f


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • This PR changes tests and documentation only; no estimator or production inference code changed.
  • Affected methodology surface: Survey Data Support, including Binder TSL variance, IF variance, replicate-weight variance, survey df, fweight/aweight inference, and DEFF diagnostics.
  • The new methodology tests match the documented formulas in REGISTRY.md and survey-theory.md.
  • Prior P3 wording issue is resolved: the tracker now correctly describes survey + non-HC1 paths as explicit NotImplementedError guards.
  • I could not run pytest here because pytest is not installed; AST syntax parsing of the new test file passed.

Methodology

  • Severity: P3 informational
    Finding: Documented survey deviations remain properly labeled in docs/methodology/REGISTRY.md:L4122-L4132, docs/methodology/REGISTRY.md:L4180-L4189, and docs/methodology/REGISTRY.md:L4307-L4381, so they are not methodology defects.
    Impact: No undocumented mismatch found against Binder Eq. 4.7, survey-theory §5/§6, or the registry contracts.
    Concrete fix: None required.

Code Quality

  • Severity: None
    Finding: No production code changed. The added tests use hand-oracle checks for multi-stratum Bessel/FPC, singleton handling, weight-type meat, replicate factors, and DEFF, e.g. tests/test_methodology_survey.py:L265-L322, tests/test_methodology_survey.py:L453-L516, and tests/test_methodology_survey.py:L525-L641.
    Impact: No code-quality regression identified.
    Concrete fix: None.

Performance

  • Severity: None
    Finding: Added deterministic methodology tests only; no runtime paths changed.
    Impact: No library performance impact.
    Concrete fix: None.

Maintainability

  • Severity: None
    Finding: The Survey Data Support tracker entry now has verified components, R-parity summary, corrections, deviations, and explicit coverage boundaries at METHODOLOGY_REVIEW.md:L1432-L1472.
    Impact: Improves future methodology reviewability.
    Concrete fix: None.

Tech Debt

  • Severity: None
    Finding: Cross-estimator survey limitations are documented as fail-closed boundaries rather than silent behavior, including replicate-weight gaps and survey + non-HC1 rejections at METHODOLOGY_REVIEW.md:L1465-L1472.
    Impact: No new untracked correctness debt introduced.
    Concrete fix: None.

Security

  • Severity: None
    Finding: Narrow secret-pattern scan of changed files found no private keys, common API tokens, or credential assignments.
    Impact: No security issue identified.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3 informational
    Finding: Prior review’s stale wording issue is resolved: the tracker now says survey + non-HC1 paths explicitly raise, matching stacked_did.py:L427-L435, sun_abraham.py:L750-L758, wooldridge.py:L701-L710, and twfe.py:L251-L260.
    Impact: Documentation no longer mischaracterizes those paths as silent overrides.
    Concrete fix: None.

  • Severity: P3 informational
    Finding: tests/test_methodology_survey.py parses successfully and contains 33 tests, but I could not execute it because pytest is not installed in this environment. git diff --check passed.
    Impact: Static review only; CI should verify execution.
    Concrete fix: None unless CI lacks pytest.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 27, 2026
@igerber igerber merged commit 59ed280 into main Jun 27, 2026
47 of 49 checks passed
@igerber igerber deleted the feature/survey-methodology-validation branch June 27, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant