You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
imputation: thread vcov_type as narrow {hc1} contract per Theorem 3
Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna
(PR #487) + TripleDifference (PR #488) template for IF-based estimators:
vcov_type is permanently narrow to {"hc1"} because the per-unit influence
function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no
single design matrix on which hat-matrix leverage or Bell-McCaffrey
Satterthwaite DOF can be defined.
Source surface:
- diff_diff/imputation.py: vcov_type param + @staticmethod
_validate_vcov_type + fit()-time revalidation +
cluster+replicate-weights NotImplementedError guard +
Results cluster_name/n_clusters resolution
- diff_diff/imputation_results.py: vcov_type/cluster_name/n_clusters
fields + new to_dict() + variance-estimator line in summary() routing
through shared _format_vcov_label helper
- diff_diff/imputation_bootstrap.py: dual-site n_clusters<2 /
n_psu<2 NaN guards via new _build_nan_bootstrap_results helper
(closes the BLAS-roundoff zero-SE class predicted to recur on
IF-based estimators)
Tests: 34 new tests in TestImputationDiDVcovType covering default /
cluster / TSL-survey / replicate-survey bit-equality (parametrized over
aggregate modes), bootstrap × cluster + bootstrap × survey bit-equality,
fit()-time revalidation after set_params bypass, bootstrap n_psu<2 /
n_clusters<2 NaN propagation, pretrends bit-equality, and the full
introspection + safety-gate surface (8 tests).
Docs: REGISTRY.md (IF-based taxonomy + 4 new Notes), CHANGELOG.md,
TODO.md (row narrowed, Conley follow-up added), llms-full.txt
(vcov_type + pretrends signature drift fix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
12
12
13
13
### Added
14
14
- **TROP methodology-review-tracker promotion: In Progress → Complete.** Closes the Athey, Imbens, Qu & Viviano (2025) *Triply Robust Panel Estimators* (arXiv:2508.21536) primary-source review on the methodology-review tracker. PR-A (the paper review on file at `docs/methodology/papers/athey-2025-review.md`) was previously merged as #443; this PR is the F.L.I.P. consolidation — new `tests/test_methodology_trop.py` with paper-equation-numbered Verified Components walk-through (10 classes, 36 tests covering Eq. 2 soft-threshold SVD prox / plain prox-gradient monotonicity on a toy setup / weighted-prox solver (the shipped accelerated FISTA outer loop is NOT directly tested for per-step monotonicity because Nesterov momentum does not guarantee it), Eq. 3 unit + time weights, Eqs. 4-5 + Algorithm 1 LOOCV with two-stage cycling, Corollary 1 three-condition unbiasedness, Theorem 5.1 MC-ranking realisation of the triply-robust bias bound, Section 2.2 DID + MC reductions, Eq. 13 + Algorithm 2 per-(i, t) estimation, Algorithm 3 stratified pairs bootstrap, Section 3 / Eq. 6 factor-DGP recovery, plus a `TestTROPDeviations` class locking 11 documented library deviations). Migrated from `tests/test_trop.py`: `TestMethodologyVerification` (5 tests → `TestTROPEquation6FactorDGPRecovery`), four paper-conformance tests + one weighted-solver convergence test from `TestPaperConformanceFixes` (→ `TestTROPEquation3Weights` / `TestTROPAlgorithm1LOOCV` / `TestTROPNuclearNormProx` / `TestTROPAlgorithm3Bootstrap`), three prox / plain prox-gradient monotonicity / weighted-objective tests from `TestTROPNuclearNormSolver` (→ `TestTROPNuclearNormProx`), plus a cycling-convergence test from `TestCyclingSearch` and the factor-DGP smoke from `TestTROPvsSDID`; the `TestPaperConformanceFixes` and `TestTROPvsSDID` shells are deleted. `TestTROPNuclearNormSolver` retains its single defensive `test_zero_weights_no_division_error`. `METHODOLOGY_REVIEW.md` TROP row promoted with merge date 2026-05-24, full Verified Components / Test Coverage / Deviations / Outstanding Concerns / R Parity structure mirroring HAD (PR #473) / ContinuousDiD (PR #476) / DCDH (PR #481) / WooldridgeDiD (PR #486) precedents. **Documented deviations:** Gap #5 (unnormalised weights match Eq. 2, not Section 5 sum-to-one), Gap #9 (unbalanced panels supported beyond paper's balanced-panel assumption), rank selection is implicit via nuclear-norm soft-thresholding with no discrete `rank_selection` constructor parameter (matches paper Section 5.3 + Appendix; corrects an earlier REGISTRY overclaim that listed cv / ic / elbow methods), `λ_nn = ∞` → 1e10 internal sentinel with original-value storage on results. **Outstanding Concerns (deferred):** Equation 14 covariate extension (`TROP.fit()` does not accept a `covariates` kwarg; non-support locked by `TestTROPDeviations::test_covariates_not_supported` via `inspect.signature` to guard against future `**kwargs`) and Theorem 8.1 (covariate triple robustness) deferred until use cases motivate; SC / SDID reductions paper-claimed under "specific (omega, theta) weight choices" not provided in the paper text — cross-language anchor deferred. **R parity:** deferred until paper-author reference implementation is released ("forthcoming" per the paper). REGISTRY.md `## TROP` section gains a "Verified Components" expansion: 10 ticked requirements + four `**Note:**` / `**Note (library-side choice):**` / `**Note (deferral):**` annotations consolidating the deviation surface (Eq. 10 balancing-decomposition pointer, Gap #5 weight-normalisation library-side choice, Eq. 14 covariate deferral). No source-code changes to `diff_diff/trop*.py`. Methodology sign-off scope: paper-aligned identification ingredients (Eq. 2 prox, Eq. 3 weights, Eqs. 4-5 LOOCV, Algorithms 1-3, Corollary 1 unbiasedness, Eq. 6 simulation recovery, DID reduction, documented deviations) are directly locked by the new tests. Theorem 5.1 is verified as a simulation sanity check (TROP RMSE < DID RMSE under LOOCV-tuned weights), NOT as a direct fixed-weight conditional-bias-bound lock; the Matrix Completion reduction is verified as code-path activation (effective_rank > 0 + beats DID baseline), NOT as equivalence against an independent MC reference. The Eq. 14 covariate extension is documented as deferred (TROP.fit() has no `covariates` kwarg).
15
+
- **ImputationDiD `vcov_type` input contract (Phase 1b interstitial #3, permanently narrow).** `ImputationDiD(vcov_type=...)` now accepts `{"hc1"}` only (default). Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages mirroring the CallawaySantAnna (PR #487) and TripleDifference (PR #488) interstitials. The rejection is **library-architectural, not paper-prescribed**: ImputationDiD uses influence-function-based variance per Borusyak-Jaravel-Spiess (2024) Theorem 3 — the per-unit IF aggregation `psi_it = v_it · epsilon_tilde_it` has no equivalent single design matrix on which hat-matrix leverage `1/(1−h_ii)` or Bell-McCaffrey Satterthwaite DOF can be defined. `hc1` with `cluster=None` ≡ per-unit IF variance (Theorem 3 equation 7); `hc1` with `cluster=X` ≡ per-cluster IF summation `sigma_sq = (cluster_psi_sums**2).sum()` (plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p`); `hc1` with `survey_design=` ≡ TSL on the combined IF via `compute_survey_if_variance()` (analytical strata/PSU/FPC or replicate BRR/Fay/JK1/JKn/SDR). All paths are unchanged at machine precision (default behavior bit-equal across `aggregate ∈ {None, "event_study", "group"}` and across analytical + bootstrap inference). `vcov_type`, `cluster_name`, and `n_clusters` fields added to `ImputationDiDResults`; threaded through new `to_dict()` method (also net-new, mirrors `TripleDifferenceResults.to_dict()`). `summary()` routes the variance-family label through the shared `_format_vcov_label` (`results.py:49-89`): bare fits render `"HC1 heteroskedasticity-robust"`, clustered fits render `"CR1 cluster-robust at <cluster_name>, G=<n>"`, and survey-backed fits suppress the variance-estimator line (the Survey Design block already names design + n_psu + df). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`** at `fit()`: replicate-weight variance is computed by replicate reweighting and ignores PSU/cluster entirely; honoring bare `cluster=` would silently no-op while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the CS PR #487 + TD PR #488 fail-closed guards. **Bootstrap path returns NaN SE when fewer than 2 independent clusters/PSUs are available** (`n_clusters < 2` analytical path, `n_psu < 2` survey-PSU path); without this guard the multiplier bootstrap SE collapses to ≈0 from BLAS roundoff (NOT NaN) and downstream zero-SE checks miss the degenerate case. NaN propagates to all overall ATT inference fields plus per-horizon and per-group bootstrap dicts via the new `_build_nan_bootstrap_results` helper in `imputation_bootstrap.py`. `set_params(vcov_type=...)` mirrors CS+TD pattern (mutate-then-validate-at-use, no atomic validation); `fit()` re-validates `vcov_type` at use time. New `TestImputationDiDVcovType` class in `tests/test_imputation.py` covers the 7-surface contract (default / cluster / TSL-survey / replicate-survey bit-equal parametrized over `aggregate`, bootstrap × cluster + bootstrap × survey bit-equal, `fit()`-time revalidation, bootstrap n_psu<2 + n_clusters<2 NaN propagation including `coef_var` NaN, `pretrends=True` × `vcov_type='hc1'` × cluster bit-equality) plus introspection (default attr, `get_params`, Results carries, `to_dict`, summary label, cluster_name suppression under survey, fit-clone idempotence, convenience function) and input-rejection tests with distinct keyword `match=` pins per family. REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" cross-reference section updated to list `ImputationDiD` alongside `CallawaySantAnna` and `TripleDifference` in the "Enforced today" tier. **Interstitial PR #3** in the Phase 1b sequence (after CS #487, TD #488). Two estimators remaining: TwoStageDiD (methodology-heavy, sandwich + Gardner GMM-corrected meat) and EfficientDiD (IF-based interstitial #4, follows the same narrow-contract template).
15
16
- **TripleDifference `vcov_type` input contract (Phase 1b interstitial #2, permanently narrow).** `TripleDifference(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages mirroring the CS interstitial. The rejection is **library-architectural, not paper-prescribed**: TripleDifference uses influence-function-based variance per Ortiz-Villavicencio & Sant'Anna (2025) arXiv:2505.09942 — the 3-pairwise-DiD decomposition `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to the remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (`std(inf)/sqrt(n)`); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the combined IF (`(G/(G-1)) · Σ_c (Σ_{i∈c} ψ_i)² / n²`, plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p` in the OLS sense); `hc1` with `survey_design=` ≡ TSL on the combined IF (analytical or replicate). All three paths are unchanged at machine precision (default behavior bit-equal across all 3 estimation methods `{dr, reg, ipw}`). `vcov_type` and `cluster_name` fields added to `TripleDifferenceResults`, threaded through `to_dict()`. `summary()` routes the variance-family label through the shared `_format_vcov_label` (`results.py:49-89`): bare fits render `"HC1 heteroskedasticity-robust"`, clustered fits render `"CR1 cluster-robust at <cluster_name>, G=<n>"` (since the actual algebra is Liang-Zeger CR1 on the combined IF), and survey-backed fits suppress the variance-estimator line entirely (the Survey Design block already names design + n_psu + df, and the analytical SE is TSL on the combined IF — a raw "hc1" label would misstate the inference path). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`** at `fit()`: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely; honoring bare `cluster=` would silently have no effect on the variance estimate while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the `CallawaySantAnna` guard from PR #487. Under `survey_design.psu` (non-replicate path) `cluster_name`/`n_clusters` on Results are suppressed (set to None) so they can't misreport the raw cluster argument when the resolver picks the survey PSU instead. `set_params(vcov_type=...)` mirrors CS pattern (mutate-then-validate-at-use, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR #2** (after CS PR #487) rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by TripleDifference's IF-based variance, not a deferral. New `TestTripleDifferenceVcovType` class in `tests/test_triple_diff.py` covers the 5-surface contract (default/cluster/survey bit-equal, `__init__` rejection per family, `fit()`-time revalidation) plus 8 introspection / convenience-function tests. REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" cross-reference section updated to list `TripleDifference` alongside `CallawaySantAnna` in the "Enforced today" tier. Phase 1b PR 4/8 (full `{classical, hc1, hc2, hc2_bm}` threading) resumes on a different estimator (TwoStageDiD) post-merge; the two remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) follow the same narrow-contract template.
16
17
- **CallawaySantAnna `vcov_type` input contract (Phase 1b interstitial, permanently narrow).** `CallawaySantAnna(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages. The rejection is **library-architectural, not paper-prescribed**: CS uses influence-function-based variance per Callaway & Sant'Anna (2021) — per-(g,t) doubly-robust / IPW / outcome-regression structure — and has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to other IF-based estimators (ImputationDiD, EfficientDiD) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (Williams 2000 form); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF activated via the cluster= wiring fix above. Documentation in `docs/methodology/REGISTRY.md` "IF-based variance estimators vs analytical-sandwich estimators" subsection. `vcov_type`, `cluster_name`, `n_clusters`, `df_inference` added to `CallawaySantAnnaResults` (the canonical PSU column wins for `cluster_name` reporting — `survey_design.psu` when explicit PSU is provided, `self.cluster` when bare cluster synthesizes/injects). `set_params(vcov_type=...)` mirrors SA pattern (mutate-then-refresh `_vcov_type_explicit`, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR** rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by CS's IF-based variance, not a deferral. Phase 1b PR 4/8 (full {classical, hc1, hc2, hc2_bm} threading) resumes on a different estimator post-merge.
17
18
-**TripleDifference cluster-changes-SE defensive regression test.** Added `tests/test_triple_diff.py::TestTripleDifferenceClusterDefensive::test_cluster_changes_ses` asserting that `TripleDifference(cluster="state")` produces SE differing from `cluster=None` SE by `>1e-6` on a fixed-seed panel with state-level random effects. Defensive coverage closes a test gap identified during the Phase 1b cluster-wiring audit; TripleDifference's bare-cluster code path (`triple_diff.py:1245-1259`) was already correct but lacked a positive regression test. Mirrors `tests/test_two_stage.py::test_cluster_changes_ses`.
0 commit comments