igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **TROP methodology-review-tracker promotion: In Progress → Complete.** Closes the Athey, Imbens, Qu & Viviano (2025) *Triply Robust Panel Estimators* (arXiv:2508.21536) primary-source review on the methodology-review tracker. PR-A (the paper review on file at `docs/methodology/papers/athey-2025-review.md`) was previously merged as #443; this PR is the F.L.I.P. consolidation — new `tests/test_methodology_trop.py` with paper-equation-numbered Verified Components walk-through (10 classes, 36 tests covering Eq. 2 soft-threshold SVD prox / plain prox-gradient monotonicity on a toy setup / weighted-prox solver (the shipped accelerated FISTA outer loop is NOT directly tested for per-step monotonicity because Nesterov momentum does not guarantee it), Eq. 3 unit + time weights, Eqs. 4-5 + Algorithm 1 LOOCV with two-stage cycling, Corollary 1 three-condition unbiasedness, Theorem 5.1 MC-ranking realisation of the triply-robust bias bound, Section 2.2 DID + MC reductions, Eq. 13 + Algorithm 2 per-(i, t) estimation, Algorithm 3 stratified pairs bootstrap, Section 3 / Eq. 6 factor-DGP recovery, plus a `TestTROPDeviations` class locking 11 documented library deviations). Migrated from `tests/test_trop.py`: `TestMethodologyVerification` (5 tests → `TestTROPEquation6FactorDGPRecovery`), four paper-conformance tests + one weighted-solver convergence test from `TestPaperConformanceFixes` (→ `TestTROPEquation3Weights` / `TestTROPAlgorithm1LOOCV` / `TestTROPNuclearNormProx` / `TestTROPAlgorithm3Bootstrap`), three prox / plain prox-gradient monotonicity / weighted-objective tests from `TestTROPNuclearNormSolver` (→ `TestTROPNuclearNormProx`), plus a cycling-convergence test from `TestCyclingSearch` and the factor-DGP smoke from `TestTROPvsSDID`; the `TestPaperConformanceFixes` and `TestTROPvsSDID` shells are deleted. `TestTROPNuclearNormSolver` retains its single defensive `test_zero_weights_no_division_error`. `METHODOLOGY_REVIEW.md` TROP row promoted with merge date 2026-05-24, full Verified Components / Test Coverage / Deviations / Outstanding Concerns / R Parity structure mirroring HAD (PR #473) / ContinuousDiD (PR #476) / DCDH (PR #481) / WooldridgeDiD (PR #486) precedents. **Documented deviations:** Gap #5 (unnormalised weights match Eq. 2, not Section 5 sum-to-one), Gap #9 (unbalanced panels supported beyond paper's balanced-panel assumption), rank selection is implicit via nuclear-norm soft-thresholding with no discrete `rank_selection` constructor parameter (matches paper Section 5.3 + Appendix; corrects an earlier REGISTRY overclaim that listed cv / ic / elbow methods), `λ_nn = ∞` → 1e10 internal sentinel with original-value storage on results. **Outstanding Concerns (deferred):** Equation 14 covariate extension (`TROP.fit()` does not accept a `covariates` kwarg; non-support locked by `TestTROPDeviations::test_covariates_not_supported` via `inspect.signature` to guard against future `**kwargs`) and Theorem 8.1 (covariate triple robustness) deferred until use cases motivate; SC / SDID reductions paper-claimed under "specific (omega, theta) weight choices" not provided in the paper text — cross-language anchor deferred. **R parity:** deferred until paper-author reference implementation is released ("forthcoming" per the paper). REGISTRY.md `## TROP` section gains a "Verified Components" expansion: 10 ticked requirements + four `**Note:**` / `**Note (library-side choice):**` / `**Note (deferral):**` annotations consolidating the deviation surface (Eq. 10 balancing-decomposition pointer, Gap #5 weight-normalisation library-side choice, Eq. 14 covariate deferral). No source-code changes to `diff_diff/trop*.py`. Methodology sign-off scope: paper-aligned identification ingredients (Eq. 2 prox, Eq. 3 weights, Eqs. 4-5 LOOCV, Algorithms 1-3, Corollary 1 unbiasedness, Eq. 6 simulation recovery, DID reduction, documented deviations) are directly locked by the new tests. Theorem 5.1 is verified as a simulation sanity check (TROP RMSE < DID RMSE under LOOCV-tuned weights), NOT as a direct fixed-weight conditional-bias-bound lock; the Matrix Completion reduction is verified as code-path activation (effective_rank > 0 + beats DID baseline), NOT as equivalence against an independent MC reference. The Eq. 14 covariate extension is documented as deferred (TROP.fit() has no `covariates` kwarg).
+- **ImputationDiD `vcov_type` input contract (Phase 1b interstitial #3, permanently narrow).** `ImputationDiD(vcov_type=...)` now accepts `{"hc1"}` only (default). Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages mirroring the CallawaySantAnna (PR #487) and TripleDifference (PR #488) interstitials. The rejection is **library-architectural, not paper-prescribed**: ImputationDiD uses influence-function-based variance per Borusyak-Jaravel-Spiess (2024) Theorem 3 — the per-unit IF aggregation `psi_it = v_it · epsilon_tilde_it` has no equivalent single design matrix on which hat-matrix leverage `1/(1−h_ii)` or Bell-McCaffrey Satterthwaite DOF can be defined. `hc1` with `cluster=None` ≡ per-unit IF variance (Theorem 3 equation 7); `hc1` with `cluster=X` ≡ per-cluster IF summation `sigma_sq = (cluster_psi_sums**2).sum()` (plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p`); `hc1` with `survey_design=` ≡ TSL on the combined IF via `compute_survey_if_variance()` (analytical strata/PSU/FPC or replicate BRR/Fay/JK1/JKn/SDR). All paths are unchanged at machine precision (default behavior bit-equal across `aggregate ∈ {None, "event_study", "group"}` and across analytical + bootstrap inference). `vcov_type`, `cluster_name`, and `n_clusters` fields added to `ImputationDiDResults`; threaded through new `to_dict()` method (also net-new, mirrors `TripleDifferenceResults.to_dict()`). `summary()` routes the variance-family label through the shared `_format_vcov_label` (`results.py:49-89`): bare fits render `"HC1 heteroskedasticity-robust"`, clustered fits render `"CR1 cluster-robust at <cluster_name>, G=<n>"`, and survey-backed fits suppress the variance-estimator line (the Survey Design block already names design + n_psu + df). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`** at `fit()`: replicate-weight variance is computed by replicate reweighting and ignores PSU/cluster entirely; honoring bare `cluster=` would silently no-op while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the CS PR #487 + TD PR #488 fail-closed guards. **Bootstrap path returns NaN SE when fewer than 2 independent clusters/PSUs are available** (`n_clusters < 2` analytical path, `n_psu < 2` survey-PSU path); without this guard the multiplier bootstrap SE collapses to ≈0 from BLAS roundoff (NOT NaN) and downstream zero-SE checks miss the degenerate case. NaN propagates to all overall ATT inference fields plus per-horizon and per-group bootstrap dicts via the new `_build_nan_bootstrap_results` helper in `imputation_bootstrap.py`. `set_params(vcov_type=...)` mirrors CS+TD pattern (mutate-then-validate-at-use, no atomic validation); `fit()` re-validates `vcov_type` at use time. New `TestImputationDiDVcovType` class in `tests/test_imputation.py` covers the 7-surface contract (default / cluster / TSL-survey / replicate-survey bit-equal parametrized over `aggregate`, bootstrap × cluster + bootstrap × survey bit-equal, `fit()`-time revalidation, bootstrap n_psu<2 + n_clusters<2 NaN propagation including `coef_var` NaN, `pretrends=True` × `vcov_type='hc1'` × cluster bit-equality) plus introspection (default attr, `get_params`, Results carries, `to_dict`, summary label, cluster_name suppression under survey, fit-clone idempotence, convenience function) and input-rejection tests with distinct keyword `match=` pins per family. REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" cross-reference section updated to list `ImputationDiD` alongside `CallawaySantAnna` and `TripleDifference` in the "Enforced today" tier. **Interstitial PR #3** in the Phase 1b sequence (after CS #487, TD #488). Two estimators remaining: TwoStageDiD (methodology-heavy, sandwich + Gardner GMM-corrected meat) and EfficientDiD (IF-based interstitial #4, follows the same narrow-contract template).
 - **TripleDifference `vcov_type` input contract (Phase 1b interstitial #2, permanently narrow).** `TripleDifference(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages mirroring the CS interstitial. The rejection is **library-architectural, not paper-prescribed**: TripleDifference uses influence-function-based variance per Ortiz-Villavicencio & Sant'Anna (2025) arXiv:2505.09942 — the 3-pairwise-DiD decomposition `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to the remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (`std(inf)/sqrt(n)`); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the combined IF (`(G/(G-1)) · Σ_c (Σ_{i∈c} ψ_i)² / n²`, plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p` in the OLS sense); `hc1` with `survey_design=` ≡ TSL on the combined IF (analytical or replicate). All three paths are unchanged at machine precision (default behavior bit-equal across all 3 estimation methods `{dr, reg, ipw}`). `vcov_type` and `cluster_name` fields added to `TripleDifferenceResults`, threaded through `to_dict()`. `summary()` routes the variance-family label through the shared `_format_vcov_label` (`results.py:49-89`): bare fits render `"HC1 heteroskedasticity-robust"`, clustered fits render `"CR1 cluster-robust at <cluster_name>, G=<n>"` (since the actual algebra is Liang-Zeger CR1 on the combined IF), and survey-backed fits suppress the variance-estimator line entirely (the Survey Design block already names design + n_psu + df, and the analytical SE is TSL on the combined IF — a raw "hc1" label would misstate the inference path). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`** at `fit()`: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely; honoring bare `cluster=` would silently have no effect on the variance estimate while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the `CallawaySantAnna` guard from PR #487. Under `survey_design.psu` (non-replicate path) `cluster_name`/`n_clusters` on Results are suppressed (set to None) so they can't misreport the raw cluster argument when the resolver picks the survey PSU instead. `set_params(vcov_type=...)` mirrors CS pattern (mutate-then-validate-at-use, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR #2** (after CS PR #487) rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by TripleDifference's IF-based variance, not a deferral. New `TestTripleDifferenceVcovType` class in `tests/test_triple_diff.py` covers the 5-surface contract (default/cluster/survey bit-equal, `__init__` rejection per family, `fit()`-time revalidation) plus 8 introspection / convenience-function tests. REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" cross-reference section updated to list `TripleDifference` alongside `CallawaySantAnna` in the "Enforced today" tier. Phase 1b PR 4/8 (full `{classical, hc1, hc2, hc2_bm}` threading) resumes on a different estimator (TwoStageDiD) post-merge; the two remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) follow the same narrow-contract template.
 - **CallawaySantAnna `vcov_type` input contract (Phase 1b interstitial, permanently narrow).** `CallawaySantAnna(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages. The rejection is **library-architectural, not paper-prescribed**: CS uses influence-function-based variance per Callaway & Sant'Anna (2021) — per-(g,t) doubly-robust / IPW / outcome-regression structure — and has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to other IF-based estimators (ImputationDiD, EfficientDiD) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (Williams 2000 form); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF activated via the cluster= wiring fix above. Documentation in `docs/methodology/REGISTRY.md` "IF-based variance estimators vs analytical-sandwich estimators" subsection. `vcov_type`, `cluster_name`, `n_clusters`, `df_inference` added to `CallawaySantAnnaResults` (the canonical PSU column wins for `cluster_name` reporting — `survey_design.psu` when explicit PSU is provided, `self.cluster` when bare cluster synthesizes/injects). `set_params(vcov_type=...)` mirrors SA pattern (mutate-then-refresh `_vcov_type_explicit`, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR** rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by CS's IF-based variance, not a deferral. Phase 1b PR 4/8 (full {classical, hc1, hc2, hc2_bm} threading) resumes on a different estimator post-merge.
 - **TripleDifference cluster-changes-SE defensive regression test.** Added `tests/test_triple_diff.py::TestTripleDifferenceClusterDefensive::test_cluster_changes_ses` asserting that `TripleDifference(cluster="state")` produces SE differing from `cluster=None` SE by `>1e-6` on a fixed-seed panel with state-level random effects. Defensive coverage closes a test gap identified during the Phase 1b cluster-wiring audit; TripleDifference's bare-cluster code path (`triple_diff.py:1245-1259`) was already correct but lacked a positive regression test. Mirrors `tests/test_two_stage.py::test_cluster_changes_ses`.