Skip to content

Conley Wave A: DiD/cluster combo, sparse k-d-tree, callable metric#433

Merged
igerber merged 13 commits into
mainfrom
spillover-conley-wave-a
May 14, 2026
Merged

Conley Wave A: DiD/cluster combo, sparse k-d-tree, callable metric#433
igerber merged 13 commits into
mainfrom
spillover-conley-wave-a

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 14, 2026

Summary

  • DiD + Conley (Fix NaN t-statistics across 7 locations for consistent undefined inference #118): DifferenceInDifferences(vcov_type="conley").fit(..., unit="<col>") lifts the prior fit-time redirect. unit is a fit-time kwarg (not on __init__; unused unless Conley is set; not part of get_params / set_params). On a 2-period panel, ATT + SE match MultiPeriodDiD(...).fit(post_periods=[1], reference_period=0) bit-exactly.
  • Combined spatial + cluster product kernel (Align TROP lambda conventions with paper (Athey et al. 2025) #119): compute_robust_vcov / LinearRegression / TWFE / DiD now accept cluster_ids / cluster= alongside vcov_type="conley". Meat applies K_total[i,j] = K_space(d_ij/h) · 1{c_i = c_j}. On the panel block-decomposed path the validator enforces cluster-time invariance within each unit (raises ValueError otherwise). TWFE's default auto-cluster on the Conley path is still silently dropped; explicit cluster=<col> opts into the combined kernel.
  • Sparse k-d-tree fast path (Add git worktree management slash commands #120): auto-activates for the spatial Bartlett meat when n > 5_000 AND metric ∈ {haversine, euclidean} AND kernel = bartlett. Builds a CSR sparse kernel matrix via scipy.spatial.cKDTree.query_ball_tree; haversine projects to a 3-D unit-sphere chord representation with the arc clamped at π (so cutoffs above half-Earth circumference still capture all geometrically eligible pairs). Bit-identity parity vs dense at atol=1e-10; R parity at atol=1e-6 preserved with the sparse path force-enabled on the 3 panel R fixtures. Constants: _CONLEY_SPARSE_N_THRESHOLD = 5_000; _CONLEY_DENSE_WARN_N renamed to _CONLEY_DENSE_OOM_WARN_N = 20_000 to disambiguate from the new sparse threshold.
  • Callable conley_metric validation (Fix worktree-rm: detect squash-merged branches via GitHub PR status #123): _validate_callable_metric_result enforces shape (n, n), finite, non-negative, symmetric to atol=1e-10. Each failure raises a targeted ValueError naming the violated invariant — replaces the prior opaque BLAS errors for malformed callables.

Methodology references (required if estimator / math changes)

  • Method name(s): Conley (1999) spatial-HAC sandwich, panel block-decomposed form, combined spatial + cluster product kernel
  • Paper / source link(s): Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. Journal of Econometrics 92(1), 1-45 (https://doi.org/10.1016/S0304-4076(98)00084-0). R conleyreg (Düsterhöft 2021, CRAN v0.1.9, https://github.com/cdueben/conleyreg) — parity benchmark. Stata acreg (Colella, Lalive, Sakalli, Thoenig 2019, IZA DP 12584) — combined-kernel parallel canonical implementation (not parity-tested).
  • Any intentional deviations from the source (and why): (a) Combined spatial + cluster product kernel — diff-diff convention; R conleyreg does not support this combination. Validated by two limit fixtures (all-unique-clusters reduces to HC0; huge-cutoff reduces to pure within-cluster CR1) rather than R parity. (b) Sparse haversine arc clamped at π — matches dense path's np.clip(a, 0, 1) saturation; preserves correctness for cutoffs above half-Earth circumference. (c) Panel time-invariance contract on cluster_ids — required so the within-unit serial sandwich's cluster mask is trivially all-ones; cross-sectional path has no such constraint. All documented as **Note:** / **Deviation from R:** labels in docs/methodology/REGISTRY.md § ConleySpatialHAC.

Validation

  • Tests added/updated: tests/test_conley_vcov.py extended by ~1100 lines covering TestConleySparse (sparse k-d-tree + arc-clamp edge cases), TestConleySparseRParityForced (R parity with sparse forced), TestConleyCluster (combined kernel + time-invariance contract), TestConleyDistanceMetrics (callable validation), plus DiD-with-Conley integration tests (raw-time-label propagation under absorb, MPD parity at atol=1e-10).
  • Backtest / simulation / notebook evidence (if applicable): R conleyreg parity at atol=1e-6 preserved on the 6 existing benchmark fixtures (3 cross-sectional + 3 panel) under both dense and forced-sparse paths. Bit-identity dense ↔ sparse parity at atol=1e-10 on synthetic euclidean + haversine fixtures including the boundary case cutoff = π·R_earth.
  • Local codex AI review (2 rounds): R1 surfaced 2 P0 + 1 P2 + 1 P3; R2 confirmed all addressed with no new findings. Full targeted regression sweep: 514 passed across tests/test_conley_vcov.py, tests/test_linalg.py, tests/test_estimators.py, tests/test_estimators_vcov_type.py, tests/test_methodology_twfe.py, tests/test_linalg_hc2_bm.py.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment
⚠️ Needs changes

Executive Summary

  • Using docs/methodology/REGISTRY.md and the in-code Conley docstrings/references, I did not find an undocumented change to the Conley variance formulas or SE construction; the main deviations introduced here are documented.
  • The new DifferenceInDifferences(vcov_type="conley") path adds a fit-time unit contract but never validates that the named unit column actually exists, so bad inputs fall through to a raw pandas KeyError.
  • The new vcov_type="conley" + cluster= estimator plumbing is only regression-tested at the low level and on TWFE; the DiD and MultiPeriodDiD public estimator paths are still untested.
  • I did not find a new inline-inference / partial-NaN anti-pattern in the changed code.
  • I could not execute the test suite in this sandbox because the environment lacks pandas.

Methodology

  • Severity: P3 (informational). Methods affected: ConleySpatialHAC cross-sectional/panel variance, DiD/MPD/TWFE estimator wiring, the combined spatial+cluster product kernel, the sparse Bartlett path, and callable metric validation. These non-paper / non-R behaviors are explicitly documented in docs/methodology/REGISTRY.md:L2963-L2972, docs/methodology/REGISTRY.md:L3005-L3016, and docs/methodology/REGISTRY.md:L3053-L3135, so I do not count them as defects. Impact: none. Concrete fix: none.

Code Quality

  • Severity: P1. DifferenceInDifferences.fit validates only that unit is provided, not that the named column exists. The new Conley path checks unit is not None at diff_diff/estimators.py:L394-L427, then later indexes data[unit] at diff_diff/estimators.py:L535-L543; unlike the peer panel paths, it never performs the front-door existence check already present in MultiPeriodDiD and TwoWayFixedEffects at diff_diff/estimators.py:L1288-L1291 and diff_diff/twfe.py:L136-L138. The added tests cover unit=None (tests/test_conley_vcov.py:L1069-L1079, tests/test_conley_vcov.py:L1818-L1863) but not an invalid unit column name. Impact: bad inputs on a new public API path error out as raw pandas exceptions instead of clear estimator-level validation, which is missing edge-case handling in production. Concrete fix: add if unit not in data.columns: raise ValueError(...) in DifferenceInDifferences.fit before Conley array extraction, and add a regression for unit="missing_unit".

Performance

  • No findings.

Maintainability

  • No findings beyond the validation inconsistency above.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P1. The new combined-kernel estimator support is not regression-tested across all affected public estimator paths. The PR/docs now advertise explicit cluster= support on DiD and MPD/TWFE Conley paths (docs/methodology/REGISTRY.md:L2947-L2959, docs/methodology/REGISTRY.md:L3141-L3142, diff_diff/estimators.py:L1124-L1139), but the added estimator-level tests cover DiD without cluster= (tests/test_conley_vcov.py:L1052-L1160), MPD without cluster= (tests/test_conley_vcov.py:L1196-L1517), and TWFE with cluster= (tests/test_conley_vcov.py:L1562-L1586). The low-level helper is covered (tests/test_conley_vcov.py:L872-L896), but the DiD and MPD estimator plumbing is not. Impact: two new public parameter-interaction paths can regress without detection even though the low-level kernel math is tested. Concrete fix: add direct regressions for DifferenceInDifferences(vcov_type="conley", cluster=...) and MultiPeriodDiD(vcov_type="conley", cluster=...), including a positive case with a time-invariant above-unit cluster and a negative case where cluster varies within unit and must raise ValueError.
  • Review note: I did not execute the suite in this environment because the sandbox is missing pandas.

Path to Approval

  1. Add front-door unit column existence validation in DifferenceInDifferences.fit, mirroring the MPD/TWFE guards, and cover it with a regression test.
  2. Add estimator-level combined-kernel tests for DiD and MultiPeriodDiD, including both success and time-varying-cluster rejection cases.

igerber added a commit that referenced this pull request May 14, 2026
…el tests

Address CI codex review round 1 (PR #433).

P1 #1 — DiD.fit() missing unit-column existence check
  - The new Conley path validated `unit is not None` but never confirmed
    the named column actually exists, so a bad column name fell through
    to a raw `pandas` KeyError at `data[unit]` instead of a clear
    estimator-level message. Adds the same front-door guard that
    `MultiPeriodDiD.fit` and `TwoWayFixedEffects.fit` already have:
        if unit not in data.columns:
            raise ValueError(f"Unit column '{unit}' not found in data")

P1 #2 — Combined-kernel estimator-path coverage
  - The low-level helper and TWFE-with-cluster were covered, but the DiD
    and MultiPeriodDiD public estimator paths with explicit `cluster=`
    were not. Adds four new tests in `TestConleyCluster`:
      * test_did_combined_kernel_finite_se_and_cluster_name: DiD +
        Conley + cluster='region' produces finite SE, propagates
        cluster_name to result + to_dict(), differs from no-cluster
        baseline (combined kernel zeros out cross-cluster pairs).
      * test_did_combined_kernel_time_varying_cluster_raises: panel
        time-invariance contract enforced on the DiD path.
      * test_mpd_combined_kernel_finite_se_and_cluster_name: same
        finite-SE + cluster_name + to_dict() invariants on MPD.
      * test_mpd_combined_kernel_time_varying_cluster_raises: panel
        time-invariance contract on MPD.
  - test_did_conley_unknown_unit_column_raises (Code Quality P1
    regression): asserts the new ValueError fires before the bad
    pandas lookup.

Targeted regression: tests/test_conley_vcov.py +
test_estimators.py + test_methodology_twfe.py: 332 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: be1b527f5af28ecf11377da8882491c11fcb5ff4


Overall Assessment
⚠️ Needs changes

Executive Summary

  • Previous P1 items appear addressed: DiD now front-door validates unit, and estimator-level combined-kernel tests were added for DiD/MPD (diff_diff/estimators.py:L394-L403, tests/test_conley_vcov.py:L1081-L1100, tests/test_conley_vcov.py:L3210-L3309).
  • Methodology-wise, I did not find an undocumented change to the Conley math or SE construction. The combined kernel, time-label normalization, sparse Bartlett path, and callable-metric validation are explicitly documented in docs/methodology/REGISTRY.md:L2963-L2972, docs/methodology/REGISTRY.md:L3005-L3016, and docs/methodology/REGISTRY.md:L3053-L3135.
  • Severity: P1 [Newly identified]. The new DifferenceInDifferences(vcov_type="conley") path validates only that conley_coords is non-None; malformed tuples or missing coord columns still fall through to raw IndexError/KeyError during coordinate extraction at diff_diff/estimators.py:L410-L415 and diff_diff/estimators.py:L537-L545.
  • I could not execute the tests in this sandbox because pandas is unavailable.

Methodology

  • Severity: P3 (informational). Impact: none. Concrete fix: none. The non-paper / non-R choices introduced here are documented rather than implicit: DiD-vs-TWFE cluster asymmetry (docs/methodology/REGISTRY.md:L2963-L2972), dense-code time normalization (docs/methodology/REGISTRY.md:L3005-L3016), combined spatial + cluster kernel (docs/methodology/REGISTRY.md:L3053-L3090), sparse Bartlett-only kd-tree path (docs/methodology/REGISTRY.md:L3092-L3121), and callable-metric boundary validation (docs/methodology/REGISTRY.md:L3123-L3135). I did not find an undocumented estimator, weighting, or variance deviation in diff_diff/conley.py:L225-L375 and diff_diff/conley.py:L584-L816.

Code Quality

  • Severity: P1 [Newly identified]. Impact: bad conley_coords inputs on the newly added DiD Conley path still fail as raw container/indexing exceptions instead of estimator-level ValueError, which is the same production-validation class the earlier unit fix addressed. DifferenceInDifferences.fit only checks conley_coords is not None at diff_diff/estimators.py:L410-L415, then blindly indexes self.conley_coords[0], self.conley_coords[1], and data[...] at diff_diff/estimators.py:L537-L545. The updated DiD validation table in tests/test_conley_vcov.py:L1839-L1884 covers missing cutoff/lag/unit but not malformed or absent coord columns. Concrete fix: validate at fit-time that conley_coords is a length-2 tuple/list of existing column names before building _conley_coords_arr, and add regressions for conley_coords=("missing_lat", "lon") and a malformed 1-element tuple.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 (informational). Impact: none. Concrete fix: none. The remaining unsupported Conley surfaces are still properly tracked in TODO.md:L119-L120, so I do not treat them as blockers.

Security

  • No findings.

Documentation/Tests

  • No additional findings. The previous review’s estimator-level combined-kernel test gap appears resolved by tests/test_conley_vcov.py:L3210-L3309, and the earlier DiD unit validation gap is covered at tests/test_conley_vcov.py:L1081-L1100.
  • Review note: I could not run the suite here because pandas is not installed in the sandbox.

Path to Approval

  1. Add front-door DiD validation for conley_coords: require a 2-element column-name tuple/list and verify both columns exist before data[...] extraction in diff_diff/estimators.py.
  2. Add regression tests for invalid DiD conley_coords inputs, at minimum a missing column name and a malformed tuple shape.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: eea4463a8b62256a183491766a48a2ae6fc55ed4


Overall Assessment
✅ Looks good

Executive Summary

  • Re-review: the prior DiD front-door validation finding is resolved. DifferenceInDifferences.fit() now validates unit and conley_coords up front at diff_diff/estimators.py:L394-L431, with regressions for missing/unknown/malformed inputs at tests/test_conley_vcov.py:L1069-L1155.
  • The prior absorb/raw-time-label correctness issue is also resolved. The DiD Conley path now reads raw time/unit/coord columns from data, not absorb-demeaned working_data, at diff_diff/estimators.py:L538-L561, with a dedicated regression at tests/test_conley_vcov.py:L1203-L1249.
  • Methodology alignment looks sound. The new DiD support, combined spatial+cluster product kernel, sparse Bartlett kd-tree path, and callable-metric validation are all documented in docs/methodology/REGISTRY.md:L2947-L3145, and the implementation matches those documented contracts in diff_diff/conley.py:L118-L191, diff_diff/conley.py:L225-L375, diff_diff/conley.py:L507-L581, and diff_diff/conley.py:L695-L777.
  • I did not find any unmitigated P0/P1 issues in the estimator math, SE construction, or inference handling.
  • One minor P3 remains: summary() does not distinguish pure Conley from the new Conley+cluster path because the Conley variance label ignores cluster_name in diff_diff/results.py:L77-L83.

Methodology

  • Severity: P3 (informational). Impact: none. Concrete fix: none. The PR’s non-paper / non-R choices are documented rather than implicit: DiD support and panel restrictions docs/methodology/REGISTRY.md:L2947-L2972, time-label normalization docs/methodology/REGISTRY.md:L3005-L3016, combined spatial+cluster kernel docs/methodology/REGISTRY.md:L3053-L3090, sparse Bartlett path docs/methodology/REGISTRY.md:L3092-L3121, and callable-metric validation docs/methodology/REGISTRY.md:L3123-L3135. I did not find an undocumented methodology deviation in the changed estimator/Conley code.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3. Impact: human-readable summaries for explicit-cluster Conley fits still print the same generic Conley label as pure Conley because _format_vcov_label() ignores cluster_name for vcov_type=="conley" at diff_diff/results.py:L77-L83, even though summary rendering passes cluster_name into that helper at diff_diff/results.py:L217-L230. The new tests verify cluster_name/to_dict() for combined-kernel fits at tests/test_conley_vcov.py:L1638-L1662 and tests/test_conley_vcov.py:L3265-L3335, but the only summary regression still covers plain Conley at tests/test_conley_vcov.py:L1727-L1745.
  • Concrete fix: include cluster_name in the Conley summary label (for example, “Conley spatial HAC + cluster product kernel at , lag_cutoff=L”) and add a summary regression test for an explicit-cluster Conley fit.

Tech Debt

  • No findings. Remaining Conley follow-ups are still properly tracked in TODO.md:L119-L120.

Security

  • No findings.

Documentation/Tests

  • No findings.
  • Review note: I could not execute tests/test_conley_vcov.py in this sandbox because pytest and numpy are unavailable.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0a8b4389d66e8f18c5e865469b4747f64e4a0665


Overall Assessment
⚠️ Needs changes

Executive Summary

  • Re-review: the prior DiD Conley front-door validation, absorb/raw-time handling, and Conley summary-label issues look resolved in diff_diff/estimators.py:L394-L587, diff_diff/results.py:L49-L88, and the corresponding regressions at tests/test_conley_vcov.py:L1174-L1249 and tests/test_conley_vcov.py:L1750-L1768.
  • The Wave A methodology changes are documented in docs/methodology/REGISTRY.md:L2947-L3151, and the DiD wire-up, combined spatial+cluster kernel, and sparse Bartlett path generally match those documented contracts in diff_diff/conley.py:L378-L816, diff_diff/estimators.py:L385-L587, and diff_diff/twfe.py:L169-L381.
  • Severity P1 [Newly identified]: the new callable conley_metric validator omits the zero-diagonal/self-distance invariant. That leaves malformed callables that still pass validation and can silently misstate Conley SEs by downweighting the HC0 diagonal.
  • Remaining deferred non-blockers are appropriately tracked in TODO.md:L119-L120.

Methodology

  • Severity: P1 [Newly identified]. Impact: _validate_callable_metric_result() accepts any symmetric, finite, non-negative (n, n) matrix, even when d(i,i) > 0. But the Conley estimator relies on d(i,i)=0 so the i=j diagonal contributes the HC0 term via K(0)=1; that invariant is explicitly relied on by the registry’s variance formula and the existing diagonal-reduction test. A callable with positive diagonal therefore passes the new boundary validation yet can silently attenuate or eliminate the HC0 diagonal contribution, yielding incorrect Conley SEs. This is not mitigated by REGISTRY.md; the current registry text at L3123-L3135 describes only the partial check set, not a deliberate deviation. Concrete fix: extend the validator to require a zero diagonal within tolerance, raise a targeted ValueError when violated, and document that invariant alongside the other callable checks. diff_diff/conley.py:L117-L167, docs/methodology/REGISTRY.md:L2976-L2980, docs/methodology/REGISTRY.md:L3123-L3135, tests/test_conley_vcov.py:L183-L257, tests/test_conley_vcov.py:L2217-L2233

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Remaining Conley follow-ups are properly tracked in TODO.md:L119-L120.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the methodology item above. Review note: I did not execute the test suite in this sandbox because numpy/pytest are unavailable.

Path to Approval

  1. In diff_diff/conley.py, reject callable distance outputs whose diagonal is nonzero within tolerance, with an explicit error message that names the d(i,i)=0 contract.
  2. Add a regression in tests/test_conley_vcov.py for a symmetric/finite/non-negative callable with positive diagonal, and update docs/methodology/REGISTRY.md so the callable contract explicitly includes zero self-distance.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 484620ecfd00afe4ed7c26ca2bc7b6514f9cba72


Overall Assessment
⛔ Blocker

Executive Summary

  • Re-review: the prior callable-conley_metric diagonal-invariant issue is resolved in diff_diff/conley.py (L117-L186), documented in docs/methodology/REGISTRY.md (L3123-L3142), and regression-covered in tests/test_conley_vcov.py (L253-L288).
  • The Wave A Conley changes themselves are generally aligned with the Methodology Registry: DiD Conley support, the documented spatial+cluster product kernel deviation, and the sparse Bartlett implementation all match the stated contract.
  • Severity P0 [Newly identified]: MultiPeriodDiD still builds Conley kernel inputs from working_data after absorb demeaning, so overlapping covariates can silently corrupt the geography/time inputs and misstate Conley SEs.
  • I could not execute pytest here: this sandbox is missing pytest and runtime Python deps such as numpy, so this is a static review.

Methodology

  • Severity: P0 [Newly identified]. Impact: in MultiPeriodDiD, the absorb path demeans every listed covariate in working_data at diff_diff/estimators.py (L1549-L1563), but the Conley branch later constructs conley_coords, conley_time, and conley_unit from that same working_data at diff_diff/estimators.py (L1671-L1683). If a caller includes lat/lon or time in covariates, the Conley kernel is then built on residualized inputs rather than the original geography/period labels. With absorb=["unit"] and time-invariant coordinates, that can collapse coordinates to zero within unit and silently produce wrong Conley SEs. This is the same raw-data contract the PR already fixed for DiD at diff_diff/estimators.py (L538-L561) and regression-tested at tests/test_conley_vcov.py (L1246-L1288), so the pattern fix is only partial. Concrete fix: mirror the DiD/TWFE contract in MultiPeriodDiD.fit and always source conley_coords, conley_time, and conley_unit from raw data, not working_data, then add a regression that proves the raw arrays reach _compute_conley_vcov under absorb with overlapping covariates.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Update MultiPeriodDiD.fit so the Conley kernel inputs (conley_coords, conley_time, conley_unit) are always taken from raw data, matching the new DiD/TWFE contract.
  2. Add a MultiPeriodDiD regression with absorb=["unit"] and overlapping Conley-input covariates that fails on the current code and proves the raw arrays are what reach _compute_conley_vcov.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e03318dcbf8e7c683a2f9e97e4b9278ee10005ca


Overall Assessment
⚠️ Needs changes

Executive Summary

  • Re-review: the prior blocker on MultiPeriodDiD sourcing Conley inputs from absorb-demeaned working_data is resolved; it now uses raw data and has regression coverage in diff_diff/estimators.py:L1668-L1692 and tests/test_conley_vcov.py:L1294-L1375.
  • I did not find an unmitigated methodology mismatch in the Conley math itself. DiD support, the documented spatial+cluster deviation, and callable-metric validation are aligned with the updated Registry in docs/methodology/REGISTRY.md:L2947-L3155 and the implementation in diff_diff/conley.py:L117-L186 and diff_diff/conley.py:L640-L648.
  • Severity: P1 [Newly identified]. The PR only front-door-fixes malformed/missing conley_coords on DiD; the corresponding MultiPeriodDiD and TwoWayFixedEffects Conley paths still fall through to raw KeyError/IndexError on the same bad inputs.
  • Severity: P2. The sparse Conley auto-toggle keys only on n/metric/kernel, so dense-neighborhood cutoffs can still materialize O(n^2) CSR storage and negate the claimed memory benefit.
  • I could not run the test suite here: pytest and runtime deps such as numpy are not installed in this sandbox.

Methodology
No findings. The Conley changes that deviate from the paper/R are explicitly documented in docs/methodology/REGISTRY.md with note/deviation labels, so under the review rubric I am treating those as non-defects.

Code Quality

  • Severity: P1 [Newly identified]. Impact: DiD now validates malformed or missing conley_coords up front in diff_diff/estimators.py:L416-L431, with regression coverage in tests/test_conley_vcov.py:L1145-L1198, but MultiPeriodDiD and TwoWayFixedEffects still only check for None in diff_diff/estimators.py:L1499-L1505 and diff_diff/twfe.py:L175-L188 and then directly index self.conley_coords[0] / [1] in diff_diff/estimators.py:L1680-L1684 and diff_diff/twfe.py:L359-L363. That is a partial fix of the same pattern bug: bad inputs that now raise clear estimator-level ValueError on DiD still crash opaquely on MPD/TWFE. Concrete fix: extract the DiD conley_coords arity/column validation into a shared helper and call it from all three Conley estimator entry points before raw column access.

Performance

  • Severity: P2. Impact: the sparse path auto-activates solely from n > 5_000, supported metric, and Bartlett kernel in diff_diff/conley.py:L697-L744, but _compute_spatial_bartlett_meat_sparse still stores every in-range pair in CSR form in diff_diff/conley.py:L348-L394. With a large cutoff or compact geography, that can make the “sparse” matrix effectively dense, use more memory than the dense float64 matrix, and turn the new auto-toggle into an OOM risk rather than an optimization. Concrete fix: after query_ball_tree, estimate nnz/density and fall back to dense or warn once neighbor density crosses a threshold; alternatively gate auto-sparse on cutoff relative to the data span instead of n alone.

Maintainability
No findings.

Tech Debt
No findings.

Security
No findings.

Documentation/Tests

  • Severity: P3. Impact: the new validation note for the documented combined-kernel deviation says the “huge-cutoff” fixture reduces exactly to pure within-cluster CR1 in docs/methodology/REGISTRY.md:L3077-L3090, and the test encodes that claim with Bartlett at huge_cutoff = 1e9 in tests/test_conley_vcov.py:L3186-L3208. For Bartlett, finite off-diagonal weights are 1 - d/h, so this is an asymptotic limit, not an exact finite-cutoff identity. Concrete fix: make the exact reduction anchor use conley_kernel="uniform", or keep Bartlett and rewrite both the note and the test as an approximation against the exact finite-cutoff Bartlett weights.

Path to Approval

  1. Add the same malformed/missing conley_coords front-door validation now used by DifferenceInDifferences.fit() to MultiPeriodDiD.fit() and TwoWayFixedEffects.fit().
  2. Add MPD/TWFE regressions that conley_coords=("missing_lat", "lon") and conley_coords=("lat",) raise estimator-level ValueError instead of raw pandas/Index errors.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 68678e9a4eadaed3ffd592c5442cf7902b7d3a88


Overall Assessment
⚠️ Needs changes

Executive Summary

  • Re-review: the prior Conley re-review items appear addressed. The raw-vs-demeaned Conley input bug is fixed by sourcing time/unit/coords from raw data on the DiD and MPD Conley paths (diff_diff/estimators.py:L503-L551, diff_diff/estimators.py:L1609-L1658), and the prior MPD/TWFE front-door conley_coords gap is closed via the shared validator in diff_diff/conley.py:L201-L301, wired from diff_diff/estimators.py:L394-L410, diff_diff/estimators.py:L1464-L1476, and diff_diff/twfe.py:L175-L187.
  • Methodology cross-check: the affected method is ConleySpatialHAC. DiD support reuses the documented block-decomposed panel sandwich, and the combined spatial+cluster kernel / time-label normalization are explicitly documented Note deviations in docs/methodology/REGISTRY.md:L2947-L3178, so I do not count them as methodology defects.
  • Severity: P1 [Newly identified]. The new Conley combined-kernel estimator paths still do not validate a bad cluster= column before data[self.cluster] access, so a misspelled cluster column falls through to a raw pandas KeyError on DiD, MPD, and TWFE instead of the estimator-level ValueError pattern the PR added for unit/conley_coords (diff_diff/estimators.py:L472-L477, diff_diff/estimators.py:L1562-L1567, diff_diff/twfe.py:L300-L305).
  • The earlier sparse-path dense-neighborhood concern is materially mitigated by the new density gate and corresponding warning (diff_diff/conley.py:L462-L479, docs/methodology/REGISTRY.md:L3129-L3141).
  • I did not execute the test suite here; python -m pytest --version fails because pytest is not installed in this sandbox.

Methodology

  • No findings. ConleySpatialHAC’s changed estimator math is aligned with the updated registry contract in docs/methodology/REGISTRY.md:L2947-L3178. The combined spatial+cluster kernel, time-label normalization, and sparse bartlett-only path are documented deviations or implementation choices rather than undocumented methodology drift.

Code Quality

  • Severity: P1 [Newly identified]. Impact: cluster= is newly load-bearing on the Conley combined-kernel estimator paths, but none of the three entry points validates that the named cluster column exists before raw DataFrame indexing. A typo like cluster="missing_region" now raises an opaque pandas KeyError instead of a clear estimator-level ValueError on DifferenceInDifferences.fit, MultiPeriodDiD.fit, and TwoWayFixedEffects.fit (diff_diff/estimators.py:L472-L477, diff_diff/estimators.py:L1562-L1567, diff_diff/twfe.py:L300-L305). Concrete fix: add a shared front-door cluster-column existence check alongside _validate_conley_estimator_inputs (or immediately before each data[self.cluster] access) and raise ValueError("Cluster column '<name>' not found in data").

Performance

  • No findings. The prior “auto-sparse can still materialize near-dense storage” concern is meaningfully mitigated by _CONLEY_SPARSE_DENSITY_THRESHOLD and the dense fallback warning in diff_diff/conley.py:L462-L479.

Maintainability

  • No findings. The Conley metadata propagation and labeling gaps noted in the prior review are fixed: result serialization now carries conley_lag_cutoff, and TWFE’s summary label distinguishes bare Conley from the combined kernel (diff_diff/results.py:L77-L88, diff_diff/results.py:L269-L298, diff_diff/results.py:L568-L585, diff_diff/twfe.py:L551-L596).

Tech Debt

  • No findings. The TODO cleanup is consistent with the shipped scope: completed Wave A items are removed, while still-deferred survey-design and SyntheticDiD Conley follow-ups remain tracked in TODO.md:L119-L120.

Security

  • No findings.

Documentation/Tests

  • No additional findings. The earlier exactness mismatch on the “huge-cutoff reduces to pure within-cluster CR1” note is now corrected in the methodology registry, and the test anchor now uses the exact uniform-kernel identity (docs/methodology/REGISTRY.md:L3077-L3092, tests/test_conley_vcov.py:L3262-L3283).
  • Verification note: tests were not run in this environment because pytest is unavailable.

Path to Approval

  1. Add estimator-level validation that an explicit cluster= column exists before any data[self.cluster] access on the Conley combined-kernel paths in DifferenceInDifferences.fit, MultiPeriodDiD.fit, and TwoWayFixedEffects.fit.
  2. Add regressions for cluster="missing_region" on all three Conley estimator surfaces and assert the new estimator-level ValueError instead of a raw pandas KeyError.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 04a5fa1e3b98e2b8ea6394591585a51e188b0847


Overall Assessment
⚠️ Needs changes

Executive Summary

Methodology

  • Severity: P3-informational. Impact: No unmitigated methodology defects found. The Wave A changes all land under the documented ConleySpatialHAC contract, and the non-paper extensions are explicitly labeled as notes/deviations in the registry rather than shipped silently. Concrete fix: None.

Code Quality

  • No findings.

Performance

  • Severity: P1. Impact: _compute_spatial_bartlett_meat_sparse() computes density from count_neighbors() before applying cluster_codes, but the actual sparse kernel drops cross-cluster pairs afterward. On large datasets with many small clusters, the helper can therefore misclassify a genuinely sparse combined kernel as “dense” and fall back to full pairwise distances, which is exactly the new production-error mode this PR is supposed to avoid for vcov_type="conley", cluster=. See diff_diff/conley.py:L476 and diff_diff/conley.py:L503. Concrete fix: make the density gate cluster-aware by counting only within-cluster in-range pairs, or thresholding per cluster slice, before returning None to the dense path.

Maintainability

  • No findings.

Tech Debt

  • No findings. The deferred weighted/survey and SyntheticDiD Conley work remains tracked in TODO.md:L117 and TODO.md:L118.

Security

  • No findings.

Documentation/Tests

  • Severity: P1. Impact: The new cluster_ids parameter is not tested against all new Conley code paths. Sparse coverage exercises unclustered dense-vs-sparse parity, and cluster coverage exercises dense/estimator behavior, but there is no regression for sparse clustered Conley even though the clustered sparse branch is now load-bearing at diff_diff/conley.py:L374, diff_diff/conley.py:L503, and diff_diff/conley.py:L883. Concrete fix: add dense-vs-sparse parity tests for clustered Conley on both cross-sectional and panel block-decomposed fixtures, including a case where global spatial density is high but within-cluster density is low.

Path to Approval

  1. Make the sparse density heuristic cluster-aware so vcov_type="conley", cluster= does not spuriously fall back to dense O(n²) work when the post-mask kernel is still sparse.
  2. Add regressions that force _conley_sparse=True and _conley_sparse=False on clustered Conley inputs and assert parity on both the cross-sectional and panel paths.
  3. Add one clustered sparse test where overall spatial neighborhoods are dense but within-cluster neighborhoods are sparse, and assert the helper stays on the sparse path instead of warning and falling back to dense.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 75beec7314b72faf5b29e8e7f107d533a41d94e2


Overall Assessment
✅ Looks good

Executive Summary

Methodology

  • Severity: P3-informational. Impact: No unmitigated methodology defects found in the implementation. DiD support, the combined spatial+cluster product kernel, the sparse Bartlett path, and callable-metric validation are all documented in the registry rather than shipped as silent deviations. Concrete fix: None.

Code Quality

  • No findings.

Performance

Maintainability

  • No findings.

Tech Debt

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: The registry’s huge-cutoff note writes the within-cluster CR1 limit as Σ_g X_g' ε_g ε_g' X_g', which is dimensionally wrong, and CHANGELOG.md:L11 omits both the registry’s uniform-kernel-only caveat for the exact finite-cutoff identity and the new zero-diagonal callable-metric requirement. The implementation and tests are correct; this is documentation-only. See docs/methodology/REGISTRY.md:L3083-L3092 and docs/methodology/REGISTRY.md:L3143-L3162. Concrete fix: change the registry expression to Σ_g X_g' ε_g ε_g' X_g, and mirror the “exact only for conley_kernel="uniform"; Bartlett is asymptotic” plus zero-diagonal callable contract in the changelog summary.

Static review only; I did not execute the test suite in this sandbox.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cd08440aae5f44cdf41af52a95b2c23f594ceca0


Overall assessment

✅ Looks good

Executive summary

  • Re-review: the earlier sparse-density, front-door validation, and raw-label/absorb concerns appear addressed in diff_diff/conley.py, diff_diff/conley.py, diff_diff/estimators.py, and diff_diff/twfe.py, with targeted regressions in tests/test_conley_vcov.py, tests/test_conley_vcov.py, and tests/test_conley_vcov.py.
  • The affected method is ConleySpatialHAC, plus estimator-surface wiring for DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects. I did not find an undocumented methodology mismatch in the implementation.
  • The combined spatial+cluster kernel is explicitly documented as a project convention in docs/methodology/REGISTRY.md, so it is non-blocking under the stated review rubric.
  • Remaining issues are documentation-only P3s: one broken llms-full.txt example and minor public-doc drift on the callable zero-diagonal contract / MultiPeriodDiD combined-kernel surface.
  • Static review only; I did not execute the test suite in this sandbox.

Methodology

  • Severity: P3-informational. Impact: No unmitigated methodology findings. The changed math still maps to the existing ConleySpatialHAC contract, and the non-paper extensions in this PR are documented rather than shipped as silent deviations in docs/methodology/REGISTRY.md and docs/methodology/REGISTRY.md. Concrete fix: None.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new findings. The lack of snippet smoke tests for .txt AI guides that let one doc example drift is already tracked in TODO.md.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: The new Conley code sample in diff_diff/guides/llms-full.txt uses DifferenceInDifferences at diff_diff/guides/llms-full.txt without importing it, so the example is not runnable as written. Because .txt AI guides are outside doc-snippet CI, this is non-blocking but still worth fixing. Concrete fix: add DifferenceInDifferences to the import list or change the example to use only imported classes; optionally fold this snippet into the smoke-test set tracked in TODO.md.
  • Severity: P3. Impact: The public Conley docs still have minor contract drift. The main callable-validation subsection correctly documents the zero-diagonal invariant in docs/methodology/REGISTRY.md, but the summary bullet in diff_diff/guides/llms-full.txt and the edge-case bullet in docs/methodology/REGISTRY.md omit it; separately, the Wave A changelog enumerates DiD/TWFE combined-kernel estimator surfaces but not the implemented MultiPeriodDiD path shown in diff_diff/estimators.py and tests/test_conley_vcov.py, see CHANGELOG.md. Concrete fix: align the guide/registry/changelog prose with the shipped contract by adding the zero-diagonal callable rule everywhere it is summarized and explicitly naming MultiPeriodDiD(cluster=..., vcov_type="conley") in the combined-kernel surface list.

igerber and others added 12 commits May 14, 2026 17:16
… callable metric validation

Four mechanical extensions on top of the Phase 1+2 Conley sandwich
(PR #426). All four touch the same surface (conley.py + linalg.py +
estimators.py + twfe.py + tests/test_conley_vcov.py).

  - DiD.fit() accepts `unit=<col>` as a fit-time kwarg (NOT on __init__;
    unused unless Conley is set; not part of get_params/set_params).
  - Drops the prior fit-time redirect to MultiPeriodDiD.
  - Validates unit/lag_cutoff/coords/cutoff_km; rejects survey_design
    and wild_bootstrap with NotImplementedError.
  - On a 2-period panel, matches MPD(...).fit(post_periods=[1],
    reference_period=0) bit-exactly (atol=1e-10).

  - compute_robust_vcov / LinearRegression / TWFE / DiD now accept
    cluster_ids alongside vcov_type='conley'; meat applies
    K_total[i,j] = K_space(d_ij/h) * 1{c_i == c_j}.
  - Validator enforces cluster-time invariance on the panel block-
    decomposed path (cluster constant within unit across periods).
  - Per-slice mask construction (NOT full n×n) preserves memory on
    panel paths; serial-component mask is trivially all-ones under
    the invariance contract.
  - TWFE auto-cluster on Conley path still silently dropped; explicit
    cluster=<col> opts into combined kernel. DiD has no auto-cluster,
    so opt-in is fully explicit. DiD-vs-TWFE asymmetry documented.
  - linalg validator's conley + cluster_ids reject and twfe's
    explicit-cluster reject both dropped.

  - _compute_spatial_bartlett_meat_sparse uses
    scipy.spatial.cKDTree.query_ball_tree to build a CSR sparse
    kernel matrix instead of materializing the dense n×n distance.
  - Auto-activates for n > _CONLEY_SPARSE_N_THRESHOLD (5_000) AND
    metric in {haversine, euclidean} AND kernel == "bartlett".
  - Bartlett-only gate: bartlett at u=1 returns exactly 0 so the
    sparse path safely drops at-cutoff pairs; uniform at u=1 is 1
    and would require closed-interval query semantics incompatible
    with haversine chord projection roundoff.
  - Haversine projects to 3-D unit-sphere Cartesian; chord query
    radius matches arc-length cutoff with a 1e-12 relative epsilon
    for projection roundoff; exact great-circle is recomputed only
    for in-range neighbors.
  - Private _conley_sparse: Optional[bool] kwarg controls the toggle
    (None=auto, True=force, False=force dense; True with unsupported
    config raises).
  - Bit-identity parity vs dense at atol=1e-10 on synthetic fixtures;
    R parity at atol=1e-6 preserved on all 3 panel R fixtures with
    _conley_sparse=True forced.
  - Renames _CONLEY_DENSE_WARN_N -> _CONLEY_DENSE_OOM_WARN_N (20_000)
    to disambiguate from the new 5_000 sparse threshold; warning text
    differentiates sparse-eligible vs dense-fallback paths.

  - _validate_callable_metric_result checks shape (n,n), finite,
    non-negative, symmetric within atol=1e-10.
  - Each failure raises a targeted ValueError naming the violated
    invariant. Previously, malformed callables produced opaque BLAS
    errors deep in the pipeline.

Tests
  - tests/test_conley_vcov.py: 36 new tests across TestConleySparse,
    TestConleySparseRParityForced, TestConleyCluster,
    TestConleyDistanceMetrics extension.
  - Existing DiD-rejection / TWFE-explicit-cluster-reject /
    linalg-conley-cluster-reject tests flipped to behavioral asserts.
  - test_did_conley_matches_mpd_post_periods_1 locks the DiD-vs-MPD
    bit-exact agreement on a 2-period panel.
  - Full regression sweep (test_conley_vcov + test_linalg +
    test_estimators + test_estimators_vcov_type + test_methodology_twfe
    + test_linalg_hc2_bm): 511 passed.

Docs
  - docs/methodology/REGISTRY.md: new "Combined spatial + cluster
    product kernel" subsection with math + cluster-time-invariance
    contract + two-limit-fixture anchors; new "Performance / scale"
    subsection on the sparse path; new "Callable conley_metric
    validation" subsection; updated panel-API restrictions table;
    DiD-vs-TWFE asymmetry paragraph.
  - CHANGELOG.md: Unreleased Wave A entry; Phase 1+2 entry's
    "DiD continues to raise" / "cluster_ids raises" text updated to
    reflect the lifted rejects.
  - diff_diff/guides/llms.txt + llms-full.txt: DiD support + combined
    kernel + sparse path documented; restrictions list refreshed.
  - README.md: catalog one-line refresh.
  - TODO.md: rows for the four Wave A items removed; rows 121 (Conley
    + survey/weights, Bertanha-Imbens 2014) and 122 (SyntheticDiD
    Conley path, spatial-block bootstrap) retained for later waves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… leak

Address Wave A codex review round 1 findings.

P0 #1 — Sparse haversine wrong at cutoff > π·R_earth
  - `_compute_spatial_bartlett_meat_sparse` computed
    `chord_radius = 2·sin(cutoff/(2R))` directly. That function is
    monotone only on [0, πR]; for cutoff > π·R (~20,015 km, half-Earth
    circumference) it shrinks again, and the kd-tree silently drops
    pairs that still have positive Bartlett weight. The dense path
    saturates at πR via `_haversine_km`'s `np.clip(a, 0, 1)`; the
    sparse path now mirrors that by clamping the arc to π radians
    before the chord computation:
        arc_radians = min(cutoff / R_earth, π)
        chord_radius = 2 · sin(arc_radians / 2)
    At cutoff >= πR, chord_radius reaches 2 (sphere diameter), so the
    kd-tree captures all pairs.

P0 #2 — DiD + Conley + absorb leaks demeaned time into the helper
  - `DifferenceInDifferences.fit()` built `_conley_time_arr`,
    `_conley_unit_arr`, and `_conley_coords_arr` from `working_data`,
    but `absorb` demeans columns in `working_data` (including `time`
    when `time` is listed in `absorb`, or whenever `time` appears in
    `vars_to_demean`). The Conley helper would then partition the
    within-period spatial sandwich on residualized floats instead of
    the true pre/post periods. Now reads from the original `data`
    frame, mirroring `TwoWayFixedEffects.fit` which has the same
    FWL-composability contract: meat is computed on demeaned scores
    but the kernel grid uses raw coords + time/unit.

P2 — Regression coverage
  - `test_sparse_haversine_cutoff_above_half_earth_circumference`:
    forces sparse path with cutoff = 25,000 km (> π·R_earth ≈ 20,015 km);
    asserts dense/sparse parity at atol=1e-10.
  - `test_sparse_haversine_cutoff_at_exactly_half_earth_circumference`:
    boundary test at cutoff = π·R_earth exactly.
  - `test_did_conley_with_absorb_uses_raw_time_labels`: monkeypatches
    `_compute_conley_vcov` to capture the `time` arg, asserts the
    helper sees raw integer 0/1 labels (not absorb-demeaned floats).

P3 — Stale docstrings reconciled
  - `diff_diff/estimators.py` DiD class docstring `vcov_type` entry:
    "rejected at fit-time" → Wave A support description.
  - `diff_diff/estimators.py` MultiPeriodDiD class docstring `vcov_type`
    entry: dropped the "cluster= raises" restriction (Wave A #119 lifts
    it via the combined kernel).
  - `diff_diff/linalg.py` `compute_robust_vcov` docstring: `vcov_type`
    entry updated for the combined spatial + cluster kernel.
  - `diff_diff/linalg.py` `LinearRegression.__init__` docstring:
    Conley contract updated for combined cluster kernel + DiD support.

Full Wave A regression: 514 passed (up from 511 with the 3 new tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…el tests

Address CI codex review round 1 (PR #433).

P1 #1 — DiD.fit() missing unit-column existence check
  - The new Conley path validated `unit is not None` but never confirmed
    the named column actually exists, so a bad column name fell through
    to a raw `pandas` KeyError at `data[unit]` instead of a clear
    estimator-level message. Adds the same front-door guard that
    `MultiPeriodDiD.fit` and `TwoWayFixedEffects.fit` already have:
        if unit not in data.columns:
            raise ValueError(f"Unit column '{unit}' not found in data")

P1 #2 — Combined-kernel estimator-path coverage
  - The low-level helper and TWFE-with-cluster were covered, but the DiD
    and MultiPeriodDiD public estimator paths with explicit `cluster=`
    were not. Adds four new tests in `TestConleyCluster`:
      * test_did_combined_kernel_finite_se_and_cluster_name: DiD +
        Conley + cluster='region' produces finite SE, propagates
        cluster_name to result + to_dict(), differs from no-cluster
        baseline (combined kernel zeros out cross-cluster pairs).
      * test_did_combined_kernel_time_varying_cluster_raises: panel
        time-invariance contract enforced on the DiD path.
      * test_mpd_combined_kernel_finite_se_and_cluster_name: same
        finite-SE + cluster_name + to_dict() invariants on MPD.
      * test_mpd_combined_kernel_time_varying_cluster_raises: panel
        time-invariance contract on MPD.
  - test_did_conley_unknown_unit_column_raises (Code Quality P1
    regression): asserts the new ValueError fires before the bad
    pandas lookup.

Targeted regression: tests/test_conley_vcov.py +
test_estimators.py + test_methodology_twfe.py: 332 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 [new in R2] — DiD.fit() didn't validate conley_coords shape or
column existence; malformed tuples or missing column names fell through
to opaque IndexError/KeyError downstream. Mirrors the unit-column guard
from R1.

Adds two checks in the DiD Conley block:
  - conley_coords must be a 2-element tuple/list of strings (raises if
    arity wrong or any element non-string).
  - Each named column must exist in `data` (raises ValueError naming
    the missing column).

Adds two regression tests in TestConleyEstimatorIntegration:
  - test_did_conley_unknown_coord_column_raises
  - test_did_conley_malformed_coord_tuple_raises (covers 1-element
    tuple and non-string element)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an explicit cluster=<col> is combined with vcov_type='conley',
the result summary now distinguishes the combined spatial + cluster
product kernel from bare Conley and names the cluster column:

  Bare Conley:
    "Conley spatial HAC (1999), lag_cutoff=1"

  Combined kernel (Wave A #119):
    "Conley spatial HAC (1999) + cluster product kernel at region, lag_cutoff=1"

Previously `_format_vcov_label()` ignored `cluster_name` on the Conley
path, so combined-kernel fits printed the same label as pure Conley.

Adds a regression test
`test_twfe_conley_with_cluster_summary_label_names_kernel_and_cluster`
covering the new label format. The existing bare-Conley summary test
now also asserts the absence of the "+ cluster product kernel" suffix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 [new in R4] — `_validate_callable_metric_result` accepted any
symmetric / finite / non-negative matrix, even when d(i, i) > 0. The
Conley sandwich relies on K(0) = 1 to reduce the i=j term to the HC0
diagonal X_i ε_i² X_i'; a positive self-distance silently attenuates
the HC0 contribution by K(d_ii / h) < 1 and misstates Conley SEs.
Built-in metrics ("haversine", "euclidean") satisfy d(i, i) = 0 by
construction, so the existing parity tests didn't catch this gap on
the callable surface.

Adds a sixth check to the callable validator:
  |d(i, i)| <= 1e-10 for all i, else ValueError naming the violated
  invariant ("nonzero self-distance ... requires d(i, i) = 0 so the
  kernel reduces to K(0) = 1 on the HC0 diagonal contribution").

Tests:
  - test_callable_metric_nonzero_diagonal_raises: symmetric / finite /
    non-negative with d(i, i) = 0.5 raises.
  - test_callable_metric_near_zero_diagonal_accepted: sub-tolerance
    diagonal noise (1e-13) accepted, mirroring the symmetry contract.
  - test_pairwise_distance_callable updated: the constant_metric
    fixture now zeroes its diagonal (otherwise it would fail the new
    check); behaviorally equivalent off-diagonal coverage.

Docs:
  - docs/methodology/REGISTRY.md § "Callable conley_metric validation"
    extended: 6th check added to the list + a paragraph explaining the
    HC0-reduction rationale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 Methodology [new in R5] — `MultiPeriodDiD.fit` built
`_conley_coords_arr` / `_conley_time_arr` / `_conley_unit_arr` from
`working_data`, which is absorb-demeaned. If a caller listed any of
those columns (coords, time, or unit) in `absorb`, the Conley helper
would silently partition the within-period spatial sandwich on
residualized values instead of the true geography/period labels and
misstate Conley SEs.

This is the same raw-data contract the local R1 fix established for
DifferenceInDifferences (`estimators.py:L538-L561`); MultiPeriodDiD
shared the same pattern but wasn't updated in that pass — codex CI R5
caught the holistic-fix omission.

Now reads from `data` in MPD, mirroring DiD + TWFE
(`twfe.py:L370-L371`). FWL composability holds: the meat is computed
on demeaned scores but the kernel grid uses raw coords + time/unit.

Adds `test_mpd_conley_with_absorb_uses_raw_coords_and_time` regression:
monkeypatches `_compute_conley_vcov` to capture the time + coords args,
asserts the helper sees raw integer time labels (0..T-1, not absorb-
demeaned floats) AND n_units distinct (lat, lon) pairs (not collapsed
to one demeaned mean per unit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… P3 uniform-kernel limit fix

Three findings on R6, addressed via pattern-level fixes rather than
single-site patches, after 5 cross-round findings across 2 mirror
classes (validation-gap, raw-data-vs-working-data) signaled the symptoms
were drifting between estimator surfaces.

P1 (Code Quality) — shared estimator validator
  Validation gaps mirrored across DiD (R1, R2) and MPD/TWFE (R6) under
  the same `vcov_type='conley'` entry-point contract: missing/unknown
  unit column, malformed conley_coords arity, missing coord column,
  survey_design rejection, wild_bootstrap rejection. Replaces three
  separate inline blocks (one per estimator) with a single
  `_validate_conley_estimator_inputs(...)` helper in `conley.py`,
  called from `DifferenceInDifferences.fit()`,
  `MultiPeriodDiD.fit()`, and `TwoWayFixedEffects.fit()`. Single
  source of truth for the seven-check union; future estimator
  surfaces (HAD-Conley, SDID-Conley) pick up the same contract by
  one-line opt-in.

P2 (Performance) — sparse density gate
  The sparse k-d-tree path's CSR storage (~12 bytes/nnz: data +
  indices + indptr) loses its memory advantage over dense float64
  (8 bytes/cell) at ~67% density; at high density "sparse" can
  silently use MORE memory than dense. Adds
  `_CONLEY_SPARSE_DENSITY_THRESHOLD = 0.3` (well below break-even
  for a comfortable safety margin). After kd-tree build,
  `cKDTree.count_neighbors` (shares traversal with the subsequent
  `query_ball_tree`) measures actual neighbor density; if it exceeds
  the threshold, `_compute_spatial_bartlett_meat_sparse` returns
  None, the dispatcher falls back to the dense path, and a
  UserWarning surfaces the reason. Locks the auto-sparse contract:
  it only fires when it actually saves memory.

P3 (Documentation/Tests) — uniform kernel for huge-cutoff limit
  `test_combined_kernel_reduces_to_pure_cluster_at_huge_cutoff` used
  Bartlett at cutoff=1e9; Bartlett gives K=1-d/h<1 for d>0, so the
  reduction to pure CR1 is asymptotic, not exact. Switched to
  uniform kernel (K=1 exactly for all in-cutoff pairs). REGISTRY
  note clarifies the kernel choice rationale.

Tests:
  - test_sparse_density_gate_falls_back_to_dense_and_warns:
    tight-cluster + huge cutoff → 100% density → fallback +
    warning + result equals dense.
  - test_sparse_density_gate_does_not_trigger_below_threshold:
    realistic cutoff → no fallback / no warning.

Full regression sweep: 493 passed (no regressions across
conley_vcov, linalg, estimators, estimators_vcov_type,
methodology_twfe).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ence

P1 (Code Quality) [new in R7] — `cluster=<col>` is load-bearing on the
Conley combined-kernel paths (Wave A #119) but none of DiD / MPD / TWFE
validated that the named cluster column exists in `data` before the
downstream `data[self.cluster]` access. A typo like
`cluster="missing_region"` fell through to a raw pandas KeyError
instead of the estimator-level ValueError pattern the rest of the
Conley validation surface now uses.

Same class as R1's unit-column guard and R2/R6's conley_coords guard:
extends the shared `_validate_conley_estimator_inputs` helper added
in R6 with an 8th check `if cluster is not None and cluster not in
data.columns: raise ValueError("Cluster column ... not found in data")`.
The three call sites in DiD/MPD/TWFE now pass `cluster=self.cluster`
through and pick up the new guard via one-line opt-in. Future Conley
surfaces that add cluster support get the validator's behavior for
free.

Tests: regressions on all three estimator surfaces (DiD/MPD/TWFE)
asserting `cluster="missing_region"` raises the estimator-level
ValueError before any pandas-level error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gressions

Two related P1s on R8 (both about the sparse+cluster interaction).

P1 (Performance) — density gate is now cluster-aware
  The R6 density gate counted ALL spatial in-range pairs before the
  cluster mask was applied, but the actual CSR matrix stores only
  within-cluster pairs (the inner row-loop drops cross-cluster
  neighbors before populating row/col/data lists). On large clustered
  panels — exactly the use case sparse+cluster is supposed to win —
  spatial density could be ~100% while within-cluster density is tiny,
  and we'd spuriously fall back to dense O(n²) work.

  Fix: when the unrefined spatial density exceeds the threshold AND
  `cluster_codes is not None`, refine by counting only within-cluster
  in-range pairs (per-cluster kd-trees, summed). The refinement runs
  only when needed (the gate is already cleared if spatial density is
  below threshold), so the per-cluster sub-tree builds are amortized.

  Stores the projected coordinates as `tree_coords` once on the
  haversine branch (was `xyz` locally before; renaming makes the
  per-cluster build symmetric across metrics).

P1 (Documentation/Tests) — sparse+cluster regression coverage
  Sparse coverage previously exercised unclustered dense-vs-sparse
  parity; cluster coverage previously exercised dense/estimator
  behavior — but the combined sparse+cluster path was load-bearing
  and untested. Adds three regressions:
    - test_sparse_with_cluster_matches_dense: cross-sectional 600-row
      panel, 8 clusters, dense vs forced-sparse parity at atol=1e-10.
    - test_sparse_with_cluster_panel_matches_dense: 3-period block-
      decomposed panel, 200 units, 5 time-invariant clusters, dense
      vs forced-sparse parity at atol=1e-10.
    - test_sparse_density_gate_cluster_aware: tight spatial cluster
      (100% global density) + 50 small clusters → within-cluster
      density << 30%. Locks the refinement contract: NO density
      warning, sparse path executes, result matches dense.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P3 (Documentation/Tests, cosmetic) — pure docs polish.

(1) docs/methodology/REGISTRY.md L3083-3092: huge-cutoff CR1 limit
    expression had an erroneous trailing prime on X_g, producing a
    dimensionally-inconsistent form. The CR1 meat for cluster g is
    `(X_g' ε_g)(X_g' ε_g)' = X_g' ε_g ε_g' X_g` (the last X_g is
    unprimed; the result is (k, k)). Fix: drop the trailing prime.

(2) CHANGELOG.md Wave A entry: mirror two REGISTRY clarifications that
    were already in the methodology document but missing from the
    release note prose:
    - "Huge-cutoff reduction is EXACT only for `conley_kernel="uniform"`;
       for `bartlett` the identity is asymptotic since K_bartlett(u) < 1
       for u > 0. The fixture anchor uses uniform for an exact identity
       check."
    - "Callable conley_metric validation now requires zero diagonal
       (`|d(i, i)| ≤ 1e-10`); rationale is the K(0) = 1 reduction to
       the HC0 diagonal X_i ε_i² X_i'."

Implementation and tests are correct; this is documentation-only
alignment with the codex R9 P3 finding. Loop stops here — R9 was
already green per the stop criterion, but the user opted to polish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two P3 doc-only items from R10 (final ✅ Looks good review).

(1) diff_diff/guides/llms-full.txt: the new DiD-Conley example used
    DifferenceInDifferences(...) but the import block at the top of
    the code sample didn't include it (only LinearRegression,
    MultiPeriodDiD, TwoWayFixedEffects). Example wasn't runnable as
    written. Added DifferenceInDifferences to the imports.

(2) Doc-contract drift across three summary surfaces — the main
    methodology subsections already documented these contracts
    correctly, but downstream summary bullets/release notes were stale:
    - llms-full.txt:L2031 callable-validation bullet now spells out
      "shape (n, n), finite, non-negative, symmetric to atol=1e-10,
      AND zero on the diagonal" (was missing the zero-diagonal rule).
    - REGISTRY.md edge-case bullet for callable now names "non-zero-
      diagonal matrix" alongside the other rejection conditions, with
      pointer back to the main subsection.
    - CHANGELOG.md Wave A item (2) (combined kernel) now explicitly
      lists MultiPeriodDiD(vcov_type="conley", cluster="<col>") along-
      side DiD/TWFE; the MPD path is implemented + tested but was
      previously omitted from the surface enumeration.

Implementation and tests unchanged; pure docs alignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the spillover-conley-wave-a branch from cd08440 to 3b05cd2 Compare May 14, 2026 21:23
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b05cd28cd432985103154d433977b0a27bcea24


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review: the earlier P3 doc drift from the last AI pass appears addressed. The Conley guide now imports DifferenceInDifferences, documents the zero-diagonal callable contract, and the changelog now names the shipped MPD combined-kernel surface in diff_diff/guides/llms-full.txt:L1915-L1962, diff_diff/guides/llms-full.txt:L2030-L2031, and CHANGELOG.md:L12-L13.
  • Affected methods are ConleySpatialHAC, the panel block-decomposed Conley form, and the estimator wiring for DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects. I did not find any unmitigated P0/P1 methodology issues after cross-checking against docs/methodology/REGISTRY.md:L2954-L3196.
  • The earlier raw-label/absorb correctness risk looks closed: DiD, MPD, and TWFE now all source Conley time/unit/coord arrays from raw data rather than absorbed working frames in diff_diff/estimators.py:L504-L553, diff_diff/estimators.py:L1611-L1639, and diff_diff/twfe.py:L346-L379.
  • Remaining issues are minor P3s: one contradictory MPD warning/error path and one user-facing doc restriction drift.
  • Static review only; I did not execute the test suite in this sandbox.

Methodology

No findings. The documented Wave A extensions in docs/methodology/REGISTRY.md:L3071-L3196 are reflected in the implementation at diff_diff/conley.py:L562-L1006, diff_diff/estimators.py:L385-L553, diff_diff/estimators.py:L1460-L1661, and diff_diff/twfe.py:L169-L398.

Code Quality

  • Severity: P3
  • Impact: MultiPeriodDiD(vcov_type="conley", inference="wild_bootstrap") first warns that it will fall back to analytical inference at diff_diff/estimators.py:L1251-L1260, but the later Conley validator still raises NotImplementedError at diff_diff/estimators.py:L1465-L1478. That produces contradictory guidance on the same call.
  • Concrete fix: Skip the generic analytical-fallback warning when self.vcov_type == "conley" or otherwise centralize the Conley inference decision in one branch, then add a regression that asserts the intended MPD+Conley+wild_bootstrap behavior cleanly.

Performance

No findings.

Maintainability

No findings. The new shared _validate_conley_estimator_inputs helper in diff_diff/conley.py:L201-L310 reduces the validation drift that existed across DiD/MPD/TWFE.

Tech Debt

No findings. Remaining deferred Conley work is still explicitly tracked in TODO.md:L119-L120, so I would not block on it.

Security

No findings.

Documentation/Tests

  • Severity: P3
  • Impact: the user-facing restriction bullets still understate the MultiPeriodDiD Conley reject surface. diff_diff/guides/llms-full.txt:L2022-L2029 only names DiD/TWFE for the inference="wild_bootstrap" rejection and only names DiD for the estimator-level survey_design= rejection, while code rejects MPD on the same Conley combinations at diff_diff/estimators.py:L1465-L1478. docs/methodology/REGISTRY.md:L3182-L3190 is also partially inconsistent with its earlier generic restriction at docs/methodology/REGISTRY.md:L2978-L2979.
  • Concrete fix: Harmonize the guide and registry bullets so they consistently say DifferenceInDifferences / MultiPeriodDiD / TwoWayFixedEffects reject Conley + inference="wild_bootstrap", and make the estimator-level survey_design= rejection equally explicit for MPD/TWFE.

…ey path + harmonize doc restriction bullets

Two P3 doc-and-UX items from R11 (✅ Looks good overall).

P3 #1 (Code Quality) — contradictory warning + error on MPD+Conley
+wild_bootstrap
  `MultiPeriodDiD.fit()` had an unconditional UserWarning at the top
  of fit() saying wild_bootstrap "is not yet supported, using
  analytical instead" — but the Conley validator below raises
  NotImplementedError on `vcov_type="conley", inference="wild_bootstrap"`.
  Users hitting that combination got "warn we're falling back" followed
  immediately by "actually we're raising," which is contradictory
  guidance on the same call.

  Fix: gate the warning on `self.vcov_type != "conley"`. The Conley
  validator's raise takes precedence. Adds regression
  `test_mpd_conley_wild_bootstrap_raises_without_warning` asserting
  the NotImplementedError fires with no analytical-fallback warning.

P3 #2 (Documentation/Tests) — restriction bullets understated MPD's
reject surface
  - `llms-full.txt:L2026` only named DiD/TWFE for the
    `inference="wild_bootstrap"` reject; MPD also rejects this combo
    via the shared validator. Updated to enumerate all three.
  - `llms-full.txt:L2027` only named DiD for the estimator-level
    `survey_design=` reject; MPD and TWFE both raise the same on this
    combo. Updated to enumerate all three.
  - `docs/methodology/REGISTRY.md:L3188` had the same drift; updated
    in parallel and added a parenthetical noting the MPD warning
    suppression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 054e2ee962bf0799989ab41f0783ae50637fcda2


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review outcome: the prior P3s appear addressed. The MultiPeriodDiD Conley path now suppresses the contradictory wild-bootstrap fallback warning in diff_diff/estimators.py:L1251-L1265, and the user-facing restrictions are harmonized in diff_diff/guides/llms-full.txt:L2022-L2031 and docs/methodology/REGISTRY.md:L3183-L3197.
  • Affected methods are ConleySpatialHAC, the R conleyreg-form panel block-decomposed sandwich, the documented combined spatial + cluster product kernel, the sparse Bartlett fast path, and callable metric validation.
  • I did not find any unmitigated P0/P1 methodology issues. The implementation in diff_diff/conley.py:L368-L1006, estimator wiring in diff_diff/estimators.py:L385-L552, diff_diff/estimators.py:L1465-L1665, and diff_diff/twfe.py:L169-L419, and the Registry text in docs/methodology/REGISTRY.md:L2965-L3197 are aligned.
  • The earlier raw-label / absorb composability correctness risk remains closed: DiD and MPD now source Conley coords/time/unit from raw data, matching TWFE’s existing pattern in diff_diff/estimators.py:L504-L552, diff_diff/estimators.py:L1616-L1644, and diff_diff/twfe.py:L346-L373.
  • Static review only. I could not execute the tests here because this sandbox does not have pytest, and the Python environment also lacks the project deps (numpy import failed).

Methodology

  • No findings. The documented deviations and extensions in docs/methodology/REGISTRY.md:L3015-L3048 and docs/methodology/REGISTRY.md:L3071-L3197 are reflected in code, including the panel time-label normalization, combined-kernel contract, sparse bartlett-only gate, and callable zero-diagonal requirement.

Code Quality

  • No findings. The shared _validate_conley_estimator_inputs() helper in diff_diff/conley.py:L201-L310 removes the validation drift that had previously existed across DiD / MPD / TWFE entry points.

Performance

  • No findings. The sparse Bartlett implementation, density gate, and dense fallback behavior are internally consistent with the documented performance contract in diff_diff/conley.py:L368-L559 and docs/methodology/REGISTRY.md:L3116-L3159.

Maintainability

  • No findings. Variance provenance is now surfaced consistently for the changed result surfaces through cluster_name / conley_lag_cutoff handling in diff_diff/results.py:L49-L89, diff_diff/results.py:L269-L312, and diff_diff/results.py:L708-L758.

Tech Debt

  • No findings. The remaining Conley follow-ups are still explicitly tracked in TODO.md:L119-L120, and the completed Wave A items were removed from the table.

Security

  • No findings.

Documentation/Tests

  • No findings. The earlier documentation drift is resolved in diff_diff/guides/llms-full.txt:L2022-L2031 and docs/methodology/REGISTRY.md:L3183-L3197.
  • Validation note: test execution was not possible in this sandbox because pytest and the Python dependencies are unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026
@igerber igerber merged commit 9ac8f68 into main May 14, 2026
31 of 32 checks passed
@igerber igerber deleted the spillover-conley-wave-a branch May 14, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant