Conley Wave A: DiD/cluster combo, sparse k-d-tree, callable metric by igerber · Pull Request #433 · igerber/diff-diff

igerber · 2026-05-14T16:34:33Z

Summary

DiD + Conley (Fix NaN t-statistics across 7 locations for consistent undefined inference #118): DifferenceInDifferences(vcov_type="conley").fit(..., unit="<col>") lifts the prior fit-time redirect. unit is a fit-time kwarg (not on __init__; unused unless Conley is set; not part of get_params / set_params). On a 2-period panel, ATT + SE match MultiPeriodDiD(...).fit(post_periods=[1], reference_period=0) bit-exactly.
Combined spatial + cluster product kernel (Align TROP lambda conventions with paper (Athey et al. 2025) #119): compute_robust_vcov / LinearRegression / TWFE / DiD now accept cluster_ids / cluster= alongside vcov_type="conley". Meat applies K_total[i,j] = K_space(d_ij/h) · 1{c_i = c_j}. On the panel block-decomposed path the validator enforces cluster-time invariance within each unit (raises ValueError otherwise). TWFE's default auto-cluster on the Conley path is still silently dropped; explicit cluster=<col> opts into the combined kernel.
Sparse k-d-tree fast path (Add git worktree management slash commands #120): auto-activates for the spatial Bartlett meat when n > 5_000 AND metric ∈ {haversine, euclidean} AND kernel = bartlett. Builds a CSR sparse kernel matrix via scipy.spatial.cKDTree.query_ball_tree; haversine projects to a 3-D unit-sphere chord representation with the arc clamped at π (so cutoffs above half-Earth circumference still capture all geometrically eligible pairs). Bit-identity parity vs dense at atol=1e-10; R parity at atol=1e-6 preserved with the sparse path force-enabled on the 3 panel R fixtures. Constants: _CONLEY_SPARSE_N_THRESHOLD = 5_000; _CONLEY_DENSE_WARN_N renamed to _CONLEY_DENSE_OOM_WARN_N = 20_000 to disambiguate from the new sparse threshold.
Callable conley_metric validation (Fix worktree-rm: detect squash-merged branches via GitHub PR status #123): _validate_callable_metric_result enforces shape (n, n), finite, non-negative, symmetric to atol=1e-10. Each failure raises a targeted ValueError naming the violated invariant — replaces the prior opaque BLAS errors for malformed callables.

Methodology references (required if estimator / math changes)

Method name(s): Conley (1999) spatial-HAC sandwich, panel block-decomposed form, combined spatial + cluster product kernel
Paper / source link(s): Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. Journal of Econometrics 92(1), 1-45 (https://doi.org/10.1016/S0304-4076(98)00084-0). R conleyreg (Düsterhöft 2021, CRAN v0.1.9, https://github.com/cdueben/conleyreg) — parity benchmark. Stata acreg (Colella, Lalive, Sakalli, Thoenig 2019, IZA DP 12584) — combined-kernel parallel canonical implementation (not parity-tested).
Any intentional deviations from the source (and why): (a) Combined spatial + cluster product kernel — diff-diff convention; R conleyreg does not support this combination. Validated by two limit fixtures (all-unique-clusters reduces to HC0; huge-cutoff reduces to pure within-cluster CR1) rather than R parity. (b) Sparse haversine arc clamped at π — matches dense path's np.clip(a, 0, 1) saturation; preserves correctness for cutoffs above half-Earth circumference. (c) Panel time-invariance contract on cluster_ids — required so the within-unit serial sandwich's cluster mask is trivially all-ones; cross-sectional path has no such constraint. All documented as **Note:** / **Deviation from R:** labels in docs/methodology/REGISTRY.md § ConleySpatialHAC.

Validation

Tests added/updated: tests/test_conley_vcov.py extended by ~1100 lines covering TestConleySparse (sparse k-d-tree + arc-clamp edge cases), TestConleySparseRParityForced (R parity with sparse forced), TestConleyCluster (combined kernel + time-invariance contract), TestConleyDistanceMetrics (callable validation), plus DiD-with-Conley integration tests (raw-time-label propagation under absorb, MPD parity at atol=1e-10).
Backtest / simulation / notebook evidence (if applicable): R conleyreg parity at atol=1e-6 preserved on the 6 existing benchmark fixtures (3 cross-sectional + 3 panel) under both dense and forced-sparse paths. Bit-identity dense ↔ sparse parity at atol=1e-10 on synthetic euclidean + haversine fixtures including the boundary case cutoff = π·R_earth.
Local codex AI review (2 rounds): R1 surfaced 2 P0 + 1 P2 + 1 P3; R2 confirmed all addressed with no new findings. Full targeted regression sweep: 514 passed across tests/test_conley_vcov.py, tests/test_linalg.py, tests/test_estimators.py, tests/test_estimators_vcov_type.py, tests/test_methodology_twfe.py, tests/test_linalg_hc2_bm.py.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-14T16:42:17Z

Overall Assessment
⚠️ Needs changes

Executive Summary

Using docs/methodology/REGISTRY.md and the in-code Conley docstrings/references, I did not find an undocumented change to the Conley variance formulas or SE construction; the main deviations introduced here are documented.
The new DifferenceInDifferences(vcov_type="conley") path adds a fit-time unit contract but never validates that the named unit column actually exists, so bad inputs fall through to a raw pandas KeyError.
The new vcov_type="conley" + cluster= estimator plumbing is only regression-tested at the low level and on TWFE; the DiD and MultiPeriodDiD public estimator paths are still untested.
I did not find a new inline-inference / partial-NaN anti-pattern in the changed code.
I could not execute the test suite in this sandbox because the environment lacks pandas.

Methodology

Severity: P3 (informational). Methods affected: ConleySpatialHAC cross-sectional/panel variance, DiD/MPD/TWFE estimator wiring, the combined spatial+cluster product kernel, the sparse Bartlett path, and callable metric validation. These non-paper / non-R behaviors are explicitly documented in docs/methodology/REGISTRY.md:L2963-L2972, docs/methodology/REGISTRY.md:L3005-L3016, and docs/methodology/REGISTRY.md:L3053-L3135, so I do not count them as defects. Impact: none. Concrete fix: none.

Code Quality

Severity: P1. DifferenceInDifferences.fit validates only that unit is provided, not that the named column exists. The new Conley path checks unit is not None at diff_diff/estimators.py:L394-L427, then later indexes data[unit] at diff_diff/estimators.py:L535-L543; unlike the peer panel paths, it never performs the front-door existence check already present in MultiPeriodDiD and TwoWayFixedEffects at diff_diff/estimators.py:L1288-L1291 and diff_diff/twfe.py:L136-L138. The added tests cover unit=None (tests/test_conley_vcov.py:L1069-L1079, tests/test_conley_vcov.py:L1818-L1863) but not an invalid unit column name. Impact: bad inputs on a new public API path error out as raw pandas exceptions instead of clear estimator-level validation, which is missing edge-case handling in production. Concrete fix: add if unit not in data.columns: raise ValueError(...) in DifferenceInDifferences.fit before Conley array extraction, and add a regression for unit="missing_unit".

Performance

No findings.

Maintainability

No findings beyond the validation inconsistency above.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P1. The new combined-kernel estimator support is not regression-tested across all affected public estimator paths. The PR/docs now advertise explicit cluster= support on DiD and MPD/TWFE Conley paths (docs/methodology/REGISTRY.md:L2947-L2959, docs/methodology/REGISTRY.md:L3141-L3142, diff_diff/estimators.py:L1124-L1139), but the added estimator-level tests cover DiD without cluster= (tests/test_conley_vcov.py:L1052-L1160), MPD without cluster= (tests/test_conley_vcov.py:L1196-L1517), and TWFE with cluster= (tests/test_conley_vcov.py:L1562-L1586). The low-level helper is covered (tests/test_conley_vcov.py:L872-L896), but the DiD and MPD estimator plumbing is not. Impact: two new public parameter-interaction paths can regress without detection even though the low-level kernel math is tested. Concrete fix: add direct regressions for DifferenceInDifferences(vcov_type="conley", cluster=...) and MultiPeriodDiD(vcov_type="conley", cluster=...), including a positive case with a time-invariant above-unit cluster and a negative case where cluster varies within unit and must raise ValueError.
Review note: I did not execute the suite in this environment because the sandbox is missing pandas.

Path to Approval

Add front-door unit column existence validation in DifferenceInDifferences.fit, mirroring the MPD/TWFE guards, and cover it with a regression test.
Add estimator-level combined-kernel tests for DiD and MultiPeriodDiD, including both success and time-varying-cluster rejection cases.

…el tests Address CI codex review round 1 (PR #433). P1 #1 — DiD.fit() missing unit-column existence check - The new Conley path validated `unit is not None` but never confirmed the named column actually exists, so a bad column name fell through to a raw `pandas` KeyError at `data[unit]` instead of a clear estimator-level message. Adds the same front-door guard that `MultiPeriodDiD.fit` and `TwoWayFixedEffects.fit` already have: if unit not in data.columns: raise ValueError(f"Unit column '{unit}' not found in data") P1 #2 — Combined-kernel estimator-path coverage - The low-level helper and TWFE-with-cluster were covered, but the DiD and MultiPeriodDiD public estimator paths with explicit `cluster=` were not. Adds four new tests in `TestConleyCluster`: * test_did_combined_kernel_finite_se_and_cluster_name: DiD + Conley + cluster='region' produces finite SE, propagates cluster_name to result + to_dict(), differs from no-cluster baseline (combined kernel zeros out cross-cluster pairs). * test_did_combined_kernel_time_varying_cluster_raises: panel time-invariance contract enforced on the DiD path. * test_mpd_combined_kernel_finite_se_and_cluster_name: same finite-SE + cluster_name + to_dict() invariants on MPD. * test_mpd_combined_kernel_time_varying_cluster_raises: panel time-invariance contract on MPD. - test_did_conley_unknown_unit_column_raises (Code Quality P1 regression): asserts the new ValueError fires before the bad pandas lookup. Targeted regression: tests/test_conley_vcov.py + test_estimators.py + test_methodology_twfe.py: 332 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-14T17:03:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: be1b527f5af28ecf11377da8882491c11fcb5ff4

Overall Assessment
⚠️ Needs changes

Executive Summary

Previous P1 items appear addressed: DiD now front-door validates unit, and estimator-level combined-kernel tests were added for DiD/MPD (diff_diff/estimators.py:L394-L403, tests/test_conley_vcov.py:L1081-L1100, tests/test_conley_vcov.py:L3210-L3309).
Methodology-wise, I did not find an undocumented change to the Conley math or SE construction. The combined kernel, time-label normalization, sparse Bartlett path, and callable-metric validation are explicitly documented in docs/methodology/REGISTRY.md:L2963-L2972, docs/methodology/REGISTRY.md:L3005-L3016, and docs/methodology/REGISTRY.md:L3053-L3135.
Severity: P1 [Newly identified]. The new DifferenceInDifferences(vcov_type="conley") path validates only that conley_coords is non-None; malformed tuples or missing coord columns still fall through to raw IndexError/KeyError during coordinate extraction at diff_diff/estimators.py:L410-L415 and diff_diff/estimators.py:L537-L545.
I could not execute the tests in this sandbox because pandas is unavailable.

Methodology

Severity: P3 (informational). Impact: none. Concrete fix: none. The non-paper / non-R choices introduced here are documented rather than implicit: DiD-vs-TWFE cluster asymmetry (docs/methodology/REGISTRY.md:L2963-L2972), dense-code time normalization (docs/methodology/REGISTRY.md:L3005-L3016), combined spatial + cluster kernel (docs/methodology/REGISTRY.md:L3053-L3090), sparse Bartlett-only kd-tree path (docs/methodology/REGISTRY.md:L3092-L3121), and callable-metric boundary validation (docs/methodology/REGISTRY.md:L3123-L3135). I did not find an undocumented estimator, weighting, or variance deviation in diff_diff/conley.py:L225-L375 and diff_diff/conley.py:L584-L816.

Code Quality

Severity: P1 [Newly identified]. Impact: bad conley_coords inputs on the newly added DiD Conley path still fail as raw container/indexing exceptions instead of estimator-level ValueError, which is the same production-validation class the earlier unit fix addressed. DifferenceInDifferences.fit only checks conley_coords is not None at diff_diff/estimators.py:L410-L415, then blindly indexes self.conley_coords[0], self.conley_coords[1], and data[...] at diff_diff/estimators.py:L537-L545. The updated DiD validation table in tests/test_conley_vcov.py:L1839-L1884 covers missing cutoff/lag/unit but not malformed or absent coord columns. Concrete fix: validate at fit-time that conley_coords is a length-2 tuple/list of existing column names before building _conley_coords_arr, and add regressions for conley_coords=("missing_lat", "lon") and a malformed 1-element tuple.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 (informational). Impact: none. Concrete fix: none. The remaining unsupported Conley surfaces are still properly tracked in TODO.md:L119-L120, so I do not treat them as blockers.

Security

No findings.

Documentation/Tests

No additional findings. The previous review’s estimator-level combined-kernel test gap appears resolved by tests/test_conley_vcov.py:L3210-L3309, and the earlier DiD unit validation gap is covered at tests/test_conley_vcov.py:L1081-L1100.
Review note: I could not run the suite here because pandas is not installed in the sandbox.

Path to Approval

Add front-door DiD validation for conley_coords: require a 2-element column-name tuple/list and verify both columns exist before data[...] extraction in diff_diff/estimators.py.
Add regression tests for invalid DiD conley_coords inputs, at minimum a missing column name and a malformed tuple shape.

github-actions · 2026-05-14T17:24:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: eea4463a8b62256a183491766a48a2ae6fc55ed4

Overall Assessment
✅ Looks good

Executive Summary

Re-review: the prior DiD front-door validation finding is resolved. DifferenceInDifferences.fit() now validates unit and conley_coords up front at diff_diff/estimators.py:L394-L431, with regressions for missing/unknown/malformed inputs at tests/test_conley_vcov.py:L1069-L1155.
The prior absorb/raw-time-label correctness issue is also resolved. The DiD Conley path now reads raw time/unit/coord columns from data, not absorb-demeaned working_data, at diff_diff/estimators.py:L538-L561, with a dedicated regression at tests/test_conley_vcov.py:L1203-L1249.
Methodology alignment looks sound. The new DiD support, combined spatial+cluster product kernel, sparse Bartlett kd-tree path, and callable-metric validation are all documented in docs/methodology/REGISTRY.md:L2947-L3145, and the implementation matches those documented contracts in diff_diff/conley.py:L118-L191, diff_diff/conley.py:L225-L375, diff_diff/conley.py:L507-L581, and diff_diff/conley.py:L695-L777.
I did not find any unmitigated P0/P1 issues in the estimator math, SE construction, or inference handling.
One minor P3 remains: summary() does not distinguish pure Conley from the new Conley+cluster path because the Conley variance label ignores cluster_name in diff_diff/results.py:L77-L83.

Methodology

Severity: P3 (informational). Impact: none. Concrete fix: none. The PR’s non-paper / non-R choices are documented rather than implicit: DiD support and panel restrictions docs/methodology/REGISTRY.md:L2947-L2972, time-label normalization docs/methodology/REGISTRY.md:L3005-L3016, combined spatial+cluster kernel docs/methodology/REGISTRY.md:L3053-L3090, sparse Bartlett path docs/methodology/REGISTRY.md:L3092-L3121, and callable-metric validation docs/methodology/REGISTRY.md:L3123-L3135. I did not find an undocumented methodology deviation in the changed estimator/Conley code.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: human-readable summaries for explicit-cluster Conley fits still print the same generic Conley label as pure Conley because _format_vcov_label() ignores cluster_name for vcov_type=="conley" at diff_diff/results.py:L77-L83, even though summary rendering passes cluster_name into that helper at diff_diff/results.py:L217-L230. The new tests verify cluster_name/to_dict() for combined-kernel fits at tests/test_conley_vcov.py:L1638-L1662 and tests/test_conley_vcov.py:L3265-L3335, but the only summary regression still covers plain Conley at tests/test_conley_vcov.py:L1727-L1745.
Concrete fix: include cluster_name in the Conley summary label (for example, “Conley spatial HAC + cluster product kernel at , lag_cutoff=L”) and add a summary regression test for an explicit-cluster Conley fit.

Tech Debt

No findings. Remaining Conley follow-ups are still properly tracked in TODO.md:L119-L120.

Security

No findings.

Documentation/Tests

No findings.
Review note: I could not execute tests/test_conley_vcov.py in this sandbox because pytest and numpy are unavailable.

github-actions · 2026-05-14T18:08:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0a8b4389d66e8f18c5e865469b4747f64e4a0665

Overall Assessment
⚠️ Needs changes

Executive Summary

Re-review: the prior DiD Conley front-door validation, absorb/raw-time handling, and Conley summary-label issues look resolved in diff_diff/estimators.py:L394-L587, diff_diff/results.py:L49-L88, and the corresponding regressions at tests/test_conley_vcov.py:L1174-L1249 and tests/test_conley_vcov.py:L1750-L1768.
The Wave A methodology changes are documented in docs/methodology/REGISTRY.md:L2947-L3151, and the DiD wire-up, combined spatial+cluster kernel, and sparse Bartlett path generally match those documented contracts in diff_diff/conley.py:L378-L816, diff_diff/estimators.py:L385-L587, and diff_diff/twfe.py:L169-L381.
Severity P1 [Newly identified]: the new callable conley_metric validator omits the zero-diagonal/self-distance invariant. That leaves malformed callables that still pass validation and can silently misstate Conley SEs by downweighting the HC0 diagonal.
Remaining deferred non-blockers are appropriately tracked in TODO.md:L119-L120.

Methodology

Severity: P1 [Newly identified]. Impact: _validate_callable_metric_result() accepts any symmetric, finite, non-negative (n, n) matrix, even when d(i,i) > 0. But the Conley estimator relies on d(i,i)=0 so the i=j diagonal contributes the HC0 term via K(0)=1; that invariant is explicitly relied on by the registry’s variance formula and the existing diagonal-reduction test. A callable with positive diagonal therefore passes the new boundary validation yet can silently attenuate or eliminate the HC0 diagonal contribution, yielding incorrect Conley SEs. This is not mitigated by REGISTRY.md; the current registry text at L3123-L3135 describes only the partial check set, not a deliberate deviation. Concrete fix: extend the validator to require a zero diagonal within tolerance, raise a targeted ValueError when violated, and document that invariant alongside the other callable checks. diff_diff/conley.py:L117-L167, docs/methodology/REGISTRY.md:L2976-L2980, docs/methodology/REGISTRY.md:L3123-L3135, tests/test_conley_vcov.py:L183-L257, tests/test_conley_vcov.py:L2217-L2233

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Remaining Conley follow-ups are properly tracked in TODO.md:L119-L120.

Security

No findings.

Documentation/Tests

No additional findings beyond the methodology item above. Review note: I did not execute the test suite in this sandbox because numpy/pytest are unavailable.

Path to Approval

In diff_diff/conley.py, reject callable distance outputs whose diagonal is nonzero within tolerance, with an explicit error message that names the d(i,i)=0 contract.
Add a regression in tests/test_conley_vcov.py for a symmetric/finite/non-negative callable with positive diagonal, and update docs/methodology/REGISTRY.md so the callable contract explicitly includes zero self-distance.

github-actions · 2026-05-14T18:25:37Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 484620ecfd00afe4ed7c26ca2bc7b6514f9cba72

Overall Assessment
⛔ Blocker

Executive Summary

Re-review: the prior callable-conley_metric diagonal-invariant issue is resolved in diff_diff/conley.py (L117-L186), documented in docs/methodology/REGISTRY.md (L3123-L3142), and regression-covered in tests/test_conley_vcov.py (L253-L288).
The Wave A Conley changes themselves are generally aligned with the Methodology Registry: DiD Conley support, the documented spatial+cluster product kernel deviation, and the sparse Bartlett implementation all match the stated contract.
Severity P0 [Newly identified]: MultiPeriodDiD still builds Conley kernel inputs from working_data after absorb demeaning, so overlapping covariates can silently corrupt the geography/time inputs and misstate Conley SEs.
I could not execute pytest here: this sandbox is missing pytest and runtime Python deps such as numpy, so this is a static review.

Methodology

Severity: P0 [Newly identified]. Impact: in MultiPeriodDiD, the absorb path demeans every listed covariate in working_data at diff_diff/estimators.py (L1549-L1563), but the Conley branch later constructs conley_coords, conley_time, and conley_unit from that same working_data at diff_diff/estimators.py (L1671-L1683). If a caller includes lat/lon or time in covariates, the Conley kernel is then built on residualized inputs rather than the original geography/period labels. With absorb=["unit"] and time-invariant coordinates, that can collapse coordinates to zero within unit and silently produce wrong Conley SEs. This is the same raw-data contract the PR already fixed for DiD at diff_diff/estimators.py (L538-L561) and regression-tested at tests/test_conley_vcov.py (L1246-L1288), so the pattern fix is only partial. Concrete fix: mirror the DiD/TWFE contract in MultiPeriodDiD.fit and always source conley_coords, conley_time, and conley_unit from raw data, not working_data, then add a regression that proves the raw arrays reach _compute_conley_vcov under absorb with overlapping covariates.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No separate findings beyond the methodology blocker. The previously reported callable-metric validator gap is now documented and tested at diff_diff/conley.py, docs/methodology/REGISTRY.md, and tests/test_conley_vcov.py. I was not able to run the suite in this environment.

Path to Approval

Update MultiPeriodDiD.fit so the Conley kernel inputs (conley_coords, conley_time, conley_unit) are always taken from raw data, matching the new DiD/TWFE contract.
Add a MultiPeriodDiD regression with absorb=["unit"] and overlapping Conley-input covariates that fails on the current code and proves the raw arrays are what reach _compute_conley_vcov.

github-actions · 2026-05-14T18:39:45Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e03318dcbf8e7c683a2f9e97e4b9278ee10005ca

Overall Assessment
⚠️ Needs changes

Executive Summary

Re-review: the prior blocker on MultiPeriodDiD sourcing Conley inputs from absorb-demeaned working_data is resolved; it now uses raw data and has regression coverage in diff_diff/estimators.py:L1668-L1692 and tests/test_conley_vcov.py:L1294-L1375.
I did not find an unmitigated methodology mismatch in the Conley math itself. DiD support, the documented spatial+cluster deviation, and callable-metric validation are aligned with the updated Registry in docs/methodology/REGISTRY.md:L2947-L3155 and the implementation in diff_diff/conley.py:L117-L186 and diff_diff/conley.py:L640-L648.
Severity: P1 [Newly identified]. The PR only front-door-fixes malformed/missing conley_coords on DiD; the corresponding MultiPeriodDiD and TwoWayFixedEffects Conley paths still fall through to raw KeyError/IndexError on the same bad inputs.
Severity: P2. The sparse Conley auto-toggle keys only on n/metric/kernel, so dense-neighborhood cutoffs can still materialize O(n^2) CSR storage and negate the claimed memory benefit.
I could not run the test suite here: pytest and runtime deps such as numpy are not installed in this sandbox.

Methodology
No findings. The Conley changes that deviate from the paper/R are explicitly documented in docs/methodology/REGISTRY.md with note/deviation labels, so under the review rubric I am treating those as non-defects.

Code Quality

Severity: P1 [Newly identified]. Impact: DiD now validates malformed or missing conley_coords up front in diff_diff/estimators.py:L416-L431, with regression coverage in tests/test_conley_vcov.py:L1145-L1198, but MultiPeriodDiD and TwoWayFixedEffects still only check for None in diff_diff/estimators.py:L1499-L1505 and diff_diff/twfe.py:L175-L188 and then directly index self.conley_coords[0] / [1] in diff_diff/estimators.py:L1680-L1684 and diff_diff/twfe.py:L359-L363. That is a partial fix of the same pattern bug: bad inputs that now raise clear estimator-level ValueError on DiD still crash opaquely on MPD/TWFE. Concrete fix: extract the DiD conley_coords arity/column validation into a shared helper and call it from all three Conley estimator entry points before raw column access.

Performance

Severity: P2. Impact: the sparse path auto-activates solely from n > 5_000, supported metric, and Bartlett kernel in diff_diff/conley.py:L697-L744, but _compute_spatial_bartlett_meat_sparse still stores every in-range pair in CSR form in diff_diff/conley.py:L348-L394. With a large cutoff or compact geography, that can make the “sparse” matrix effectively dense, use more memory than the dense float64 matrix, and turn the new auto-toggle into an OOM risk rather than an optimization. Concrete fix: after query_ball_tree, estimate nnz/density and fall back to dense or warn once neighbor density crosses a threshold; alternatively gate auto-sparse on cutoff relative to the data span instead of n alone.

Maintainability
No findings.

Tech Debt
No findings.

Security
No findings.

Documentation/Tests

Severity: P3. Impact: the new validation note for the documented combined-kernel deviation says the “huge-cutoff” fixture reduces exactly to pure within-cluster CR1 in docs/methodology/REGISTRY.md:L3077-L3090, and the test encodes that claim with Bartlett at huge_cutoff = 1e9 in tests/test_conley_vcov.py:L3186-L3208. For Bartlett, finite off-diagonal weights are 1 - d/h, so this is an asymptotic limit, not an exact finite-cutoff identity. Concrete fix: make the exact reduction anchor use conley_kernel="uniform", or keep Bartlett and rewrite both the note and the test as an approximation against the exact finite-cutoff Bartlett weights.

Path to Approval

Add the same malformed/missing conley_coords front-door validation now used by DifferenceInDifferences.fit() to MultiPeriodDiD.fit() and TwoWayFixedEffects.fit().
Add MPD/TWFE regressions that conley_coords=("missing_lat", "lon") and conley_coords=("lat",) raise estimator-level ValueError instead of raw pandas/Index errors.

github-actions · 2026-05-14T19:20:31Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 68678e9a4eadaed3ffd592c5442cf7902b7d3a88

Overall Assessment
⚠️ Needs changes

Executive Summary

Re-review: the prior Conley re-review items appear addressed. The raw-vs-demeaned Conley input bug is fixed by sourcing time/unit/coords from raw data on the DiD and MPD Conley paths (diff_diff/estimators.py:L503-L551, diff_diff/estimators.py:L1609-L1658), and the prior MPD/TWFE front-door conley_coords gap is closed via the shared validator in diff_diff/conley.py:L201-L301, wired from diff_diff/estimators.py:L394-L410, diff_diff/estimators.py:L1464-L1476, and diff_diff/twfe.py:L175-L187.
Methodology cross-check: the affected method is ConleySpatialHAC. DiD support reuses the documented block-decomposed panel sandwich, and the combined spatial+cluster kernel / time-label normalization are explicitly documented Note deviations in docs/methodology/REGISTRY.md:L2947-L3178, so I do not count them as methodology defects.
Severity: P1 [Newly identified]. The new Conley combined-kernel estimator paths still do not validate a bad cluster= column before data[self.cluster] access, so a misspelled cluster column falls through to a raw pandas KeyError on DiD, MPD, and TWFE instead of the estimator-level ValueError pattern the PR added for unit/conley_coords (diff_diff/estimators.py:L472-L477, diff_diff/estimators.py:L1562-L1567, diff_diff/twfe.py:L300-L305).
The earlier sparse-path dense-neighborhood concern is materially mitigated by the new density gate and corresponding warning (diff_diff/conley.py:L462-L479, docs/methodology/REGISTRY.md:L3129-L3141).
I did not execute the test suite here; python -m pytest --version fails because pytest is not installed in this sandbox.

Methodology

No findings. ConleySpatialHAC’s changed estimator math is aligned with the updated registry contract in docs/methodology/REGISTRY.md:L2947-L3178. The combined spatial+cluster kernel, time-label normalization, and sparse bartlett-only path are documented deviations or implementation choices rather than undocumented methodology drift.

Code Quality

Severity: P1 [Newly identified]. Impact: cluster= is newly load-bearing on the Conley combined-kernel estimator paths, but none of the three entry points validates that the named cluster column exists before raw DataFrame indexing. A typo like cluster="missing_region" now raises an opaque pandas KeyError instead of a clear estimator-level ValueError on DifferenceInDifferences.fit, MultiPeriodDiD.fit, and TwoWayFixedEffects.fit (diff_diff/estimators.py:L472-L477, diff_diff/estimators.py:L1562-L1567, diff_diff/twfe.py:L300-L305). Concrete fix: add a shared front-door cluster-column existence check alongside _validate_conley_estimator_inputs (or immediately before each data[self.cluster] access) and raise ValueError("Cluster column '<name>' not found in data").

Performance

No findings. The prior “auto-sparse can still materialize near-dense storage” concern is meaningfully mitigated by _CONLEY_SPARSE_DENSITY_THRESHOLD and the dense fallback warning in diff_diff/conley.py:L462-L479.

Maintainability

No findings. The Conley metadata propagation and labeling gaps noted in the prior review are fixed: result serialization now carries conley_lag_cutoff, and TWFE’s summary label distinguishes bare Conley from the combined kernel (diff_diff/results.py:L77-L88, diff_diff/results.py:L269-L298, diff_diff/results.py:L568-L585, diff_diff/twfe.py:L551-L596).

Tech Debt

No findings. The TODO cleanup is consistent with the shipped scope: completed Wave A items are removed, while still-deferred survey-design and SyntheticDiD Conley follow-ups remain tracked in TODO.md:L119-L120.

Security

No findings.

Documentation/Tests

No additional findings. The earlier exactness mismatch on the “huge-cutoff reduces to pure within-cluster CR1” note is now corrected in the methodology registry, and the test anchor now uses the exact uniform-kernel identity (docs/methodology/REGISTRY.md:L3077-L3092, tests/test_conley_vcov.py:L3262-L3283).
Verification note: tests were not run in this environment because pytest is unavailable.

Path to Approval

Add estimator-level validation that an explicit cluster= column exists before any data[self.cluster] access on the Conley combined-kernel paths in DifferenceInDifferences.fit, MultiPeriodDiD.fit, and TwoWayFixedEffects.fit.
Add regressions for cluster="missing_region" on all three Conley estimator surfaces and assert the new estimator-level ValueError instead of a raw pandas KeyError.

github-actions · 2026-05-14T19:28:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 04a5fa1e3b98e2b8ea6394591585a51e188b0847

Overall Assessment
⚠️ Needs changes

Executive Summary

Re-review: the prior P1 about bad cluster= columns falling through to raw pandas KeyError is fixed by the shared front-door validator in diff_diff/conley.py:L201, now wired into DiD/MPD/TWFE at diff_diff/estimators.py:L394, diff_diff/estimators.py:L1465, and diff_diff/twfe.py:L175.
Methodology cross-check: the affected method is ConleySpatialHAC. DiD support, the combined spatial+cluster product kernel, time-label normalization, and the bartlett-only sparse path are all documented in the Methodology Registry, so I do not count them as methodology defects. See docs/methodology/REGISTRY.md:L2947, docs/methodology/REGISTRY.md:L3005, docs/methodology/REGISTRY.md:L3053, docs/methodology/REGISTRY.md:L3098, and docs/methodology/REGISTRY.md:L3143.
Severity: P1. On the new sparse+cluster path, the density gate measures all spatial neighbors before the cluster mask is applied, so clustered Conley fits can be forced onto the dense O(n²) fallback even when the actual combined kernel is sparse. That can reintroduce the production-time memory/runtime failure this PR is trying to remove on large clustered panels. See diff_diff/conley.py:L476, diff_diff/conley.py:L503, and diff_diff/conley.py:L883.
The new cluster_ids/cluster= interaction is also not directly regression-tested on the sparse branch: sparse tests are separate from cluster tests in tests/test_conley_vcov.py:L2751 and tests/test_conley_vcov.py:L3272.
I did not execute the test suite in this sandbox; this is a static review.

Methodology

Severity: P3-informational. Impact: No unmitigated methodology defects found. The Wave A changes all land under the documented ConleySpatialHAC contract, and the non-paper extensions are explicitly labeled as notes/deviations in the registry rather than shipped silently. Concrete fix: None.

Code Quality

No findings.

Performance

Severity: P1. Impact: _compute_spatial_bartlett_meat_sparse() computes density from count_neighbors() before applying cluster_codes, but the actual sparse kernel drops cross-cluster pairs afterward. On large datasets with many small clusters, the helper can therefore misclassify a genuinely sparse combined kernel as “dense” and fall back to full pairwise distances, which is exactly the new production-error mode this PR is supposed to avoid for vcov_type="conley", cluster=. See diff_diff/conley.py:L476 and diff_diff/conley.py:L503. Concrete fix: make the density gate cluster-aware by counting only within-cluster in-range pairs, or thresholding per cluster slice, before returning None to the dense path.

Maintainability

No findings.

Tech Debt

No findings. The deferred weighted/survey and SyntheticDiD Conley work remains tracked in TODO.md:L117 and TODO.md:L118.

Security

No findings.

Documentation/Tests

Severity: P1. Impact: The new cluster_ids parameter is not tested against all new Conley code paths. Sparse coverage exercises unclustered dense-vs-sparse parity, and cluster coverage exercises dense/estimator behavior, but there is no regression for sparse clustered Conley even though the clustered sparse branch is now load-bearing at diff_diff/conley.py:L374, diff_diff/conley.py:L503, and diff_diff/conley.py:L883. Concrete fix: add dense-vs-sparse parity tests for clustered Conley on both cross-sectional and panel block-decomposed fixtures, including a case where global spatial density is high but within-cluster density is low.

Path to Approval

Make the sparse density heuristic cluster-aware so vcov_type="conley", cluster= does not spuriously fall back to dense O(n²) work when the post-mask kernel is still sparse.
Add regressions that force _conley_sparse=True and _conley_sparse=False on clustered Conley inputs and assert parity on both the cross-sectional and panel paths.
Add one clustered sparse test where overall spatial neighborhoods are dense but within-cluster neighborhoods are sparse, and assert the helper stays on the sparse path instead of warning and falling back to dense.

github-actions · 2026-05-14T19:40:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 75beec7314b72faf5b29e8e7f107d533a41d94e2

Overall Assessment
✅ Looks good

Executive Summary

Re-review: the prior sparse+cluster P1 is addressed by the cluster-aware density refinement in diff_diff/conley.py:L473-L509 and the new clustered sparse parity/density-gate regressions in tests/test_conley_vcov.py:L3141-L3265.
The affected method is ConleySpatialHAC. I did not find an unmitigated implementation mismatch against the project’s documented methodology contract in docs/methodology/REGISTRY.md:L2947-L3178.
The earlier estimator-surface validation drift is fixed by the shared front-door validator in diff_diff/conley.py:L201-L310, now wired into DiD/MPD/TWFE at diff_diff/estimators.py:L394-L411, diff_diff/estimators.py:L1465-L1478, and diff_diff/twfe.py:L175-L188.
The absorb/raw-label bug is also addressed: DiD and MPD now source Conley coords/time/unit from raw data, not residualized working_data, in diff_diff/estimators.py:L504-L553 and diff_diff/estimators.py:L1611-L1635, with targeted regressions at tests/test_conley_vcov.py:L1343-L1475.
One P3 docs issue remains: the new methodology prose has a small math/wording inconsistency, but the code and tests are fine.

Methodology

Severity: P3-informational. Impact: No unmitigated methodology defects found in the implementation. DiD support, the combined spatial+cluster product kernel, the sparse Bartlett path, and callable-metric validation are all documented in the registry rather than shipped as silent deviations. Concrete fix: None.

Code Quality

No findings.

Performance

Severity: P3-informational. Impact: No remaining performance blocker in the changed code. The previous cluster-blind sparse fallback issue is fixed in diff_diff/conley.py:L473-L509 and covered by the new sparse+cluster tests in tests/test_conley_vcov.py:L3141-L3265. Concrete fix: None.

Maintainability

No findings.

Tech Debt

No findings. Remaining deferred Conley work is still tracked in TODO.md:L119-L120.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The registry’s huge-cutoff note writes the within-cluster CR1 limit as Σ_g X_g' ε_g ε_g' X_g', which is dimensionally wrong, and CHANGELOG.md:L11 omits both the registry’s uniform-kernel-only caveat for the exact finite-cutoff identity and the new zero-diagonal callable-metric requirement. The implementation and tests are correct; this is documentation-only. See docs/methodology/REGISTRY.md:L3083-L3092 and docs/methodology/REGISTRY.md:L3143-L3162. Concrete fix: change the registry expression to Σ_g X_g' ε_g ε_g' X_g, and mirror the “exact only for conley_kernel="uniform"; Bartlett is asymptotic” plus zero-diagonal callable contract in the changelog summary.

Static review only; I did not execute the test suite in this sandbox.

github-actions · 2026-05-14T20:10:48Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cd08440aae5f44cdf41af52a95b2c23f594ceca0

Overall assessment

✅ Looks good

Executive summary

Re-review: the earlier sparse-density, front-door validation, and raw-label/absorb concerns appear addressed in diff_diff/conley.py, diff_diff/conley.py, diff_diff/estimators.py, and diff_diff/twfe.py, with targeted regressions in tests/test_conley_vcov.py, tests/test_conley_vcov.py, and tests/test_conley_vcov.py.
The affected method is ConleySpatialHAC, plus estimator-surface wiring for DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects. I did not find an undocumented methodology mismatch in the implementation.
The combined spatial+cluster kernel is explicitly documented as a project convention in docs/methodology/REGISTRY.md, so it is non-blocking under the stated review rubric.
Remaining issues are documentation-only P3s: one broken llms-full.txt example and minor public-doc drift on the callable zero-diagonal contract / MultiPeriodDiD combined-kernel surface.
Static review only; I did not execute the test suite in this sandbox.

Methodology

Severity: P3-informational. Impact: No unmitigated methodology findings. The changed math still maps to the existing ConleySpatialHAC contract, and the non-paper extensions in this PR are documented rather than shipped as silent deviations in docs/methodology/REGISTRY.md and docs/methodology/REGISTRY.md. Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new findings. The lack of snippet smoke tests for .txt AI guides that let one doc example drift is already tracked in TODO.md.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new Conley code sample in diff_diff/guides/llms-full.txt uses DifferenceInDifferences at diff_diff/guides/llms-full.txt without importing it, so the example is not runnable as written. Because .txt AI guides are outside doc-snippet CI, this is non-blocking but still worth fixing. Concrete fix: add DifferenceInDifferences to the import list or change the example to use only imported classes; optionally fold this snippet into the smoke-test set tracked in TODO.md.
Severity: P3. Impact: The public Conley docs still have minor contract drift. The main callable-validation subsection correctly documents the zero-diagonal invariant in docs/methodology/REGISTRY.md, but the summary bullet in diff_diff/guides/llms-full.txt and the edge-case bullet in docs/methodology/REGISTRY.md omit it; separately, the Wave A changelog enumerates DiD/TWFE combined-kernel estimator surfaces but not the implemented MultiPeriodDiD path shown in diff_diff/estimators.py and tests/test_conley_vcov.py, see CHANGELOG.md. Concrete fix: align the guide/registry/changelog prose with the shipped contract by adding the zero-diagonal callable rule everywhere it is summarized and explicitly naming MultiPeriodDiD(cluster=..., vcov_type="conley") in the combined-kernel surface list.

… callable metric validation Four mechanical extensions on top of the Phase 1+2 Conley sandwich (PR #426). All four touch the same surface (conley.py + linalg.py + estimators.py + twfe.py + tests/test_conley_vcov.py). - DiD.fit() accepts `unit=<col>` as a fit-time kwarg (NOT on __init__; unused unless Conley is set; not part of get_params/set_params). - Drops the prior fit-time redirect to MultiPeriodDiD. - Validates unit/lag_cutoff/coords/cutoff_km; rejects survey_design and wild_bootstrap with NotImplementedError. - On a 2-period panel, matches MPD(...).fit(post_periods=[1], reference_period=0) bit-exactly (atol=1e-10). - compute_robust_vcov / LinearRegression / TWFE / DiD now accept cluster_ids alongside vcov_type='conley'; meat applies K_total[i,j] = K_space(d_ij/h) * 1{c_i == c_j}. - Validator enforces cluster-time invariance on the panel block- decomposed path (cluster constant within unit across periods). - Per-slice mask construction (NOT full n×n) preserves memory on panel paths; serial-component mask is trivially all-ones under the invariance contract. - TWFE auto-cluster on Conley path still silently dropped; explicit cluster=<col> opts into combined kernel. DiD has no auto-cluster, so opt-in is fully explicit. DiD-vs-TWFE asymmetry documented. - linalg validator's conley + cluster_ids reject and twfe's explicit-cluster reject both dropped. - _compute_spatial_bartlett_meat_sparse uses scipy.spatial.cKDTree.query_ball_tree to build a CSR sparse kernel matrix instead of materializing the dense n×n distance. - Auto-activates for n > _CONLEY_SPARSE_N_THRESHOLD (5_000) AND metric in {haversine, euclidean} AND kernel == "bartlett". - Bartlett-only gate: bartlett at u=1 returns exactly 0 so the sparse path safely drops at-cutoff pairs; uniform at u=1 is 1 and would require closed-interval query semantics incompatible with haversine chord projection roundoff. - Haversine projects to 3-D unit-sphere Cartesian; chord query radius matches arc-length cutoff with a 1e-12 relative epsilon for projection roundoff; exact great-circle is recomputed only for in-range neighbors. - Private _conley_sparse: Optional[bool] kwarg controls the toggle (None=auto, True=force, False=force dense; True with unsupported config raises). - Bit-identity parity vs dense at atol=1e-10 on synthetic fixtures; R parity at atol=1e-6 preserved on all 3 panel R fixtures with _conley_sparse=True forced. - Renames _CONLEY_DENSE_WARN_N -> _CONLEY_DENSE_OOM_WARN_N (20_000) to disambiguate from the new 5_000 sparse threshold; warning text differentiates sparse-eligible vs dense-fallback paths. - _validate_callable_metric_result checks shape (n,n), finite, non-negative, symmetric within atol=1e-10. - Each failure raises a targeted ValueError naming the violated invariant. Previously, malformed callables produced opaque BLAS errors deep in the pipeline. Tests - tests/test_conley_vcov.py: 36 new tests across TestConleySparse, TestConleySparseRParityForced, TestConleyCluster, TestConleyDistanceMetrics extension. - Existing DiD-rejection / TWFE-explicit-cluster-reject / linalg-conley-cluster-reject tests flipped to behavioral asserts. - test_did_conley_matches_mpd_post_periods_1 locks the DiD-vs-MPD bit-exact agreement on a 2-period panel. - Full regression sweep (test_conley_vcov + test_linalg + test_estimators + test_estimators_vcov_type + test_methodology_twfe + test_linalg_hc2_bm): 511 passed. Docs - docs/methodology/REGISTRY.md: new "Combined spatial + cluster product kernel" subsection with math + cluster-time-invariance contract + two-limit-fixture anchors; new "Performance / scale" subsection on the sparse path; new "Callable conley_metric validation" subsection; updated panel-API restrictions table; DiD-vs-TWFE asymmetry paragraph. - CHANGELOG.md: Unreleased Wave A entry; Phase 1+2 entry's "DiD continues to raise" / "cluster_ids raises" text updated to reflect the lifted rejects. - diff_diff/guides/llms.txt + llms-full.txt: DiD support + combined kernel + sparse path documented; restrictions list refreshed. - README.md: catalog one-line refresh. - TODO.md: rows for the four Wave A items removed; rows 121 (Conley + survey/weights, Bertanha-Imbens 2014) and 122 (SyntheticDiD Conley path, spatial-block bootstrap) retained for later waves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… leak Address Wave A codex review round 1 findings. P0 #1 — Sparse haversine wrong at cutoff > π·R_earth - `_compute_spatial_bartlett_meat_sparse` computed `chord_radius = 2·sin(cutoff/(2R))` directly. That function is monotone only on [0, πR]; for cutoff > π·R (~20,015 km, half-Earth circumference) it shrinks again, and the kd-tree silently drops pairs that still have positive Bartlett weight. The dense path saturates at πR via `_haversine_km`'s `np.clip(a, 0, 1)`; the sparse path now mirrors that by clamping the arc to π radians before the chord computation: arc_radians = min(cutoff / R_earth, π) chord_radius = 2 · sin(arc_radians / 2) At cutoff >= πR, chord_radius reaches 2 (sphere diameter), so the kd-tree captures all pairs. P0 #2 — DiD + Conley + absorb leaks demeaned time into the helper - `DifferenceInDifferences.fit()` built `_conley_time_arr`, `_conley_unit_arr`, and `_conley_coords_arr` from `working_data`, but `absorb` demeans columns in `working_data` (including `time` when `time` is listed in `absorb`, or whenever `time` appears in `vars_to_demean`). The Conley helper would then partition the within-period spatial sandwich on residualized floats instead of the true pre/post periods. Now reads from the original `data` frame, mirroring `TwoWayFixedEffects.fit` which has the same FWL-composability contract: meat is computed on demeaned scores but the kernel grid uses raw coords + time/unit. P2 — Regression coverage - `test_sparse_haversine_cutoff_above_half_earth_circumference`: forces sparse path with cutoff = 25,000 km (> π·R_earth ≈ 20,015 km); asserts dense/sparse parity at atol=1e-10. - `test_sparse_haversine_cutoff_at_exactly_half_earth_circumference`: boundary test at cutoff = π·R_earth exactly. - `test_did_conley_with_absorb_uses_raw_time_labels`: monkeypatches `_compute_conley_vcov` to capture the `time` arg, asserts the helper sees raw integer 0/1 labels (not absorb-demeaned floats). P3 — Stale docstrings reconciled - `diff_diff/estimators.py` DiD class docstring `vcov_type` entry: "rejected at fit-time" → Wave A support description. - `diff_diff/estimators.py` MultiPeriodDiD class docstring `vcov_type` entry: dropped the "cluster= raises" restriction (Wave A #119 lifts it via the combined kernel). - `diff_diff/linalg.py` `compute_robust_vcov` docstring: `vcov_type` entry updated for the combined spatial + cluster kernel. - `diff_diff/linalg.py` `LinearRegression.__init__` docstring: Conley contract updated for combined cluster kernel + DiD support. Full Wave A regression: 514 passed (up from 511 with the 3 new tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…el tests Address CI codex review round 1 (PR #433). P1 #1 — DiD.fit() missing unit-column existence check - The new Conley path validated `unit is not None` but never confirmed the named column actually exists, so a bad column name fell through to a raw `pandas` KeyError at `data[unit]` instead of a clear estimator-level message. Adds the same front-door guard that `MultiPeriodDiD.fit` and `TwoWayFixedEffects.fit` already have: if unit not in data.columns: raise ValueError(f"Unit column '{unit}' not found in data") P1 #2 — Combined-kernel estimator-path coverage - The low-level helper and TWFE-with-cluster were covered, but the DiD and MultiPeriodDiD public estimator paths with explicit `cluster=` were not. Adds four new tests in `TestConleyCluster`: * test_did_combined_kernel_finite_se_and_cluster_name: DiD + Conley + cluster='region' produces finite SE, propagates cluster_name to result + to_dict(), differs from no-cluster baseline (combined kernel zeros out cross-cluster pairs). * test_did_combined_kernel_time_varying_cluster_raises: panel time-invariance contract enforced on the DiD path. * test_mpd_combined_kernel_finite_se_and_cluster_name: same finite-SE + cluster_name + to_dict() invariants on MPD. * test_mpd_combined_kernel_time_varying_cluster_raises: panel time-invariance contract on MPD. - test_did_conley_unknown_unit_column_raises (Code Quality P1 regression): asserts the new ValueError fires before the bad pandas lookup. Targeted regression: tests/test_conley_vcov.py + test_estimators.py + test_methodology_twfe.py: 332 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 [new in R2] — DiD.fit() didn't validate conley_coords shape or column existence; malformed tuples or missing column names fell through to opaque IndexError/KeyError downstream. Mirrors the unit-column guard from R1. Adds two checks in the DiD Conley block: - conley_coords must be a 2-element tuple/list of strings (raises if arity wrong or any element non-string). - Each named column must exist in `data` (raises ValueError naming the missing column). Adds two regression tests in TestConleyEstimatorIntegration: - test_did_conley_unknown_coord_column_raises - test_did_conley_malformed_coord_tuple_raises (covers 1-element tuple and non-string element) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When an explicit cluster=<col> is combined with vcov_type='conley', the result summary now distinguishes the combined spatial + cluster product kernel from bare Conley and names the cluster column: Bare Conley: "Conley spatial HAC (1999), lag_cutoff=1" Combined kernel (Wave A #119): "Conley spatial HAC (1999) + cluster product kernel at region, lag_cutoff=1" Previously `_format_vcov_label()` ignored `cluster_name` on the Conley path, so combined-kernel fits printed the same label as pure Conley. Adds a regression test `test_twfe_conley_with_cluster_summary_label_names_kernel_and_cluster` covering the new label format. The existing bare-Conley summary test now also asserts the absence of the "+ cluster product kernel" suffix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 [new in R4] — `_validate_callable_metric_result` accepted any symmetric / finite / non-negative matrix, even when d(i, i) > 0. The Conley sandwich relies on K(0) = 1 to reduce the i=j term to the HC0 diagonal X_i ε_i² X_i'; a positive self-distance silently attenuates the HC0 contribution by K(d_ii / h) < 1 and misstates Conley SEs. Built-in metrics ("haversine", "euclidean") satisfy d(i, i) = 0 by construction, so the existing parity tests didn't catch this gap on the callable surface. Adds a sixth check to the callable validator: |d(i, i)| <= 1e-10 for all i, else ValueError naming the violated invariant ("nonzero self-distance ... requires d(i, i) = 0 so the kernel reduces to K(0) = 1 on the HC0 diagonal contribution"). Tests: - test_callable_metric_nonzero_diagonal_raises: symmetric / finite / non-negative with d(i, i) = 0.5 raises. - test_callable_metric_near_zero_diagonal_accepted: sub-tolerance diagonal noise (1e-13) accepted, mirroring the symmetry contract. - test_pairwise_distance_callable updated: the constant_metric fixture now zeroes its diagonal (otherwise it would fail the new check); behaviorally equivalent off-diagonal coverage. Docs: - docs/methodology/REGISTRY.md § "Callable conley_metric validation" extended: 6th check added to the list + a paragraph explaining the HC0-reduction rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P0 Methodology [new in R5] — `MultiPeriodDiD.fit` built `_conley_coords_arr` / `_conley_time_arr` / `_conley_unit_arr` from `working_data`, which is absorb-demeaned. If a caller listed any of those columns (coords, time, or unit) in `absorb`, the Conley helper would silently partition the within-period spatial sandwich on residualized values instead of the true geography/period labels and misstate Conley SEs. This is the same raw-data contract the local R1 fix established for DifferenceInDifferences (`estimators.py:L538-L561`); MultiPeriodDiD shared the same pattern but wasn't updated in that pass — codex CI R5 caught the holistic-fix omission. Now reads from `data` in MPD, mirroring DiD + TWFE (`twfe.py:L370-L371`). FWL composability holds: the meat is computed on demeaned scores but the kernel grid uses raw coords + time/unit. Adds `test_mpd_conley_with_absorb_uses_raw_coords_and_time` regression: monkeypatches `_compute_conley_vcov` to capture the time + coords args, asserts the helper sees raw integer time labels (0..T-1, not absorb- demeaned floats) AND n_units distinct (lat, lon) pairs (not collapsed to one demeaned mean per unit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… P3 uniform-kernel limit fix Three findings on R6, addressed via pattern-level fixes rather than single-site patches, after 5 cross-round findings across 2 mirror classes (validation-gap, raw-data-vs-working-data) signaled the symptoms were drifting between estimator surfaces. P1 (Code Quality) — shared estimator validator Validation gaps mirrored across DiD (R1, R2) and MPD/TWFE (R6) under the same `vcov_type='conley'` entry-point contract: missing/unknown unit column, malformed conley_coords arity, missing coord column, survey_design rejection, wild_bootstrap rejection. Replaces three separate inline blocks (one per estimator) with a single `_validate_conley_estimator_inputs(...)` helper in `conley.py`, called from `DifferenceInDifferences.fit()`, `MultiPeriodDiD.fit()`, and `TwoWayFixedEffects.fit()`. Single source of truth for the seven-check union; future estimator surfaces (HAD-Conley, SDID-Conley) pick up the same contract by one-line opt-in. P2 (Performance) — sparse density gate The sparse k-d-tree path's CSR storage (~12 bytes/nnz: data + indices + indptr) loses its memory advantage over dense float64 (8 bytes/cell) at ~67% density; at high density "sparse" can silently use MORE memory than dense. Adds `_CONLEY_SPARSE_DENSITY_THRESHOLD = 0.3` (well below break-even for a comfortable safety margin). After kd-tree build, `cKDTree.count_neighbors` (shares traversal with the subsequent `query_ball_tree`) measures actual neighbor density; if it exceeds the threshold, `_compute_spatial_bartlett_meat_sparse` returns None, the dispatcher falls back to the dense path, and a UserWarning surfaces the reason. Locks the auto-sparse contract: it only fires when it actually saves memory. P3 (Documentation/Tests) — uniform kernel for huge-cutoff limit `test_combined_kernel_reduces_to_pure_cluster_at_huge_cutoff` used Bartlett at cutoff=1e9; Bartlett gives K=1-d/h<1 for d>0, so the reduction to pure CR1 is asymptotic, not exact. Switched to uniform kernel (K=1 exactly for all in-cutoff pairs). REGISTRY note clarifies the kernel choice rationale. Tests: - test_sparse_density_gate_falls_back_to_dense_and_warns: tight-cluster + huge cutoff → 100% density → fallback + warning + result equals dense. - test_sparse_density_gate_does_not_trigger_below_threshold: realistic cutoff → no fallback / no warning. Full regression sweep: 493 passed (no regressions across conley_vcov, linalg, estimators, estimators_vcov_type, methodology_twfe). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ence P1 (Code Quality) [new in R7] — `cluster=<col>` is load-bearing on the Conley combined-kernel paths (Wave A #119) but none of DiD / MPD / TWFE validated that the named cluster column exists in `data` before the downstream `data[self.cluster]` access. A typo like `cluster="missing_region"` fell through to a raw pandas KeyError instead of the estimator-level ValueError pattern the rest of the Conley validation surface now uses. Same class as R1's unit-column guard and R2/R6's conley_coords guard: extends the shared `_validate_conley_estimator_inputs` helper added in R6 with an 8th check `if cluster is not None and cluster not in data.columns: raise ValueError("Cluster column ... not found in data")`. The three call sites in DiD/MPD/TWFE now pass `cluster=self.cluster` through and pick up the new guard via one-line opt-in. Future Conley surfaces that add cluster support get the validator's behavior for free. Tests: regressions on all three estimator surfaces (DiD/MPD/TWFE) asserting `cluster="missing_region"` raises the estimator-level ValueError before any pandas-level error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gressions Two related P1s on R8 (both about the sparse+cluster interaction). P1 (Performance) — density gate is now cluster-aware The R6 density gate counted ALL spatial in-range pairs before the cluster mask was applied, but the actual CSR matrix stores only within-cluster pairs (the inner row-loop drops cross-cluster neighbors before populating row/col/data lists). On large clustered panels — exactly the use case sparse+cluster is supposed to win — spatial density could be ~100% while within-cluster density is tiny, and we'd spuriously fall back to dense O(n²) work. Fix: when the unrefined spatial density exceeds the threshold AND `cluster_codes is not None`, refine by counting only within-cluster in-range pairs (per-cluster kd-trees, summed). The refinement runs only when needed (the gate is already cleared if spatial density is below threshold), so the per-cluster sub-tree builds are amortized. Stores the projected coordinates as `tree_coords` once on the haversine branch (was `xyz` locally before; renaming makes the per-cluster build symmetric across metrics). P1 (Documentation/Tests) — sparse+cluster regression coverage Sparse coverage previously exercised unclustered dense-vs-sparse parity; cluster coverage previously exercised dense/estimator behavior — but the combined sparse+cluster path was load-bearing and untested. Adds three regressions: - test_sparse_with_cluster_matches_dense: cross-sectional 600-row panel, 8 clusters, dense vs forced-sparse parity at atol=1e-10. - test_sparse_with_cluster_panel_matches_dense: 3-period block- decomposed panel, 200 units, 5 time-invariant clusters, dense vs forced-sparse parity at atol=1e-10. - test_sparse_density_gate_cluster_aware: tight spatial cluster (100% global density) + 50 small clusters → within-cluster density << 30%. Locks the refinement contract: NO density warning, sparse path executes, result matches dense. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P3 (Documentation/Tests, cosmetic) — pure docs polish. (1) docs/methodology/REGISTRY.md L3083-3092: huge-cutoff CR1 limit expression had an erroneous trailing prime on X_g, producing a dimensionally-inconsistent form. The CR1 meat for cluster g is `(X_g' ε_g)(X_g' ε_g)' = X_g' ε_g ε_g' X_g` (the last X_g is unprimed; the result is (k, k)). Fix: drop the trailing prime. (2) CHANGELOG.md Wave A entry: mirror two REGISTRY clarifications that were already in the methodology document but missing from the release note prose: - "Huge-cutoff reduction is EXACT only for `conley_kernel="uniform"`; for `bartlett` the identity is asymptotic since K_bartlett(u) < 1 for u > 0. The fixture anchor uses uniform for an exact identity check." - "Callable conley_metric validation now requires zero diagonal (`|d(i, i)| ≤ 1e-10`); rationale is the K(0) = 1 reduction to the HC0 diagonal X_i ε_i² X_i'." Implementation and tests are correct; this is documentation-only alignment with the codex R9 P3 finding. Loop stops here — R9 was already green per the stop criterion, but the user opted to polish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two P3 doc-only items from R10 (final ✅ Looks good review). (1) diff_diff/guides/llms-full.txt: the new DiD-Conley example used DifferenceInDifferences(...) but the import block at the top of the code sample didn't include it (only LinearRegression, MultiPeriodDiD, TwoWayFixedEffects). Example wasn't runnable as written. Added DifferenceInDifferences to the imports. (2) Doc-contract drift across three summary surfaces — the main methodology subsections already documented these contracts correctly, but downstream summary bullets/release notes were stale: - llms-full.txt:L2031 callable-validation bullet now spells out "shape (n, n), finite, non-negative, symmetric to atol=1e-10, AND zero on the diagonal" (was missing the zero-diagonal rule). - REGISTRY.md edge-case bullet for callable now names "non-zero- diagonal matrix" alongside the other rejection conditions, with pointer back to the main subsection. - CHANGELOG.md Wave A item (2) (combined kernel) now explicitly lists MultiPeriodDiD(vcov_type="conley", cluster="<col>") along- side DiD/TWFE; the MPD path is implemented + tested but was previously omitted from the surface enumeration. Implementation and tests unchanged; pure docs alignment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-14T21:31:16Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b05cd28cd432985103154d433977b0a27bcea24

Overall Assessment

✅ Looks good

Executive Summary

Re-review: the earlier P3 doc drift from the last AI pass appears addressed. The Conley guide now imports DifferenceInDifferences, documents the zero-diagonal callable contract, and the changelog now names the shipped MPD combined-kernel surface in diff_diff/guides/llms-full.txt:L1915-L1962, diff_diff/guides/llms-full.txt:L2030-L2031, and CHANGELOG.md:L12-L13.
Affected methods are ConleySpatialHAC, the panel block-decomposed Conley form, and the estimator wiring for DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects. I did not find any unmitigated P0/P1 methodology issues after cross-checking against docs/methodology/REGISTRY.md:L2954-L3196.
The earlier raw-label/absorb correctness risk looks closed: DiD, MPD, and TWFE now all source Conley time/unit/coord arrays from raw data rather than absorbed working frames in diff_diff/estimators.py:L504-L553, diff_diff/estimators.py:L1611-L1639, and diff_diff/twfe.py:L346-L379.
Remaining issues are minor P3s: one contradictory MPD warning/error path and one user-facing doc restriction drift.
Static review only; I did not execute the test suite in this sandbox.

Methodology

No findings. The documented Wave A extensions in docs/methodology/REGISTRY.md:L3071-L3196 are reflected in the implementation at diff_diff/conley.py:L562-L1006, diff_diff/estimators.py:L385-L553, diff_diff/estimators.py:L1460-L1661, and diff_diff/twfe.py:L169-L398.

Code Quality

Severity: P3
Impact: MultiPeriodDiD(vcov_type="conley", inference="wild_bootstrap") first warns that it will fall back to analytical inference at diff_diff/estimators.py:L1251-L1260, but the later Conley validator still raises NotImplementedError at diff_diff/estimators.py:L1465-L1478. That produces contradictory guidance on the same call.
Concrete fix: Skip the generic analytical-fallback warning when self.vcov_type == "conley" or otherwise centralize the Conley inference decision in one branch, then add a regression that asserts the intended MPD+Conley+wild_bootstrap behavior cleanly.

Performance

No findings.

Maintainability

No findings. The new shared _validate_conley_estimator_inputs helper in diff_diff/conley.py:L201-L310 reduces the validation drift that existed across DiD/MPD/TWFE.

Tech Debt

No findings. Remaining deferred Conley work is still explicitly tracked in TODO.md:L119-L120, so I would not block on it.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: the user-facing restriction bullets still understate the MultiPeriodDiD Conley reject surface. diff_diff/guides/llms-full.txt:L2022-L2029 only names DiD/TWFE for the inference="wild_bootstrap" rejection and only names DiD for the estimator-level survey_design= rejection, while code rejects MPD on the same Conley combinations at diff_diff/estimators.py:L1465-L1478. docs/methodology/REGISTRY.md:L3182-L3190 is also partially inconsistent with its earlier generic restriction at docs/methodology/REGISTRY.md:L2978-L2979.
Concrete fix: Harmonize the guide and registry bullets so they consistently say DifferenceInDifferences / MultiPeriodDiD / TwoWayFixedEffects reject Conley + inference="wild_bootstrap", and make the estimator-level survey_design= rejection equally explicit for MPD/TWFE.

…ey path + harmonize doc restriction bullets Two P3 doc-and-UX items from R11 (✅ Looks good overall). P3 #1 (Code Quality) — contradictory warning + error on MPD+Conley +wild_bootstrap `MultiPeriodDiD.fit()` had an unconditional UserWarning at the top of fit() saying wild_bootstrap "is not yet supported, using analytical instead" — but the Conley validator below raises NotImplementedError on `vcov_type="conley", inference="wild_bootstrap"`. Users hitting that combination got "warn we're falling back" followed immediately by "actually we're raising," which is contradictory guidance on the same call. Fix: gate the warning on `self.vcov_type != "conley"`. The Conley validator's raise takes precedence. Adds regression `test_mpd_conley_wild_bootstrap_raises_without_warning` asserting the NotImplementedError fires with no analytical-fallback warning. P3 #2 (Documentation/Tests) — restriction bullets understated MPD's reject surface - `llms-full.txt:L2026` only named DiD/TWFE for the `inference="wild_bootstrap"` reject; MPD also rejects this combo via the shared validator. Updated to enumerate all three. - `llms-full.txt:L2027` only named DiD for the estimator-level `survey_design=` reject; MPD and TWFE both raise the same on this combo. Updated to enumerate all three. - `docs/methodology/REGISTRY.md:L3188` had the same drift; updated in parallel and added a parenthetical noting the MPD warning suppression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-14T21:39:48Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 054e2ee962bf0799989ab41f0783ae50637fcda2

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: the prior P3s appear addressed. The MultiPeriodDiD Conley path now suppresses the contradictory wild-bootstrap fallback warning in diff_diff/estimators.py:L1251-L1265, and the user-facing restrictions are harmonized in diff_diff/guides/llms-full.txt:L2022-L2031 and docs/methodology/REGISTRY.md:L3183-L3197.
Affected methods are ConleySpatialHAC, the R conleyreg-form panel block-decomposed sandwich, the documented combined spatial + cluster product kernel, the sparse Bartlett fast path, and callable metric validation.
I did not find any unmitigated P0/P1 methodology issues. The implementation in diff_diff/conley.py:L368-L1006, estimator wiring in diff_diff/estimators.py:L385-L552, diff_diff/estimators.py:L1465-L1665, and diff_diff/twfe.py:L169-L419, and the Registry text in docs/methodology/REGISTRY.md:L2965-L3197 are aligned.
The earlier raw-label / absorb composability correctness risk remains closed: DiD and MPD now source Conley coords/time/unit from raw data, matching TWFE’s existing pattern in diff_diff/estimators.py:L504-L552, diff_diff/estimators.py:L1616-L1644, and diff_diff/twfe.py:L346-L373.
Static review only. I could not execute the tests here because this sandbox does not have pytest, and the Python environment also lacks the project deps (numpy import failed).

Methodology

No findings. The documented deviations and extensions in docs/methodology/REGISTRY.md:L3015-L3048 and docs/methodology/REGISTRY.md:L3071-L3197 are reflected in code, including the panel time-label normalization, combined-kernel contract, sparse bartlett-only gate, and callable zero-diagonal requirement.

Code Quality

No findings. The shared _validate_conley_estimator_inputs() helper in diff_diff/conley.py:L201-L310 removes the validation drift that had previously existed across DiD / MPD / TWFE entry points.

Performance

No findings. The sparse Bartlett implementation, density gate, and dense fallback behavior are internally consistent with the documented performance contract in diff_diff/conley.py:L368-L559 and docs/methodology/REGISTRY.md:L3116-L3159.

Maintainability

No findings. Variance provenance is now surfaced consistently for the changed result surfaces through cluster_name / conley_lag_cutoff handling in diff_diff/results.py:L49-L89, diff_diff/results.py:L269-L312, and diff_diff/results.py:L708-L758.

Tech Debt

No findings. The remaining Conley follow-ups are still explicitly tracked in TODO.md:L119-L120, and the completed Wave A items were removed from the table.

Security

No findings.

Documentation/Tests

No findings. The earlier documentation drift is resolved in diff_diff/guides/llms-full.txt:L2022-L2031 and docs/methodology/REGISTRY.md:L3183-L3197.
Validation note: test execution was not possible in this sandbox because pytest and the Python dependencies are unavailable.

igerber and others added 12 commits May 14, 2026 17:16

igerber force-pushed the spillover-conley-wave-a branch from cd08440 to 3b05cd2 Compare May 14, 2026 21:23

igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026

igerber merged commit 9ac8f68 into main May 14, 2026
31 of 32 checks passed

igerber deleted the spillover-conley-wave-a branch May 14, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conley Wave A: DiD/cluster combo, sparse k-d-tree, callable metric#433

Conley Wave A: DiD/cluster combo, sparse k-d-tree, callable metric#433
igerber merged 13 commits into
mainfrom
spillover-conley-wave-a

igerber commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 14, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant