Fix #401 holistic audit residuals: two stale docstrings around dCDH by_path / paths_of_interest#434
Conversation
…y_path / paths_of_interest support Holistic re-audit of merged #401 (dCDH by_path non-binary integer treatment + `paths_of_interest` Python-only selector) + #419 (cleanup PR broadening R-parser caveat). Per-PR CI review on #419 couldn't see the combined post-PR holistic state. Local agentic codex review against the combined diff (3 rounds) surfaced 2 P2 docstring fixes: - `_validate_and_aggregate_to_cells` (`chaisemartin_dhaultfoeuille.py` aggregator helper): the `Raises` block still said the helper raises on "non-binary raw treatment values", but the shipped implementation now accepts continuous `d_gt` cell means and defers integer-only enforcement to `fit()` time (the `by_path` / `paths_of_interest` contract). Updated the Raises section to enumerate the actual failure modes (missing columns, NaN, non-coercible, within-cell-varying) and explain where the integer-only check actually lives. - `path_sup_t_bands` dataclass field-adjacent comment (`chaisemartin_dhaultfoeuille_results.py`): comment still said the field is populated when `by_path` is a positive int AND `n_bootstrap > 0`, but `paths_of_interest` also activates the field. Updated to match the public docstring above (EITHER `by_path` OR `paths_of_interest` AND `n_bootstrap > 0`). No methodology changes, no behavior changes to fit() outputs — both are documentation-only edits on dataclass field comments and a helper Raises block. The methodology contract has been correct on the public docstrings + REGISTRY all along; only the source-adjacent comments lagged. Empirical observation: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). #401's cleanup PR #419 was a narrow R-parser caveat broadening, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return-on-investment scaling with cleanup-PR scope.
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI AI review R1 P3: the revised Raises block was incomplete — it documented treatment/outcome NaN failures but omitted the existing group/time NaN rejection at chaisemartin_dhaultfoeuille.py:177-193, which exists because groupby silently drops NaN keys. Added the group/time NaN clause to the Raises text with the rationale inline.
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI AI review R3 P3: the comment referenced chaisemartin_dhaultfoeuille.py:2865-2875 but the OVERALL event_study_effects[l]['cband_conf_int'] propagation now lives at ~3329-3338 in the same file. The OVERALL site is inlined inside fit() with no stable enclosing function name, so the cleanest fix is to drop the line range entirely and reference the symbol + enclosing fit() method (stable across moves). Per feedback memory 'audit-residual-drift-classes' item 6 (Hardcoded line refs).
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI AI review R4 P3: the Raises text reads as unconditional but survey-weighted fits drop zero-weight rows BEFORE running NaN / coercion / within-cell validation, per the SurveyDesign.subpopulation() out-of-sample contract (chaisemartin_dhaultfoeuille.py:170-178). NaN values in zero-weight rows therefore do not raise. Added a second paragraph to Raises describing this scope qualifier so the contract is complete.
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
Holistic re-audit of merged #401 (dCDH
by_pathnon-binary integer treatment +paths_of_interestPython-only selector) + #419 (R-parser caveat broadening cleanup). The per-PR CI cleanup review on #419 couldn't see the combined post-PR holistic state; 3 rounds of local agentic codex review surfaced 2 P2 docstring fixes._validate_and_aggregate_to_cellsRaises docstring (chaisemartin_dhaultfoeuille.py): said the helper raises on "non-binary raw treatment values", but the shipped implementation accepts continuousd_gtcell means and defers integer-only enforcement tofit()time per theby_path/paths_of_interestcontract. Rewritten to enumerate the actual failure modes.path_sup_t_bandsfield-adjacent comment (chaisemartin_dhaultfoeuille_results.py): said the field is populated only whenby_pathis a positive int, butpaths_of_interestalso activates it. Updated to match the public docstring above.2 files, +19/-12 lines. No methodology changes, no behavior changes — purely documentation alignment on source-adjacent comments. Public docstrings and REGISTRY have been correct all along.
Empirical: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). The R-parser caveat broadening in #419 is narrow, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return scaling with cleanup-PR scope.
Test plan
🤖 Generated with Claude Code