Fix #401 holistic audit residuals: two stale docstrings around dCDH by_path / paths_of_interest by igerber · Pull Request #434 · igerber/diff-diff

igerber · 2026-05-14T17:13:23Z

Summary

Holistic re-audit of merged #401 (dCDH by_path non-binary integer treatment + paths_of_interest Python-only selector) + #419 (R-parser caveat broadening cleanup). The per-PR CI cleanup review on #419 couldn't see the combined post-PR holistic state; 3 rounds of local agentic codex review surfaced 2 P2 docstring fixes.

_validate_and_aggregate_to_cells Raises docstring (chaisemartin_dhaultfoeuille.py): said the helper raises on "non-binary raw treatment values", but the shipped implementation accepts continuous d_gt cell means and defers integer-only enforcement to fit() time per the by_path / paths_of_interest contract. Rewritten to enumerate the actual failure modes.
path_sup_t_bands field-adjacent comment (chaisemartin_dhaultfoeuille_results.py): said the field is populated only when by_path is a positive int, but paths_of_interest also activates it. Updated to match the public docstring above.

2 files, +19/-12 lines. No methodology changes, no behavior changes — purely documentation alignment on source-adjacent comments. Public docstrings and REGISTRY have been correct all along.

Empirical: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). The R-parser caveat broadening in #419 is narrow, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return scaling with cleanup-PR scope.

Test plan

CI green (no test changes; docstring-only edits)
CI AI review on the residual-fix diff

🤖 Generated with Claude Code

…y_path / paths_of_interest support Holistic re-audit of merged #401 (dCDH by_path non-binary integer treatment + `paths_of_interest` Python-only selector) + #419 (cleanup PR broadening R-parser caveat). Per-PR CI review on #419 couldn't see the combined post-PR holistic state. Local agentic codex review against the combined diff (3 rounds) surfaced 2 P2 docstring fixes: - `_validate_and_aggregate_to_cells` (`chaisemartin_dhaultfoeuille.py` aggregator helper): the `Raises` block still said the helper raises on "non-binary raw treatment values", but the shipped implementation now accepts continuous `d_gt` cell means and defers integer-only enforcement to `fit()` time (the `by_path` / `paths_of_interest` contract). Updated the Raises section to enumerate the actual failure modes (missing columns, NaN, non-coercible, within-cell-varying) and explain where the integer-only check actually lives. - `path_sup_t_bands` dataclass field-adjacent comment (`chaisemartin_dhaultfoeuille_results.py`): comment still said the field is populated when `by_path` is a positive int AND `n_bootstrap > 0`, but `paths_of_interest` also activates the field. Updated to match the public docstring above (EITHER `by_path` OR `paths_of_interest` AND `n_bootstrap > 0`). No methodology changes, no behavior changes to fit() outputs — both are documentation-only edits on dataclass field comments and a helper Raises block. The methodology contract has been correct on the public docstrings + REGISTRY all along; only the source-adjacent comments lagged. Empirical observation: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). #401's cleanup PR #419 was a narrow R-parser caveat broadening, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return-on-investment scaling with cleanup-PR scope.

github-actions · 2026-05-14T17:17:39Z

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

The diff is documentation-only in diff_diff/chaisemartin_dhaultfoeuille.py:L143-L155 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L639; no estimator logic, weighting, variance, identification check, or default changed.
The _validate_and_aggregate_to_cells() edit correctly removes the stale claim that non-binary raw treatment is rejected at aggregation time; the actual integer-only gate for by_path / paths_of_interest remains in fit() at diff_diff/chaisemartin_dhaultfoeuille.py:L2067-L2077, consistent with docs/methodology/REGISTRY.md:L485-L487 and docs/methodology/REGISTRY.md:L645-L647.
The path_sup_t_bands comment now matches the implemented gate ((by_path or paths_of_interest) and n_bootstrap > 0) at diff_diff/chaisemartin_dhaultfoeuille.py:L4208-L4248 and the public results docstring at diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L486.
One P3 remains: the revised helper docstring still omits the existing NaN checks on group and time, so it does not yet fully enumerate the function’s failure modes diff_diff/chaisemartin_dhaultfoeuille.py:L109-L155, diff_diff/chaisemartin_dhaultfoeuille.py:L177-L193.

Methodology

Severity: None. Impact: No methodology defect. The changed text is consistent with the shipped dCDH contract that within-cell-varying treatment is rejected during aggregation diff_diff/chaisemartin_dhaultfoeuille.py:L263-L281, while integer-coded treatment becomes required only when per-path surfaces are requested in fit() diff_diff/chaisemartin_dhaultfoeuille.py:L2067-L2077. Concrete fix: None.

Code Quality

Severity: None. Impact: No executable code changed, so there is no new inline-inference anti-pattern, NaN-propagation regression, control-group change, or parameter-propagation risk in scope. Concrete fix: None.

Performance

Severity: None. Impact: Runtime and memory behavior are unchanged by a docstring/comment-only diff. Concrete fix: None.

Maintainability

Severity: None. Impact: The updated path_sup_t_bands comment reduces drift between the private field comment and the public results contract diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L639, diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L486. Concrete fix: None.

Tech Debt

Severity: None. Impact: No new deferred correctness debt was introduced, and the diff does not rely on a TODO.md mitigation. Concrete fix: None.

Security

Severity: None. Impact: No new I/O, parsing, network, secret, or permissions surface changed. Concrete fix: None.

Documentation/Tests

Severity: P3. Impact: _validate_and_aggregate_to_cells() still documents only treatment/outcome NaN failures, but the implementation also raises on NaN in group or time before groupby to avoid silent key drops diff_diff/chaisemartin_dhaultfoeuille.py:L177-L193. That leaves the revised “actual failure modes” text incomplete for callers of both fit() and twowayfeweights(). Concrete fix: extend the helper contract / Raises text to mention group/time NaN rejection; if you want the “matching REGISTRY.md” claim to be literal, mirror that note in docs/methodology/REGISTRY.md:L486-L487.
Severity: None. Impact: I did not run the test suite; based on static inspection, this doc-only diff does not require new behavioral tests. Concrete fix: None.

CI AI review R1 P3: the revised Raises block was incomplete — it documented treatment/outcome NaN failures but omitted the existing group/time NaN rejection at chaisemartin_dhaultfoeuille.py:177-193, which exists because groupby silently drops NaN keys. Added the group/time NaN clause to the Raises text with the rationale inline.

github-actions · 2026-05-14T17:24:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 34cf5a1e9fa06faf878747db937636029f04a70d

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review result: the prior documentation finding is resolved. _validate_and_aggregate_to_cells() now documents the pre-groupby group/time NaN rejection that the implementation already performs at diff_diff/chaisemartin_dhaultfoeuille.py:L143-L158 and diff_diff/chaisemartin_dhaultfoeuille.py:L180-L196.
No estimator logic, weighting, variance/SE, identification check, or default behavior changed in scope. The diff is documentation-only and remains consistent with the shipped dCDH contract in docs/methodology/REGISTRY.md:L485-L487 and the by-path notes at docs/methodology/REGISTRY.md:L645-L647.
The path_sup_t_bands comment now correctly includes the paths_of_interest activation path, matching the runtime gate at diff_diff/chaisemartin_dhaultfoeuille.py:L4211-L4250 and the public results docstring at diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L486.
One minor P3 remains: the touched path_sup_t_bands comment still points readers to chaisemartin_dhaultfoeuille.py:2865-2875, but the overall event_study_effects[l]["cband_conf_int"] propagation now lives at diff_diff/chaisemartin_dhaultfoeuille.py:L3329-L3338. This is documentation drift only, not a behavioral issue.

Methodology

Severity: None. Impact: No methodology defect in scope. The revised helper docstring matches the implemented separation between aggregation-time within-cell treatment-constancy checks and fit-time integer-coded by_path / paths_of_interest enforcement, consistent with diff_diff/chaisemartin_dhaultfoeuille.py:L266-L284, diff_diff/chaisemartin_dhaultfoeuille.py:L2070-L2080, and docs/methodology/REGISTRY.md:L485-L487. Concrete fix: None.

Code Quality

Severity: None. Impact: No executable code changed, so there is no new inline-inference anti-pattern, NaN-propagation regression, control-group change, or parameter-propagation risk in scope. Concrete fix: None.

Performance

Severity: None. Impact: Runtime and memory behavior are unchanged by this doc/comment-only diff. Concrete fix: None.

Maintainability

Severity: P3. Impact: The refreshed path_sup_t_bands comment still contains a stale hard-coded source reference at diff_diff/chaisemartin_dhaultfoeuille_results.py:L631-L633. That reference no longer points to the overall cband_conf_int write path, so future readers following it will land on unrelated warning code. Concrete fix: update that reference to diff_diff/chaisemartin_dhaultfoeuille.py:L3329-L3338, or better, replace literal line numbers with a symbolic description; ideally also update the sibling stale comment at diff_diff/chaisemartin_dhaultfoeuille.py:L3488-L3492.

Tech Debt

Severity: None. Impact: No new deferred correctness debt is introduced, and no TODO.md mitigation is needed for the in-scope changes. Concrete fix: None.

Security

Severity: None. Impact: No new I/O, parsing, secret, permission, or network surface changed. Concrete fix: None.

Documentation/Tests

Severity: None. Impact: The previous AI-review P3 is addressed by the expanded Raises block at diff_diff/chaisemartin_dhaultfoeuille.py:L143-L158, which now includes group/time NaN rejection matching diff_diff/chaisemartin_dhaultfoeuille.py:L180-L196. I did not run tests; for this doc-only diff I did not identify a behavior change that requires new coverage. Concrete fix: None.

CI AI review R3 P3: the comment referenced chaisemartin_dhaultfoeuille.py:2865-2875 but the OVERALL event_study_effects[l]['cband_conf_int'] propagation now lives at ~3329-3338 in the same file. The OVERALL site is inlined inside fit() with no stable enclosing function name, so the cleanest fix is to drop the line range entirely and reference the symbol + enclosing fit() method (stable across moves). Per feedback memory 'audit-residual-drift-classes' item 6 (Hardcoded line refs).

github-actions · 2026-05-14T17:30:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5d1e361cf374f6d9e3ff7590a84dcf1bc5bf0ece

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review result: the prior stale path_sup_t_bands comment-reference issue is resolved. The updated field comment now matches the public results docstring and the actual runtime gate at diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L486, diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640, and diff_diff/chaisemartin_dhaultfoeuille.py:L4211-L4250.
No estimator logic, weighting, variance/SE, identification assumptions, or defaults changed. This remains a documentation-only diff, and the updated text is consistent with the dCDH registry notes on non-binary treatment and per-path sup-t bands at docs/methodology/REGISTRY.md:L485-L487 and docs/methodology/REGISTRY.md:L645-L647.
_validate_and_aggregate_to_cells() now correctly documents the group/time NaN rejection and the separation between aggregation-time within-cell constancy and fit-time integer-coded by_path / paths_of_interest enforcement at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L158, diff_diff/chaisemartin_dhaultfoeuille.py:L180-L196, and diff_diff/chaisemartin_dhaultfoeuille.py:L2070-L2080.
One minor P3 remains: that revised Raises text still reads as unconditional, but survey-weighted fits intentionally drop zero-weight rows before NaN/coercion/within-cell validation at diff_diff/chaisemartin_dhaultfoeuille.py:L170-L178, as exercised in tests/test_survey_dcdh.py:L855-L916.

Methodology

Severity: None. Impact: The updated wording matches the shipped methodology contract: within-cell treatment constancy is enforced in the aggregation helper, while integer-coded non-binary treatment is only required when by_path / paths_of_interest is active in fit(). That is consistent with docs/methodology/REGISTRY.md:L485-L487, docs/methodology/REGISTRY.md:L645-L647, diff_diff/chaisemartin_dhaultfoeuille.py:L266-L284, and diff_diff/chaisemartin_dhaultfoeuille.py:L2070-L2080. Concrete fix: None.

Code Quality

Severity: None. Impact: No executable paths changed, so this diff does not introduce new inline-inference, NaN-propagation, control-group, or parameter-propagation defects. Concrete fix: None.

Performance

Severity: None. Impact: Runtime and memory behavior are unchanged by these doc/comment edits. Concrete fix: None.

Maintainability

Severity: None. Impact: The prior re-review P3 about the stale hard-coded path_sup_t_bands source pointer is addressed by the updated wording at diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640. Concrete fix: None.

Tech Debt

Severity: None. Impact: No new deferred correctness debt is introduced in scope, and I did not find a relevant TODO.md entry that changes the assessment. Concrete fix: None.

Security

Severity: None. Impact: No I/O, secret-handling, permission, or parsing surface changed. Concrete fix: None.

Documentation/Tests

Severity: P3. Impact: The rewritten Raises block at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L158 still overstates the weighted-path contract. _validate_and_aggregate_to_cells() filters zero-weight rows before NaN/coercion and within-cell-constancy checks (diff_diff/chaisemartin_dhaultfoeuille.py:L170-L178), and that out-of-sample behavior is regression-tested for mixed-treatment and NaN rows in tests/test_survey_dcdh.py:L855-L916. As written, the docstring implies those zero-weight rows would raise, which is not the shipped behavior. Concrete fix: qualify the Raises text to apply after zero-weight subpopulation filtering, e.g. “on in-sample / positive-weight rows” for weighted calls, and apply that qualifier consistently to the NaN, coercion, and within-cell-variation clauses.
Severity: None. Impact: I did not run tests; for this doc/comment-only diff I did not identify a behavioral change that requires new coverage. Concrete fix: None.

CI AI review R4 P3: the Raises text reads as unconditional but survey-weighted fits drop zero-weight rows BEFORE running NaN / coercion / within-cell validation, per the SurveyDesign.subpopulation() out-of-sample contract (chaisemartin_dhaultfoeuille.py:170-178). NaN values in zero-weight rows therefore do not raise. Added a second paragraph to Raises describing this scope qualifier so the contract is complete.

github-actions · 2026-05-14T17:35:29Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0434408bc4d854e5841e1fd795d30a2ad4816196

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review result: the prior P3 on the _validate_and_aggregate_to_cells() Raises docstring is resolved. The revised text now correctly scopes NaN/coercion/within-cell validation to positive-weight rows under survey weighting, matching the pre-filter in diff_diff/chaisemartin_dhaultfoeuille.py:L160-L165 and diff_diff/chaisemartin_dhaultfoeuille.py:L177-L185, with existing regression coverage at tests/test_survey_dcdh.py:L855-L916.
The helper docstring now also matches the shipped dCDH contract that continuous d_gt cell means are accepted at aggregation time, while integer-coded non-binary treatment is enforced later only when by_path / paths_of_interest is active in fit(), at diff_diff/chaisemartin_dhaultfoeuille.py:L154-L158 and diff_diff/chaisemartin_dhaultfoeuille.py:L2077-L2087.
The path_sup_t_bands field-adjacent comment now matches both the public results docstring and the runtime gate: (by_path or paths_of_interest) and n_bootstrap > 0, with the documented None vs {} empty-state contract preserved at diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640, diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L485, and diff_diff/chaisemartin_dhaultfoeuille.py:L4218-L4258.
No estimator logic, weighting, variance/SE, identification assumptions, or defaults changed. This remains a documentation-only diff.
The updated wording is consistent with the Methodology Registry notes on non-binary treatment, within-cell treatment constancy, paths_of_interest, and per-path sup-t bands at docs/methodology/REGISTRY.md:L485-L487 and docs/methodology/REGISTRY.md:L645-L647.

Methodology

Severity: None. Impact: No methodology deviation introduced. The updated docs align with the implemented dCDH contract for non-binary treatment, within-cell constancy, by_path / paths_of_interest, and per-path sup-t bands at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L165, diff_diff/chaisemartin_dhaultfoeuille.py:L2077-L2087, diff_diff/chaisemartin_dhaultfoeuille.py:L4218-L4258, and docs/methodology/REGISTRY.md:L485-L487, docs/methodology/REGISTRY.md:L645-L647. Concrete fix: None.

Code Quality

Severity: None. Impact: The diff changes only docstrings/comments and does not introduce new inline inference, NaN-propagation, control-group, or parameter-propagation defects. Concrete fix: None.

Performance

Severity: None. Impact: Runtime and memory behavior are unchanged. Concrete fix: None.

Maintainability

Severity: None. Impact: The modified comments now match the adjacent implementation and public API docs, removing the stale-doc drift that prompted the prior re-review note at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L165 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640. Concrete fix: None.

Tech Debt

Severity: None. Impact: I did not find a relevant TODO.md entry that changes the assessment, and this PR does not add new deferred debt in scope. Concrete fix: None.

Security

Severity: None. Impact: No I/O, secret-handling, permission, or parsing surface changed. Concrete fix: None.

Documentation/Tests

Severity: None. Impact: The previous documentation finding is addressed, and the updated helper wording now matches already-tested survey-weight behavior at tests/test_survey_dcdh.py:L855-L916. I did not run tests for this review-only pass. Concrete fix: None.

igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026

igerber merged commit 7c1e9d0 into main May 14, 2026
31 of 32 checks passed

igerber deleted the fix-audit-401-r2 branch May 14, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #401 holistic audit residuals: two stale docstrings around dCDH by_path / paths_of_interest#434

Fix #401 holistic audit residuals: two stale docstrings around dCDH by_path / paths_of_interest#434
igerber merged 4 commits into
mainfrom
fix-audit-401-r2

igerber commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 14, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant