Skip to content

Fix #401 holistic audit residuals: two stale docstrings around dCDH by_path / paths_of_interest#434

Merged
igerber merged 4 commits into
mainfrom
fix-audit-401-r2
May 14, 2026
Merged

Fix #401 holistic audit residuals: two stale docstrings around dCDH by_path / paths_of_interest#434
igerber merged 4 commits into
mainfrom
fix-audit-401-r2

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 14, 2026

Summary

Holistic re-audit of merged #401 (dCDH by_path non-binary integer treatment + paths_of_interest Python-only selector) + #419 (R-parser caveat broadening cleanup). The per-PR CI cleanup review on #419 couldn't see the combined post-PR holistic state; 3 rounds of local agentic codex review surfaced 2 P2 docstring fixes.

  • _validate_and_aggregate_to_cells Raises docstring (chaisemartin_dhaultfoeuille.py): said the helper raises on "non-binary raw treatment values", but the shipped implementation accepts continuous d_gt cell means and defers integer-only enforcement to fit() time per the by_path / paths_of_interest contract. Rewritten to enumerate the actual failure modes.
  • path_sup_t_bands field-adjacent comment (chaisemartin_dhaultfoeuille_results.py): said the field is populated only when by_path is a positive int, but paths_of_interest also activates it. Updated to match the public docstring above.

2 files, +19/-12 lines. No methodology changes, no behavior changes — purely documentation alignment on source-adjacent comments. Public docstrings and REGISTRY have been correct all along.

Empirical: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). The R-parser caveat broadening in #419 is narrow, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return scaling with cleanup-PR scope.

Test plan

  • CI green (no test changes; docstring-only edits)
  • CI AI review on the residual-fix diff

🤖 Generated with Claude Code

…y_path / paths_of_interest support

Holistic re-audit of merged #401 (dCDH by_path non-binary integer treatment + `paths_of_interest` Python-only selector) + #419 (cleanup PR broadening R-parser caveat). Per-PR CI review on #419 couldn't see the combined post-PR holistic state. Local agentic codex review against the combined diff (3 rounds) surfaced 2 P2 docstring fixes:

- `_validate_and_aggregate_to_cells` (`chaisemartin_dhaultfoeuille.py` aggregator helper): the `Raises` block still said the helper raises on "non-binary raw treatment values", but the shipped implementation now accepts continuous `d_gt` cell means and defers integer-only enforcement to `fit()` time (the `by_path` / `paths_of_interest` contract). Updated the Raises section to enumerate the actual failure modes (missing columns, NaN, non-coercible, within-cell-varying) and explain where the integer-only check actually lives.
- `path_sup_t_bands` dataclass field-adjacent comment (`chaisemartin_dhaultfoeuille_results.py`): comment still said the field is populated when `by_path` is a positive int AND `n_bootstrap > 0`, but `paths_of_interest` also activates the field. Updated to match the public docstring above (EITHER `by_path` OR `paths_of_interest` AND `n_bootstrap > 0`).

No methodology changes, no behavior changes to fit() outputs — both are documentation-only edits on dataclass field comments and a helper Raises block. The methodology contract has been correct on the public docstrings + REGISTRY all along; only the source-adjacent comments lagged.

Empirical observation: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). #401's cleanup PR #419 was a narrow R-parser caveat broadening, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return-on-investment scaling with cleanup-PR scope.
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Methodology

Code Quality

  • Severity: None. Impact: No executable code changed, so there is no new inline-inference anti-pattern, NaN-propagation regression, control-group change, or parameter-propagation risk in scope. Concrete fix: None.

Performance

  • Severity: None. Impact: Runtime and memory behavior are unchanged by a docstring/comment-only diff. Concrete fix: None.

Maintainability

Tech Debt

  • Severity: None. Impact: No new deferred correctness debt was introduced, and the diff does not rely on a TODO.md mitigation. Concrete fix: None.

Security

  • Severity: None. Impact: No new I/O, parsing, network, secret, or permissions surface changed. Concrete fix: None.

Documentation/Tests

  • Severity: P3. Impact: _validate_and_aggregate_to_cells() still documents only treatment/outcome NaN failures, but the implementation also raises on NaN in group or time before groupby to avoid silent key drops diff_diff/chaisemartin_dhaultfoeuille.py:L177-L193. That leaves the revised “actual failure modes” text incomplete for callers of both fit() and twowayfeweights(). Concrete fix: extend the helper contract / Raises text to mention group/time NaN rejection; if you want the “matching REGISTRY.md” claim to be literal, mirror that note in docs/methodology/REGISTRY.md:L486-L487.
  • Severity: None. Impact: I did not run the test suite; based on static inspection, this doc-only diff does not require new behavioral tests. Concrete fix: None.

CI AI review R1 P3: the revised Raises block was incomplete — it
documented treatment/outcome NaN failures but omitted the existing
group/time NaN rejection at chaisemartin_dhaultfoeuille.py:177-193,
which exists because groupby silently drops NaN keys. Added the
group/time NaN clause to the Raises text with the rationale inline.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 34cf5a1e9fa06faf878747db937636029f04a70d


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review result: the prior documentation finding is resolved. _validate_and_aggregate_to_cells() now documents the pre-groupby group/time NaN rejection that the implementation already performs at diff_diff/chaisemartin_dhaultfoeuille.py:L143-L158 and diff_diff/chaisemartin_dhaultfoeuille.py:L180-L196.
  • No estimator logic, weighting, variance/SE, identification check, or default behavior changed in scope. The diff is documentation-only and remains consistent with the shipped dCDH contract in docs/methodology/REGISTRY.md:L485-L487 and the by-path notes at docs/methodology/REGISTRY.md:L645-L647.
  • The path_sup_t_bands comment now correctly includes the paths_of_interest activation path, matching the runtime gate at diff_diff/chaisemartin_dhaultfoeuille.py:L4211-L4250 and the public results docstring at diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L486.
  • One minor P3 remains: the touched path_sup_t_bands comment still points readers to chaisemartin_dhaultfoeuille.py:2865-2875, but the overall event_study_effects[l]["cband_conf_int"] propagation now lives at diff_diff/chaisemartin_dhaultfoeuille.py:L3329-L3338. This is documentation drift only, not a behavioral issue.

Methodology

  • Severity: None. Impact: No methodology defect in scope. The revised helper docstring matches the implemented separation between aggregation-time within-cell treatment-constancy checks and fit-time integer-coded by_path / paths_of_interest enforcement, consistent with diff_diff/chaisemartin_dhaultfoeuille.py:L266-L284, diff_diff/chaisemartin_dhaultfoeuille.py:L2070-L2080, and docs/methodology/REGISTRY.md:L485-L487. Concrete fix: None.

Code Quality

  • Severity: None. Impact: No executable code changed, so there is no new inline-inference anti-pattern, NaN-propagation regression, control-group change, or parameter-propagation risk in scope. Concrete fix: None.

Performance

  • Severity: None. Impact: Runtime and memory behavior are unchanged by this doc/comment-only diff. Concrete fix: None.

Maintainability

  • Severity: P3. Impact: The refreshed path_sup_t_bands comment still contains a stale hard-coded source reference at diff_diff/chaisemartin_dhaultfoeuille_results.py:L631-L633. That reference no longer points to the overall cband_conf_int write path, so future readers following it will land on unrelated warning code. Concrete fix: update that reference to diff_diff/chaisemartin_dhaultfoeuille.py:L3329-L3338, or better, replace literal line numbers with a symbolic description; ideally also update the sibling stale comment at diff_diff/chaisemartin_dhaultfoeuille.py:L3488-L3492.

Tech Debt

  • Severity: None. Impact: No new deferred correctness debt is introduced, and no TODO.md mitigation is needed for the in-scope changes. Concrete fix: None.

Security

  • Severity: None. Impact: No new I/O, parsing, secret, permission, or network surface changed. Concrete fix: None.

Documentation/Tests

  • Severity: None. Impact: The previous AI-review P3 is addressed by the expanded Raises block at diff_diff/chaisemartin_dhaultfoeuille.py:L143-L158, which now includes group/time NaN rejection matching diff_diff/chaisemartin_dhaultfoeuille.py:L180-L196. I did not run tests; for this doc-only diff I did not identify a behavior change that requires new coverage. Concrete fix: None.

CI AI review R3 P3: the comment referenced
chaisemartin_dhaultfoeuille.py:2865-2875 but the OVERALL
event_study_effects[l]['cband_conf_int'] propagation now lives at
~3329-3338 in the same file. The OVERALL site is inlined inside
fit() with no stable enclosing function name, so the cleanest fix
is to drop the line range entirely and reference the symbol +
enclosing fit() method (stable across moves). Per feedback memory
'audit-residual-drift-classes' item 6 (Hardcoded line refs).
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5d1e361cf374f6d9e3ff7590a84dcf1bc5bf0ece


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review result: the prior stale path_sup_t_bands comment-reference issue is resolved. The updated field comment now matches the public results docstring and the actual runtime gate at diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L486, diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640, and diff_diff/chaisemartin_dhaultfoeuille.py:L4211-L4250.
  • No estimator logic, weighting, variance/SE, identification assumptions, or defaults changed. This remains a documentation-only diff, and the updated text is consistent with the dCDH registry notes on non-binary treatment and per-path sup-t bands at docs/methodology/REGISTRY.md:L485-L487 and docs/methodology/REGISTRY.md:L645-L647.
  • _validate_and_aggregate_to_cells() now correctly documents the group/time NaN rejection and the separation between aggregation-time within-cell constancy and fit-time integer-coded by_path / paths_of_interest enforcement at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L158, diff_diff/chaisemartin_dhaultfoeuille.py:L180-L196, and diff_diff/chaisemartin_dhaultfoeuille.py:L2070-L2080.
  • One minor P3 remains: that revised Raises text still reads as unconditional, but survey-weighted fits intentionally drop zero-weight rows before NaN/coercion/within-cell validation at diff_diff/chaisemartin_dhaultfoeuille.py:L170-L178, as exercised in tests/test_survey_dcdh.py:L855-L916.

Methodology

  • Severity: None. Impact: The updated wording matches the shipped methodology contract: within-cell treatment constancy is enforced in the aggregation helper, while integer-coded non-binary treatment is only required when by_path / paths_of_interest is active in fit(). That is consistent with docs/methodology/REGISTRY.md:L485-L487, docs/methodology/REGISTRY.md:L645-L647, diff_diff/chaisemartin_dhaultfoeuille.py:L266-L284, and diff_diff/chaisemartin_dhaultfoeuille.py:L2070-L2080. Concrete fix: None.

Code Quality

  • Severity: None. Impact: No executable paths changed, so this diff does not introduce new inline-inference, NaN-propagation, control-group, or parameter-propagation defects. Concrete fix: None.

Performance

  • Severity: None. Impact: Runtime and memory behavior are unchanged by these doc/comment edits. Concrete fix: None.

Maintainability

  • Severity: None. Impact: The prior re-review P3 about the stale hard-coded path_sup_t_bands source pointer is addressed by the updated wording at diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640. Concrete fix: None.

Tech Debt

  • Severity: None. Impact: No new deferred correctness debt is introduced in scope, and I did not find a relevant TODO.md entry that changes the assessment. Concrete fix: None.

Security

  • Severity: None. Impact: No I/O, secret-handling, permission, or parsing surface changed. Concrete fix: None.

Documentation/Tests

  • Severity: P3. Impact: The rewritten Raises block at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L158 still overstates the weighted-path contract. _validate_and_aggregate_to_cells() filters zero-weight rows before NaN/coercion and within-cell-constancy checks (diff_diff/chaisemartin_dhaultfoeuille.py:L170-L178), and that out-of-sample behavior is regression-tested for mixed-treatment and NaN rows in tests/test_survey_dcdh.py:L855-L916. As written, the docstring implies those zero-weight rows would raise, which is not the shipped behavior. Concrete fix: qualify the Raises text to apply after zero-weight subpopulation filtering, e.g. “on in-sample / positive-weight rows” for weighted calls, and apply that qualifier consistently to the NaN, coercion, and within-cell-variation clauses.
  • Severity: None. Impact: I did not run tests; for this doc/comment-only diff I did not identify a behavioral change that requires new coverage. Concrete fix: None.

CI AI review R4 P3: the Raises text reads as unconditional but
survey-weighted fits drop zero-weight rows BEFORE running NaN /
coercion / within-cell validation, per the
SurveyDesign.subpopulation() out-of-sample contract
(chaisemartin_dhaultfoeuille.py:170-178). NaN values in zero-weight
rows therefore do not raise. Added a second paragraph to Raises
describing this scope qualifier so the contract is complete.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0434408bc4d854e5841e1fd795d30a2ad4816196


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review result: the prior P3 on the _validate_and_aggregate_to_cells() Raises docstring is resolved. The revised text now correctly scopes NaN/coercion/within-cell validation to positive-weight rows under survey weighting, matching the pre-filter in diff_diff/chaisemartin_dhaultfoeuille.py:L160-L165 and diff_diff/chaisemartin_dhaultfoeuille.py:L177-L185, with existing regression coverage at tests/test_survey_dcdh.py:L855-L916.
  • The helper docstring now also matches the shipped dCDH contract that continuous d_gt cell means are accepted at aggregation time, while integer-coded non-binary treatment is enforced later only when by_path / paths_of_interest is active in fit(), at diff_diff/chaisemartin_dhaultfoeuille.py:L154-L158 and diff_diff/chaisemartin_dhaultfoeuille.py:L2077-L2087.
  • The path_sup_t_bands field-adjacent comment now matches both the public results docstring and the runtime gate: (by_path or paths_of_interest) and n_bootstrap > 0, with the documented None vs {} empty-state contract preserved at diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640, diff_diff/chaisemartin_dhaultfoeuille_results.py:L462-L485, and diff_diff/chaisemartin_dhaultfoeuille.py:L4218-L4258.
  • No estimator logic, weighting, variance/SE, identification assumptions, or defaults changed. This remains a documentation-only diff.
  • The updated wording is consistent with the Methodology Registry notes on non-binary treatment, within-cell treatment constancy, paths_of_interest, and per-path sup-t bands at docs/methodology/REGISTRY.md:L485-L487 and docs/methodology/REGISTRY.md:L645-L647.

Methodology

  • Severity: None. Impact: No methodology deviation introduced. The updated docs align with the implemented dCDH contract for non-binary treatment, within-cell constancy, by_path / paths_of_interest, and per-path sup-t bands at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L165, diff_diff/chaisemartin_dhaultfoeuille.py:L2077-L2087, diff_diff/chaisemartin_dhaultfoeuille.py:L4218-L4258, and docs/methodology/REGISTRY.md:L485-L487, docs/methodology/REGISTRY.md:L645-L647. Concrete fix: None.

Code Quality

  • Severity: None. Impact: The diff changes only docstrings/comments and does not introduce new inline inference, NaN-propagation, control-group, or parameter-propagation defects. Concrete fix: None.

Performance

  • Severity: None. Impact: Runtime and memory behavior are unchanged. Concrete fix: None.

Maintainability

  • Severity: None. Impact: The modified comments now match the adjacent implementation and public API docs, removing the stale-doc drift that prompted the prior re-review note at diff_diff/chaisemartin_dhaultfoeuille.py:L145-L165 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L626-L640. Concrete fix: None.

Tech Debt

  • Severity: None. Impact: I did not find a relevant TODO.md entry that changes the assessment, and this PR does not add new deferred debt in scope. Concrete fix: None.

Security

  • Severity: None. Impact: No I/O, secret-handling, permission, or parsing surface changed. Concrete fix: None.

Documentation/Tests

  • Severity: None. Impact: The previous documentation finding is addressed, and the updated helper wording now matches already-tested survey-weight behavior at tests/test_survey_dcdh.py:L855-L916. I did not run tests for this review-only pass. Concrete fix: None.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026
@igerber igerber merged commit 7c1e9d0 into main May 14, 2026
31 of 32 checks passed
@igerber igerber deleted the fix-audit-401-r2 branch May 14, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant