From 466c05dbb9332fb3ee0fc787fba2e89ad0c59396 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Thu, 14 May 2026 13:13:07 -0400
Subject: [PATCH 1/4] Fix #401 holistic audit residuals: two stale docstrings
 around dCDH by_path / paths_of_interest support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Holistic re-audit of merged #401 (dCDH by_path non-binary integer treatment + `paths_of_interest` Python-only selector) + #419 (cleanup PR broadening R-parser caveat). Per-PR CI review on #419 couldn't see the combined post-PR holistic state. Local agentic codex review against the combined diff (3 rounds) surfaced 2 P2 docstring fixes:

- `_validate_and_aggregate_to_cells` (`chaisemartin_dhaultfoeuille.py` aggregator helper): the `Raises` block still said the helper raises on "non-binary raw treatment values", but the shipped implementation now accepts continuous `d_gt` cell means and defers integer-only enforcement to `fit()` time (the `by_path` / `paths_of_interest` contract). Updated the Raises section to enumerate the actual failure modes (missing columns, NaN, non-coercible, within-cell-varying) and explain where the integer-only check actually lives.
- `path_sup_t_bands` dataclass field-adjacent comment (`chaisemartin_dhaultfoeuille_results.py`): comment still said the field is populated when `by_path` is a positive int AND `n_bootstrap > 0`, but `paths_of_interest` also activates the field. Updated to match the public docstring above (EITHER `by_path` OR `paths_of_interest` AND `n_bootstrap > 0`).

No methodology changes, no behavior changes to fit() outputs — both are documentation-only edits on dataclass field comments and a helper Raises block. The methodology contract has been correct on the public docstrings + REGISTRY all along; only the source-adjacent comments lagged.

Empirical observation: pilot-401 converged in 3 codex rounds (vs 11 for the #402 pilot). #401's cleanup PR #419 was a narrow R-parser caveat broadening, so the holistic state didn't have many cross-surface drift opportunities — consistent with the holistic-pilot pattern's expected return-on-investment scaling with cleanup-PR scope.
---
 diff_diff/chaisemartin_dhaultfoeuille.py       | 13 ++++++++++---
 .../chaisemartin_dhaultfoeuille_results.py     | 18 +++++++++---------
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
index 7d5a7a94..36cc1242 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -143,9 +143,16 @@ def _validate_and_aggregate_to_cells(
     Raises
     ------
     ValueError
-        On missing columns, NaN treatment / outcome values, non-numeric
-        treatment / outcome that cannot be coerced, or non-binary raw
-        treatment values.
+        On missing columns; NaN values in the treatment / outcome columns;
+        non-numeric treatment / outcome that cannot be coerced via
+        ``pd.to_numeric``; or within-cell-varying treatment (any
+        ``(group, time)`` cell where ``d_min != d_max``, since fuzzy DiD
+        is out of scope and deferred to a separate dCdH 2018 paper).
+        Integer-coded non-binary treatment (the ``by_path`` /
+        ``paths_of_interest`` requirement) is enforced separately at
+        ``fit()`` time, not here at aggregation time — this helper
+        accepts continuous ``d_gt`` cell means and lets ``fit()`` decide
+        whether the integer-only contract applies.
     """
     # 1. Required columns
     missing = [c for c in (outcome, group, time, treatment) if c not in data.columns]
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
index 21286864..6b7c5f5f 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_results.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -625,16 +625,16 @@ class ChaisemartinDHaultfoeuilleResults:
     )
     # Per-path joint sup-t simultaneous-band metadata. Keyed by path
     # tuple; each entry holds `{"crit_value", "alpha", "n_bootstrap",
-    # "method", "n_valid_horizons"}`. Populated when `by_path` is a
-    # positive int AND `n_bootstrap > 0`. The joint band itself is
-    # written per-horizon as `cband_conf_int` on
-    # `path_effects[path]["horizons"][l]` (mirrors the OVERALL
-    # `event_study_effects[l]["cband_conf_int"]` pattern at
+    # "method", "n_valid_horizons"}`. Populated when EITHER `by_path` is
+    # a positive int OR `paths_of_interest` is non-empty AND
+    # `n_bootstrap > 0`. The joint band itself is written per-horizon as
+    # `cband_conf_int` on `path_effects[path]["horizons"][l]` (mirrors
+    # the OVERALL `event_study_effects[l]["cband_conf_int"]` pattern at
     # `chaisemartin_dhaultfoeuille.py:2865-2875`). Empty-state contract:
-    # `None` when not requested (no bootstrap or `by_path is None`); `{}`
-    # when requested but no path passed both gates (>=2 valid horizons
-    # AND a strict majority — more than 50% — of finite sup-t draws).
-    # The bands cover joint inference
+    # `None` when not requested (no bootstrap, or both `by_path` and
+    # `paths_of_interest` are `None`); `{}` when requested but no path
+    # passed both gates (>=2 valid horizons AND a strict majority — more
+    # than 50% — of finite sup-t draws). The bands cover joint inference
     # WITHIN a single path across horizons; they do NOT provide
     # simultaneous coverage across paths.
     path_sup_t_bands: Optional[Dict[Tuple[int, ...], Dict[str, Any]]] = field(

From 34cf5a1e9fa06faf878747db937636029f04a70d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Thu, 14 May 2026 13:21:21 -0400
Subject: [PATCH 2/4] R3 follow-up: extend Raises docstring with group/time NaN
 rejection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI AI review R1 P3: the revised Raises block was incomplete — it
documented treatment/outcome NaN failures but omitted the existing
group/time NaN rejection at chaisemartin_dhaultfoeuille.py:177-193,
which exists because groupby silently drops NaN keys. Added the
group/time NaN clause to the Raises text with the rationale inline.
---
 diff_diff/chaisemartin_dhaultfoeuille.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
index 36cc1242..f79609d9 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -143,7 +143,10 @@ def _validate_and_aggregate_to_cells(
     Raises
     ------
     ValueError
-        On missing columns; NaN values in the treatment / outcome columns;
+        On missing columns; NaN values in any of the ``group``, ``time``,
+        ``treatment``, or ``outcome`` columns (``group`` and ``time`` are
+        rejected pre-``groupby`` because ``groupby`` silently drops NaN
+        keys, which would change the estimation sample without warning);
         non-numeric treatment / outcome that cannot be coerced via
         ``pd.to_numeric``; or within-cell-varying treatment (any
         ``(group, time)`` cell where ``d_min != d_max``, since fuzzy DiD

From 5d1e361cf374f6d9e3ff7590a84dcf1bc5bf0ece Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Thu, 14 May 2026 13:27:07 -0400
Subject: [PATCH 3/4] R4: drop stale line-number ref in path_sup_t_bands
 comment

CI AI review R3 P3: the comment referenced
chaisemartin_dhaultfoeuille.py:2865-2875 but the OVERALL
event_study_effects[l]['cband_conf_int'] propagation now lives at
~3329-3338 in the same file. The OVERALL site is inlined inside
fit() with no stable enclosing function name, so the cleanest fix
is to drop the line range entirely and reference the symbol +
enclosing fit() method (stable across moves). Per feedback memory
'audit-residual-drift-classes' item 6 (Hardcoded line refs).
---
 diff_diff/chaisemartin_dhaultfoeuille_results.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
index 6b7c5f5f..8c6dd470 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_results.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -629,8 +629,9 @@ class ChaisemartinDHaultfoeuilleResults:
     # a positive int OR `paths_of_interest` is non-empty AND
     # `n_bootstrap > 0`. The joint band itself is written per-horizon as
     # `cband_conf_int` on `path_effects[path]["horizons"][l]` (mirrors
-    # the OVERALL `event_study_effects[l]["cband_conf_int"]` pattern at
-    # `chaisemartin_dhaultfoeuille.py:2865-2875`). Empty-state contract:
+    # the OVERALL `event_study_effects[l]["cband_conf_int"]` pattern
+    # populated alongside the bootstrap propagation in
+    # `chaisemartin_dhaultfoeuille.py::fit`). Empty-state contract:
     # `None` when not requested (no bootstrap, or both `by_path` and
     # `paths_of_interest` are `None`); `{}` when requested but no path
     # passed both gates (>=2 valid horizons AND a strict majority — more

From 0434408bc4d854e5841e1fd795d30a2ad4816196 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Thu, 14 May 2026 13:32:38 -0400
Subject: [PATCH 4/4] R5: document survey zero-weight pre-filter in Raises text

CI AI review R4 P3: the Raises text reads as unconditional but
survey-weighted fits drop zero-weight rows BEFORE running NaN /
coercion / within-cell validation, per the
SurveyDesign.subpopulation() out-of-sample contract
(chaisemartin_dhaultfoeuille.py:170-178). NaN values in zero-weight
rows therefore do not raise. Added a second paragraph to Raises
describing this scope qualifier so the contract is complete.
---
 diff_diff/chaisemartin_dhaultfoeuille.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
index f79609d9..92ba1476 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -156,6 +156,13 @@ def _validate_and_aggregate_to_cells(
         ``fit()`` time, not here at aggregation time — this helper
         accepts continuous ``d_gt`` cell means and lets ``fit()`` decide
         whether the integer-only contract applies.
+
+        Under the survey-weighted path (``weights`` is not ``None``),
+        zero-weight rows are pre-filtered before any NaN / coercion /
+        within-cell validation per the ``SurveyDesign.subpopulation()``
+        out-of-sample contract — invalid values in zero-weight rows
+        therefore do NOT raise. NaN / coercion / within-cell checks
+        still apply to all positive-weight rows.
     """
     # 1. Required columns
     missing = [c for c in (outcome, group, time, treatment) if c not in data.columns]