Skip to content

Migrate Benchmark (review follow-up to #731 + master merge)#761

Open
waltsims wants to merge 11 commits into
masterfrom
migrate-benchmark-followup
Open

Migrate Benchmark (review follow-up to #731 + master merge)#761
waltsims wants to merge 11 commits into
masterfrom
migrate-benchmark-followup

Conversation

@waltsims

@waltsims waltsims commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Why a separate PR

This PR carries forward @faridyagubbayli's benchmark migration from #731 and adds the fixes that came out of code review on that thread. I'd have pushed directly to #731 but its maintainerCanModify flag is false, so a new PR was the path of least friction. All of @faridyagubbayli's original commits are preserved as-is; my changes are stacked on top as a single new commit with co-authorship. Once this merges, #731 can be closed as superseded.

What's included

From #731 (faridyagubbayli, unchanged)

  • Port of MATLAB k-Wave's benchmark.m to benchmarks/benchmark.py + benchmarks/helpers.py
  • 9 hermetic tests (tests/test_benchmark.py)
  • README with usage and output format

Merge of master

Review-feedback fixes (this PR's contribution)

cpp-backend memory measurement (P1)

benchmark.py exposes --backend cpp, but PeakMemorySampler reads the Python process RSS. When the cpp backend is active, the simulation runs in a separate subprocess (the kspaceFirstOrder-OMP/-CUDA binary) — so Python RSS reflects ~nothing about its memory footprint and mem_usage becomes silently meaningless.

The MATLAB original only ever exercised in-process kspaceFirstOrder3D (never the subprocess kspaceFirstOrder3DC), so the equivalent question didn't come up there. The Python port introduced the cpp-backend benchmark option without extending memory measurement to match.

Fix: add ChildPeakMemorySampler using resource.getrusage(RUSAGE_CHILDREN).ru_maxrss before/after the subprocess. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (resource is POSIX-only); --report-mem-usage + backend=\"cpp\" on Windows now refuses with a clear ValueError at startup, before any output file is written. benchmark.run() picks the sampler factory automatically; tests override via the new mem_sampler_factory= kwarg.

Rename peak_memory_bytescurrent_memory_bytes (P2)

The Linux (/proc/self/statm field [1]), macOS (ps -o rss=), and Windows (WorkingSetSize) readers all return current RSS, not historical peak. Only PeakMemorySampler (which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do. peak_memory_bytes retained as a back-compat alias.

__post_init__ validation guards (P3)

Three cheap guards that prevent opaque ZeroDivisionErrors downstream: domain_size > 0, sensor_radius > 0, pml_size >= 0.

Code comments

  • Density slab [max(ny//4 - 1, 0):, :] — why the -1 is asymmetric with the head slice [:nx//4, :] (1-indexed vs 0-indexed).
  • make_ball(... Vector([nx//2, ny//2, nz//2]) ...)make_ball treats the supplied center as 1-indexed, so this matches MATLAB's Nx/2 despite looking off-by-one.
  • source.p0 = smooth(...).astype(dtype)smooth() upcasts to float64 via the FFT path; the trailing astype restores user dtype.

README clarifications

  • data_cast=\"single\" is a no-op for backend=\"cpp\" (binary always serializes as float32); only meaningful for backend=\"python\".
  • --report-mem-usage + backend interaction documented per above (with the MATLAB context).

Inline bon_a

The local was always computed but only used in the options.nonlinear_media branch (default False).

Test plan

  • 9 new test cases (18 total): back-compat alias, three validation guards (parametrized × 5 cases), cpp-backend factory invocation, Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal.
  • All 18 pass locally (Linux + Python 3.12).
  • Existing tests unchanged.

Out of scope for this PR

  • The optional Windows PeakWorkingSetSize shortcut (already in the struct definition at helpers.py:208, just unused) — would let Windows skip the sampler thread for backend=\"python\". Easy follow-up.
  • benchmarks/ is correctly excluded from the wheel ([tool.hatch.build.targets.wheel] packages allowlist); it does land in the sdist (cosmetic — sdists routinely carry dev material). Left as-is.

Closes #731.

Greptile Summary

This PR ports MATLAB k-Wave's benchmark.m to Python (benchmarks/benchmark.py + benchmarks/helpers.py), carrying forward the original author's work with review-feedback fixes stacked on top: a ChildPeakMemorySampler for cpp-backend subprocess memory measurement, a PeakMemorySampler rename (peak_memory_bytescurrent_memory_bytes), __post_init__ validation guards, and psutil added to the test extras.

  • New benchmark runner (benchmark.py): runs kspaceFirstOrder over a sequence of growing 3D grids, averages timings, saves partial JSON results after each case, and optionally reports peak memory via --report-mem-usage.
  • Memory measurement split: PeakMemorySampler (polling thread, Python-backend RSS) vs ChildPeakMemorySampler (resource.RUSAGE_CHILDREN, cpp-backend subprocess), with Windows refusing the latter cleanly at startup.
  • 18 new tests covering timing aggregation, float32 dtype, validation guards, Windows refusal, and partial-result preservation; psutil>=5.9 added to the test and new benchmark extras.

Confidence Score: 4/5

Safe to merge with one fix: the cpp-backend memory figures for all cases after the first are wrong and will silently mislead any analysis of the output JSON.

In benchmark.py a new ChildPeakMemorySampler() is created inside the case loop, so its _baseline is set to the cumulative ru_maxrss after all previous cases' children. For the second and subsequent cases, peak_bytes reports the incremental growth from the prior case's peak rather than the absolute peak for that case. With the default 12-case run, mem_usage[1] through mem_usage[11] all understate true child RSS. The test suite only exercises max_cases=1 for the cpp path so the defect is untested. All other logic — timing aggregation, partial saves, Windows refusal, validation guards — looks correct.

benchmarks/benchmark.py lines 80–89 (ChildPeakMemorySampler construction placement) and the corresponding test in tests/test_benchmark.py which should add a multi-case cpp-backend assertion.

Important Files Changed

Filename Overview
benchmarks/benchmark.py New benchmark runner with a cross-case baseline bug in ChildPeakMemorySampler construction that causes mem_usage values after index 0 to report incremental deltas rather than absolute per-case peaks.
benchmarks/helpers.py New helpers module: BenchmarkOptions dataclass, build_case, ChildPeakMemorySampler (POSIX-only), and PeakMemorySampler. Individual classes are correct; the cross-case contamination issue is in how benchmark.py instantiates ChildPeakMemorySampler.
tests/test_benchmark.py 18 new tests covering timing, memory sampling, validation guards, and Windows refusal; all cpp-backend memory tests use max_cases=1, leaving the cross-case baseline contamination untested.
pyproject.toml Adds psutil>=5.9 to both the test and new benchmark extras, resolving the earlier CI collection failure.
benchmarks/README.md New README covering usage, output format, and backend/memory-measurement interaction notes.
benchmarks/init.py Trivial package init docstring.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run called] --> B{report_mem_usage?}
    B -- yes --> C{backend == cpp?}
    C -- yes --> D[ChildPeakMemorySampler probe\nWindows raises ValueError]
    C -- no --> E[validate memory_reader]
    D --> F[result includes mem_usage list]
    E --> F
    B -- no --> G[skip mem probe]
    F --> H[for each grid case]
    G --> H

    H --> I[build_case]
    I --> J{report_mem_usage AND cpp?}
    J -- yes --> K[grid_sampler = new ChildPeakMemorySampler\nbaseline set to current ru_maxrss]
    J -- no --> L[grid_sampler = None]
    K --> M[for each iteration]
    L --> M

    M --> N{grid_sampler present?}
    N -- yes --> O[memory_cm = shared grid_sampler]
    N -- no --> P[memory_cm = new PeakMemorySampler]
    O --> Q[with memory_cm: run solver]
    P --> Q
    Q --> R[update rolling averages]
    R --> S[store_case_result + save_results]
    S --> M

    M -- all iterations done --> H
    H -- all cases done --> T[return result]
    Q -- exception --> U[error_reached true\nsave partial + break]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[run called] --> B{report_mem_usage?}
    B -- yes --> C{backend == cpp?}
    C -- yes --> D[ChildPeakMemorySampler probe\nWindows raises ValueError]
    C -- no --> E[validate memory_reader]
    D --> F[result includes mem_usage list]
    E --> F
    B -- no --> G[skip mem probe]
    F --> H[for each grid case]
    G --> H

    H --> I[build_case]
    I --> J{report_mem_usage AND cpp?}
    J -- yes --> K[grid_sampler = new ChildPeakMemorySampler\nbaseline set to current ru_maxrss]
    J -- no --> L[grid_sampler = None]
    K --> M[for each iteration]
    L --> M

    M --> N{grid_sampler present?}
    N -- yes --> O[memory_cm = shared grid_sampler]
    N -- no --> P[memory_cm = new PeakMemorySampler]
    O --> Q[with memory_cm: run solver]
    P --> Q
    Q --> R[update rolling averages]
    R --> S[store_case_result + save_results]
    S --> M

    M -- all iterations done --> H
    H -- all cases done --> T[return result]
    Q -- exception --> U[error_reached true\nsave partial + break]
Loading

Comments Outside Diff (2)

  1. tests/test_benchmark.py, line 824-860 (link)

    P2 Test doesn't verify memory value correctness for multiple averages

    test_cpp_backend_uses_child_sampler_via_factory uses num_averages=2 but only asserts len(result["mem_usage"]) == 1 — it never checks the actual reported value. The FakeChildSampler also returns increasing values (1024 * len(sampler_calls)) which would not expose the real-world ChildPeakMemorySampler bug where the cumulative ru_maxrss baseline for iteration 2 equals iteration 1's peak, producing delta=0.

    Adding an assertion like assert result["mem_usage"][0] == pytest.approx(expected_average) with a FakeChildSampler whose peak_bytes is constant (e.g., always 4096.0) would both verify averaging logic and catch the cumulative-delta regression when ChildPeakMemorySampler is used with num_averages > 1.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. pyproject.toml, line 46-53 (link)

    P1 benchmarks/helpers.py imports psutil unconditionally at the top of the module. tests/test_benchmark.py imports from that module, so pytest collection of the test file will fail with ModuleNotFoundError: No module named 'psutil' whenever only the test extra is installed. The CI workflow (.github/workflows/pytest.yml line 108) runs exactly pip install '.[test]', so the benchmark tests will fail to collect on every CI run.

Reviews (7): Last reviewed commit: "benchmark: /simplify pass — drop kwarg, ..." | Re-trigger Greptile

faridyagubbayli and others added 6 commits May 14, 2026 14:16
Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.
…nups

Addresses reviewer feedback (PR #731 thread, comment thread):

## cpp-backend memory measurement (P1)

`benchmark.py` exposes a `--backend cpp` option, but the existing
PeakMemorySampler reads the *Python* process RSS. When the cpp backend
is active the simulation runs in a separate subprocess (the
`kspaceFirstOrder-OMP`/`-CUDA` binary), so Python RSS reflects nothing
about its memory footprint and `mem_usage` becomes silently meaningless.

(The MATLAB original `benchmark.m` only ever exercised the in-process
`kspaceFirstOrder3D` solver, so the equivalent question — measuring
the subprocess `kspaceFirstOrder3DC` backend — didn't come up there.)

Add `ChildPeakMemorySampler` that uses `resource.getrusage(RUSAGE_CHILDREN)`
before/after the subprocess to capture true peak RSS of all reaped
children. Zero new dependencies (stdlib only). Linux returns KB,
macOS returns bytes — normalized to bytes. Windows is unsupported
(`resource` is POSIX-only); `--report-mem-usage` + `backend="cpp"` on
Windows now refuses with a clear `ValueError` at startup, before any
output file is written.

`benchmark.run()` picks the sampler factory based on backend
automatically; tests override via the new `mem_sampler_factory=` kwarg.

## Rename peak_memory_bytes → current_memory_bytes (P2)

The Linux (`/proc/self/statm` field [1]), macOS (`ps -o rss=`), and
Windows (`WorkingSetSize`) readers all return *current* RSS, not
historical peak. Only `PeakMemorySampler` (which polls in a background
thread) finds a peak over time. Rename to reflect what the readers
actually do. `peak_memory_bytes` retained as a back-compat alias.

## README clarifications

- `data_cast="single"` is a no-op for `backend="cpp"` (binary always
  serializes as float32); only meaningful for `backend="python"`.
- `--report-mem-usage` + backend interaction documented per above.

## __post_init__ validation guards (P3)

Add three cheap guards that prevent opaque `ZeroDivisionError`s
downstream: `domain_size > 0`, `sensor_radius > 0`, `pml_size >= 0`.

## Code comments

- Density slab `[max(ny//4 - 1, 0):, :]`: comment why the `-1` is
  asymmetric with the head slice `[:nx//4, :]` (1-indexed inclusive
  start ↔ 0-indexed start).
- `make_ball(... Vector([nx//2, ny//2, nz//2]) ...)`: comment that
  `make_ball` treats the supplied center as 1-indexed, so this matches
  MATLAB's `Nx/2` despite looking off-by-one.
- `source.p0 = smooth(...).astype(dtype)`: comment that smooth() upcasts
  to float64 via the FFT path; the trailing astype restores user dtype.

## Inline `bon_a`

The local was always computed but only used in the
`options.nonlinear_media` branch (default `False`). Inline at the use
site.

## Tests

- 9 new test cases (18 total): back-compat alias, three validation
  guards (parametrized × 5 cases), cpp-backend factory invocation,
  Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal.
- All existing tests still pass.

Co-authored-by: Farid Yagubbayli <faridyagubbayli@users.noreply.github.com>
Co-authored-by: Walter Simson <walter.a.simson@gmail.com>
@waltsims waltsims mentioned this pull request Jun 21, 2026
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.83%. Comparing base (e303fde) to head (6268b9e).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #761      +/-   ##
==========================================
+ Coverage   75.73%   75.83%   +0.10%     
==========================================
  Files          58       58              
  Lines        8274     8274              
  Branches     1614     1614              
==========================================
+ Hits         6266     6275       +9     
+ Misses       1384     1373      -11     
- Partials      624      626       +2     
Flag Coverage Δ
3.10 75.80% <ø> (+0.10%) ⬆️
3.11 75.80% <ø> (+0.10%) ⬆️
3.12 75.80% <ø> (+0.10%) ⬆️
3.13 75.80% <ø> (+0.10%) ⬆️
macos-latest 75.71% <ø> (+0.10%) ⬆️
ubuntu-latest 75.71% <ø> (+0.10%) ⬆️
windows-latest 75.64% <ø> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread benchmarks/helpers.py Outdated
Comment on lines +266 to +309
class ChildPeakMemorySampler:
"""Capture the peak RSS of child processes spawned during the ``with`` block.

Use this for ``backend="cpp"`` benchmark runs where the simulation work
happens in a subprocess — ``PeakMemorySampler`` only sees the Python
parent's RSS and reports a number that has nothing to do with the C++
process's actual footprint.

Implementation: ``resource.getrusage(RUSAGE_CHILDREN).ru_maxrss`` is the
cumulative max across all reaped children, so we record before/after and
return the increment. On Linux ``ru_maxrss`` is reported in kilobytes; on
macOS it's in bytes (per BSD historical convention). Windows has no
equivalent in ``resource`` — the sampler raises ``NotImplementedError``
on Windows so the caller can refuse the combination cleanly.
"""

def __init__(self) -> None:
if platform.system().lower() == "windows":
raise NotImplementedError(
"ChildPeakMemorySampler is not supported on Windows "
"(resource.RUSAGE_CHILDREN is POSIX-only). "
"Use backend='python' to measure simulation memory on Windows."
)
self._before = 0
self._after = 0

def __enter__(self) -> ChildPeakMemorySampler:
import resource # POSIX-only; gated by the Windows check above

self._before = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss
return self

def __exit__(self, exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: Any) -> None:
import resource

self._after = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss

@property
def peak_bytes(self) -> float:
delta = max(0, self._after - self._before)
# ru_maxrss units differ by platform (Linux: KB, macOS: bytes).
if platform.system().lower() == "linux":
return float(delta * 1024)
return float(delta)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ChildPeakMemorySampler reports 0 bytes for every iteration after the first when num_averages > 1

resource.getrusage(RUSAGE_CHILDREN).ru_maxrss is a monotonically non-decreasing cumulative maximum across all reaped children since the parent process started — it never decreases between iterations. With the default num_averages=3:

  • Iteration 1: _before=0, child runs and peaks at ~X KB, _after=X, delta=X → correct
  • Iteration 2: _before=X (unchanged), child runs again with the same workload peaking at ~X KB, _after=X (max(X,X)=X), delta=0 → silently wrong
  • Iteration 3: same, delta=0

The rolling average accumulates as (X + 0 + 0) / 3 = X/3, reporting one-third of the actual peak. Because max(0, ...) clamps negatives, the reported number is never obviously wrong — it just silently deflates.

A fix that works without requiring each child to exceed the prior peak is to capture the cumulative ru_maxrss once at the very first __enter__ of a benchmark run (not once per inner loop), compute after - first_baseline at the end, and expose the result via peak_bytes. Alternatively, skip averaging memory across iterations for the cpp backend (since peak RSS of a given grid size isn't expected to vary per run) and only record the single-run measurement.

waltsims added 3 commits June 21, 2026 22:52
The cpp-backend memory measurement is now handled correctly by
ChildPeakMemorySampler (Windows refused with a clear error), so the
three-paragraph README explanation was redundant. The behavior is
self-documenting via the error message; the script is the source of
truth, not the README.
The previous implementation captured the ru_maxrss baseline at every
__enter__ — but ru_maxrss is monotonically non-decreasing cumulative
across the parent's lifetime, so any iteration after the first with
the same workload reported delta=0. Rolling average across N
iterations silently deflated the memory measurement by a factor of N.

Fix:
- helpers.py: ChildPeakMemorySampler now captures baseline in __init__
  (once per construction), not __enter__. __enter__ is a no-op; __exit__
  records the latest ru_maxrss. peak_bytes returns growth since baseline.
- benchmark.py: for backend="cpp", create the sampler ONCE per grid
  (outside the inner num_averages loop) and reuse across iterations.
  For backend="python", PeakMemorySampler stays per-iteration (its
  polling-based per-call peak is independent and averages meaningfully).
- tests: replaced the prior cpp factory test with one that explicitly
  verifies (a) factory is called twice for one grid (1 probe + 1 grid,
  not once per iteration), (b) enter/exit happen 3× on the SAME instance
  for num_averages=3, (c) peak_bytes correctly accumulates instead of
  zeroing out.
…aders)

Collapses the per-platform memory readers into one psutil call:

  - _linux_current_memory_bytes  (parsed /proc/self/statm)
  - _ps_current_memory_bytes     (shelled out to ps -o rss=)
  - _windows_current_memory_bytes (ctypes ProcessMemoryCounters struct)

→ psutil.Process().memory_info().rss

~70 LOC removed (including the Windows ctypes struct and try/except
chain). Cross-platform from one call. Added as an optional dependency
under the new `[benchmark]` extra in pyproject.toml so users not
running benchmarks don't pull it in:

  pip install k-wave-python[benchmark]

Scope intentionally limited: ChildPeakMemorySampler still uses
resource.getrusage(RUSAGE_CHILDREN). Switching its subprocess
measurement to psutil would require exposing kspaceFirstOrder's
subprocess.Popen handle from kwave.solvers.cpp_simulation, which
belongs in a separate PR.

All 18 tests pass on Linux.
@waltsims

Copy link
Copy Markdown
Owner Author

@greptileai review

Three commits since the previous review (commit 08d8da6):

  • af5361b — addresses your P1 on ChildPeakMemorySampler. Baseline now captured once in __init__ (not per __enter__); benchmark.py creates ONE sampler per grid for backend="cpp" and reuses across iterations. peak_bytes now correctly reports cumulative growth, not zero on iters ≥ 2. New test test_cpp_backend_shares_child_sampler_across_iterations verifies the factory-call-once + enter/exit-N-times pattern.
  • 5bc96bb — trimmed the README per maintainer feedback (the verbose data_cast / mem_usage caveats were redundant now that the cpp memory measurement works).
  • 95e802f — replaced the three platform-specific RSS readers (_linux_current_memory_bytes, _ps_current_memory_bytes, _windows_current_memory_bytes + Windows ctypes struct) with a single psutil.Process().memory_info().rss. psutil added under the new [benchmark] optional-dependency extra, not the core deps. ChildPeakMemorySampler unchanged (still getrusage(RUSAGE_CHILDREN) — subprocess measurement via psutil would require refactoring cpp_simulation.py).

18/18 tests pass locally. Please re-review.

…chmark] extra

CI pytest workflow installs the [test] extra, not [benchmark]. The
previous commit's top-level `import psutil` in benchmarks/helpers.py
made the import unconditional, so pytest collection of
tests/test_benchmark.py failed with ModuleNotFoundError before any test
ran (collection imports `from benchmarks.benchmark import ...` which
transitively imports helpers).

Move the import inside `current_memory_bytes()`. The module itself
imports cleanly without psutil; only callers that take a real memory
reading need it. Tests that inject `memory_reader=` (the existing
mocking pattern) work unchanged.

Two options were considered: lazy import (this), or adding psutil to
the [test] extra. Lazy import is more honest about when psutil is
actually needed — measuring real memory, not running tests.
@waltsims

Copy link
Copy Markdown
Owner Author

@greptile review

…consts

Cleanups from the /simplify review:

- Drop `mem_sampler_factory` parameter from `run()`. Tests now monkeypatch
  `benchmarks.benchmark.ChildPeakMemorySampler` via `unittest.mock.patch`
  (the pattern already used by the Windows-refusal test). The factory
  was a 9th kwarg whose only consumer was the test suite; removing it
  also eliminates the `_default_sampler_factory` closure and the
  redundant ternary that duplicated the backend predicate.

- Hoist `psutil.Process()` to a module-level `_THIS_PROCESS` constant.
  Constructing per call opens /proc/<pid>/stat on Linux (a real syscall);
  PeakMemorySampler ticks every 50 ms by default. ~23 μs saved per
  sample. Combined with adding psutil to the [test] extra, the previous
  lazy-import inside `current_memory_bytes` is no longer needed —
  reverted to top-of-module import.

- Add `psutil>=5.9` to the [test] extra (was [benchmark] only). pytest
  collection of tests/test_benchmark.py transitively imports
  benchmarks.helpers, so psutil needs to be present whenever tests
  run. The [benchmark] extra is kept for parity (end users who install
  only [benchmark] get the same dep).

- Fold `platform.system().lower()` and the Linux/macOS ru_maxrss unit
  difference into module-level `_PLATFORM` and `_RU_MAXRSS_UNIT_BYTES`
  constants. Removes two scattered platform checks and lets the
  POSIX-only `resource` module be imported at module scope (gated on
  `_PLATFORM != 'windows'`) instead of lazily on every __init__/__exit__.

- Drop `ChildPeakMemorySampler._latest` attribute. `__exit__` now writes
  the final delta directly into `_peak_bytes` (matching PeakMemorySampler's
  shape); `peak_bytes` is a trivial accessor.

- Drop `peak_memory_bytes` back-compat alias and the identity-pinning
  test that locked it in. `benchmarks/` is not in the wheel (verified
  via [tool.hatch.build.targets.wheel].packages allowlist), so there
  are no external importers to break with a rename.

- Sharpen the Windows-refusal error message in `ChildPeakMemorySampler.__init__`
  to point at the real blocker (Executor.run_simulation doesn't expose
  the subprocess.Popen handle) rather than implying psutil is missing
  — psutil is already a hard dep of the [benchmark] extra.

Test count: 17 (was 18; one removed was the alias-identity assertion).
All pass on Linux.
@waltsims waltsims requested a review from faridyagubbayli June 22, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants