Migrate Benchmark (review follow-up to #731 + master merge)#761
Migrate Benchmark (review follow-up to #731 + master merge)#761waltsims wants to merge 11 commits into
Conversation
Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.
…nups Addresses reviewer feedback (PR #731 thread, comment thread): ## cpp-backend memory measurement (P1) `benchmark.py` exposes a `--backend cpp` option, but the existing PeakMemorySampler reads the *Python* process RSS. When the cpp backend is active the simulation runs in a separate subprocess (the `kspaceFirstOrder-OMP`/`-CUDA` binary), so Python RSS reflects nothing about its memory footprint and `mem_usage` becomes silently meaningless. (The MATLAB original `benchmark.m` only ever exercised the in-process `kspaceFirstOrder3D` solver, so the equivalent question — measuring the subprocess `kspaceFirstOrder3DC` backend — didn't come up there.) Add `ChildPeakMemorySampler` that uses `resource.getrusage(RUSAGE_CHILDREN)` before/after the subprocess to capture true peak RSS of all reaped children. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (`resource` is POSIX-only); `--report-mem-usage` + `backend="cpp"` on Windows now refuses with a clear `ValueError` at startup, before any output file is written. `benchmark.run()` picks the sampler factory based on backend automatically; tests override via the new `mem_sampler_factory=` kwarg. ## Rename peak_memory_bytes → current_memory_bytes (P2) The Linux (`/proc/self/statm` field [1]), macOS (`ps -o rss=`), and Windows (`WorkingSetSize`) readers all return *current* RSS, not historical peak. Only `PeakMemorySampler` (which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do. `peak_memory_bytes` retained as a back-compat alias. ## README clarifications - `data_cast="single"` is a no-op for `backend="cpp"` (binary always serializes as float32); only meaningful for `backend="python"`. - `--report-mem-usage` + backend interaction documented per above. ## __post_init__ validation guards (P3) Add three cheap guards that prevent opaque `ZeroDivisionError`s downstream: `domain_size > 0`, `sensor_radius > 0`, `pml_size >= 0`. ## Code comments - Density slab `[max(ny//4 - 1, 0):, :]`: comment why the `-1` is asymmetric with the head slice `[:nx//4, :]` (1-indexed inclusive start ↔ 0-indexed start). - `make_ball(... Vector([nx//2, ny//2, nz//2]) ...)`: comment that `make_ball` treats the supplied center as 1-indexed, so this matches MATLAB's `Nx/2` despite looking off-by-one. - `source.p0 = smooth(...).astype(dtype)`: comment that smooth() upcasts to float64 via the FFT path; the trailing astype restores user dtype. ## Inline `bon_a` The local was always computed but only used in the `options.nonlinear_media` branch (default `False`). Inline at the use site. ## Tests - 9 new test cases (18 total): back-compat alias, three validation guards (parametrized × 5 cases), cpp-backend factory invocation, Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal. - All existing tests still pass. Co-authored-by: Farid Yagubbayli <faridyagubbayli@users.noreply.github.com> Co-authored-by: Walter Simson <walter.a.simson@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #761 +/- ##
==========================================
+ Coverage 75.73% 75.83% +0.10%
==========================================
Files 58 58
Lines 8274 8274
Branches 1614 1614
==========================================
+ Hits 6266 6275 +9
+ Misses 1384 1373 -11
- Partials 624 626 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| class ChildPeakMemorySampler: | ||
| """Capture the peak RSS of child processes spawned during the ``with`` block. | ||
|
|
||
| Use this for ``backend="cpp"`` benchmark runs where the simulation work | ||
| happens in a subprocess — ``PeakMemorySampler`` only sees the Python | ||
| parent's RSS and reports a number that has nothing to do with the C++ | ||
| process's actual footprint. | ||
|
|
||
| Implementation: ``resource.getrusage(RUSAGE_CHILDREN).ru_maxrss`` is the | ||
| cumulative max across all reaped children, so we record before/after and | ||
| return the increment. On Linux ``ru_maxrss`` is reported in kilobytes; on | ||
| macOS it's in bytes (per BSD historical convention). Windows has no | ||
| equivalent in ``resource`` — the sampler raises ``NotImplementedError`` | ||
| on Windows so the caller can refuse the combination cleanly. | ||
| """ | ||
|
|
||
| def __init__(self) -> None: | ||
| if platform.system().lower() == "windows": | ||
| raise NotImplementedError( | ||
| "ChildPeakMemorySampler is not supported on Windows " | ||
| "(resource.RUSAGE_CHILDREN is POSIX-only). " | ||
| "Use backend='python' to measure simulation memory on Windows." | ||
| ) | ||
| self._before = 0 | ||
| self._after = 0 | ||
|
|
||
| def __enter__(self) -> ChildPeakMemorySampler: | ||
| import resource # POSIX-only; gated by the Windows check above | ||
|
|
||
| self._before = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: Any) -> None: | ||
| import resource | ||
|
|
||
| self._after = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss | ||
|
|
||
| @property | ||
| def peak_bytes(self) -> float: | ||
| delta = max(0, self._after - self._before) | ||
| # ru_maxrss units differ by platform (Linux: KB, macOS: bytes). | ||
| if platform.system().lower() == "linux": | ||
| return float(delta * 1024) | ||
| return float(delta) |
There was a problem hiding this comment.
ChildPeakMemorySampler reports 0 bytes for every iteration after the first when num_averages > 1
resource.getrusage(RUSAGE_CHILDREN).ru_maxrss is a monotonically non-decreasing cumulative maximum across all reaped children since the parent process started — it never decreases between iterations. With the default num_averages=3:
- Iteration 1:
_before=0, child runs and peaks at ~X KB,_after=X,delta=X → correct - Iteration 2:
_before=X (unchanged), child runs again with the same workload peaking at ~X KB,_after=X (max(X,X)=X),delta=0 → silently wrong - Iteration 3: same,
delta=0
The rolling average accumulates as (X + 0 + 0) / 3 = X/3, reporting one-third of the actual peak. Because max(0, ...) clamps negatives, the reported number is never obviously wrong — it just silently deflates.
A fix that works without requiring each child to exceed the prior peak is to capture the cumulative ru_maxrss once at the very first __enter__ of a benchmark run (not once per inner loop), compute after - first_baseline at the end, and expose the result via peak_bytes. Alternatively, skip averaging memory across iterations for the cpp backend (since peak RSS of a given grid size isn't expected to vary per run) and only record the single-run measurement.
The cpp-backend memory measurement is now handled correctly by ChildPeakMemorySampler (Windows refused with a clear error), so the three-paragraph README explanation was redundant. The behavior is self-documenting via the error message; the script is the source of truth, not the README.
The previous implementation captured the ru_maxrss baseline at every __enter__ — but ru_maxrss is monotonically non-decreasing cumulative across the parent's lifetime, so any iteration after the first with the same workload reported delta=0. Rolling average across N iterations silently deflated the memory measurement by a factor of N. Fix: - helpers.py: ChildPeakMemorySampler now captures baseline in __init__ (once per construction), not __enter__. __enter__ is a no-op; __exit__ records the latest ru_maxrss. peak_bytes returns growth since baseline. - benchmark.py: for backend="cpp", create the sampler ONCE per grid (outside the inner num_averages loop) and reuse across iterations. For backend="python", PeakMemorySampler stays per-iteration (its polling-based per-call peak is independent and averages meaningfully). - tests: replaced the prior cpp factory test with one that explicitly verifies (a) factory is called twice for one grid (1 probe + 1 grid, not once per iteration), (b) enter/exit happen 3× on the SAME instance for num_averages=3, (c) peak_bytes correctly accumulates instead of zeroing out.
…aders) Collapses the per-platform memory readers into one psutil call: - _linux_current_memory_bytes (parsed /proc/self/statm) - _ps_current_memory_bytes (shelled out to ps -o rss=) - _windows_current_memory_bytes (ctypes ProcessMemoryCounters struct) → psutil.Process().memory_info().rss ~70 LOC removed (including the Windows ctypes struct and try/except chain). Cross-platform from one call. Added as an optional dependency under the new `[benchmark]` extra in pyproject.toml so users not running benchmarks don't pull it in: pip install k-wave-python[benchmark] Scope intentionally limited: ChildPeakMemorySampler still uses resource.getrusage(RUSAGE_CHILDREN). Switching its subprocess measurement to psutil would require exposing kspaceFirstOrder's subprocess.Popen handle from kwave.solvers.cpp_simulation, which belongs in a separate PR. All 18 tests pass on Linux.
|
@greptileai review Three commits since the previous review (commit 08d8da6):
18/18 tests pass locally. Please re-review. |
…chmark] extra CI pytest workflow installs the [test] extra, not [benchmark]. The previous commit's top-level `import psutil` in benchmarks/helpers.py made the import unconditional, so pytest collection of tests/test_benchmark.py failed with ModuleNotFoundError before any test ran (collection imports `from benchmarks.benchmark import ...` which transitively imports helpers). Move the import inside `current_memory_bytes()`. The module itself imports cleanly without psutil; only callers that take a real memory reading need it. Tests that inject `memory_reader=` (the existing mocking pattern) work unchanged. Two options were considered: lazy import (this), or adding psutil to the [test] extra. Lazy import is more honest about when psutil is actually needed — measuring real memory, not running tests.
|
@greptile review |
…consts Cleanups from the /simplify review: - Drop `mem_sampler_factory` parameter from `run()`. Tests now monkeypatch `benchmarks.benchmark.ChildPeakMemorySampler` via `unittest.mock.patch` (the pattern already used by the Windows-refusal test). The factory was a 9th kwarg whose only consumer was the test suite; removing it also eliminates the `_default_sampler_factory` closure and the redundant ternary that duplicated the backend predicate. - Hoist `psutil.Process()` to a module-level `_THIS_PROCESS` constant. Constructing per call opens /proc/<pid>/stat on Linux (a real syscall); PeakMemorySampler ticks every 50 ms by default. ~23 μs saved per sample. Combined with adding psutil to the [test] extra, the previous lazy-import inside `current_memory_bytes` is no longer needed — reverted to top-of-module import. - Add `psutil>=5.9` to the [test] extra (was [benchmark] only). pytest collection of tests/test_benchmark.py transitively imports benchmarks.helpers, so psutil needs to be present whenever tests run. The [benchmark] extra is kept for parity (end users who install only [benchmark] get the same dep). - Fold `platform.system().lower()` and the Linux/macOS ru_maxrss unit difference into module-level `_PLATFORM` and `_RU_MAXRSS_UNIT_BYTES` constants. Removes two scattered platform checks and lets the POSIX-only `resource` module be imported at module scope (gated on `_PLATFORM != 'windows'`) instead of lazily on every __init__/__exit__. - Drop `ChildPeakMemorySampler._latest` attribute. `__exit__` now writes the final delta directly into `_peak_bytes` (matching PeakMemorySampler's shape); `peak_bytes` is a trivial accessor. - Drop `peak_memory_bytes` back-compat alias and the identity-pinning test that locked it in. `benchmarks/` is not in the wheel (verified via [tool.hatch.build.targets.wheel].packages allowlist), so there are no external importers to break with a rename. - Sharpen the Windows-refusal error message in `ChildPeakMemorySampler.__init__` to point at the real blocker (Executor.run_simulation doesn't expose the subprocess.Popen handle) rather than implying psutil is missing — psutil is already a hard dep of the [benchmark] extra. Test count: 17 (was 18; one removed was the alias-identity assertion). All pass on Linux.
Why a separate PR
This PR carries forward @faridyagubbayli's benchmark migration from #731 and adds the fixes that came out of code review on that thread. I'd have pushed directly to #731 but its
maintainerCanModifyflag isfalse, so a new PR was the path of least friction. All of @faridyagubbayli's original commits are preserved as-is; my changes are stacked on top as a single new commit with co-authorship. Once this merges, #731 can be closed as superseded.What's included
From #731 (faridyagubbayli, unchanged)
benchmark.mtobenchmarks/benchmark.py+benchmarks/helpers.pytests/test_benchmark.py)Merge of
masterReview-feedback fixes (this PR's contribution)
cpp-backend memory measurement (P1)
benchmark.pyexposes--backend cpp, butPeakMemorySamplerreads the Python process RSS. When the cpp backend is active, the simulation runs in a separate subprocess (thekspaceFirstOrder-OMP/-CUDAbinary) — so Python RSS reflects ~nothing about its memory footprint andmem_usagebecomes silently meaningless.The MATLAB original only ever exercised in-process
kspaceFirstOrder3D(never the subprocesskspaceFirstOrder3DC), so the equivalent question didn't come up there. The Python port introduced the cpp-backend benchmark option without extending memory measurement to match.Fix: add
ChildPeakMemorySamplerusingresource.getrusage(RUSAGE_CHILDREN).ru_maxrssbefore/after the subprocess. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (resourceis POSIX-only);--report-mem-usage+backend=\"cpp\"on Windows now refuses with a clearValueErrorat startup, before any output file is written.benchmark.run()picks the sampler factory automatically; tests override via the newmem_sampler_factory=kwarg.Rename
peak_memory_bytes→current_memory_bytes(P2)The Linux (
/proc/self/statmfield [1]), macOS (ps -o rss=), and Windows (WorkingSetSize) readers all return current RSS, not historical peak. OnlyPeakMemorySampler(which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do.peak_memory_bytesretained as a back-compat alias.__post_init__validation guards (P3)Three cheap guards that prevent opaque
ZeroDivisionErrors downstream:domain_size > 0,sensor_radius > 0,pml_size >= 0.Code comments
[max(ny//4 - 1, 0):, :]— why the-1is asymmetric with the head slice[:nx//4, :](1-indexed vs 0-indexed).make_ball(... Vector([nx//2, ny//2, nz//2]) ...)—make_balltreats the supplied center as 1-indexed, so this matches MATLAB'sNx/2despite looking off-by-one.source.p0 = smooth(...).astype(dtype)—smooth()upcasts to float64 via the FFT path; the trailing astype restores user dtype.README clarifications
data_cast=\"single\"is a no-op forbackend=\"cpp\"(binary always serializes as float32); only meaningful forbackend=\"python\".--report-mem-usage+ backend interaction documented per above (with the MATLAB context).Inline
bon_aThe local was always computed but only used in the
options.nonlinear_mediabranch (defaultFalse).Test plan
ChildPeakMemorySamplerWindows refusal.Out of scope for this PR
PeakWorkingSetSizeshortcut (already in the struct definition athelpers.py:208, just unused) — would let Windows skip the sampler thread forbackend=\"python\". Easy follow-up.benchmarks/is correctly excluded from the wheel ([tool.hatch.build.targets.wheel] packagesallowlist); it does land in the sdist (cosmetic — sdists routinely carry dev material). Left as-is.Closes #731.
Greptile Summary
This PR ports MATLAB k-Wave's
benchmark.mto Python (benchmarks/benchmark.py+benchmarks/helpers.py), carrying forward the original author's work with review-feedback fixes stacked on top: aChildPeakMemorySamplerfor cpp-backend subprocess memory measurement, aPeakMemorySamplerrename (peak_memory_bytes→current_memory_bytes),__post_init__validation guards, andpsutiladded to the test extras.benchmark.py): runskspaceFirstOrderover a sequence of growing 3D grids, averages timings, saves partial JSON results after each case, and optionally reports peak memory via--report-mem-usage.PeakMemorySampler(polling thread, Python-backend RSS) vsChildPeakMemorySampler(resource.RUSAGE_CHILDREN, cpp-backend subprocess), with Windows refusing the latter cleanly at startup.psutil>=5.9added to thetestand newbenchmarkextras.Confidence Score: 4/5
Safe to merge with one fix: the cpp-backend memory figures for all cases after the first are wrong and will silently mislead any analysis of the output JSON.
In benchmark.py a new ChildPeakMemorySampler() is created inside the case loop, so its _baseline is set to the cumulative ru_maxrss after all previous cases' children. For the second and subsequent cases, peak_bytes reports the incremental growth from the prior case's peak rather than the absolute peak for that case. With the default 12-case run, mem_usage[1] through mem_usage[11] all understate true child RSS. The test suite only exercises max_cases=1 for the cpp path so the defect is untested. All other logic — timing aggregation, partial saves, Windows refusal, validation guards — looks correct.
benchmarks/benchmark.py lines 80–89 (ChildPeakMemorySampler construction placement) and the corresponding test in tests/test_benchmark.py which should add a multi-case cpp-backend assertion.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[run called] --> B{report_mem_usage?} B -- yes --> C{backend == cpp?} C -- yes --> D[ChildPeakMemorySampler probe\nWindows raises ValueError] C -- no --> E[validate memory_reader] D --> F[result includes mem_usage list] E --> F B -- no --> G[skip mem probe] F --> H[for each grid case] G --> H H --> I[build_case] I --> J{report_mem_usage AND cpp?} J -- yes --> K[grid_sampler = new ChildPeakMemorySampler\nbaseline set to current ru_maxrss] J -- no --> L[grid_sampler = None] K --> M[for each iteration] L --> M M --> N{grid_sampler present?} N -- yes --> O[memory_cm = shared grid_sampler] N -- no --> P[memory_cm = new PeakMemorySampler] O --> Q[with memory_cm: run solver] P --> Q Q --> R[update rolling averages] R --> S[store_case_result + save_results] S --> M M -- all iterations done --> H H -- all cases done --> T[return result] Q -- exception --> U[error_reached true\nsave partial + break]%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[run called] --> B{report_mem_usage?} B -- yes --> C{backend == cpp?} C -- yes --> D[ChildPeakMemorySampler probe\nWindows raises ValueError] C -- no --> E[validate memory_reader] D --> F[result includes mem_usage list] E --> F B -- no --> G[skip mem probe] F --> H[for each grid case] G --> H H --> I[build_case] I --> J{report_mem_usage AND cpp?} J -- yes --> K[grid_sampler = new ChildPeakMemorySampler\nbaseline set to current ru_maxrss] J -- no --> L[grid_sampler = None] K --> M[for each iteration] L --> M M --> N{grid_sampler present?} N -- yes --> O[memory_cm = shared grid_sampler] N -- no --> P[memory_cm = new PeakMemorySampler] O --> Q[with memory_cm: run solver] P --> Q Q --> R[update rolling averages] R --> S[store_case_result + save_results] S --> M M -- all iterations done --> H H -- all cases done --> T[return result] Q -- exception --> U[error_reached true\nsave partial + break]Comments Outside Diff (2)
tests/test_benchmark.py, line 824-860 (link)test_cpp_backend_uses_child_sampler_via_factoryusesnum_averages=2but only assertslen(result["mem_usage"]) == 1— it never checks the actual reported value. TheFakeChildSampleralso returns increasing values (1024 * len(sampler_calls)) which would not expose the real-worldChildPeakMemorySamplerbug where the cumulativeru_maxrssbaseline for iteration 2 equals iteration 1's peak, producingdelta=0.Adding an assertion like
assert result["mem_usage"][0] == pytest.approx(expected_average)with aFakeChildSamplerwhosepeak_bytesis constant (e.g., always4096.0) would both verify averaging logic and catch the cumulative-delta regression whenChildPeakMemorySampleris used withnum_averages > 1.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
pyproject.toml, line 46-53 (link)benchmarks/helpers.pyimportspsutilunconditionally at the top of the module.tests/test_benchmark.pyimports from that module, so pytest collection of the test file will fail withModuleNotFoundError: No module named 'psutil'whenever only thetestextra is installed. The CI workflow (.github/workflows/pytest.ymlline 108) runs exactlypip install '.[test]', so the benchmark tests will fail to collect on every CI run.Reviews (7): Last reviewed commit: "benchmark: /simplify pass — drop kwarg, ..." | Re-trigger Greptile