Migrate Benchmark (review follow-up to #731 + master merge) by waltsims · Pull Request #761 · waltsims/k-wave-python

waltsims · 2026-06-21T21:20:59Z

Why a separate PR

This PR carries forward @faridyagubbayli's benchmark migration from #731 and adds the fixes that came out of code review on that thread. I'd have pushed directly to #731 but its maintainerCanModify flag is false, so a new PR was the path of least friction. All of @faridyagubbayli's original commits are preserved as-is; my changes are stacked on top as a single new commit with co-authorship. Once this merges, #731 can be closed as superseded.

What's included

From #731 (faridyagubbayli, unchanged)

Port of MATLAB k-Wave's benchmark.m to benchmarks/benchmark.py + benchmarks/helpers.py
9 hermetic tests (tests/test_benchmark.py)
README with usage and output format

Merge of `master`

Brings in everything that landed since the branch's last sync (docs: document Blackwell (sm 12.0) GPU support in README #750, Warn at runtime when GPU compute capability < 7.5 #755, Flip URL_DICT to unified release + bump to 0.6.3 #756, Fix-forward: rename-on-download + Greptile P1/P2 follow-ups for v0.6.3 #757, Bump to 0.6.3rc1 (pre-release for hardware validation) #758, dependabot bumps). Clean merge, no conflicts.

Review-feedback fixes (this PR's contribution)

cpp-backend memory measurement (P1)

benchmark.py exposes --backend cpp, but PeakMemorySampler reads the Python process RSS. When the cpp backend is active, the simulation runs in a separate subprocess (the kspaceFirstOrder-OMP/-CUDA binary) — so Python RSS reflects ~nothing about its memory footprint and mem_usage becomes silently meaningless.

The MATLAB original only ever exercised in-process kspaceFirstOrder3D (never the subprocess kspaceFirstOrder3DC), so the equivalent question didn't come up there. The Python port introduced the cpp-backend benchmark option without extending memory measurement to match.

Fix: add ChildPeakMemorySampler using resource.getrusage(RUSAGE_CHILDREN).ru_maxrss before/after the subprocess. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (resource is POSIX-only); --report-mem-usage + backend=\"cpp\" on Windows now refuses with a clear ValueError at startup, before any output file is written. benchmark.run() picks the sampler factory automatically; tests override via the new mem_sampler_factory= kwarg.

Rename `peak_memory_bytes` → `current_memory_bytes` (P2)

The Linux (/proc/self/statm field [1]), macOS (ps -o rss=), and Windows (WorkingSetSize) readers all return current RSS, not historical peak. Only PeakMemorySampler (which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do. peak_memory_bytes retained as a back-compat alias.

`__post_init__` validation guards (P3)

Three cheap guards that prevent opaque ZeroDivisionErrors downstream: domain_size > 0, sensor_radius > 0, pml_size >= 0.

Code comments

Density slab [max(ny//4 - 1, 0):, :] — why the -1 is asymmetric with the head slice [:nx//4, :] (1-indexed vs 0-indexed).
make_ball(... Vector([nx//2, ny//2, nz//2]) ...) — make_ball treats the supplied center as 1-indexed, so this matches MATLAB's Nx/2 despite looking off-by-one.
source.p0 = smooth(...).astype(dtype) — smooth() upcasts to float64 via the FFT path; the trailing astype restores user dtype.

README clarifications

data_cast=\"single\" is a no-op for backend=\"cpp\" (binary always serializes as float32); only meaningful for backend=\"python\".
--report-mem-usage + backend interaction documented per above (with the MATLAB context).

Inline `bon_a`

The local was always computed but only used in the options.nonlinear_media branch (default False).

Test plan

9 new test cases (18 total): back-compat alias, three validation guards (parametrized × 5 cases), cpp-backend factory invocation, Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal.
All 18 pass locally (Linux + Python 3.12).
Existing tests unchanged.

Out of scope for this PR

The optional Windows PeakWorkingSetSize shortcut (already in the struct definition at helpers.py:208, just unused) — would let Windows skip the sampler thread for backend=\"python\". Easy follow-up.
benchmarks/ is correctly excluded from the wheel ([tool.hatch.build.targets.wheel] packages allowlist); it does land in the sdist (cosmetic — sdists routinely carry dev material). Left as-is.

Closes #731.

Greptile Summary

This PR ports MATLAB k-Wave's benchmark.m to Python (benchmarks/benchmark.py + benchmarks/helpers.py), carrying forward the original author's work with review-feedback fixes stacked on top: a ChildPeakMemorySampler for cpp-backend subprocess memory measurement, a PeakMemorySampler rename (peak_memory_bytes → current_memory_bytes), __post_init__ validation guards, and psutil added to the test extras.

New benchmark runner (benchmark.py): runs kspaceFirstOrder over a sequence of growing 3D grids, averages timings, saves partial JSON results after each case, and optionally reports peak memory via --report-mem-usage.
Memory measurement split: PeakMemorySampler (polling thread, Python-backend RSS) vs ChildPeakMemorySampler (resource.RUSAGE_CHILDREN, cpp-backend subprocess), with Windows refusing the latter cleanly at startup.
18 new tests covering timing aggregation, float32 dtype, validation guards, Windows refusal, and partial-result preservation; psutil>=5.9 added to the test and new benchmark extras.

Confidence Score: 4/5

Safe to merge with one fix: the cpp-backend memory figures for all cases after the first are wrong and will silently mislead any analysis of the output JSON.

In benchmark.py a new ChildPeakMemorySampler() is created inside the case loop, so its _baseline is set to the cumulative ru_maxrss after all previous cases' children. For the second and subsequent cases, peak_bytes reports the incremental growth from the prior case's peak rather than the absolute peak for that case. With the default 12-case run, mem_usage[1] through mem_usage[11] all understate true child RSS. The test suite only exercises max_cases=1 for the cpp path so the defect is untested. All other logic — timing aggregation, partial saves, Windows refusal, validation guards — looks correct.

benchmarks/benchmark.py lines 80–89 (ChildPeakMemorySampler construction placement) and the corresponding test in tests/test_benchmark.py which should add a multi-case cpp-backend assertion.

Important Files Changed

Filename	Overview
benchmarks/benchmark.py	New benchmark runner with a cross-case baseline bug in ChildPeakMemorySampler construction that causes mem_usage values after index 0 to report incremental deltas rather than absolute per-case peaks.
benchmarks/helpers.py	New helpers module: BenchmarkOptions dataclass, build_case, ChildPeakMemorySampler (POSIX-only), and PeakMemorySampler. Individual classes are correct; the cross-case contamination issue is in how benchmark.py instantiates ChildPeakMemorySampler.
tests/test_benchmark.py	18 new tests covering timing, memory sampling, validation guards, and Windows refusal; all cpp-backend memory tests use max_cases=1, leaving the cross-case baseline contamination untested.
pyproject.toml	Adds psutil>=5.9 to both the test and new benchmark extras, resolving the earlier CI collection failure.
benchmarks/README.md	New README covering usage, output format, and backend/memory-measurement interaction notes.
benchmarks/init.py	Trivial package init docstring.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run called] --> B{report_mem_usage?}
    B -- yes --> C{backend == cpp?}
    C -- yes --> D[ChildPeakMemorySampler probe\nWindows raises ValueError]
    C -- no --> E[validate memory_reader]
    D --> F[result includes mem_usage list]
    E --> F
    B -- no --> G[skip mem probe]
    F --> H[for each grid case]
    G --> H

    H --> I[build_case]
    I --> J{report_mem_usage AND cpp?}
    J -- yes --> K[grid_sampler = new ChildPeakMemorySampler\nbaseline set to current ru_maxrss]
    J -- no --> L[grid_sampler = None]
    K --> M[for each iteration]
    L --> M

    M --> N{grid_sampler present?}
    N -- yes --> O[memory_cm = shared grid_sampler]
    N -- no --> P[memory_cm = new PeakMemorySampler]
    O --> Q[with memory_cm: run solver]
    P --> Q
    Q --> R[update rolling averages]
    R --> S[store_case_result + save_results]
    S --> M

    M -- all iterations done --> H
    H -- all cases done --> T[return result]
    Q -- exception --> U[error_reached true\nsave partial + break]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[run called] --> B{report_mem_usage?}
    B -- yes --> C{backend == cpp?}
    C -- yes --> D[ChildPeakMemorySampler probe\nWindows raises ValueError]
    C -- no --> E[validate memory_reader]
    D --> F[result includes mem_usage list]
    E --> F
    B -- no --> G[skip mem probe]
    F --> H[for each grid case]
    G --> H

    H --> I[build_case]
    I --> J{report_mem_usage AND cpp?}
    J -- yes --> K[grid_sampler = new ChildPeakMemorySampler\nbaseline set to current ru_maxrss]
    J -- no --> L[grid_sampler = None]
    K --> M[for each iteration]
    L --> M

    M --> N{grid_sampler present?}
    N -- yes --> O[memory_cm = shared grid_sampler]
    N -- no --> P[memory_cm = new PeakMemorySampler]
    O --> Q[with memory_cm: run solver]
    P --> Q
    Q --> R[update rolling averages]
    R --> S[store_case_result + save_results]
    S --> M

    M -- all iterations done --> H
    H -- all cases done --> T[return result]
    Q -- exception --> U[error_reached true\nsave partial + break]

Comments Outside Diff (2)

tests/test_benchmark.py, line 824-860 (link)

Test doesn't verify memory value correctness for multiple averages

test_cpp_backend_uses_child_sampler_via_factory uses num_averages=2 but only asserts len(result["mem_usage"]) == 1 — it never checks the actual reported value. The FakeChildSampler also returns increasing values (1024 * len(sampler_calls)) which would not expose the real-world ChildPeakMemorySampler bug where the cumulative ru_maxrss baseline for iteration 2 equals iteration 1's peak, producing delta=0.

Adding an assertion like assert result["mem_usage"][0] == pytest.approx(expected_average) with a FakeChildSampler whose peak_bytes is constant (e.g., always 4096.0) would both verify averaging logic and catch the cumulative-delta regression when ChildPeakMemorySampler is used with num_averages > 1.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
pyproject.toml, line 46-53 (link)

benchmarks/helpers.py imports psutil unconditionally at the top of the module. tests/test_benchmark.py imports from that module, so pytest collection of the test file will fail with ModuleNotFoundError: No module named 'psutil' whenever only the test extra is installed. The CI workflow (.github/workflows/pytest.yml line 108) runs exactly pip install '.[test]', so the benchmark tests will fail to collect on every CI run.

_{Reviews (7): Last reviewed commit: "benchmark: /simplify pass — drop kwarg, ..." | Re-trigger Greptile}

Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.

…nups Addresses reviewer feedback (PR #731 thread, comment thread): ## cpp-backend memory measurement (P1) `benchmark.py` exposes a `--backend cpp` option, but the existing PeakMemorySampler reads the *Python* process RSS. When the cpp backend is active the simulation runs in a separate subprocess (the `kspaceFirstOrder-OMP`/`-CUDA` binary), so Python RSS reflects nothing about its memory footprint and `mem_usage` becomes silently meaningless. (The MATLAB original `benchmark.m` only ever exercised the in-process `kspaceFirstOrder3D` solver, so the equivalent question — measuring the subprocess `kspaceFirstOrder3DC` backend — didn't come up there.) Add `ChildPeakMemorySampler` that uses `resource.getrusage(RUSAGE_CHILDREN)` before/after the subprocess to capture true peak RSS of all reaped children. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (`resource` is POSIX-only); `--report-mem-usage` + `backend="cpp"` on Windows now refuses with a clear `ValueError` at startup, before any output file is written. `benchmark.run()` picks the sampler factory based on backend automatically; tests override via the new `mem_sampler_factory=` kwarg. ## Rename peak_memory_bytes → current_memory_bytes (P2) The Linux (`/proc/self/statm` field [1]), macOS (`ps -o rss=`), and Windows (`WorkingSetSize`) readers all return *current* RSS, not historical peak. Only `PeakMemorySampler` (which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do. `peak_memory_bytes` retained as a back-compat alias. ## README clarifications - `data_cast="single"` is a no-op for `backend="cpp"` (binary always serializes as float32); only meaningful for `backend="python"`. - `--report-mem-usage` + backend interaction documented per above. ## __post_init__ validation guards (P3) Add three cheap guards that prevent opaque `ZeroDivisionError`s downstream: `domain_size > 0`, `sensor_radius > 0`, `pml_size >= 0`. ## Code comments - Density slab `[max(ny//4 - 1, 0):, :]`: comment why the `-1` is asymmetric with the head slice `[:nx//4, :]` (1-indexed inclusive start ↔ 0-indexed start). - `make_ball(... Vector([nx//2, ny//2, nz//2]) ...)`: comment that `make_ball` treats the supplied center as 1-indexed, so this matches MATLAB's `Nx/2` despite looking off-by-one. - `source.p0 = smooth(...).astype(dtype)`: comment that smooth() upcasts to float64 via the FFT path; the trailing astype restores user dtype. ## Inline `bon_a` The local was always computed but only used in the `options.nonlinear_media` branch (default `False`). Inline at the use site. ## Tests - 9 new test cases (18 total): back-compat alias, three validation guards (parametrized × 5 cases), cpp-backend factory invocation, Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal. - All existing tests still pass. Co-authored-by: Farid Yagubbayli <faridyagubbayli@users.noreply.github.com> Co-authored-by: Walter Simson <walter.a.simson@gmail.com>

codecov · 2026-06-21T21:25:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.83%. Comparing base (e303fde) to head (6268b9e).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #761      +/-   ##
==========================================
+ Coverage   75.73%   75.83%   +0.10%     
==========================================
  Files          58       58              
  Lines        8274     8274              
  Branches     1614     1614              
==========================================
+ Hits         6266     6275       +9     
+ Misses       1384     1373      -11     
- Partials      624      626       +2

Flag	Coverage Δ
3.10	`75.80% <ø> (+0.10%)`	⬆️
3.11	`75.80% <ø> (+0.10%)`	⬆️
3.12	`75.80% <ø> (+0.10%)`	⬆️
3.13	`75.80% <ø> (+0.10%)`	⬆️
macos-latest	`75.71% <ø> (+0.10%)`	⬆️
ubuntu-latest	`75.71% <ø> (+0.10%)`	⬆️
windows-latest	`75.64% <ø> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

greptile-apps · 2026-06-21T21:25:44Z

+class ChildPeakMemorySampler:
+    """Capture the peak RSS of child processes spawned during the ``with`` block.
+
+    Use this for ``backend="cpp"`` benchmark runs where the simulation work
+    happens in a subprocess — ``PeakMemorySampler`` only sees the Python
+    parent's RSS and reports a number that has nothing to do with the C++
+    process's actual footprint.
+
+    Implementation: ``resource.getrusage(RUSAGE_CHILDREN).ru_maxrss`` is the
+    cumulative max across all reaped children, so we record before/after and
+    return the increment. On Linux ``ru_maxrss`` is reported in kilobytes; on
+    macOS it's in bytes (per BSD historical convention). Windows has no
+    equivalent in ``resource`` — the sampler raises ``NotImplementedError``
+    on Windows so the caller can refuse the combination cleanly.
+    """
+
+    def __init__(self) -> None:
+        if platform.system().lower() == "windows":
+            raise NotImplementedError(
+                "ChildPeakMemorySampler is not supported on Windows "
+                "(resource.RUSAGE_CHILDREN is POSIX-only). "
+                "Use backend='python' to measure simulation memory on Windows."
+            )
+        self._before = 0
+        self._after = 0
+
+    def __enter__(self) -> ChildPeakMemorySampler:
+        import resource  # POSIX-only; gated by the Windows check above
+
+        self._before = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss
+        return self
+
+    def __exit__(self, exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: Any) -> None:
+        import resource
+
+        self._after = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss
+
+    @property
+    def peak_bytes(self) -> float:
+        delta = max(0, self._after - self._before)
+        # ru_maxrss units differ by platform (Linux: KB, macOS: bytes).
+        if platform.system().lower() == "linux":
+            return float(delta * 1024)
+        return float(delta)


ChildPeakMemorySampler reports 0 bytes for every iteration after the first when num_averages > 1

resource.getrusage(RUSAGE_CHILDREN).ru_maxrss is a monotonically non-decreasing cumulative maximum across all reaped children since the parent process started — it never decreases between iterations. With the default num_averages=3:

Iteration 1: _before=0, child runs and peaks at ~X KB, _after=X, delta=X → correct

Iteration 2: _before=X (unchanged), child runs again with the same workload peaking at ~X KB, _after=X (max(X,X)=X), delta=0 → silently wrong

Iteration 3: same, delta=0

The rolling average accumulates as (X + 0 + 0) / 3 = X/3, reporting one-third of the actual peak. Because max(0, ...) clamps negatives, the reported number is never obviously wrong — it just silently deflates.

A fix that works without requiring each child to exceed the prior peak is to capture the cumulative ru_maxrss once at the very first __enter__ of a benchmark run (not once per inner loop), compute after - first_baseline at the end, and expose the result via peak_bytes. Alternatively, skip averaging memory across iterations for the cpp backend (since peak RSS of a given grid size isn't expected to vary per run) and only record the single-run measurement.

The cpp-backend memory measurement is now handled correctly by ChildPeakMemorySampler (Windows refused with a clear error), so the three-paragraph README explanation was redundant. The behavior is self-documenting via the error message; the script is the source of truth, not the README.

The previous implementation captured the ru_maxrss baseline at every __enter__ — but ru_maxrss is monotonically non-decreasing cumulative across the parent's lifetime, so any iteration after the first with the same workload reported delta=0. Rolling average across N iterations silently deflated the memory measurement by a factor of N. Fix: - helpers.py: ChildPeakMemorySampler now captures baseline in __init__ (once per construction), not __enter__. __enter__ is a no-op; __exit__ records the latest ru_maxrss. peak_bytes returns growth since baseline. - benchmark.py: for backend="cpp", create the sampler ONCE per grid (outside the inner num_averages loop) and reuse across iterations. For backend="python", PeakMemorySampler stays per-iteration (its polling-based per-call peak is independent and averages meaningfully). - tests: replaced the prior cpp factory test with one that explicitly verifies (a) factory is called twice for one grid (1 probe + 1 grid, not once per iteration), (b) enter/exit happen 3× on the SAME instance for num_averages=3, (c) peak_bytes correctly accumulates instead of zeroing out.

…aders) Collapses the per-platform memory readers into one psutil call: - _linux_current_memory_bytes (parsed /proc/self/statm) - _ps_current_memory_bytes (shelled out to ps -o rss=) - _windows_current_memory_bytes (ctypes ProcessMemoryCounters struct) → psutil.Process().memory_info().rss ~70 LOC removed (including the Windows ctypes struct and try/except chain). Cross-platform from one call. Added as an optional dependency under the new `[benchmark]` extra in pyproject.toml so users not running benchmarks don't pull it in: pip install k-wave-python[benchmark] Scope intentionally limited: ChildPeakMemorySampler still uses resource.getrusage(RUSAGE_CHILDREN). Switching its subprocess measurement to psutil would require exposing kspaceFirstOrder's subprocess.Popen handle from kwave.solvers.cpp_simulation, which belongs in a separate PR. All 18 tests pass on Linux.

waltsims · 2026-06-21T23:22:09Z

@greptileai review

Three commits since the previous review (commit 08d8da6):

af5361b — addresses your P1 on ChildPeakMemorySampler. Baseline now captured once in __init__ (not per __enter__); benchmark.py creates ONE sampler per grid for backend="cpp" and reuses across iterations. peak_bytes now correctly reports cumulative growth, not zero on iters ≥ 2. New test test_cpp_backend_shares_child_sampler_across_iterations verifies the factory-call-once + enter/exit-N-times pattern.
5bc96bb — trimmed the README per maintainer feedback (the verbose data_cast / mem_usage caveats were redundant now that the cpp memory measurement works).
95e802f — replaced the three platform-specific RSS readers (_linux_current_memory_bytes, _ps_current_memory_bytes, _windows_current_memory_bytes + Windows ctypes struct) with a single psutil.Process().memory_info().rss. psutil added under the new [benchmark] optional-dependency extra, not the core deps. ChildPeakMemorySampler unchanged (still getrusage(RUSAGE_CHILDREN) — subprocess measurement via psutil would require refactoring cpp_simulation.py).

18/18 tests pass locally. Please re-review.

…chmark] extra CI pytest workflow installs the [test] extra, not [benchmark]. The previous commit's top-level `import psutil` in benchmarks/helpers.py made the import unconditional, so pytest collection of tests/test_benchmark.py failed with ModuleNotFoundError before any test ran (collection imports `from benchmarks.benchmark import ...` which transitively imports helpers). Move the import inside `current_memory_bytes()`. The module itself imports cleanly without psutil; only callers that take a real memory reading need it. Tests that inject `memory_reader=` (the existing mocking pattern) work unchanged. Two options were considered: lazy import (this), or adding psutil to the [test] extra. Lazy import is more honest about when psutil is actually needed — measuring real memory, not running tests.

waltsims · 2026-06-22T04:00:45Z

@greptile review

…consts Cleanups from the /simplify review: - Drop `mem_sampler_factory` parameter from `run()`. Tests now monkeypatch `benchmarks.benchmark.ChildPeakMemorySampler` via `unittest.mock.patch` (the pattern already used by the Windows-refusal test). The factory was a 9th kwarg whose only consumer was the test suite; removing it also eliminates the `_default_sampler_factory` closure and the redundant ternary that duplicated the backend predicate. - Hoist `psutil.Process()` to a module-level `_THIS_PROCESS` constant. Constructing per call opens /proc/<pid>/stat on Linux (a real syscall); PeakMemorySampler ticks every 50 ms by default. ~23 μs saved per sample. Combined with adding psutil to the [test] extra, the previous lazy-import inside `current_memory_bytes` is no longer needed — reverted to top-of-module import. - Add `psutil>=5.9` to the [test] extra (was [benchmark] only). pytest collection of tests/test_benchmark.py transitively imports benchmarks.helpers, so psutil needs to be present whenever tests run. The [benchmark] extra is kept for parity (end users who install only [benchmark] get the same dep). - Fold `platform.system().lower()` and the Linux/macOS ru_maxrss unit difference into module-level `_PLATFORM` and `_RU_MAXRSS_UNIT_BYTES` constants. Removes two scattered platform checks and lets the POSIX-only `resource` module be imported at module scope (gated on `_PLATFORM != 'windows'`) instead of lazily on every __init__/__exit__. - Drop `ChildPeakMemorySampler._latest` attribute. `__exit__` now writes the final delta directly into `_peak_bytes` (matching PeakMemorySampler's shape); `peak_bytes` is a trivial accessor. - Drop `peak_memory_bytes` back-compat alias and the identity-pinning test that locked it in. `benchmarks/` is not in the wheel (verified via [tool.hatch.build.targets.wheel].packages allowlist), so there are no external importers to break with a rename. - Sharpen the Windows-refusal error message in `ChildPeakMemorySampler.__init__` to point at the real blocker (Executor.run_simulation doesn't expose the subprocess.Popen handle) rather than implying psutil is missing — psutil is already a hard dep of the [benchmark] extra. Test count: 17 (was 18; one removed was the alias-identity assertion). All pass on Linux.

faridyagubbayli and others added 6 commits May 14, 2026 14:16

Migrate Benchmark

e956c91

helpers

968b5f8

Fix benchmark review issues.

6abd4ed

Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.

Merge branch 'master' into migrate-benchmark

f3109c7

Merge remote-tracking branch 'origin/master' into migrate-benchmark

74b3fb2

waltsims mentioned this pull request Jun 21, 2026

Migrate Benchmark #731

Closed

greptile-apps Bot reviewed Jun 21, 2026

View reviewed changes

waltsims added 3 commits June 21, 2026 22:52

waltsims requested a review from faridyagubbayli June 22, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Benchmark (review follow-up to #731 + master merge)#761

Migrate Benchmark (review follow-up to #731 + master merge)#761
waltsims wants to merge 11 commits into
masterfrom
migrate-benchmark-followup

waltsims commented Jun 21, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

codecov Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 21, 2026

Uh oh!

waltsims commented Jun 21, 2026

Uh oh!

waltsims commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waltsims commented Jun 21, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why a separate PR

What's included

From #731 (faridyagubbayli, unchanged)

Merge of master

Review-feedback fixes (this PR's contribution)

cpp-backend memory measurement (P1)

Rename peak_memory_bytes → current_memory_bytes (P2)

__post_init__ validation guards (P3)

Code comments

README clarifications

Inline bon_a

Test plan

Out of scope for this PR

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

codecov Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

waltsims commented Jun 21, 2026

Uh oh!

waltsims commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

waltsims commented Jun 21, 2026 •

edited by greptile-apps Bot

Loading

Merge of `master`

Rename `peak_memory_bytes` → `current_memory_bytes` (P2)

`__post_init__` validation guards (P3)

Inline `bon_a`

codecov Bot commented Jun 21, 2026 •

edited

Loading