Migrate Benchmark by faridyagubbayli · Pull Request #731 · waltsims/k-wave-python

faridyagubbayli · 2026-05-14T14:26:26Z

Migrates benchmark.m script with some additions.

benchmarks/README.md shows sample usage.

Example output json file

{
  "comp_size": [
    32768,
    65536,
    131072,
    262144,
    524288
  ],
  "comp_time": [
    2.813780007263025,
    2.5228265166903534,
    2.5327161628132067,
    4.796307735455533,
    10.499971745846173
  ],
  "options": {
    "data_cast": "off",
    "heterogeneous_media": true,
    "absorbing_media": true,
    "nonlinear_media": false,
    "binary_sensor_mask": true,
    "number_sensor_points": 100,
    "number_time_points": 1000,
    "num_averages": 3,
    "start_size": 32,
    "x_scale_array": [
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16,
      16
    ],
    "y_scale_array": [
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16
    ],
    "z_scale_array": [
      1,
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8
    ],
    "domain_size": 0.022,
    "sensor_radius": 0.01,
    "pml_size": 10,
    "pml_inside": true,
    "report_mem_usage": false,
    "backend": "python",
    "device": "gpu",
    "computer_name": "k-instance",
    "python_version": "3.11.14",
    "platform": "Linux-5.10.0-39-cloud-amd64-x86_64-with-glibc2.31",
    "kwave_python_version": "0.6.1"
  },
  "output_path": "<filename>",
  "error_reached": false,
  "error_message": ""
}

Greptile Summary

This PR ports MATLAB's benchmark.m script to Python, adding a benchmarks/ package with a 3D solver scaling benchmark (benchmark.py), a helper library (helpers.py), a README, and a comprehensive test suite. The implementation corrects all issues flagged in previous review rounds — including start_size validation, the NaN-in-JSON Windows path, cumulative-peak ru_maxrss semantics, and the grid-size-collision identity key.

benchmark.py runs kspaceFirstOrder across a sequence of increasing 3D grid sizes, averaging elapsed time per case and writing partial results after each run to preserve progress on failure.
helpers.py provides BenchmarkOptions, platform-specific RSS readers, and two context-manager samplers (PeakMemorySampler for the Python backend, ChildPeakMemorySampler for the C++ subprocess backend) that track peak memory during each solver call.
tests/test_benchmark.py covers timing aggregation, memory sampling (including the cpp/python sampler split), validation errors, partial-result durability, and the allow_nan=False JSON guard.

Confidence Score: 5/5

This PR is safe to merge. All previously flagged issues have been resolved and no new defects were found.

Every issue from earlier review rounds has been addressed: start_size is validated, Windows memory sampling no longer silently produces invalid JSON (both via validate_memory_bytes and allow_nan=False in json.dumps), the ru_maxrss cumulative-peak problem is replaced by a threaded current-RSS polling approach, and the grid-size collision key is replaced by case_index. The test suite is thorough, covering timing aggregation, both sampler paths (Python vs cpp backend), platform guard failures, and durability on solver error.

No files require special attention.

Important Files Changed

Filename	Overview
benchmarks/benchmark.py	Core benchmark runner: iterates grid cases, wraps solver calls with optional memory sampling, accumulates rolling averages, and saves partial results after each inner loop iteration. Logic is clean and all previously flagged issues are resolved.
benchmarks/helpers.py	All helper utilities: BenchmarkOptions dataclass with full validation, platform-specific RSS readers, PeakMemorySampler (threaded polling), ChildPeakMemorySampler (RUSAGE_CHILDREN delta), and save_results with allow_nan=False guard. No issues found.
tests/test_benchmark.py	Comprehensive test suite with injectable solver, timer, and memory_reader; covers timing aggregation, memory sampling, error durability, platform guards, and JSON validity. All previously uncovered paths are now tested.
benchmarks/README.md	New documentation file describing benchmark usage, CLI flags, and output format. Accurate and matches the implemented interface.
benchmarks/init.py	Trivial module init with a single docstring.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run called] --> B[grid_sizes: build case list]
    B --> C{report_mem_usage?}
    C -- Yes --> D[Early probe: sampler_factory + memory_reader]
    D --> E{probe OK?}
    E -- No --> F[raise ValueError before writing output]
    E -- Yes --> G[result mem_usage = list]
    C -- No --> H[result without mem_usage]
    G --> I
    H --> I[for each case_index nx,ny,nz,scale]
    I --> J[build_case: kgrid, medium, source, sensor]
    J --> K[for loop_num in 1..num_averages]
    K --> L{report_mem_usage?}
    L -- Yes --> M[with sampler_factory as memory_sampler]
    M --> N[start timer - solver - stop timer]
    N --> O[exit sampler context - final RSS sample]
    O --> P[rolling_average loop_mem_usage]
    L -- No --> Q[start timer - solver - stop timer]
    P --> R[rolling_average loop_time]
    Q --> R
    R --> S[store_case_result by case_index]
    S --> T[save_results to JSON]
    T --> K
    K -- loop done --> I
    I -- all cases done --> U[return result]
    J -- exception --> V[error_reached = True]
    N -- exception --> V
    V --> W[save_results with error info]
    W --> X[break]
    X --> U

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[run called] --> B[grid_sizes: build case list]
    B --> C{report_mem_usage?}
    C -- Yes --> D[Early probe: sampler_factory + memory_reader]
    D --> E{probe OK?}
    E -- No --> F[raise ValueError before writing output]
    E -- Yes --> G[result mem_usage = list]
    C -- No --> H[result without mem_usage]
    G --> I
    H --> I[for each case_index nx,ny,nz,scale]
    I --> J[build_case: kgrid, medium, source, sensor]
    J --> K[for loop_num in 1..num_averages]
    K --> L{report_mem_usage?}
    L -- Yes --> M[with sampler_factory as memory_sampler]
    M --> N[start timer - solver - stop timer]
    N --> O[exit sampler context - final RSS sample]
    O --> P[rolling_average loop_mem_usage]
    L -- No --> Q[start timer - solver - stop timer]
    P --> R[rolling_average loop_time]
    Q --> R
    R --> S[store_case_result by case_index]
    S --> T[save_results to JSON]
    T --> K
    K -- loop done --> I
    I -- all cases done --> U[return result]
    J -- exception --> V[error_reached = True]
    N -- exception --> V
    V --> W[save_results with error info]
    W --> X[break]
    X --> U

_{Reviews (5): Last reviewed commit: "benchmark: trim README — drop verbose da..." | Re-trigger Greptile}

codecov · 2026-05-14T14:28:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.68%. Comparing base (66c256d) to head (f3109c7).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #731      +/-   ##
==========================================
+ Coverage   75.57%   75.68%   +0.10%     
==========================================
  Files          57       57              
  Lines        8195     8195              
  Branches     1600     1600              
==========================================
+ Hits         6193     6202       +9     
+ Misses       1381     1370      -11     
- Partials      621      623       +2

Flag	Coverage Δ
3.10	`75.64% <ø> (+0.10%)`	⬆️
3.11	`75.64% <ø> (+0.10%)`	⬆️
3.12	`75.64% <ø> (+0.10%)`	⬆️
3.13	`75.64% <ø> (+0.10%)`	⬆️
macos-latest	`75.58% <ø> (+0.10%)`	⬆️
ubuntu-latest	`75.58% <ø> (+0.10%)`	⬆️
windows-latest	`75.42% <ø> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.

faridyagubbayli · 2026-06-04T14:31:25Z

@greptile review

greptile-apps · 2026-06-04T14:40:44Z

Want your agent to iterate on Greptile's feedback? Try greploops.

waltsims

Took a careful pass against the MATLAB reference (ucl-bug/k-wave → k-Wave/benchmark.m) and the surrounding k-wave-python integration. Nicely done port — faithful where it matters, idiomatic where it diverges. Specifics below.

Faithfulness to `benchmark.m`

Verified matches (cross-checked against the MATLAB source):

Defaults, scale arrays, physical constants (c0=1500, ρ0=1000, α=0.75, α-power=1.5, B/A=6), 100 sensor points, 1000 time steps, 3 averages, PML 10
Scale arrays (1,2,2,2,4,4,4,8,8,8,16,16) × y × z — identical to MATLAB lines 84–86
Heterogeneity: sound_speed first nx/4 slab × 1.2; density from ny/4 onward × 1.2
Index translation is correct — I traced the arithmetic for ny ∈ {4, 8, 16, 32, 128}:
- [:nx//4] ≡ MATLAB 1:Nx/4 (head slice — no -1)
- [max(ny//4-1, 0):] ≡ MATLAB Ny/4:end (tail slice — needs -1 to convert 1-indexed inclusive start)
- The max(.., 0) guard handles small-ny edge cases
kgrid.makeTime(max(c)) then kgrid.setTime(Nt, dt) (helpers.py:116–117) mirrors MATLAB lines 167–168 exactly — both languages do the same "let CFL pick dt, then override Nt" dance. Not redundant.
smooth(p0, restore_max=True) matches MATLAB smooth(source.p0, true) semantics (verified smooth() signature at kwave/utils/filters.py:549)
make_ball(Vector([nx, ny, nz]), Vector([nx//2, ny//2, nz//2]), 2 * scale) matches MATLAB's call. Looks off-by-one but isn't — make_ball treats the supplied center as 1-indexed internally (see kwave/utils/mapgen.py:547 where it subtracts ceil(grid_size/2)), so a future reader who tries to "fix" the apparent off-by-one would break parity. One-line code comment on that line would prevent that.

Where the Python port improves on MATLAB

Cross-platform memory measurement — MATLAB only ran the memory branch on Windows; the Python port supports Linux (/proc/self/statm), macOS (ps), Windows (Win32 GetProcessMemoryInfo)
JSON output instead of .mat — sensible in the Python ecosystem
CLI argparse with explicit choices + smoke-friendly --max-cases
8 hermetic tests via DI of solver=, timer=, memory_reader= — clean separation, no real subprocess or /proc reads, tests pass on any platform
PeakMemorySampler threading is correct — Thread.join() in __exit__ provides a happens-before barrier for self._error, so the read in __exit__ always sees the writer's final state

Real concerns

1. cpp-backend memory measurement is broken (P1, doc-worthy)

peak_memory_bytes samples the Python process's RSS. When backend="cpp", the actual simulation runs in a separate subprocess (cpp_simulation.py:388 is subprocess.run(command, ...)). The Python RSS reflects ~nothing about the C++ process's memory, so mem_usage becomes meaningless silently for --backend cpp.

The MATLAB original didn't have this problem (in-process). The Python port silently inherits a wrong number.

Fix options, in order of preference:

getrusage(RUSAGE_CHILDREN).ru_maxrss after subprocess.wait() — zero new dependencies, gives true peak RSS of all reaped children. Available on Linux + macOS via stdlib resource module. The cleanest solution by far.
Read /proc/<child_pid>/status VmHWM before the child gets reaped — Linux-only but precise.
At minimum, detect backend="cpp" + --report-mem-usage and refuse with a clear "subprocess memory not supported by this measurement path; mem_usage will be Python-process-only" error.

psutil.Process(child_pid).memory_info() would also work but adds a new project dependency for one benchmark feature — not worth it when resource.getrusage is in the stdlib.

In any case, the README should explicitly say "memory measurement only meaningful for backend=\"python\" today."

2. `peak_memory_bytes()` is misnamed (P2)

On Linux/macOS, the readers return current RSS, not historical peak:

/proc/self/statm field [1] is resident (per man proc); peak lives in /proc/self/status:VmHWM
ps -o rss= is current
Windows reads WorkingSetSize (current)

Only PeakMemorySampler (which polls) finds a peak over time. The reader should be current_memory_bytes and PeakMemorySampler is what samples-for-peak.

Bonus opportunity: PeakWorkingSetSize is already declared in the ProcessMemoryCounters struct at helpers.py:208 — Windows could read it directly for a true single-shot peak without the sampler thread. Worth noting as a follow-up.

3. `data_cast="single"` is effectively cosmetic for the cpp backend (worth a README line)

cpp_simulation.py always casts inputs to np.float32 before serializing to HDF5 (lines ~167/257/269), regardless of the user's Python-side dtype. So data_cast="single" matches the cpp behavior (no-op), and data_cast="off" (default float64) gets silently downcast at the serialize step. The flag is meaningful for backend="python" (where it actually drives FFT precision), but cosmetic for backend="cpp".

Not a bug — just an asymmetry users should know about.

4. `__post_init__` validation gaps (P3)

Currently validates: data_cast, scale-array equal length, number_time_points > 0, num_averages > 0, start_size > 0, number_sensor_points > 1.

Missing: domain_size > 0, sensor_radius > 0, pml_size >= 0. domain_size = 0 would ZeroDivisionError at build_case line dx = options.domain_size / nx — opaque. Cheap to add the three guards.

5. Minor

Density-slab line [max(ny//4-1, 0):] — please add a # MATLAB Ny/4:end (1-indexed) → Python ny//4-1: (0-indexed); max() handles tiny grids comment so the asymmetry vs the head slice doesn't look like a bug.
bon_a = dtype(6) local is dead code unless options.nonlinear_media is set (default False). Cheap to inline.
smooth() re-cast at helpers.py:107 — smooth() returns float64 even when given float32 (FFT path), so the trailing .astype(dtype, copy=False) is necessary, not redundant. Worth a one-line comment ("smooth returns float64; re-cast to user dtype") to prevent a "clean up the double-cast" future PR that silently changes dtype.

Packaging — already fine

Verified pyproject.toml has [tool.hatch.build.targets.wheel] packages = ["kwave", "kwave.utils", "kwave.reconstruction", "kwave.kWaveSimulation_helper"] — explicit allowlist, so benchmarks/ is not in the wheel. ✓

benchmarks/ does land in the sdist (the sdist excludes only /.github, /docs, /examples, /tests). Stylistic choice — sdists routinely carry dev material — but worth a one-line decision either way.

CI / test integration — confirmed working

pyproject.toml's [tool.pytest.ini_options] testpaths = ["tests"] picks up tests/test_benchmark.py via standard discovery
No changes needed to .github/workflows/pytest.yml
All tests are hermetic (no real subprocess, no real /proc, injected memory_reader= / solver= / timer=), so they run cross-platform without skips

Summary


Faithfulness	✓ verified against MATLAB
Test quality	✓ DI, hermetic, 8 cases
Code quality	✓ idiomatic; minor renames + comments
cpp-backend memory	✗ silently wrong — needs fix or doc
Misc	small validation + naming + comments

Recommended pre-merge changes

(P1) README: "--report-mem-usage is only meaningful for backend=\"python\" today" — even better, refuse the combo with a clear error in code. Or implement getrusage(RUSAGE_CHILDREN) if you want it to actually work for cpp.
(P2) Rename peak_memory_bytes → current_memory_bytes. Keep PeakMemorySampler as the layer that finds peak.
(P3) Three small additions: index-translation comment on the density slab line, dtype re-cast comment after smooth(), three __post_init__ validation guards (domain_size, sensor_radius, pml_size).
(README) One line noting data_cast="single" is cosmetic for backend="cpp".

The rest can land as follow-ups.

Big picture: this is a clean port, faithful to the MATLAB original, with appropriate Pythonic enhancements. Ready to merge once the cpp-memory caveat is addressed (doc or code).

…nups Addresses reviewer feedback (PR #731 thread, comment thread): ## cpp-backend memory measurement (P1) `benchmark.py` exposes a `--backend cpp` option, but the existing PeakMemorySampler reads the *Python* process RSS. When the cpp backend is active the simulation runs in a separate subprocess (the `kspaceFirstOrder-OMP`/`-CUDA` binary), so Python RSS reflects nothing about its memory footprint and `mem_usage` becomes silently meaningless. (The MATLAB original `benchmark.m` only ever exercised the in-process `kspaceFirstOrder3D` solver, so the equivalent question — measuring the subprocess `kspaceFirstOrder3DC` backend — didn't come up there.) Add `ChildPeakMemorySampler` that uses `resource.getrusage(RUSAGE_CHILDREN)` before/after the subprocess to capture true peak RSS of all reaped children. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (`resource` is POSIX-only); `--report-mem-usage` + `backend="cpp"` on Windows now refuses with a clear `ValueError` at startup, before any output file is written. `benchmark.run()` picks the sampler factory based on backend automatically; tests override via the new `mem_sampler_factory=` kwarg. ## Rename peak_memory_bytes → current_memory_bytes (P2) The Linux (`/proc/self/statm` field [1]), macOS (`ps -o rss=`), and Windows (`WorkingSetSize`) readers all return *current* RSS, not historical peak. Only `PeakMemorySampler` (which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do. `peak_memory_bytes` retained as a back-compat alias. ## README clarifications - `data_cast="single"` is a no-op for `backend="cpp"` (binary always serializes as float32); only meaningful for `backend="python"`. - `--report-mem-usage` + backend interaction documented per above. ## __post_init__ validation guards (P3) Add three cheap guards that prevent opaque `ZeroDivisionError`s downstream: `domain_size > 0`, `sensor_radius > 0`, `pml_size >= 0`. ## Code comments - Density slab `[max(ny//4 - 1, 0):, :]`: comment why the `-1` is asymmetric with the head slice `[:nx//4, :]` (1-indexed inclusive start ↔ 0-indexed start). - `make_ball(... Vector([nx//2, ny//2, nz//2]) ...)`: comment that `make_ball` treats the supplied center as 1-indexed, so this matches MATLAB's `Nx/2` despite looking off-by-one. - `source.p0 = smooth(...).astype(dtype)`: comment that smooth() upcasts to float64 via the FFT path; the trailing astype restores user dtype. ## Inline `bon_a` The local was always computed but only used in the `options.nonlinear_media` branch (default `False`). Inline at the use site. ## Tests - 9 new test cases (18 total): back-compat alias, three validation guards (parametrized × 5 cases), cpp-backend factory invocation, Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal. - All existing tests still pass. Co-authored-by: Farid Yagubbayli <faridyagubbayli@users.noreply.github.com> Co-authored-by: Walter Simson <walter.a.simson@gmail.com>

waltsims · 2026-06-21T21:21:13Z

Thanks for the migration @faridyagubbayli! I rolled the review-feedback fixes (cpp-backend memory via getrusage(RUSAGE_CHILDREN), the peak_memory_bytes→current_memory_bytes rename, three __post_init__ guards, a few comments, plus an origin/master merge) into #761 — would have pushed straight here but maintainerCanModify is false on this PR, so a new branch was the path of least friction.

All of your original commits are preserved as-is in #761; the cleanups stack on top as a single co-authored commit. Once #761 merges this can be closed as superseded. 18/18 tests pass locally on Linux. 🙏

The cpp-backend memory measurement is now handled correctly by ChildPeakMemorySampler (Windows refused with a clear error), so the three-paragraph README explanation was redundant. The behavior is self-documenting via the error message; the script is the source of truth, not the README.

Migrate Benchmark

e956c91

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread benchmarks/benchmark.py Outdated

Comment thread benchmarks/benchmark.py Outdated

Comment thread benchmarks/benchmark.py Outdated

Comment thread tests/test_benchmark.py

helpers

968b5f8

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread benchmarks/helpers.py Outdated

faridyagubbayli and others added 2 commits June 4, 2026 14:24

Fix benchmark review issues.

6abd4ed

Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.

Merge branch 'master' into migrate-benchmark

f3109c7

faridyagubbayli requested a review from waltsims June 4, 2026 14:31

waltsims reviewed Jun 21, 2026

View reviewed changes

waltsims and others added 2 commits June 21, 2026 21:16

Merge remote-tracking branch 'origin/master' into migrate-benchmark

74b3fb2

waltsims mentioned this pull request Jun 21, 2026

Migrate Benchmark (review follow-up to #731 + master merge) #761

Open

waltsims closed this Jun 21, 2026

waltsims deleted the migrate-benchmark branch June 21, 2026 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Benchmark#731

Migrate Benchmark#731
faridyagubbayli wants to merge 7 commits into
masterfrom
migrate-benchmark

faridyagubbayli commented May 14, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

codecov Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

faridyagubbayli commented Jun 4, 2026

Uh oh!

greptile-apps Bot commented Jun 4, 2026

Uh oh!

waltsims left a comment

Uh oh!

waltsims commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

faridyagubbayli commented May 14, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

faridyagubbayli commented Jun 4, 2026

Uh oh!

greptile-apps Bot commented Jun 4, 2026

Uh oh!

waltsims left a comment

Choose a reason for hiding this comment

Faithfulness to benchmark.m

Where the Python port improves on MATLAB

Real concerns

1. cpp-backend memory measurement is broken (P1, doc-worthy)

2. peak_memory_bytes() is misnamed (P2)

3. data_cast="single" is effectively cosmetic for the cpp backend (worth a README line)

4. __post_init__ validation gaps (P3)

5. Minor

Packaging — already fine

CI / test integration — confirmed working

Summary

Recommended pre-merge changes

Uh oh!

waltsims commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

faridyagubbayli commented May 14, 2026 •

edited by greptile-apps Bot

Loading

codecov Bot commented May 14, 2026 •

edited

Loading

Faithfulness to `benchmark.m`

2. `peak_memory_bytes()` is misnamed (P2)

3. `data_cast="single"` is effectively cosmetic for the cpp backend (worth a README line)

4. `__post_init__` validation gaps (P3)