Skip to content

Migrate Benchmark#731

Closed
faridyagubbayli wants to merge 7 commits into
masterfrom
migrate-benchmark
Closed

Migrate Benchmark#731
faridyagubbayli wants to merge 7 commits into
masterfrom
migrate-benchmark

Conversation

@faridyagubbayli

@faridyagubbayli faridyagubbayli commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Migrates benchmark.m script with some additions.

benchmarks/README.md shows sample usage.

Example output json file
{
  "comp_size": [
    32768,
    65536,
    131072,
    262144,
    524288
  ],
  "comp_time": [
    2.813780007263025,
    2.5228265166903534,
    2.5327161628132067,
    4.796307735455533,
    10.499971745846173
  ],
  "options": {
    "data_cast": "off",
    "heterogeneous_media": true,
    "absorbing_media": true,
    "nonlinear_media": false,
    "binary_sensor_mask": true,
    "number_sensor_points": 100,
    "number_time_points": 1000,
    "num_averages": 3,
    "start_size": 32,
    "x_scale_array": [
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16,
      16
    ],
    "y_scale_array": [
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16
    ],
    "z_scale_array": [
      1,
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8
    ],
    "domain_size": 0.022,
    "sensor_radius": 0.01,
    "pml_size": 10,
    "pml_inside": true,
    "report_mem_usage": false,
    "backend": "python",
    "device": "gpu",
    "computer_name": "k-instance",
    "python_version": "3.11.14",
    "platform": "Linux-5.10.0-39-cloud-amd64-x86_64-with-glibc2.31",
    "kwave_python_version": "0.6.1"
  },
  "output_path": "<filename>",
  "error_reached": false,
  "error_message": ""
}

Greptile Summary

This PR ports MATLAB's benchmark.m script to Python, adding a benchmarks/ package with a 3D solver scaling benchmark (benchmark.py), a helper library (helpers.py), a README, and a comprehensive test suite. The implementation corrects all issues flagged in previous review rounds — including start_size validation, the NaN-in-JSON Windows path, cumulative-peak ru_maxrss semantics, and the grid-size-collision identity key.

  • benchmark.py runs kspaceFirstOrder across a sequence of increasing 3D grid sizes, averaging elapsed time per case and writing partial results after each run to preserve progress on failure.
  • helpers.py provides BenchmarkOptions, platform-specific RSS readers, and two context-manager samplers (PeakMemorySampler for the Python backend, ChildPeakMemorySampler for the C++ subprocess backend) that track peak memory during each solver call.
  • tests/test_benchmark.py covers timing aggregation, memory sampling (including the cpp/python sampler split), validation errors, partial-result durability, and the allow_nan=False JSON guard.

Confidence Score: 5/5

This PR is safe to merge. All previously flagged issues have been resolved and no new defects were found.

Every issue from earlier review rounds has been addressed: start_size is validated, Windows memory sampling no longer silently produces invalid JSON (both via validate_memory_bytes and allow_nan=False in json.dumps), the ru_maxrss cumulative-peak problem is replaced by a threaded current-RSS polling approach, and the grid-size collision key is replaced by case_index. The test suite is thorough, covering timing aggregation, both sampler paths (Python vs cpp backend), platform guard failures, and durability on solver error.

No files require special attention.

Important Files Changed

Filename Overview
benchmarks/benchmark.py Core benchmark runner: iterates grid cases, wraps solver calls with optional memory sampling, accumulates rolling averages, and saves partial results after each inner loop iteration. Logic is clean and all previously flagged issues are resolved.
benchmarks/helpers.py All helper utilities: BenchmarkOptions dataclass with full validation, platform-specific RSS readers, PeakMemorySampler (threaded polling), ChildPeakMemorySampler (RUSAGE_CHILDREN delta), and save_results with allow_nan=False guard. No issues found.
tests/test_benchmark.py Comprehensive test suite with injectable solver, timer, and memory_reader; covers timing aggregation, memory sampling, error durability, platform guards, and JSON validity. All previously uncovered paths are now tested.
benchmarks/README.md New documentation file describing benchmark usage, CLI flags, and output format. Accurate and matches the implemented interface.
benchmarks/init.py Trivial module init with a single docstring.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run called] --> B[grid_sizes: build case list]
    B --> C{report_mem_usage?}
    C -- Yes --> D[Early probe: sampler_factory + memory_reader]
    D --> E{probe OK?}
    E -- No --> F[raise ValueError before writing output]
    E -- Yes --> G[result mem_usage = list]
    C -- No --> H[result without mem_usage]
    G --> I
    H --> I[for each case_index nx,ny,nz,scale]
    I --> J[build_case: kgrid, medium, source, sensor]
    J --> K[for loop_num in 1..num_averages]
    K --> L{report_mem_usage?}
    L -- Yes --> M[with sampler_factory as memory_sampler]
    M --> N[start timer - solver - stop timer]
    N --> O[exit sampler context - final RSS sample]
    O --> P[rolling_average loop_mem_usage]
    L -- No --> Q[start timer - solver - stop timer]
    P --> R[rolling_average loop_time]
    Q --> R
    R --> S[store_case_result by case_index]
    S --> T[save_results to JSON]
    T --> K
    K -- loop done --> I
    I -- all cases done --> U[return result]
    J -- exception --> V[error_reached = True]
    N -- exception --> V
    V --> W[save_results with error info]
    W --> X[break]
    X --> U
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[run called] --> B[grid_sizes: build case list]
    B --> C{report_mem_usage?}
    C -- Yes --> D[Early probe: sampler_factory + memory_reader]
    D --> E{probe OK?}
    E -- No --> F[raise ValueError before writing output]
    E -- Yes --> G[result mem_usage = list]
    C -- No --> H[result without mem_usage]
    G --> I
    H --> I[for each case_index nx,ny,nz,scale]
    I --> J[build_case: kgrid, medium, source, sensor]
    J --> K[for loop_num in 1..num_averages]
    K --> L{report_mem_usage?}
    L -- Yes --> M[with sampler_factory as memory_sampler]
    M --> N[start timer - solver - stop timer]
    N --> O[exit sampler context - final RSS sample]
    O --> P[rolling_average loop_mem_usage]
    L -- No --> Q[start timer - solver - stop timer]
    P --> R[rolling_average loop_time]
    Q --> R
    R --> S[store_case_result by case_index]
    S --> T[save_results to JSON]
    T --> K
    K -- loop done --> I
    I -- all cases done --> U[return result]
    J -- exception --> V[error_reached = True]
    N -- exception --> V
    V --> W[save_results with error info]
    W --> X[break]
    X --> U
Loading

Reviews (5): Last reviewed commit: "benchmark: trim README — drop verbose da..." | Re-trigger Greptile

@codecov

codecov Bot commented May 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.68%. Comparing base (66c256d) to head (f3109c7).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #731      +/-   ##
==========================================
+ Coverage   75.57%   75.68%   +0.10%     
==========================================
  Files          57       57              
  Lines        8195     8195              
  Branches     1600     1600              
==========================================
+ Hits         6193     6202       +9     
+ Misses       1381     1370      -11     
- Partials      621      623       +2     
Flag Coverage Δ
3.10 75.64% <ø> (+0.10%) ⬆️
3.11 75.64% <ø> (+0.10%) ⬆️
3.12 75.64% <ø> (+0.10%) ⬆️
3.13 75.64% <ø> (+0.10%) ⬆️
macos-latest 75.58% <ø> (+0.10%) ⬆️
ubuntu-latest 75.58% <ø> (+0.10%) ⬆️
windows-latest 75.42% <ø> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread benchmarks/benchmark.py Outdated
Comment thread benchmarks/benchmark.py Outdated
Comment thread benchmarks/benchmark.py Outdated
Comment thread tests/test_benchmark.py
Comment thread benchmarks/helpers.py Outdated
faridyagubbayli and others added 2 commits June 4, 2026 14:24
Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.
@faridyagubbayli faridyagubbayli requested a review from waltsims June 4, 2026 14:31
@faridyagubbayli

Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps

greptile-apps Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Want your agent to iterate on Greptile's feedback? Try greploops.

@waltsims waltsims left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a careful pass against the MATLAB reference (ucl-bug/k-wavek-Wave/benchmark.m) and the surrounding k-wave-python integration. Nicely done port — faithful where it matters, idiomatic where it diverges. Specifics below.

Faithfulness to benchmark.m

Verified matches (cross-checked against the MATLAB source):

  • Defaults, scale arrays, physical constants (c0=1500, ρ0=1000, α=0.75, α-power=1.5, B/A=6), 100 sensor points, 1000 time steps, 3 averages, PML 10
  • Scale arrays (1,2,2,2,4,4,4,8,8,8,16,16) × y × z — identical to MATLAB lines 84–86
  • Heterogeneity: sound_speed first nx/4 slab × 1.2; density from ny/4 onward × 1.2
  • Index translation is correct — I traced the arithmetic for ny ∈ {4, 8, 16, 32, 128}:
    • [:nx//4] ≡ MATLAB 1:Nx/4 (head slice — no -1)
    • [max(ny//4-1, 0):] ≡ MATLAB Ny/4:end (tail slice — needs -1 to convert 1-indexed inclusive start)
    • The max(.., 0) guard handles small-ny edge cases
  • kgrid.makeTime(max(c)) then kgrid.setTime(Nt, dt) (helpers.py:116–117) mirrors MATLAB lines 167–168 exactly — both languages do the same "let CFL pick dt, then override Nt" dance. Not redundant.
  • smooth(p0, restore_max=True) matches MATLAB smooth(source.p0, true) semantics (verified smooth() signature at kwave/utils/filters.py:549)
  • make_ball(Vector([nx, ny, nz]), Vector([nx//2, ny//2, nz//2]), 2 * scale) matches MATLAB's call. Looks off-by-one but isn't — make_ball treats the supplied center as 1-indexed internally (see kwave/utils/mapgen.py:547 where it subtracts ceil(grid_size/2)), so a future reader who tries to "fix" the apparent off-by-one would break parity. One-line code comment on that line would prevent that.

Where the Python port improves on MATLAB

  • Cross-platform memory measurement — MATLAB only ran the memory branch on Windows; the Python port supports Linux (/proc/self/statm), macOS (ps), Windows (Win32 GetProcessMemoryInfo)
  • JSON output instead of .mat — sensible in the Python ecosystem
  • CLI argparse with explicit choices + smoke-friendly --max-cases
  • 8 hermetic tests via DI of solver=, timer=, memory_reader= — clean separation, no real subprocess or /proc reads, tests pass on any platform
  • PeakMemorySampler threading is correct — Thread.join() in __exit__ provides a happens-before barrier for self._error, so the read in __exit__ always sees the writer's final state

Real concerns

1. cpp-backend memory measurement is broken (P1, doc-worthy)

peak_memory_bytes samples the Python process's RSS. When backend="cpp", the actual simulation runs in a separate subprocess (cpp_simulation.py:388 is subprocess.run(command, ...)). The Python RSS reflects ~nothing about the C++ process's memory, so mem_usage becomes meaningless silently for --backend cpp.

The MATLAB original didn't have this problem (in-process). The Python port silently inherits a wrong number.

Fix options, in order of preference:

  1. getrusage(RUSAGE_CHILDREN).ru_maxrss after subprocess.wait() — zero new dependencies, gives true peak RSS of all reaped children. Available on Linux + macOS via stdlib resource module. The cleanest solution by far.
  2. Read /proc/<child_pid>/status VmHWM before the child gets reaped — Linux-only but precise.
  3. At minimum, detect backend="cpp" + --report-mem-usage and refuse with a clear "subprocess memory not supported by this measurement path; mem_usage will be Python-process-only" error.

psutil.Process(child_pid).memory_info() would also work but adds a new project dependency for one benchmark feature — not worth it when resource.getrusage is in the stdlib.

In any case, the README should explicitly say "memory measurement only meaningful for backend=\"python\" today."

2. peak_memory_bytes() is misnamed (P2)

On Linux/macOS, the readers return current RSS, not historical peak:

  • /proc/self/statm field [1] is resident (per man proc); peak lives in /proc/self/status:VmHWM
  • ps -o rss= is current
  • Windows reads WorkingSetSize (current)

Only PeakMemorySampler (which polls) finds a peak over time. The reader should be current_memory_bytes and PeakMemorySampler is what samples-for-peak.

Bonus opportunity: PeakWorkingSetSize is already declared in the ProcessMemoryCounters struct at helpers.py:208 — Windows could read it directly for a true single-shot peak without the sampler thread. Worth noting as a follow-up.

3. data_cast="single" is effectively cosmetic for the cpp backend (worth a README line)

cpp_simulation.py always casts inputs to np.float32 before serializing to HDF5 (lines ~167/257/269), regardless of the user's Python-side dtype. So data_cast="single" matches the cpp behavior (no-op), and data_cast="off" (default float64) gets silently downcast at the serialize step. The flag is meaningful for backend="python" (where it actually drives FFT precision), but cosmetic for backend="cpp".

Not a bug — just an asymmetry users should know about.

4. __post_init__ validation gaps (P3)

Currently validates: data_cast, scale-array equal length, number_time_points > 0, num_averages > 0, start_size > 0, number_sensor_points > 1.

Missing: domain_size > 0, sensor_radius > 0, pml_size >= 0. domain_size = 0 would ZeroDivisionError at build_case line dx = options.domain_size / nx — opaque. Cheap to add the three guards.

5. Minor

  • Density-slab line [max(ny//4-1, 0):] — please add a # MATLAB Ny/4:end (1-indexed) → Python ny//4-1: (0-indexed); max() handles tiny grids comment so the asymmetry vs the head slice doesn't look like a bug.
  • bon_a = dtype(6) local is dead code unless options.nonlinear_media is set (default False). Cheap to inline.
  • smooth() re-cast at helpers.py:107smooth() returns float64 even when given float32 (FFT path), so the trailing .astype(dtype, copy=False) is necessary, not redundant. Worth a one-line comment ("smooth returns float64; re-cast to user dtype") to prevent a "clean up the double-cast" future PR that silently changes dtype.

Packaging — already fine

Verified pyproject.toml has [tool.hatch.build.targets.wheel] packages = ["kwave", "kwave.utils", "kwave.reconstruction", "kwave.kWaveSimulation_helper"] — explicit allowlist, so benchmarks/ is not in the wheel. ✓

benchmarks/ does land in the sdist (the sdist excludes only /.github, /docs, /examples, /tests). Stylistic choice — sdists routinely carry dev material — but worth a one-line decision either way.

CI / test integration — confirmed working

  • pyproject.toml's [tool.pytest.ini_options] testpaths = ["tests"] picks up tests/test_benchmark.py via standard discovery
  • No changes needed to .github/workflows/pytest.yml
  • All tests are hermetic (no real subprocess, no real /proc, injected memory_reader= / solver= / timer=), so they run cross-platform without skips

Summary

Faithfulness ✓ verified against MATLAB
Test quality ✓ DI, hermetic, 8 cases
Code quality ✓ idiomatic; minor renames + comments
cpp-backend memory ✗ silently wrong — needs fix or doc
Misc small validation + naming + comments

Recommended pre-merge changes

  1. (P1) README: "--report-mem-usage is only meaningful for backend=\"python\" today" — even better, refuse the combo with a clear error in code. Or implement getrusage(RUSAGE_CHILDREN) if you want it to actually work for cpp.
  2. (P2) Rename peak_memory_bytescurrent_memory_bytes. Keep PeakMemorySampler as the layer that finds peak.
  3. (P3) Three small additions: index-translation comment on the density slab line, dtype re-cast comment after smooth(), three __post_init__ validation guards (domain_size, sensor_radius, pml_size).
  4. (README) One line noting data_cast="single" is cosmetic for backend="cpp".

The rest can land as follow-ups.

Big picture: this is a clean port, faithful to the MATLAB original, with appropriate Pythonic enhancements. Ready to merge once the cpp-memory caveat is addressed (doc or code).

waltsims and others added 2 commits June 21, 2026 21:16
…nups

Addresses reviewer feedback (PR #731 thread, comment thread):

## cpp-backend memory measurement (P1)

`benchmark.py` exposes a `--backend cpp` option, but the existing
PeakMemorySampler reads the *Python* process RSS. When the cpp backend
is active the simulation runs in a separate subprocess (the
`kspaceFirstOrder-OMP`/`-CUDA` binary), so Python RSS reflects nothing
about its memory footprint and `mem_usage` becomes silently meaningless.

(The MATLAB original `benchmark.m` only ever exercised the in-process
`kspaceFirstOrder3D` solver, so the equivalent question — measuring
the subprocess `kspaceFirstOrder3DC` backend — didn't come up there.)

Add `ChildPeakMemorySampler` that uses `resource.getrusage(RUSAGE_CHILDREN)`
before/after the subprocess to capture true peak RSS of all reaped
children. Zero new dependencies (stdlib only). Linux returns KB,
macOS returns bytes — normalized to bytes. Windows is unsupported
(`resource` is POSIX-only); `--report-mem-usage` + `backend="cpp"` on
Windows now refuses with a clear `ValueError` at startup, before any
output file is written.

`benchmark.run()` picks the sampler factory based on backend
automatically; tests override via the new `mem_sampler_factory=` kwarg.

## Rename peak_memory_bytes → current_memory_bytes (P2)

The Linux (`/proc/self/statm` field [1]), macOS (`ps -o rss=`), and
Windows (`WorkingSetSize`) readers all return *current* RSS, not
historical peak. Only `PeakMemorySampler` (which polls in a background
thread) finds a peak over time. Rename to reflect what the readers
actually do. `peak_memory_bytes` retained as a back-compat alias.

## README clarifications

- `data_cast="single"` is a no-op for `backend="cpp"` (binary always
  serializes as float32); only meaningful for `backend="python"`.
- `--report-mem-usage` + backend interaction documented per above.

## __post_init__ validation guards (P3)

Add three cheap guards that prevent opaque `ZeroDivisionError`s
downstream: `domain_size > 0`, `sensor_radius > 0`, `pml_size >= 0`.

## Code comments

- Density slab `[max(ny//4 - 1, 0):, :]`: comment why the `-1` is
  asymmetric with the head slice `[:nx//4, :]` (1-indexed inclusive
  start ↔ 0-indexed start).
- `make_ball(... Vector([nx//2, ny//2, nz//2]) ...)`: comment that
  `make_ball` treats the supplied center as 1-indexed, so this matches
  MATLAB's `Nx/2` despite looking off-by-one.
- `source.p0 = smooth(...).astype(dtype)`: comment that smooth() upcasts
  to float64 via the FFT path; the trailing astype restores user dtype.

## Inline `bon_a`

The local was always computed but only used in the
`options.nonlinear_media` branch (default `False`). Inline at the use
site.

## Tests

- 9 new test cases (18 total): back-compat alias, three validation
  guards (parametrized × 5 cases), cpp-backend factory invocation,
  Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal.
- All existing tests still pass.

Co-authored-by: Farid Yagubbayli <faridyagubbayli@users.noreply.github.com>
Co-authored-by: Walter Simson <walter.a.simson@gmail.com>
@waltsims

Copy link
Copy Markdown
Owner

Thanks for the migration @faridyagubbayli! I rolled the review-feedback fixes (cpp-backend memory via getrusage(RUSAGE_CHILDREN), the peak_memory_bytescurrent_memory_bytes rename, three __post_init__ guards, a few comments, plus an origin/master merge) into #761 — would have pushed straight here but maintainerCanModify is false on this PR, so a new branch was the path of least friction.

All of your original commits are preserved as-is in #761; the cleanups stack on top as a single co-authored commit. Once #761 merges this can be closed as superseded. 18/18 tests pass locally on Linux. 🙏

The cpp-backend memory measurement is now handled correctly by
ChildPeakMemorySampler (Windows refused with a clear error), so the
three-paragraph README explanation was redundant. The behavior is
self-documenting via the error message; the script is the source of
truth, not the README.
@waltsims waltsims closed this Jun 21, 2026
@waltsims waltsims deleted the migrate-benchmark branch June 21, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants