Migrate Benchmark#731
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #731 +/- ##
==========================================
+ Coverage 75.57% 75.68% +0.10%
==========================================
Files 57 57
Lines 8195 8195
Branches 1600 1600
==========================================
+ Hits 6193 6202 +9
+ Misses 1381 1370 -11
- Partials 621 623 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Validate benchmark inputs and memory reporting so generated results stay complete and JSON-compatible.
|
@greptile review |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
waltsims
left a comment
There was a problem hiding this comment.
Took a careful pass against the MATLAB reference (ucl-bug/k-wave → k-Wave/benchmark.m) and the surrounding k-wave-python integration. Nicely done port — faithful where it matters, idiomatic where it diverges. Specifics below.
Faithfulness to benchmark.m
Verified matches (cross-checked against the MATLAB source):
- Defaults, scale arrays, physical constants (c0=1500, ρ0=1000, α=0.75, α-power=1.5, B/A=6), 100 sensor points, 1000 time steps, 3 averages, PML 10
- Scale arrays
(1,2,2,2,4,4,4,8,8,8,16,16)× y × z — identical to MATLAB lines 84–86 - Heterogeneity: sound_speed first
nx/4slab × 1.2; density fromny/4onward × 1.2 - Index translation is correct — I traced the arithmetic for ny ∈ {4, 8, 16, 32, 128}:
[:nx//4]≡ MATLAB1:Nx/4(head slice — no-1)[max(ny//4-1, 0):]≡ MATLABNy/4:end(tail slice — needs-1to convert 1-indexed inclusive start)- The
max(.., 0)guard handles small-ny edge cases
kgrid.makeTime(max(c))thenkgrid.setTime(Nt, dt)(helpers.py:116–117) mirrors MATLAB lines 167–168 exactly — both languages do the same "let CFL pick dt, then override Nt" dance. Not redundant.smooth(p0, restore_max=True)matches MATLABsmooth(source.p0, true)semantics (verifiedsmooth()signature atkwave/utils/filters.py:549)make_ball(Vector([nx, ny, nz]), Vector([nx//2, ny//2, nz//2]), 2 * scale)matches MATLAB's call. Looks off-by-one but isn't —make_balltreats the supplied center as 1-indexed internally (seekwave/utils/mapgen.py:547where it subtractsceil(grid_size/2)), so a future reader who tries to "fix" the apparent off-by-one would break parity. One-line code comment on that line would prevent that.
Where the Python port improves on MATLAB
- Cross-platform memory measurement — MATLAB only ran the memory branch on Windows; the Python port supports Linux (
/proc/self/statm), macOS (ps), Windows (Win32GetProcessMemoryInfo) - JSON output instead of
.mat— sensible in the Python ecosystem - CLI argparse with explicit choices + smoke-friendly
--max-cases - 8 hermetic tests via DI of
solver=,timer=,memory_reader=— clean separation, no real subprocess or/procreads, tests pass on any platform PeakMemorySamplerthreading is correct —Thread.join()in__exit__provides a happens-before barrier forself._error, so the read in__exit__always sees the writer's final state
Real concerns
1. cpp-backend memory measurement is broken (P1, doc-worthy)
peak_memory_bytes samples the Python process's RSS. When backend="cpp", the actual simulation runs in a separate subprocess (cpp_simulation.py:388 is subprocess.run(command, ...)). The Python RSS reflects ~nothing about the C++ process's memory, so mem_usage becomes meaningless silently for --backend cpp.
The MATLAB original didn't have this problem (in-process). The Python port silently inherits a wrong number.
Fix options, in order of preference:
getrusage(RUSAGE_CHILDREN).ru_maxrssaftersubprocess.wait()— zero new dependencies, gives true peak RSS of all reaped children. Available on Linux + macOS via stdlibresourcemodule. The cleanest solution by far.- Read
/proc/<child_pid>/statusVmHWMbefore the child gets reaped — Linux-only but precise. - At minimum, detect
backend="cpp"+--report-mem-usageand refuse with a clear "subprocess memory not supported by this measurement path; mem_usage will be Python-process-only" error.
psutil.Process(child_pid).memory_info() would also work but adds a new project dependency for one benchmark feature — not worth it when resource.getrusage is in the stdlib.
In any case, the README should explicitly say "memory measurement only meaningful for backend=\"python\" today."
2. peak_memory_bytes() is misnamed (P2)
On Linux/macOS, the readers return current RSS, not historical peak:
/proc/self/statmfield[1]isresident(perman proc); peak lives in/proc/self/status:VmHWMps -o rss=is current- Windows reads
WorkingSetSize(current)
Only PeakMemorySampler (which polls) finds a peak over time. The reader should be current_memory_bytes and PeakMemorySampler is what samples-for-peak.
Bonus opportunity: PeakWorkingSetSize is already declared in the ProcessMemoryCounters struct at helpers.py:208 — Windows could read it directly for a true single-shot peak without the sampler thread. Worth noting as a follow-up.
3. data_cast="single" is effectively cosmetic for the cpp backend (worth a README line)
cpp_simulation.py always casts inputs to np.float32 before serializing to HDF5 (lines ~167/257/269), regardless of the user's Python-side dtype. So data_cast="single" matches the cpp behavior (no-op), and data_cast="off" (default float64) gets silently downcast at the serialize step. The flag is meaningful for backend="python" (where it actually drives FFT precision), but cosmetic for backend="cpp".
Not a bug — just an asymmetry users should know about.
4. __post_init__ validation gaps (P3)
Currently validates: data_cast, scale-array equal length, number_time_points > 0, num_averages > 0, start_size > 0, number_sensor_points > 1.
Missing: domain_size > 0, sensor_radius > 0, pml_size >= 0. domain_size = 0 would ZeroDivisionError at build_case line dx = options.domain_size / nx — opaque. Cheap to add the three guards.
5. Minor
- Density-slab line
[max(ny//4-1, 0):]— please add a# MATLAB Ny/4:end (1-indexed) → Python ny//4-1: (0-indexed); max() handles tiny gridscomment so the asymmetry vs the head slice doesn't look like a bug. bon_a = dtype(6)local is dead code unlessoptions.nonlinear_mediais set (defaultFalse). Cheap to inline.smooth()re-cast athelpers.py:107—smooth()returns float64 even when given float32 (FFT path), so the trailing.astype(dtype, copy=False)is necessary, not redundant. Worth a one-line comment ("smooth returns float64; re-cast to user dtype") to prevent a "clean up the double-cast" future PR that silently changes dtype.
Packaging — already fine
Verified pyproject.toml has [tool.hatch.build.targets.wheel] packages = ["kwave", "kwave.utils", "kwave.reconstruction", "kwave.kWaveSimulation_helper"] — explicit allowlist, so benchmarks/ is not in the wheel. ✓
benchmarks/ does land in the sdist (the sdist excludes only /.github, /docs, /examples, /tests). Stylistic choice — sdists routinely carry dev material — but worth a one-line decision either way.
CI / test integration — confirmed working
pyproject.toml's[tool.pytest.ini_options] testpaths = ["tests"]picks uptests/test_benchmark.pyvia standard discovery- No changes needed to
.github/workflows/pytest.yml - All tests are hermetic (no real subprocess, no real
/proc, injectedmemory_reader=/solver=/timer=), so they run cross-platform without skips
Summary
| Faithfulness | ✓ verified against MATLAB |
| Test quality | ✓ DI, hermetic, 8 cases |
| Code quality | ✓ idiomatic; minor renames + comments |
| cpp-backend memory | ✗ silently wrong — needs fix or doc |
| Misc | small validation + naming + comments |
Recommended pre-merge changes
- (P1) README: "
--report-mem-usageis only meaningful forbackend=\"python\"today" — even better, refuse the combo with a clear error in code. Or implementgetrusage(RUSAGE_CHILDREN)if you want it to actually work for cpp. - (P2) Rename
peak_memory_bytes→current_memory_bytes. KeepPeakMemorySampleras the layer that finds peak. - (P3) Three small additions: index-translation comment on the density slab line, dtype re-cast comment after
smooth(), three__post_init__validation guards (domain_size,sensor_radius,pml_size). - (README) One line noting
data_cast="single"is cosmetic forbackend="cpp".
The rest can land as follow-ups.
Big picture: this is a clean port, faithful to the MATLAB original, with appropriate Pythonic enhancements. Ready to merge once the cpp-memory caveat is addressed (doc or code).
…nups Addresses reviewer feedback (PR #731 thread, comment thread): ## cpp-backend memory measurement (P1) `benchmark.py` exposes a `--backend cpp` option, but the existing PeakMemorySampler reads the *Python* process RSS. When the cpp backend is active the simulation runs in a separate subprocess (the `kspaceFirstOrder-OMP`/`-CUDA` binary), so Python RSS reflects nothing about its memory footprint and `mem_usage` becomes silently meaningless. (The MATLAB original `benchmark.m` only ever exercised the in-process `kspaceFirstOrder3D` solver, so the equivalent question — measuring the subprocess `kspaceFirstOrder3DC` backend — didn't come up there.) Add `ChildPeakMemorySampler` that uses `resource.getrusage(RUSAGE_CHILDREN)` before/after the subprocess to capture true peak RSS of all reaped children. Zero new dependencies (stdlib only). Linux returns KB, macOS returns bytes — normalized to bytes. Windows is unsupported (`resource` is POSIX-only); `--report-mem-usage` + `backend="cpp"` on Windows now refuses with a clear `ValueError` at startup, before any output file is written. `benchmark.run()` picks the sampler factory based on backend automatically; tests override via the new `mem_sampler_factory=` kwarg. ## Rename peak_memory_bytes → current_memory_bytes (P2) The Linux (`/proc/self/statm` field [1]), macOS (`ps -o rss=`), and Windows (`WorkingSetSize`) readers all return *current* RSS, not historical peak. Only `PeakMemorySampler` (which polls in a background thread) finds a peak over time. Rename to reflect what the readers actually do. `peak_memory_bytes` retained as a back-compat alias. ## README clarifications - `data_cast="single"` is a no-op for `backend="cpp"` (binary always serializes as float32); only meaningful for `backend="python"`. - `--report-mem-usage` + backend interaction documented per above. ## __post_init__ validation guards (P3) Add three cheap guards that prevent opaque `ZeroDivisionError`s downstream: `domain_size > 0`, `sensor_radius > 0`, `pml_size >= 0`. ## Code comments - Density slab `[max(ny//4 - 1, 0):, :]`: comment why the `-1` is asymmetric with the head slice `[:nx//4, :]` (1-indexed inclusive start ↔ 0-indexed start). - `make_ball(... Vector([nx//2, ny//2, nz//2]) ...)`: comment that `make_ball` treats the supplied center as 1-indexed, so this matches MATLAB's `Nx/2` despite looking off-by-one. - `source.p0 = smooth(...).astype(dtype)`: comment that smooth() upcasts to float64 via the FFT path; the trailing astype restores user dtype. ## Inline `bon_a` The local was always computed but only used in the `options.nonlinear_media` branch (default `False`). Inline at the use site. ## Tests - 9 new test cases (18 total): back-compat alias, three validation guards (parametrized × 5 cases), cpp-backend factory invocation, Windows-cpp clear-error path, ChildPeakMemorySampler Windows refusal. - All existing tests still pass. Co-authored-by: Farid Yagubbayli <faridyagubbayli@users.noreply.github.com> Co-authored-by: Walter Simson <walter.a.simson@gmail.com>
|
Thanks for the migration @faridyagubbayli! I rolled the review-feedback fixes (cpp-backend memory via All of your original commits are preserved as-is in #761; the cleanups stack on top as a single co-authored commit. Once #761 merges this can be closed as superseded. 18/18 tests pass locally on Linux. 🙏 |
The cpp-backend memory measurement is now handled correctly by ChildPeakMemorySampler (Windows refused with a clear error), so the three-paragraph README explanation was redundant. The behavior is self-documenting via the error message; the script is the source of truth, not the README.
Migrates
benchmark.mscript with some additions.benchmarks/README.mdshows sample usage.Example output json file
Greptile Summary
This PR ports MATLAB's
benchmark.mscript to Python, adding abenchmarks/package with a 3D solver scaling benchmark (benchmark.py), a helper library (helpers.py), a README, and a comprehensive test suite. The implementation corrects all issues flagged in previous review rounds — includingstart_sizevalidation, theNaN-in-JSON Windows path, cumulative-peakru_maxrsssemantics, and the grid-size-collision identity key.kspaceFirstOrderacross a sequence of increasing 3D grid sizes, averaging elapsed time per case and writing partial results after each run to preserve progress on failure.BenchmarkOptions, platform-specific RSS readers, and two context-manager samplers (PeakMemorySamplerfor the Python backend,ChildPeakMemorySamplerfor the C++ subprocess backend) that track peak memory during each solver call.cpp/pythonsampler split), validation errors, partial-result durability, and theallow_nan=FalseJSON guard.Confidence Score: 5/5
This PR is safe to merge. All previously flagged issues have been resolved and no new defects were found.
Every issue from earlier review rounds has been addressed: start_size is validated, Windows memory sampling no longer silently produces invalid JSON (both via validate_memory_bytes and allow_nan=False in json.dumps), the ru_maxrss cumulative-peak problem is replaced by a threaded current-RSS polling approach, and the grid-size collision key is replaced by case_index. The test suite is thorough, covering timing aggregation, both sampler paths (Python vs cpp backend), platform guard failures, and durability on solver error.
No files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[run called] --> B[grid_sizes: build case list] B --> C{report_mem_usage?} C -- Yes --> D[Early probe: sampler_factory + memory_reader] D --> E{probe OK?} E -- No --> F[raise ValueError before writing output] E -- Yes --> G[result mem_usage = list] C -- No --> H[result without mem_usage] G --> I H --> I[for each case_index nx,ny,nz,scale] I --> J[build_case: kgrid, medium, source, sensor] J --> K[for loop_num in 1..num_averages] K --> L{report_mem_usage?} L -- Yes --> M[with sampler_factory as memory_sampler] M --> N[start timer - solver - stop timer] N --> O[exit sampler context - final RSS sample] O --> P[rolling_average loop_mem_usage] L -- No --> Q[start timer - solver - stop timer] P --> R[rolling_average loop_time] Q --> R R --> S[store_case_result by case_index] S --> T[save_results to JSON] T --> K K -- loop done --> I I -- all cases done --> U[return result] J -- exception --> V[error_reached = True] N -- exception --> V V --> W[save_results with error info] W --> X[break] X --> U%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[run called] --> B[grid_sizes: build case list] B --> C{report_mem_usage?} C -- Yes --> D[Early probe: sampler_factory + memory_reader] D --> E{probe OK?} E -- No --> F[raise ValueError before writing output] E -- Yes --> G[result mem_usage = list] C -- No --> H[result without mem_usage] G --> I H --> I[for each case_index nx,ny,nz,scale] I --> J[build_case: kgrid, medium, source, sensor] J --> K[for loop_num in 1..num_averages] K --> L{report_mem_usage?} L -- Yes --> M[with sampler_factory as memory_sampler] M --> N[start timer - solver - stop timer] N --> O[exit sampler context - final RSS sample] O --> P[rolling_average loop_mem_usage] L -- No --> Q[start timer - solver - stop timer] P --> R[rolling_average loop_time] Q --> R R --> S[store_case_result by case_index] S --> T[save_results to JSON] T --> K K -- loop done --> I I -- all cases done --> U[return result] J -- exception --> V[error_reached = True] N -- exception --> V V --> W[save_results with error info] W --> X[break] X --> UReviews (5): Last reviewed commit: "benchmark: trim README — drop verbose da..." | Re-trigger Greptile