v0.6.3rc1: hardware validation across macOS / Windows / Linux + GPUs

Tracking pre-release validation of [v0.6.3rc1](https://github.com/waltsims/k-wave-python/releases/tag/v0.6.3rc1) before promoting to stable v0.6.3. The binary install path was substantially rewritten — first release on the unified binary pipeline (kspacefirstorder-unified v1.4.2). Validating that the URL flip, rename-on-download, 16-arch CUDA binary, and Windows multi-arch fix all behave correctly on real hardware.

## Install

```bash
pip install --pre k-wave-python
# or pinned:
pip install k-wave-python==0.6.3rc1
```

## Smoke test recipe (any platform)

```bash
# 1. Confirm install + binary download
python -c "import kwave; print(kwave.__version__, kwave.BINARY_VERSION)"
# expected: 0.6.3rc1 v1.4.2

# 2. Confirm binaries landed at the expected paths (rename-on-download verified)
python -c "import kwave, os; print(sorted(p for p in os.listdir(kwave.BINARY_PATH) if not p.endswith('.json')))"
# expected on linux/darwin: ['kspaceFirstOrder-CUDA', 'kspaceFirstOrder-OMP']  (no -linux / -darwin suffix)
# expected on windows:     ['kspaceFirstOrder-CUDA.exe', 'kspaceFirstOrder-OMP.exe', plus the 19 shared DLLs]

# 3. Run an OMP example end-to-end
uv run examples/ivp_homogeneous_medium.py   # or whatever your env runner is

# 4. If GPU available: run the same example with backend='cpp', device='gpu'
# (or use any example that exercises the CUDA path)
```

## Validation matrix

### macOS
- [x] **Apple Silicon (arm64) + OMP** — primary supported config; downloads `kspaceFirstOrder-OMP-darwin` and runs cleanly. Verified 2026-06-21 on M1 (macOS 15.1): `0.6.3rc1` installs from `--pre`, binary lands at the expected path, python and cpp backends both run and agree **bit-for-bit** (corr=1.0, 0 diff) on an on-grid sensor.
- [ ] **Apple Silicon (arm64) — Homebrew runtime deps present** — the darwin OMP binary is *not* self-contained: `otool -L` shows it links `fftw`, `hdf5`, `zlib`, `libomp` at hardcoded `/opt/homebrew/opt/...` paths, so `backend="cpp"` fails at launch with `dyld: Library not loaded …` unless they're installed. Confirm a clean machine follows the documented `brew install fftw hdf5 zlib libomp` (docs index / `docs/get_started/new_api.rst`) and that the cpp backend then runs. (Consider bundling via `@rpath`/delocate so this isn't a manual prerequisite.)
- [ ] **Intel Mac (x86_64) + OMP** — should emit the `_darwin_unsupported` `RuntimeWarning` at import time; no CUDA path (`URL_DICT['darwin']['cuda']` is `[]`)

### Linux
- [x] **Linux + OMP (no GPU)** — `kspaceFirstOrder-OMP` downloads, example runs on CPU
- [ ] **Linux + CUDA on Turing** (RTX 20xx / T4) — exercises sm_75 SASS section
- [ ] **Linux + CUDA on Ampere** (A100 / RTX 30xx) — exercises sm_80 or sm_86
- [ ] **Linux + CUDA on Ada** (RTX 40xx / L40) — exercises sm_89
- [x] **Linux + CUDA on Hopper** (H100 / H200) — exercises sm_90 / sm_90a
- [x] **Linux + CUDA on Blackwell consumer** (RTX 50xx / RTX PRO 6000) — exercises sm_120 / sm_120a (the new arch)
- [ ] **Linux + CUDA on Blackwell datacenter** (B200 / GB200) — exercises sm_100 / sm_100a (the new arch)
- [ ] **Linux + CUDA on Volta** (V100) — should emit the runtime cc<7.5 warning, binary load expected to fail with `no kernel image is available for execution on the device` (the warning's role is to tell users why)
- [ ] **Linux + CUDA on Pascal** (GTX 10xx / P100) — same: runtime warning + expected binary load failure

### Windows
- [ ] **Windows + OMP (no CUDA)** — `kspaceFirstOrder-OMP-windows.exe` + 19 shared DLLs download, example runs
- [ ] **Windows + CUDA on Turing** (RTX 20xx) — `kspaceFirstOrder-CUDA-windows.exe` (14.8 MB) + cudart64_13.dll + cufft64_12.dll download; GPU example runs. Proves the v1.4.1 Windows regression (sm_75-only 3.4 MB binary) is fixed.
- [ ] **Windows + CUDA on Ampere / Ada / Hopper / Blackwell** — any one card sufficient; same flow

## What success looks like

For each row above:
- Install completes (binaries download to the expected paths)
- `python -c "import kwave"` runs without unexpected warnings (or with the expected runtime cc<7.5 warning for Maxwell/Pascal/Volta)
- An example (any IVP / OMP / CUDA example as appropriate) runs to completion and produces sensible output

If any row fails, **comment on this issue with**:
- Platform + GPU model + compute capability
- Output of `pip show k-wave-python` and `python -c "import kwave; print(kwave.__version__, kwave.BINARY_VERSION, kwave.BINARY_PATH)"`
- The actual error / unexpected behavior

## What this gates

Promoting to stable **v0.6.3**. Once the matrix is reasonably covered (at minimum: one Linux + CUDA per supported arch family, one Windows + CUDA, one macOS + OMP, one Linux + OMP), I'll prep the one-line promotion PR.

## Related

- Pre-release: https://github.com/waltsims/k-wave-python/releases/tag/v0.6.3rc1
- Unified release: https://github.com/waltsims/kspacefirstorder-unified/releases/tag/v1.4.2
- Consolidation tracking: waltsims/kspacefirstorder-unified#13
- Closes #738 once stable v0.6.3 ships



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.3rc1: hardware validation across macOS / Windows / Linux + GPUs #759

Install

Smoke test recipe (any platform)

Validation matrix

macOS

Linux

Windows

What success looks like

What this gates

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.6.3rc1: hardware validation across macOS / Windows / Linux + GPUs #759

Description

Install

Smoke test recipe (any platform)

Validation matrix

macOS

Linux

Windows

What success looks like

What this gates

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions