[NV] Add MiniMax-M3 FP8 B300 Dynamo vLLM recipes#1788
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
c8b26a9 to
28cbfa6
Compare
|
Claude finished @Oseltamivir's task in 2m 39s —— View job Review of PR #1788
SummaryClean PR. Six MiniMax-M3 FP8 B300 Dynamo vLLM disaggregated recipes imported from upstream, with matching master config entries, launcher routing, and perf-changelog. Validation passed:
One warning found: All 6 decode configs set No blocking issues found. |
| stream-interval: 32 | ||
| max-num-seqs: 4096 | ||
| max-num-batched-tokens: 16384 | ||
| max-cudagraph-capture-size: 8196 |
There was a problem hiding this comment.
🟡 WARNING: max-cudagraph-capture-size: 8196 is likely a typo for 8192 (2^13)
Why it matters: 8196 is not a power of 2. Every other recipe in this repo uses power-of-2 values (e.g. 2048). vLLM uses this as a batch-size threshold for CUDA graph capture — a non-power-of-2 value is unusual and likely unintentional. This same value appears across all 6 decode configs in this PR.
Fix: Since these are imported verbatim from NVIDIA/srt-slurm#223, you may want to confirm with upstream whether 8196 is intentional or should be 8192. If it's an upstream typo, it's worth fixing here too.
| max-cudagraph-capture-size: 8196 | |
| max-cudagraph-capture-size: 8192 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27574720651 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27574720651 |
|
I'm doing the same on #1787 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27638068927 |
Summary
5caabe364e1ef531fab9926c75e32ae8927b1553.nvidia-master.yaml.Impact
Adds
minimaxm3-fp8-b300-dynamo-vllmwith fourteen disaggregated STP matrix entries covering the upstream concurrency points from 4 through 4096. Existing single-node MiniMax-M3 configurations are unchanged.Upstream source: NVIDIA/srt-slurm#223
Validation
5caabe3exactly.python -m pytest utils/matrix_logic/ -v(156 passed)bash -n runners/launch_b300-nv.shgit diff --cached --check -- . ':!perf-changelog.yaml'Note
Low Risk
Benchmark and CI launcher configuration only; no changes to application serving or auth paths.
Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM benchmark coverage on B300 for fixed 1k/1k and 8k/1k sequence lengths.
Registers a new
minimaxm3-fp8-b300-dynamo-vllmmatrix key innvidia-master.yamlwith a search space that maps concurrency tiers to 14 checked-in STP recipes (DEP2 prefill with TEP8, DEP8, or DEP4 decode layouts), sourced from NVIDIA/srt-slurm#223. The recipe YAMLs live underbenchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/and wire Dynamo, Nixl KV transfer, andsa-benchconcurrencies per topology.runners/launch_b300-nv.shnow resolves the MiniMax-M3 model path and copies those recipes into a clonedsrt-slurmcheckout onsa-submission-q2-2026for B300 Dynamo vLLM runs.perf-changelog.yamldocuments the new config key.Reviewed by Cursor Bugbot for commit 929391c. Bugbot is set up for automated code reviews on this repo. Configure here.