[NV] Add MiniMax M3 B300 Dynamo vLLM recipes#1787
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 54b829a. Configure here.
| stream-interval: 32 | ||
| max-num-seqs: 4096 | ||
| max-num-batched-tokens: 16384 | ||
| max-cudagraph-capture-size: 8196 |
There was a problem hiding this comment.
Wrong cudagraph capture size
Medium Severity
All six new MiniMax M3 B300 decode blocks set max-cudagraph-capture-size to 8196, while prefill uses 2048 and decode sets max-num-seqs to 4096. That value is not used elsewhere in the repo and sits four above the usual power-of-two 8192 paired with 4096-sequence decode configs, so CUDA graph capture may not align with intended batch sizes.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 54b829a. Configure here.
54b829a to
a2d9824
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27637791525 |


Adds MiniMax M3 MXFP8 B300 disaggregated vLLM benchmarks via Dynamo for 1k1k STP.
Validation:
bash -n runners/launch_b300-nv.shgit diff --checkCONFIG_FILEpath consistency checkNote: local matrix generation was not run because
pydanticis not installed in this environment.Note
Low Risk
Benchmark and CI launcher/config only; no application runtime, auth, or data-path changes.
Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM benchmark coverage on B300, including a new
minimaxm3-fp8-b300-dynamo-vllmentry innvidia-master.yamlwith fixed-seq-len scenarios for 1k1k and 8k1k (multiple prefill/decode worker and TP/EP/DP-attention search-space rows).Introduces local srt-slurm recipes under
benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/and wiresrunners/launch_b300-nv.shto set the M3 model path and copy those recipes into the clonedsrt-slurmrepo at job time. Selected recipes use TP8 decode with Marlin MoE (expert parallelism off) for low-concurrency points.Documents the change in
perf-changelog.yaml.Reviewed by Cursor Bugbot for commit a2d9824. Bugbot is set up for automated code reviews on this repo. Configure here.