[NV] Add MiniMax-M3 FP8 B300 Dynamo vLLM recipes by Oseltamivir · Pull Request #1788 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-15T20:28:05Z

Summary

Import all fourteen MiniMax-M3 FP8 B300 vLLM recipes from Add MiniMax M3 FP8 non-MTP disagg configs for 1k1k/8k1k NVIDIA/srt-slurm#223 at 5caabe364e1ef531fab9926c75e32ae8927b1553.
Register the submitted 1k/1k and 8k/1k DEP2-prefill with TEP8, DEP8, and DEP4 decode topologies in nvidia-master.yaml.
Update the B300 launcher to map the MiniMax-M3 model path and overlay the checked-in recipes until the upstream PR is available on the base branch.

Impact

Adds minimaxm3-fp8-b300-dynamo-vllm with fourteen disaggregated STP matrix entries covering the upstream concurrency points from 4 through 4096. Existing single-node MiniMax-M3 configurations are unchanged.

Upstream source: NVIDIA/srt-slurm#223

Validation

Confirmed all fourteen checked-in recipe files match upstream commit 5caabe3 exactly.
Generated the filtered B300 Dynamo vLLM sweep successfully; it emits fourteen entries.
Validated recipe resources, parallelism, and concurrency lists against the master config.
python -m pytest utils/matrix_logic/ -v (156 passed)
bash -n runners/launch_b300-nv.sh
git diff --cached --check -- . ':!perf-changelog.yaml'

Note

Low Risk
Benchmark and CI launcher configuration only; no changes to application serving or auth paths.

Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM benchmark coverage on B300 for fixed 1k/1k and 8k/1k sequence lengths.

Registers a new minimaxm3-fp8-b300-dynamo-vllm matrix key in nvidia-master.yaml with a search space that maps concurrency tiers to 14 checked-in STP recipes (DEP2 prefill with TEP8, DEP8, or DEP4 decode layouts), sourced from NVIDIA/srt-slurm#223. The recipe YAMLs live under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/ and wire Dynamo, Nixl KV transfer, and sa-bench concurrencies per topology.

runners/launch_b300-nv.sh now resolves the MiniMax-M3 model path and copies those recipes into a cloned srt-slurm checkout on sa-submission-q2-2026 for B300 Dynamo vLLM runs. perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit 929391c. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-15T20:28:17Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-15T20:36:19Z

Claude finished @Oseltamivir's task in 2m 39s —— View job

Review of PR #1788

Read CLAUDE.md and repo guidelines
Review nvidia-master.yaml changes
Review recipe YAML files (all 6)
Review perf-changelog.yaml changes
Review launch_b300-nv.sh changes
Validate model prefix, image accessibility, perf-changelog ordering
Post review findings

Summary

Clean PR. Six MiniMax-M3 FP8 B300 Dynamo vLLM disaggregated recipes imported from upstream, with matching master config entries, launcher routing, and perf-changelog.

Validation passed:

perf-changelog.yaml entry is correctly appended at the end of the file ✅
nvidia-master.yaml has a corresponding perf-changelog entry ✅
Image vllm/vllm-openai:minimax-m3 is valid Docker Hub format ✅
Enroot import pattern present in the launcher (shared path at line 117) ✅
Recipe worker counts, parallelism, and concurrency lists all match the master config search-space entries ✅
Launcher correctly maps MODEL_PATH and SRT_SLURM_MODEL_PREFIX to match the recipe's model.path field ✅

One warning found: All 6 decode configs set max-cudagraph-capture-size: 8196 — likely a typo for 8192 (2^13). Since these are imported verbatim from upstream, worth confirming with NVIDIA/srt-slurm#223 whether this is intentional. See inline comment.

No blocking issues found.

claude · 2026-06-15T20:38:54Z

+      stream-interval: 32
+      max-num-seqs: 4096
+      max-num-batched-tokens: 16384
+      max-cudagraph-capture-size: 8196


🟡 WARNING: max-cudagraph-capture-size: 8196 is likely a typo for 8192 (2^13)
Why it matters: 8196 is not a power of 2. Every other recipe in this repo uses power-of-2 values (e.g. 2048). vLLM uses this as a batch-size threshold for CUDA graph capture — a non-power-of-2 value is unusual and likely unintentional. This same value appears across all 6 decode configs in this PR.
Fix: Since these are imported verbatim from NVIDIA/srt-slurm#223, you may want to confirm with upstream whether 8196 is intentional or should be 8192. If it's an upstream typo, it's worth fixing here too.

Suggested change

max-cudagraph-capture-size: 8196

max-cudagraph-capture-size: 8192

github-actions · 2026-06-15T22:51:17Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27574720651
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27574720651

github-actions · 2026-06-16T00:33:33Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27574720651
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27574720651

jasonlizhengjian · 2026-06-16T17:49:32Z

I'm doing the same on #1787

github-actions · 2026-06-16T18:11:11Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27638068927
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27638068927

github-project-automation Bot added this to InferenceMAX Board Jun 15, 2026

feat: add MiniMax-M3 B300 Dynamo vLLM recipes

28cbfa6

Oseltamivir force-pushed the apply-minimax-m3-b300-recipes branch from c8b26a9 to 28cbfa6 Compare June 15, 2026 20:28

Oseltamivir added the full-sweep-enabled label Jun 15, 2026

Oseltamivir marked this pull request as ready for review June 15, 2026 20:35

Oseltamivir requested a review from a team June 15, 2026 20:35

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners June 15, 2026 20:35

claude Bot reviewed Jun 15, 2026

View reviewed changes

Merge main and add MiniMax-M3 B300 8k recipes

929391c

Oseltamivir closed this Jun 16, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add MiniMax-M3 FP8 B300 Dynamo vLLM recipes#1788

[NV] Add MiniMax-M3 FP8 B300 Dynamo vLLM recipes#1788
Oseltamivir wants to merge 2 commits into
mainfrom
apply-minimax-m3-b300-recipes

Oseltamivir commented Jun 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

claude Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

claude Bot Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

jasonlizhengjian commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	max-cudagraph-capture-size: 8196
	max-cudagraph-capture-size: 8192

Conversation

Oseltamivir commented Jun 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Validation

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

claude Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #1788

Summary

Uh oh!

claude Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

jasonlizhengjian commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented Jun 15, 2026 •

edited by cursor Bot

Loading

claude Bot commented Jun 15, 2026 •

edited

Loading