Skip to content

[NV] Add MiniMax-M3 FP8 B300 Dynamo vLLM recipes#1788

Closed
Oseltamivir wants to merge 2 commits into
mainfrom
apply-minimax-m3-b300-recipes
Closed

[NV] Add MiniMax-M3 FP8 B300 Dynamo vLLM recipes#1788
Oseltamivir wants to merge 2 commits into
mainfrom
apply-minimax-m3-b300-recipes

Conversation

@Oseltamivir

@Oseltamivir Oseltamivir commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Import all fourteen MiniMax-M3 FP8 B300 vLLM recipes from Add MiniMax M3 FP8 non-MTP disagg configs for 1k1k/8k1k NVIDIA/srt-slurm#223 at 5caabe364e1ef531fab9926c75e32ae8927b1553.
  • Register the submitted 1k/1k and 8k/1k DEP2-prefill with TEP8, DEP8, and DEP4 decode topologies in nvidia-master.yaml.
  • Update the B300 launcher to map the MiniMax-M3 model path and overlay the checked-in recipes until the upstream PR is available on the base branch.

Impact

Adds minimaxm3-fp8-b300-dynamo-vllm with fourteen disaggregated STP matrix entries covering the upstream concurrency points from 4 through 4096. Existing single-node MiniMax-M3 configurations are unchanged.

Upstream source: NVIDIA/srt-slurm#223

Validation

  • Confirmed all fourteen checked-in recipe files match upstream commit 5caabe3 exactly.
  • Generated the filtered B300 Dynamo vLLM sweep successfully; it emits fourteen entries.
  • Validated recipe resources, parallelism, and concurrency lists against the master config.
  • python -m pytest utils/matrix_logic/ -v (156 passed)
  • bash -n runners/launch_b300-nv.sh
  • git diff --cached --check -- . ':!perf-changelog.yaml'

Note

Low Risk
Benchmark and CI launcher configuration only; no changes to application serving or auth paths.

Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM benchmark coverage on B300 for fixed 1k/1k and 8k/1k sequence lengths.

Registers a new minimaxm3-fp8-b300-dynamo-vllm matrix key in nvidia-master.yaml with a search space that maps concurrency tiers to 14 checked-in STP recipes (DEP2 prefill with TEP8, DEP8, or DEP4 decode layouts), sourced from NVIDIA/srt-slurm#223. The recipe YAMLs live under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/ and wire Dynamo, Nixl KV transfer, and sa-bench concurrencies per topology.

runners/launch_b300-nv.sh now resolves the MiniMax-M3 model path and copies those recipes into a cloned srt-slurm checkout on sa-submission-q2-2026 for B300 Dynamo vLLM runs. perf-changelog.yaml documents the new config key.

Reviewed by Cursor Bugbot for commit 929391c. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Oseltamivir Oseltamivir force-pushed the apply-minimax-m3-b300-recipes branch from c8b26a9 to 28cbfa6 Compare June 15, 2026 20:28
@Oseltamivir Oseltamivir marked this pull request as ready for review June 15, 2026 20:35
@Oseltamivir Oseltamivir requested a review from a team June 15, 2026 20:35
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Oseltamivir's task in 2m 39s —— View job


Review of PR #1788

  • Read CLAUDE.md and repo guidelines
  • Review nvidia-master.yaml changes
  • Review recipe YAML files (all 6)
  • Review perf-changelog.yaml changes
  • Review launch_b300-nv.sh changes
  • Validate model prefix, image accessibility, perf-changelog ordering
  • Post review findings

Summary

Clean PR. Six MiniMax-M3 FP8 B300 Dynamo vLLM disaggregated recipes imported from upstream, with matching master config entries, launcher routing, and perf-changelog.

Validation passed:

  • perf-changelog.yaml entry is correctly appended at the end of the file ✅
  • nvidia-master.yaml has a corresponding perf-changelog entry ✅
  • Image vllm/vllm-openai:minimax-m3 is valid Docker Hub format ✅
  • Enroot import pattern present in the launcher (shared path at line 117) ✅
  • Recipe worker counts, parallelism, and concurrency lists all match the master config search-space entries ✅
  • Launcher correctly maps MODEL_PATH and SRT_SLURM_MODEL_PREFIX to match the recipe's model.path field ✅

One warning found: All 6 decode configs set max-cudagraph-capture-size: 8196 — likely a typo for 8192 (2^13). Since these are imported verbatim from upstream, worth confirming with NVIDIA/srt-slurm#223 whether this is intentional. See inline comment.

No blocking issues found.

stream-interval: 32
max-num-seqs: 4096
max-num-batched-tokens: 16384
max-cudagraph-capture-size: 8196

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING: max-cudagraph-capture-size: 8196 is likely a typo for 8192 (2^13)
Why it matters: 8196 is not a power of 2. Every other recipe in this repo uses power-of-2 values (e.g. 2048). vLLM uses this as a batch-size threshold for CUDA graph capture — a non-power-of-2 value is unusual and likely unintentional. This same value appears across all 6 decode configs in this PR.
Fix: Since these are imported verbatim from NVIDIA/srt-slurm#223, you may want to confirm with upstream whether 8196 is intentional or should be 8192. If it's an upstream typo, it's worth fixing here too.

Suggested change
max-cudagraph-capture-size: 8196
max-cudagraph-capture-size: 8192

@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

@jasonlizhengjian

Copy link
Copy Markdown
Collaborator

I'm doing the same on #1787

@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants