Add Qwen3.5-FP8 GB200 SGLang disaggregated benchmark#1810
Conversation
Qwen3.5-397B-A17B-FP8 GB200 disaggregated SGLang-via-Dynamo, 6 topologies across 1k/1k and 8k/1k (1P1D TP4 STP plus wide-EP DEP4 prefill / DEP16 decode from 1P1D up to 8P1D). Adds the recipe set, the nvidia-master entry, the gb200 launch-script model-path and recipe-copy branches, and the perf-changelog entry.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit de00324. Configure here.
| osl: 1024 | ||
| req_rate: "inf" | ||
| random_range_ratio: 0.8 | ||
| concurrencies: "2048x4096" |
There was a problem hiding this comment.
4096 concurrency exceeds decode cap
Medium Severity
The 8P1D recipe and master config sweep concurrency 4096, but decode max-running-requests and max-mamba-cache-size stay at 2048. The 2P1D recipe uses 4096 for both when benchmarking at 4096, so the 4096 sweep point cannot honor intended load.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit de00324. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27652043520 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27653706015 |


Adds
qwen3.5-fp8-gb200-dynamo-sglang: Qwen3.5-397B-A17B-FP8 disaggregated SGLang-via-Dynamo on GB200.Note
Low Risk
Benchmark and CI launch configuration only; no application runtime or auth/data-path changes.
Overview
Introduces
qwen3.5-fp8-gb200-dynamo-sglanginnvidia-master.yamlfor Qwen/Qwen3.5-397B-A17B-FP8 on GB200 with disaggregated multinode SGLang via Dynamo, covering 1k/1k and 8k/1k fixed-seq scenarios.The search space adds six topologies: 1P1D TP4 (STP) and wide-EP layouts (DEP4 prefill / DEP16 decode), scaling from 1P1D through 2P1D, 4P1D, and 8P1D, each wired to a
CONFIG_FILEunder the new recipe tree.Adds six Slurm recipe YAMLs under
benchmarks/multi_node/srt-slurm-recipes/sglang/qwen3.5/gb200-fp8/(Dynamo frontend, disagg prefill/decode,sa-benchconcurrencies).launch_gb200-nv.shmapsqwen3.5+ fp8 to Lustre weights and overlays those recipes intosrt-slurmlike other dynamo-sglang models.perf-changelog.yamldocuments the new config key.Reviewed by Cursor Bugbot for commit de00324. Bugbot is set up for automated code reviews on this repo. Configure here.