Add Qwen3.5-FP8 GB200 SGLang disaggregated benchmark by RohitNagraj · Pull Request #1810 · SemiAnalysisAI/InferenceX

RohitNagraj · 2026-06-16T22:16:30Z

Adds qwen3.5-fp8-gb200-dynamo-sglang: Qwen3.5-397B-A17B-FP8 disaggregated SGLang-via-Dynamo on GB200.

6 topologies across 1k/1k and 8k/1k: 1P1D TP4 STP plus wide-EP (DEP4 prefill / DEP16 decode), from 1P1D up to 8P1D
Recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/qwen3.5/gb200-fp8/
Image: lmsysorg/sglang:nightly-dev-cu13-20260608-303757cc
Adds the qwen3.5/fp8 model-path branch to launch_gb200-nv.sh

Note

Low Risk
Benchmark and CI launch configuration only; no application runtime or auth/data-path changes.

Overview
Introduces qwen3.5-fp8-gb200-dynamo-sglang in nvidia-master.yaml for Qwen/Qwen3.5-397B-A17B-FP8 on GB200 with disaggregated multinode SGLang via Dynamo, covering 1k/1k and 8k/1k fixed-seq scenarios.

The search space adds six topologies: 1P1D TP4 (STP) and wide-EP layouts (DEP4 prefill / DEP16 decode), scaling from 1P1D through 2P1D, 4P1D, and 8P1D, each wired to a CONFIG_FILE under the new recipe tree.

Adds six Slurm recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/qwen3.5/gb200-fp8/ (Dynamo frontend, disagg prefill/decode, sa-bench concurrencies). launch_gb200-nv.sh maps qwen3.5 + fp8 to Lustre weights and overlays those recipes into srt-slurm like other dynamo-sglang models. perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit de00324. Bugbot is set up for automated code reviews on this repo. Configure here.}

Qwen3.5-397B-A17B-FP8 GB200 disaggregated SGLang-via-Dynamo, 6 topologies across 1k/1k and 8k/1k (1P1D TP4 STP plus wide-EP DEP4 prefill / DEP16 decode from 1P1D up to 8P1D). Adds the recipe set, the nvidia-master entry, the gb200 launch-script model-path and recipe-copy branches, and the perf-changelog entry.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit de00324. Configure here.}

cursor · 2026-06-16T22:59:55Z

+  osl: 1024
+  req_rate: "inf"
+  random_range_ratio: 0.8
+  concurrencies: "2048x4096"


4096 concurrency exceeds decode cap

Medium Severity

The 8P1D recipe and master config sweep concurrency 4096, but decode max-running-requests and max-mamba-cache-size stay at 2048. The 2P1D recipe uses 4096 for both when benchmarking at 4096, so the 4096 sweep point cannot honor intended load.

Additional Locations (1)

.github/configs/nvidia-master.yaml#L8975-L8977

^{Reviewed by Cursor Bugbot for commit de00324. Configure here.}

github-actions · 2026-06-16T23:02:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27652043520
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27652043520

github-actions · 2026-06-17T01:57:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27653706015
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27653706015

RohitNagraj requested a review from a team June 16, 2026 22:16

RohitNagraj requested review from jgangani and kedarpotdar-nv as code owners June 16, 2026 22:16

github-project-automation Bot added this to InferenceMAX Board Jun 16, 2026

Update perf-changelog pr-link for #1810

b3537f8

RohitNagraj added the full-sweep-enabled label Jun 16, 2026

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread benchmarks/multi_node/srt-slurm-recipes/sglang/qwen3.5/gb200-fp8/1k1k/1p1d-tp4-tp4.yaml Outdated

RohitNagraj and others added 2 commits June 16, 2026 15:22

Merge branch 'main' into qwen3.5-fp8-gb200-dynamo-sglang

2c94721

Set 1k1k 1p1d-tp4-tp4 context-length to 4096

de00324

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5-FP8 GB200 SGLang disaggregated benchmark#1810

Add Qwen3.5-FP8 GB200 SGLang disaggregated benchmark#1810
RohitNagraj wants to merge 4 commits into
mainfrom
qwen3.5-fp8-gb200-dynamo-sglang

RohitNagraj commented Jun 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RohitNagraj commented Jun 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

4096 concurrency exceeds decode cap

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RohitNagraj commented Jun 16, 2026 •

edited by cursor Bot

Loading