[WIP][NV] add glm5-fp4-gb200-dynamo-sglang#1780
Conversation
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27575447726 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aa5f207. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27652197300 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27652967695 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27653968669 |

Note
Low Risk
Benchmark and CI launcher/config only; no application runtime or auth changes. Main review surface is recipe/topology correctness and cluster resource assumptions.
Overview
Adds GLM-5 NVFP4 disaggregated Dynamo + SGLang benchmark coverage on GB200, mirroring the existing GB300 glm5 entry pattern.
nvidia-master.yamlintroducesglm5-fp4-gb200-dynamo-sglangwithfixed-seq-lenscenarios for 8k1k and 1k1k: wide-EP decode (TP=32) max-throughput topologies (4p–10p prefill variants) and per-node TP=4 low-latency decode workers, each wired to a concreteCONFIG_FILEunderrecipes/sglang/glm5/gb200-fp4/.New srt-slurm recipe YAMLs (ported from upstream
gb200-fp4/glm5.yaml, one file per topology) live underbenchmarks/multi_node/srt-slurm-recipes/sglang/glm5/gb200-fp4/with Slurm resources, Dynamo frontend, nixl disagg, and tunedsglang_config/sa-benchconcurrency per recipe.runners/launch_gb200-nv.shmapsglm5+fp4to lustreGLM-5-NVFP4and overlays the glm5 recipe tree ontoNVIDIA/srt-slurm(sa-submission-q2-2026).perf-changelog.yamldocuments the new config key.Reviewed by Cursor Bugbot for commit ba74df2. Bugbot is set up for automated code reviews on this repo. Configure here.