Skip to content

[WIP][NV] add glm5-fp4-gb200-dynamo-sglang#1780

Open
hshrivastava-droid wants to merge 6 commits into
mainfrom
nv/gl5-gb200-fp4-v2
Open

[WIP][NV] add glm5-fp4-gb200-dynamo-sglang#1780
hshrivastava-droid wants to merge 6 commits into
mainfrom
nv/gl5-gb200-fp4-v2

Conversation

@hshrivastava-droid

@hshrivastava-droid hshrivastava-droid commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Note

Low Risk
Benchmark and CI launcher/config only; no application runtime or auth changes. Main review surface is recipe/topology correctness and cluster resource assumptions.

Overview
Adds GLM-5 NVFP4 disaggregated Dynamo + SGLang benchmark coverage on GB200, mirroring the existing GB300 glm5 entry pattern.

nvidia-master.yaml introduces glm5-fp4-gb200-dynamo-sglang with fixed-seq-len scenarios for 8k1k and 1k1k: wide-EP decode (TP=32) max-throughput topologies (4p–10p prefill variants) and per-node TP=4 low-latency decode workers, each wired to a concrete CONFIG_FILE under recipes/sglang/glm5/gb200-fp4/.

New srt-slurm recipe YAMLs (ported from upstream gb200-fp4/glm5.yaml, one file per topology) live under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/gb200-fp4/ with Slurm resources, Dynamo frontend, nixl disagg, and tuned sglang_config / sa-bench concurrency per recipe.

runners/launch_gb200-nv.sh maps glm5 + fp4 to lustre GLM-5-NVFP4 and overlays the glm5 recipe tree onto NVIDIA/srt-slurm (sa-submission-q2-2026). perf-changelog.yaml documents the new config key.

Reviewed by Cursor Bugbot for commit ba74df2. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread .github/configs/nvidia-master.yaml
@github-actions

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aa5f207. Configure here.

Comment thread runners/launch_gb200-nv.sh
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant