[NV]dsr1-fp4-b200-sglang: add DPA PDL lane by hshrivastava-droid · Pull Request #1792 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-06-15T21:58:00Z

Summary

Adds a data-parallel attention (DPA) benchmark lane for the DeepSeek-R1 FP4 B200 SGLang (dsr1-fp4-b200-sglang) fixed-sequence-length recipe and retunes the concurrency sweep matrix.

Changes

Config (`.github/configs/nvidia-master.yaml`)

Image bump: lmsysorg/sglang:v0.5.12-cu130 → lmsysorg/sglang:v0.5.12.post1
1k/1k search space:
- TP4/EP4 concurrency expanded from 1–128 → 1–256
- TP8/EP8 changed from conc 1–128 sweep → conc-list: [1] (single-point)
8k/1k search space:
- TP4/EP4 conc 1–128 retained
- New: TP4/EP4 with dp-attn: true, conc 64–256
- TP8/EP8 changed from conc 1–16 sweep → conc-list: [1] (single-point)

Script (`benchmarks/single_node/fixed_seq_len/dsr1_fp4_b200.sh`)

Adds DP_ATTENTION env var (default false) with input validation
When DP_ATTENTION=true, launches SGLang with:
- TP-sized data parallelism (--data-parallel-size=$TP)
- --enable-dp-attention, --enable-dp-attention-local-control-broadcast, --enable-dp-lm-head
- --enable-prefill-delayer, --schedule-conservativeness 3.33
- Tighter scheduler recv interval (1 instead of 10/30)
- Larger chunked prefill size (32768 instead of 16384)
All runs now set SGLANG_RADIX_FORCE_MISS=1 env var
Replaces --disable-radix-cache with --disable-piecewise-cuda-graph

Perf Changelog (`perf-changelog.yaml`)

Documents all config and script changes under dsr1-fp4-b200-sglang

github-actions · 2026-06-15T21:58:09Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4e4e9d3. Configure here.}

cursor · 2026-06-15T21:59:43Z

-      - { tp: 4, ep: 4, conc-start: 1, conc-end: 128 }
-      - { tp: 8, ep: 8, conc-start: 1, conc-end: 128 }
+      - { tp: 4, ep: 4, conc-start: 1, conc-end: 256 }
+      - { tp: 8, ep: 8, conc-list: [1] }


Single-node conc-list breaks sweep

High Severity

The TP8/EP8 search-space rows now use conc-list: [1] instead of a conc-start/conc-end range. For single-node fixed-seq-len, generate_full_sweep still reads only conc-start and conc-end, so matrix generation fails on those rows and TP8 benchmarks for this config are not scheduled.

Additional Locations (1)

.github/configs/nvidia-master.yaml#L1691-L1692

^{Reviewed by Cursor Bugbot for commit 4e4e9d3. Configure here.}

github-actions · 2026-06-16T02:27:16Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27579102530
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27579102530

hshrivastava-droid · 2026-06-16T19:42:34Z

@claude add PR description

hshrivastava-droid · 2026-06-16T19:42:54Z

/reuse-sweep-run

Klaud-Cold · 2026-06-16T19:43:06Z

Claude finished @hshrivastava-droid's task in 1m 24s —— View job

Adding PR Description

Gather context from changed files
Write and update PR description

PR description has been added with a structured summary covering the three changed files: config matrix updates, script DPA support, and perf-changelog entry.

hshrivastava-droid · 2026-06-16T19:51:31Z

@functionstackx could you please review this

jgangani

LGTM

add DPA PDL lane

4e4e9d3

hshrivastava-droid requested a review from a team June 15, 2026 21:58

hshrivastava-droid requested a review from kedarpotdar-nv as a code owner June 15, 2026 21:58

hshrivastava-droid added the full-sweep-enabled label Jun 15, 2026

hshrivastava-droid requested a review from jgangani as a code owner June 15, 2026 21:58

github-project-automation Bot added this to InferenceMAX Board Jun 15, 2026

cursor Bot reviewed Jun 15, 2026

View reviewed changes

hshrivastava-droid changed the title ~~[WIP][NV]dsr1-fp4-b200-sglang: add DPA PDL lane~~ [NV]dsr1-fp4-b200-sglang: add DPA PDL lane Jun 16, 2026

kedarpotdar-nv approved these changes Jun 16, 2026

View reviewed changes

jgangani approved these changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV]dsr1-fp4-b200-sglang: add DPA PDL lane#1792

[NV]dsr1-fp4-b200-sglang: add DPA PDL lane#1792
hshrivastava-droid wants to merge 1 commit into
mainfrom
nv/dsr1-fp4-v2

hshrivastava-droid commented Jun 15, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

hshrivastava-droid commented Jun 16, 2026

Uh oh!

hshrivastava-droid commented Jun 16, 2026

Uh oh!

Klaud-Cold commented Jun 16, 2026 •

edited

Loading

Uh oh!

hshrivastava-droid commented Jun 16, 2026

Uh oh!

jgangani left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hshrivastava-droid commented Jun 15, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Config (.github/configs/nvidia-master.yaml)

Script (benchmarks/single_node/fixed_seq_len/dsr1_fp4_b200.sh)

Perf Changelog (perf-changelog.yaml)

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 15, 2026

Choose a reason for hiding this comment

Single-node conc-list breaks sweep

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

hshrivastava-droid commented Jun 16, 2026

Uh oh!

hshrivastava-droid commented Jun 16, 2026

Uh oh!

Klaud-Cold commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

hshrivastava-droid commented Jun 16, 2026

Uh oh!

jgangani left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hshrivastava-droid commented Jun 15, 2026 •

edited by Klaud-Cold

Loading

Config (`.github/configs/nvidia-master.yaml`)

Script (`benchmarks/single_node/fixed_seq_len/dsr1_fp4_b200.sh`)

Perf Changelog (`perf-changelog.yaml`)

Klaud-Cold commented Jun 16, 2026 •

edited

Loading