Skip to content

[NV]dsr1-fp4-b200-sglang: add DPA PDL lane#1792

Open
hshrivastava-droid wants to merge 1 commit into
mainfrom
nv/dsr1-fp4-v2
Open

[NV]dsr1-fp4-b200-sglang: add DPA PDL lane#1792
hshrivastava-droid wants to merge 1 commit into
mainfrom
nv/dsr1-fp4-v2

Conversation

@hshrivastava-droid

@hshrivastava-droid hshrivastava-droid commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a data-parallel attention (DPA) benchmark lane for the DeepSeek-R1 FP4 B200 SGLang (dsr1-fp4-b200-sglang) fixed-sequence-length recipe and retunes the concurrency sweep matrix.

Changes

Config (.github/configs/nvidia-master.yaml)

  • Image bump: lmsysorg/sglang:v0.5.12-cu130lmsysorg/sglang:v0.5.12.post1
  • 1k/1k search space:
    • TP4/EP4 concurrency expanded from 1–128 → 1–256
    • TP8/EP8 changed from conc 1–128 sweep → conc-list: [1] (single-point)
  • 8k/1k search space:
    • TP4/EP4 conc 1–128 retained
    • New: TP4/EP4 with dp-attn: true, conc 64–256
    • TP8/EP8 changed from conc 1–16 sweep → conc-list: [1] (single-point)

Script (benchmarks/single_node/fixed_seq_len/dsr1_fp4_b200.sh)

  • Adds DP_ATTENTION env var (default false) with input validation
  • When DP_ATTENTION=true, launches SGLang with:
    • TP-sized data parallelism (--data-parallel-size=$TP)
    • --enable-dp-attention, --enable-dp-attention-local-control-broadcast, --enable-dp-lm-head
    • --enable-prefill-delayer, --schedule-conservativeness 3.33
    • Tighter scheduler recv interval (1 instead of 10/30)
    • Larger chunked prefill size (32768 instead of 16384)
  • All runs now set SGLANG_RADIX_FORCE_MISS=1 env var
  • Replaces --disable-radix-cache with --disable-piecewise-cuda-graph

Perf Changelog (perf-changelog.yaml)

  • Documents all config and script changes under dsr1-fp4-b200-sglang

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4e4e9d3. Configure here.

- { tp: 4, ep: 4, conc-start: 1, conc-end: 128 }
- { tp: 8, ep: 8, conc-start: 1, conc-end: 128 }
- { tp: 4, ep: 4, conc-start: 1, conc-end: 256 }
- { tp: 8, ep: 8, conc-list: [1] }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-node conc-list breaks sweep

High Severity

The TP8/EP8 search-space rows now use conc-list: [1] instead of a conc-start/conc-end range. For single-node fixed-seq-len, generate_full_sweep still reads only conc-start and conc-end, so matrix generation fails on those rows and TP8 benchmarks for this config are not scheduled.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4e4e9d3. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV]dsr1-fp4-b200-sglang: add DPA PDL lane [NV]dsr1-fp4-b200-sglang: add DPA PDL lane Jun 16, 2026
@hshrivastava-droid

Copy link
Copy Markdown
Collaborator Author

@claude add PR description

@hshrivastava-droid

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@Klaud-Cold

Klaud-Cold commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Claude finished @hshrivastava-droid's task in 1m 24s —— View job


Adding PR Description

  • Gather context from changed files
  • Write and update PR description

PR description has been added with a structured summary covering the three changed files: config matrix updates, script DPA support, and perf-changelog entry.

@hshrivastava-droid

Copy link
Copy Markdown
Collaborator Author

@functionstackx could you please review this

@jgangani jgangani left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants