[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up)#1789
Draft
qiching wants to merge 1 commit into
Draft
[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up)#1789qiching wants to merge 1 commit into
qiching wants to merge 1 commit into
Conversation
Port of internal PR SemiAnalysisAI#95. Adds opt-in synthetic-acceptance injection for the multi-node dsv4 MTP2 agg recipe: - runners/inject_synthetic_acceptance.py: rewrites each speculative-config in the srt-slurm recipe to use synthetic rejection sampling when SYNTHETIC_ACCEPTANCE=true (no-op otherwise). - runners/launch_gb200-nv.sh: USE_SHARED_FS flag (dsv4 dynamo-vllm now uses the same compute-visible shared-FS staging as minimax on watchtower) + invoke the injection after the name override, before srtctl apply. - .github/configs/nvidia-master.yaml: enable SYNTHETIC_ACCEPTANCE on the dsv4 gb200 dynamo-vllm mtp2 agg cell (length 2.27) for the e2e test.
Collaborator
Author
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-gb200-dynamo-vllm-mtp2 --conc 1 --no-evals |
Contributor
|
@qiching Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27575304265 |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in synthetic-acceptance injection for multi-node MTP recipes, enabled on a single dsv4 MTP2 agg cell as
the bring-up / e2e test.
runners/inject_synthetic_acceptance.py(new): whenSYNTHETIC_ACCEPTANCE=true, rewrites eachspeculative-configin the srt-slurm recipe to use synthetic rejection sampling. No-op when the env var is unset/false.runners/launch_gb200-nv.sh: generalize the watchtower shared-FS staging from minimax-only to aUSE_SHARED_FSflag (now also dsv4 dynamo-vllm), and invoke the injection after the name override / beforesrtctl apply..github/configs/nvidia-master.yaml: enableSYNTHETIC_ACCEPTANCE(length 2.27) on the dsv4 gb200 dynamo-vllm mtp2 agg cell only.Design
Bring-up approach: keep it opt-in so we can enable support incrementally per fw/model/config and roll back easily. Currently per-recipe via
additional-settings; promote to a first-class field once enforced everywhere.Not in this PR (follow-ups)
benchmarks/speedbench-reference-al.yaml, this cell hardcodes2.27; the reference YAML isn't in this repo yet.