Skip to content

dflash: enable Qwen3-Coder-Next on Vulkan#79

Draft
gboddaer wants to merge 4 commits into
Anbeeld:mainfrom
gboddaer:enable-dflash-qwen3-coder-next-on-vulkan
Draft

dflash: enable Qwen3-Coder-Next on Vulkan#79
gboddaer wants to merge 4 commits into
Anbeeld:mainfrom
gboddaer:enable-dflash-qwen3-coder-next-on-vulkan

Conversation

@gboddaer

@gboddaer gboddaer commented Jun 21, 2026

Copy link
Copy Markdown

Summary

Enable Qwen3-Coder-Next DFlash on Vulkan by fixing the runtime path that caused 0% draft acceptance when the GPU cross ring is unavailable.

Key changes:

  • Add dflash_target_kv_available to the drafter context parameters.
  • Set that flag from DFlash setup based on whether the GPU cross ring exists.
  • Gate full-attention DFlash KV-cache reads on that flag so Vulkan CPU-hidden capture falls back to fresh K/V projection from target_hidden instead of reading empty/stale target KV.
  • Add Qwen3Next pre-conv QKV tape aliases so DFlash conv-state replay advances r_l correctly.
  • Keep flat/batched DFlash drafting on the full block_size query block while consuming only n_draft + 1 output rows.
  • Add focused plumbing tests and a concise rationale document.

Verification

Ran in isolated worktree based on latest origin/main:

CPU CMake configure: rc=0
CPU test-dflash-plumbing target build: rc=0
build_cpu/bin/test-dflash-plumbing: rc=0
CPU full build: rc=0

Vulkan configure was re-tested after installing glslc locally on Debian 11:

/home/gbo/.local/bin/glslc --version: shaderc v2023.2
cmake -B build_vulkan -DGGML_VULKAN=ON ...: rc=0
Found Vulkan: /usr/lib/x86_64-linux-gnu/libvulkan.so (found version "1.2.162") found components: glslc

Vulkan compilation then failed later because the Debian 11 Vulkan headers are too old for current ggml-vulkan.cpp symbols, for example:

vk::PhysicalDeviceMaintenance4Properties is not a member of vk
vk::DriverId::eMesaTurnip is not a member of vk::DriverId

So the original missing-glslc configure blocker is resolved on this host; full Vulkan compilation requires newer Vulkan headers/SDK than Debian 11's vulkan.hpp 1.2.162.

Notes

This PR intentionally excludes drafter training scripts and broad investigation artifacts. It includes one canonical rationale doc:

  • docs/enable-dflash-qwen3-coder-next-on-vulkan.md

gboddaer and others added 2 commits June 21, 2026 12:53
Verified full Vulkan build on native AMD hardware:
- AMD Ryzen AI MAX+ 395 / Radeon 8060S (RADV GFX1151, Mesa 25.0.7)
- GCC 14.2.0, CMake 3.31.6, Vulkan 1.4.309, glslc
- GGML_VULKAN=ON, GGML_NATIVE=ON, Release mode
- 100% build success, all binaries produced, working tree clean
@gboddaer gboddaer marked this pull request as draft June 21, 2026 13:09
gboddaer and others added 2 commits June 21, 2026 15:41
)

Full Vulkan build verified on a second host after supplying glslc and
Vulkan-Headers 1.4.309 locally (Debian 11 ships neither):

- Debian 11 (bullseye), Linux 5.10.0-43-amd64
- AMD Ryzen 9 5950X (32 threads), 62 GiB RAM
- NVIDIA GeForce RTX 3090 (host)
- GCC 10.2.1, CMake 3.31.11, Ninja 1.10.1
- glslc: shaderc v2023.2 (built from source -> ~/.local/bin)
- Vulkan headers 1.4.309 (KhronosGroup Vulkan-Headers -> ~/.local)
- GGML_VULKAN=ON, GGML_NATIVE=ON, Release, -j32
- Full build rc=0, all binaries produced, test-dflash-plumbing rc=0
- 0 errors; only benign -Wdouble-promotion/-Wmissing-field-initializers warnings

No PR source changes required; host-side additions only.
…- tensor alias, cross-ring infrastructure

- Change llama_dflash_rollback/tape_replay return types from void to int,
  returning positions needing re-decode when tape replay is unavailable
- Add dflash_reduced_verify_broken flag to disable reduced verify on
  backends lacking TOPK support (Vulkan), falling back to full logits
- Add b- beta tensor alias for Qwen3Next tape recording
- Add linear_attn_qkv_mixed- tape alias for Qwen3-Coder-Next conv replay
- Add GGML_DFLASH_FORCE_REDECODE env var for testing re-decode paths
- Add Vulkan get_proc_address resolver for DFlash cross-ring functions
- Add Vulkan dflash_is_cuda_compatible_tensor support
- Include placeholder ggml-vulkan-cross-ring.cpp (not used at runtime)
- Add docs/vulkan-cross-ring-plan.md with Approach A/C documentation

Fixes crash on Vulkan: GGML_ASSERT(logits != nullptr) in get_logits_ith
when reduced verify failed due to missing TOPK support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant