dflash: enable Qwen3-Coder-Next on Vulkan by gboddaer · Pull Request #79 · Anbeeld/beellama.cpp

gboddaer · 2026-06-21T10:54:13Z

Summary

Enable Qwen3-Coder-Next DFlash on Vulkan by fixing the runtime path that caused 0% draft acceptance when the GPU cross ring is unavailable.

Key changes:

Add dflash_target_kv_available to the drafter context parameters.
Set that flag from DFlash setup based on whether the GPU cross ring exists.
Gate full-attention DFlash KV-cache reads on that flag so Vulkan CPU-hidden capture falls back to fresh K/V projection from target_hidden instead of reading empty/stale target KV.
Add Qwen3Next pre-conv QKV tape aliases so DFlash conv-state replay advances r_l correctly.
Keep flat/batched DFlash drafting on the full block_size query block while consuming only n_draft + 1 output rows.
Add focused plumbing tests and a concise rationale document.

Verification

Ran in isolated worktree based on latest origin/main:

CPU CMake configure: rc=0
CPU test-dflash-plumbing target build: rc=0
build_cpu/bin/test-dflash-plumbing: rc=0
CPU full build: rc=0

Vulkan configure was re-tested after installing glslc locally on Debian 11:

/home/gbo/.local/bin/glslc --version: shaderc v2023.2
cmake -B build_vulkan -DGGML_VULKAN=ON ...: rc=0
Found Vulkan: /usr/lib/x86_64-linux-gnu/libvulkan.so (found version "1.2.162") found components: glslc

Vulkan compilation then failed later because the Debian 11 Vulkan headers are too old for current ggml-vulkan.cpp symbols, for example:

vk::PhysicalDeviceMaintenance4Properties is not a member of vk
vk::DriverId::eMesaTurnip is not a member of vk::DriverId

So the original missing-glslc configure blocker is resolved on this host; full Vulkan compilation requires newer Vulkan headers/SDK than Debian 11's vulkan.hpp 1.2.162.

Notes

This PR intentionally excludes drafter training scripts and broad investigation artifacts. It includes one canonical rationale doc:

docs/enable-dflash-qwen3-coder-next-on-vulkan.md

Verified full Vulkan build on native AMD hardware: - AMD Ryzen AI MAX+ 395 / Radeon 8060S (RADV GFX1151, Mesa 25.0.7) - GCC 14.2.0, CMake 3.31.6, Vulkan 1.4.309, glslc - GGML_VULKAN=ON, GGML_NATIVE=ON, Release mode - 100% build success, all binaries produced, working tree clean

) Full Vulkan build verified on a second host after supplying glslc and Vulkan-Headers 1.4.309 locally (Debian 11 ships neither): - Debian 11 (bullseye), Linux 5.10.0-43-amd64 - AMD Ryzen 9 5950X (32 threads), 62 GiB RAM - NVIDIA GeForce RTX 3090 (host) - GCC 10.2.1, CMake 3.31.11, Ninja 1.10.1 - glslc: shaderc v2023.2 (built from source -> ~/.local/bin) - Vulkan headers 1.4.309 (KhronosGroup Vulkan-Headers -> ~/.local) - GGML_VULKAN=ON, GGML_NATIVE=ON, Release, -j32 - Full build rc=0, all binaries produced, test-dflash-plumbing rc=0 - 0 errors; only benign -Wdouble-promotion/-Wmissing-field-initializers warnings No PR source changes required; host-side additions only.

…- tensor alias, cross-ring infrastructure - Change llama_dflash_rollback/tape_replay return types from void to int, returning positions needing re-decode when tape replay is unavailable - Add dflash_reduced_verify_broken flag to disable reduced verify on backends lacking TOPK support (Vulkan), falling back to full logits - Add b- beta tensor alias for Qwen3Next tape recording - Add linear_attn_qkv_mixed- tape alias for Qwen3-Coder-Next conv replay - Add GGML_DFLASH_FORCE_REDECODE env var for testing re-decode paths - Add Vulkan get_proc_address resolver for DFlash cross-ring functions - Add Vulkan dflash_is_cuda_compatible_tensor support - Include placeholder ggml-vulkan-cross-ring.cpp (not used at runtime) - Add docs/vulkan-cross-ring-plan.md with Approach A/C documentation Fixes crash on Vulkan: GGML_ASSERT(logits != nullptr) in get_logits_ith when reduced verify failed due to missing TOPK support.

gboddaer and others added 2 commits June 21, 2026 12:53

dflash: enable Qwen3-Coder-Next on Vulkan

b788b4a

gboddaer marked this pull request as draft June 21, 2026 13:09

gboddaer and others added 2 commits June 21, 2026 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dflash: enable Qwen3-Coder-Next on Vulkan#79

dflash: enable Qwen3-Coder-Next on Vulkan#79
gboddaer wants to merge 4 commits into
Anbeeld:mainfrom
gboddaer:enable-dflash-qwen3-coder-next-on-vulkan

gboddaer commented Jun 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gboddaer commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gboddaer commented Jun 21, 2026 •

edited

Loading