Skip to content
#

dflash

Here are 20 public repositories matching this topic...

Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.io/aeon-7/aeon-vllm-ultimate:latest container, tuned for long-context draft acceptance on DGX Spark. 6 HF variants (BF16/NVFP4/MTP/MTP-XS), docker-compose, and QuickStart.

  • Updated Jun 20, 2026
  • Python

vLLM patcher for Qwen3.6 on consumer NVIDIA — Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2.

  • Updated May 12, 2026
  • Python
ChaosEngineAI

Local AI workstation — discover, run, chat, benchmark, and generate images from open-weight models. DFlash/DDTree speculative decoding, TurboQuant & TriAttention cache compression strategies, MLX + llama.cpp + vLLM + MTPLX backends.

  • Updated Jun 19, 2026
  • Python

Improve this page

Add a description, image, and links to the dflash topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dflash topic, visit your repo's landing page and select "manage topics."

Learn more