DFlash on RTX 3090: 207 tok/s Qwen3.5-27B with Speculative Decoding
· 7 min read
Run Qwen3.5-27B at 3.43x autoregressive speed on a single RTX 3090. Lucebox's DFlash port brings block-diffusion speculative decoding to GGUF — build, download weights, and start generating in under 20 minutes.