DFlash on RTX 3090: 207 tok/s Qwen3.5-27B with Speculative Decoding

Run Qwen3.5-27B at 3.43x autoregressive speed on a single RTX 3090. Lucebox's DFlash port brings block-diffusion speculative decoding to GGUF — build, download weights, and start generating in under 20 minutes.

Latest Articles

Hermes Agent: Installation Deep Dive and Optimization

Hermes Agent: Installation Deep Dive and Optimization

· 12 min read

A practical walkthrough of installing Hermes Agent by Nous Research — covering the installer script internals, PyTorch CPU optimization, Bun runtime compatibility, RL training vs. built-in learning, and setting up CLI skills for Tavily, Context7, and Beads.

Hermes Agent: Self-Improving Autonomous AI Agent

Hermes Agent: Self-Improving Autonomous AI Agent

· 9 min read

An open-source autonomous agent with a built-in learning loop that creates skills from experience, improves them during use, and remembers across sessions. Unlike typical chatbots or coding copilots, Hermes runs on your server, integrates with messaging platforms, and gets smarter the longer you use it.