Running Qwen3-Next-80B-A3B on Limited VRAM with Selective MoE Offloading

Run the 80B MoE Qwen3-Next locally using llama.cpp with selective FFN layer offloading to CPU. Unsloth UD-Q4_K_XL quantization + regex-based -ot flag lets you maximize GPU usage while keeping MoE expert layers in system RAM.

Latest Articles

Hermes Agent: Installation Deep Dive and Optimization

Hermes Agent: Installation Deep Dive and Optimization

· 12 min read

A practical walkthrough of installing Hermes Agent by Nous Research — covering the installer script internals, PyTorch CPU optimization, Bun runtime compatibility, RL training vs. built-in learning, and setting up CLI skills for Tavily, Context7, and Beads.

Hermes Agent: Self-Improving Autonomous AI Agent

Hermes Agent: Self-Improving Autonomous AI Agent

· 9 min read

An open-source autonomous agent with a built-in learning loop that creates skills from experience, improves them during use, and remembers across sessions. Unlike typical chatbots or coding copilots, Hermes runs on your server, integrates with messaging platforms, and gets smarter the longer you use it.