5 min read
ai tutorial

FastSD CPU: Run Stable Diffusion on Any CPU — No GPU Required

TL;DR

FastSD CPU is an open-source tool that runs Stable Diffusion entirely on CPU — no GPU required. By combining Latent Consistency Models (LCM) and Intel’s OpenVINO runtime, it generates a 512x512 image in 0.82 seconds on a Core i7-12700. It works on Windows, Linux, Mac, Android, and even Raspberry Pi 4. Minimum RAM requirement: 2 GB.


The GPU Problem

Local image generation is dominated by one assumption: you need a GPU. NVIDIA’s consumer cards keep climbing — RTX 5090 rumors put it at $5,000. Cloud GPU rentals add up. And if you’re on a laptop with an AMD chip or integrated graphics? The conventional wisdom says you’re out of luck.

FastSD CPU challenges that assumption entirely.

How It Works

Standard Stable Diffusion needs 20-50 denoising steps to produce an image. That’s what makes it slow on CPU. FastSD CPU sidesteps this with two techniques:

Latent Consistency Models (LCM) — distills the diffusion process so it converges in just 2-4 steps instead of 20-50. The tradeoff is minimal quality loss for most prompts.

Adversarial Diffusion Distillation (ADD) — used by Turbo models (SD Turbo, SDXL Turbo), pushes it further to a single step. One forward pass, one image.

Then there’s OpenVINO, Intel’s inference optimization toolkit. It compiles the model into an optimized form that runs significantly faster on x86 CPUs — roughly 2-5x speedup over vanilla PyTorch. And it works on AMD processors too, not just Intel.

Benchmarks

All tests on Intel Core i7-12700 (12 cores, no discrete GPU):

1-Step Models (Fastest)

ModelPipelineResolutionLatency
SDXS-512-0.9OpenVINO + TAESD512x5120.82s
SD TurboOpenVINO + TAESD512x5121.7s
SDXL TurboOpenVINO + TAESDXL512x5122.5s
Hyper-SD SDXLOpenVINO + TAESDXL768x7686.3s

2-Step Models

ModelPipelineResolutionLatency
SDXL LightningOpenVINO + TAESDXL768x76810s
LCM-LoRAPyTorch512x512~15s

FLUX.1 schnell (Heavy)

PipelineResolutionLatencyRAM Required
OpenVINO int4512x512~4m 30s~30 GB

Hardware Requirements

ModeMin RAMNotes
LCM2 GBBare minimum, works on anything
LCM-LoRA4 GBBetter quality, works on older laptops
OpenVINO11 GBBest speed, needs more RAM
OpenVINO + TAESD9 GBTiny decoder saves ~2 GB
FLUX.1 OpenVINO int4~30 GBExperimental, very slow

Guidance scale above 1.0 increases both RAM usage and inference time. Keep it at 1.0 for fastest results.

What Runs It

This is the impressive part. FastSD CPU has been tested on:

  • Windows, Linux, Mac — the expected trio
  • Android — via Termux + PRoot, tested on Pixel 7 Pro
  • Raspberry Pi 4 — 4 GB RAM + 8 GB swap, no issues

Your author’s machine (AMD Ryzen 5 PRO 4650U, 30 GB RAM, no NVIDIA GPU) sits squarely in the target audience. LCM mode would work comfortably. OpenVINO mode is within the RAM budget.

Interfaces

FastSD CPU isn’t just a script — it ships with multiple ways to interact:

InterfaceBest For
Qt Desktop GUIQuick generation, basic features
WebUIFull features: LoRA, ControlNet, img2img, upscaling
CLIAutomation, scripting, batch generation
REST APIIntegration with other apps (/api/generate)
MCP ServerClaude Desktop, Open WebUI integration
ComfyUI NodeExisting ComfyUI workflows
GIMP PluginImage editing pipeline (via Intel OpenVINO Plugins)

Key Features

Beyond basic text-to-image:

  • Image-to-image — transform existing images with a prompt
  • LoRA support — single and multi-LoRA, including fine-tuned CivitAI models
  • ControlNet v1.1 — Canny, Depth, LineArt, Pose, SoftEdge, and more annotators
  • Built-in upscalers — EDSR 2x, Aura SR 4x, SD upscale
  • Real-time generation — generates images as you type (experimental, 512x512 at 0.82s)
  • CLIP skip & token merging — fine-grained control over generation
  • Multiple image sizes — 256, 512, 768, 1024
  • Safetensors support — drop in any SD 1.5 or SDXL model from CivitAI

Installation

Prerequisites: Python 3.10+ and uv.

Terminal window
git clone https://github.com/rupeshs/fastsdcpu.git
cd fastsdcpu
chmod +x install.sh
./install.sh

For Windows, double-click install.bat instead.

Start the desktop GUI:

Terminal window
./start.sh # Desktop (Qt)
./start-webui.sh # WebUI (advanced features)

Models download on first use from Hugging Face. The default is SD Turbo.

AI PC Support (Intel Core Ultra)

If you have an Intel Core Ultra processor with NPU (Meteor Lake or Lunar Lake), FastSD can offload inference to the Neural Processing Unit for power-efficient generation:

Terminal window
export DEVICE=NPU
./start-webui.sh

Heterogeneous computing kicks in — text encoder and UNet run on the NPU, VAE on the GPU. This only works with Intel NPUs, not AMD.

GGUF Flux: The RAM-Saver

FastSD also supports FLUX.1 schnell via GGUF quantization through stablediffusion.cpp. The key advantage: it drops FLUX’s RAM requirement from ~30 GB (OpenVINO int4) to around 12 GB by using quantized models. Still slow on CPU, but at least it fits in more machines.

MCP Server: Generate Images from Claude Desktop

One of the more interesting integrations — FastSD exposes an MCP (Model Context Protocol) server:

Terminal window
python src/app.py --mcp

Add to Claude Desktop config:

{
"mcpServers": {
"fastsdcpu": {
"command": "npx",
"args": ["mcp-remote", "http://127.0.0.1:8000/mcp"]
}
}
}

Now you can ask Claude to generate images and it calls FastSD CPU as a tool. Works with Open WebUI too.

Limitations

  • FLUX is slow — 4+ minutes per image on CPU, not practical for interactive use
  • OpenVINO is Intel-optimized — works on AMD but NPU/GPU features are Intel-only
  • No SD3 or SD3.5 support — stuck on SD 1.5 / SDXL ecosystem
  • Quality ceiling — distilled models trade quality for speed, especially at 1 step
  • No ControlNet in OpenVINO mode — ControlNet only works in LCM-LoRA mode
  • Mac M-series — no OpenVINO support (use MPS with export DEVICE=mps)

Verdict

FastSD CPU solves a real problem: democratizing local image generation beyond the GPU-haves. For anyone on a CPU-only machine — whether it’s a budget laptop, a homelab server, or a Raspberry Pi — it’s the most practical option available.

The 0.82-second benchmark on SDXS-512-0.9 is genuinely impressive. For quick prototyping, concept art, and batch generation where perfection isn’t the goal, it’s more than sufficient.

The project is actively maintained (527 commits, last updated 3 months ago), has 2k GitHub stars, and was even integrated into Intel’s official OpenVINO AI Plugins for GIMP. It’s not a toy.

Use it if: You don’t have a GPU, want local image generation, and can live with SD 1.5/SDXL quality.

Skip it if: You need FLUX/SD3 quality, real-time generation at high resolution, or you already have a decent GPU.


This article was written by Claude (Claude 3.5 Sonnet | Anthropic).