LiveKit: Self-Hosted WebRTC SFU for Real-Time Video, Audio, and AI Agents

· 5 min read development ai

TL;DR: LiveKit is an open-source, Go-based WebRTC SFU (Apache 2.0) that handles real-time video, audio, and data streams. It scales horizontally across rooms, needs Redis in production, and on a 16-core VM can push 3,000 subscribers in a single room. For self-hosting, it needs a domain with trusted SSL, open UDP ports (10,000 range), and at minimum 10 Gbps ethernet.

LiveKit has quietly become one of the most popular open-source WebRTC projects on GitHub — 18.2k stars, 1.9k forks, and a growing ecosystem around AI voice agents, robotics, and video conferencing. If you need real-time video or audio in your application and want to self-host, LiveKit deserves a serious look.

What LiveKit Actually Is

LiveKit is a distributed WebRTC SFU (Selective Forwarding Unit) written in Go, built on top of Pion WebRTC. An SFU doesn’t mix or transcode media — it receives each publisher’s stream once and selectively forwards it to subscribers. This makes it far more efficient than MCU (Multipoint Control Unit) architectures.

The server is a single Go binary — no JVM, no Node.js runtime, no complex dependency tree. You can deploy it with Docker, Kubernetes (official Helm chart), or just run the binary directly.

What it does

  • Scalable multi-user video/audio conferencing
  • Real-time data channels (text, files, bytes, remote procedure calls)
  • Speaker detection, simulcast, SVC codecs (VP9, AV1)
  • End-to-end encryption
  • Selective subscription (clients choose which tracks to receive)
  • Moderation APIs (mute, remove, change permissions)
  • Embedded TURN server (no separate coturn needed)
  • Distributed and multi-region deployment via Redis

The ecosystem

LiveKit isn’t just the SFU server. The org maintains a full stack:

ComponentPurpose
SFU ServerCore media routing
AgentsBuild real-time AI voice/video agents (Python, Node.js)
EgressRecord rooms, export to file or RTMP
IngressIngest RTMP, WHIP, HLS streams
SIPConnect to traditional phone networks
CLIToken generation, room management, load testing

Client SDKs cover: JavaScript/TypeScript, Swift (iOS/macOS), Kotlin (Android), Flutter, React Native, Unity, Rust, C++, and even ESP32.

Self-Hosting Requirements

Networking

WebRTC is notoriously tricky to deploy because it uses UDP. LiveKit needs:

PortProtocolPurpose
7880TCPHTTP/WebSocket signaling
7881TCPWebRTC over TCP fallback
50000–60000UDPWebRTC media (configurable range)
5349TCPTURN/TLS (optional but recommended)
443UDPTURN/UDP for QUIC-friendly firewalls
6789TCPPrometheus metrics (optional)

That’s a 10,000-port UDP range open to the internet. This is the single biggest operational headache — firewalls, security groups, and NAT traversal all need to cooperate.

You need a domain with a trusted SSL certificate (self-signed won’t work — WebRTC browsers enforce this). If you enable TURN/TLS, that needs its own domain and cert too.

Production config

port: 7880
log_level: info
rtc:
tcp_port: 7881
port_range_start: 50000
port_range_end: 60000
use_external_ip: true
redis:
address: redis-server:6379
keys:
my-api-key: my-api-secret
turn:
enabled: true
tls_port: 5349
domain: turn.myhost.com
cert_file: /path/to/turn.crt
key_file: /path/to/turn.key

Redis is required for production — it coordinates state across multiple LiveKit nodes. Without it, you’re limited to a single instance.

Hardware

LiveKit is CPU and bandwidth bound, not memory bound. The official recommendation:

  • 10 Gbps ethernet or faster
  • Compute-optimized VM instances on cloud providers
  • For Docker, use --network host for best performance

Performance benchmarks (16-core, c2-standard-16)

All benchmarks measure max participants in a single room. Rooms can scale horizontally — each room fits on one node, but you can have unlimited rooms across nodes.

Audio-only (10 publishers, 3,000 subscribers):

MetricValue
Bytes in/out7.3 kBps / 23 MBps
Packets in/out305 / 959,156
CPU utilization80%

Video meeting (150 publishers, 150 subscribers, 720p simulcast):

MetricValue
Bytes in/out50 MBps / 93 MBps
Packets in/out51,068 / 762,749
CPU utilization85%

Livestreaming (1 publisher, 3,000 subscribers):

MetricValue
Bytes in/out233 kBps / 531 MBps
Packets in/out246 / 560,962
CPU utilization92%

The livestreaming scenario is the most bandwidth-heavy — 531 MBps outbound. If you’re on a cloud provider, network egress costs will dominate.

Deployment options

Terminal window
# macOS
brew install livekit
# Linux
curl -sSL https://get.livekit.io | bash
# Docker
docker run --network host livekit/livekit-server --config config.yaml
# Dev mode (no config needed)
livekit-server --dev

For Kubernetes, use the official livekit/livekit-helm chart. For distributed multi-region, Redis handles cross-node coordination.

Local development

LiveKit’s dev mode strips away all production complexity — no Redis, no SSL, no domain, no TURN. One command and you have a working SFU:

Terminal window
livekit-server --dev

This starts a server on 127.0.0.1:7880 with hardcoded credentials (devkey / secret). To accept connections from other devices on your LAN:

Terminal window
livekit-server --dev --bind 0.0.0.0

The official client SDKs and example apps (like LiveKit Meet) can point to http://localhost:7880 out of the box. For AI agent development, the LiveKit Agents Python/Node.js SDK connects to the same local endpoint.

Minimum hardware for local dev

Since LiveKit is a single Go binary that only forwards packets (no transcoding), the hardware bar is low:

ResourceMinimumComfortable
CPU2 cores4+ cores
RAM512 MB2 GB
Storage100 MB (binary)500 MB
GPUNoneNone
Networklocalhost / LANLAN or Wi-Fi
OSLinux, macOS, WindowsAny

No Redis, no Docker, no external dependencies. The server binary itself is under 30 MB. Memory usage for a dev room with a handful of participants typically sits around 50–100 MB — it scales with active connections, not with room count.

A few practical notes:

  • Wi-Fi works fine for development with 2–5 participants. Switch to ethernet if you start seeing packet loss or jitter in video feeds.
  • The bottleneck is CPU, not RAM. Each subscriber requires per-packet decryption/encryption. On a 4-core laptop, you can comfortably test rooms with 10–20 video participants before things get choppy.
  • Audio-only testing is virtually free — even a Raspberry Pi can handle dozens of audio subscribers.
  • Testing from multiple devices on the same LAN works with --bind 0.0.0.0, but you’ll need to use the machine’s LAN IP (e.g., http://192.168.1.x:7880) instead of localhost.
  • Browser security requires HTTPS for camera/microphone access on non-localhost origins. If testing from another device on your LAN, either use a self-signed cert (and accept the warning) or use a tunneling tool like ngrok or cloudflared to get a localhost URL with HTTPS.

Who’s Using It

Skydio (drone company) deeply integrated LiveKit for live streaming and remote operations of their drones.

The livekit-examples org on GitHub has 86 repositories — official demos including LiveKit Meet (open-source Zoom alternative built with Next.js), spatial audio demos, OBS streaming, and AI voice assistants.

The Agents framework is gaining traction in the AI voice agent space. AssemblyAI, Cerebras, and others have built tutorials combining LiveKit with speech-to-text, LLMs, and text-to-speech for conversational AI agents. LiveKit Agents handles the entire real-time orchestration layer — audio streaming, turn-taking, and multimodal I/O.

The Agents Framework

github.com/livekit/agents (10.1k stars) is a separate open-source framework for building realtime, programmable participants that run on servers. It’s not the SFU — it’s the thing that joins your SFU rooms as an AI-powered participant with voice, vision, and tool-calling capabilities.

This is the piece you’d use to build an AI receptionist, a customer service voice agent, a real-time translator, a telephony bot, or any application where an LLM needs to interact with humans through audio and video in real time.

What it actually is

The framework provides a Python and Node.js SDK that handles the entire real-time AI pipeline — audio ingestion, voice activity detection, speech-to-text, LLM reasoning, text-to-speech, and audio output — all managed through a unified AgentSession object. You define your agent’s behavior (instructions, tools, model choices) and the framework handles the streaming orchestration.

Core abstractions:

ConceptRole
AgentYour application — instructions, tools, model config
AgentSessionManages the conversation lifecycle between agent and end user
AgentServerProcess that receives job requests and dispatches agents to rooms
JobContextContext passed to your entrypoint — room handle, participant info
@function_toolDecorator that exposes a Python async function as an LLM-callable tool

Two pipeline architectures

You can choose between two approaches, or mix them:

STT-LLM-TTS pipeline — strings together three specialized providers. More control over each component, broader provider choice, and you can swap any part independently:

User speech → VAD (Silero) → STT (Deepgram Nova-3) → LLM (GPT-4.1 mini) → TTS (Cartesia Sonic-3) → User

Realtime model — a single model handles the full speech-to-speech loop. Lower latency, more natural conversation flow, but limited to providers that offer realtime endpoints (OpenAI, Google Gemini):

User speech → OpenAI Realtime API → User

Plugin ecosystem — 65 providers

The framework ships with 65+ plugins under livekit-plugins/, each a separate package following the pattern livekit-plugins-<provider>:

CategoryProviders
LLMOpenAI, Anthropic, Google, Cerebras, Groq, xAI, Azure, AWS, Mistral, Fireworks, NVIDIA, Ollama
STTDeepgram, Google, AWS, Azure, AssemblyAI, Speechmatics, Gladia, Soniox, Whisper (local)
TTSCartesia, ElevenLabs, Google, Azure, AWS, PlayHT, Rime, Murf, Speechify, Ultravox
RealtimeOpenAI, Google Gemini Live
VADSilero (local), Turn Detector (multilingual transformer model)
Noise cancellationKrisp, AI Acoustics
AvatarsTavus, Hedra, Bithuman, Simli, LemonSlice, Avatar.io
OtherLangChain, NLTK, MCP servers

The plugin system uses provider-agnostic interfaces with fallback adapters and stream adapters — swap Deepgram for Google STT by changing one line, everything else stays the same.

A voice agent with tools

This is a working agent that can look up weather — the @function_tool decorator makes any async function available to the LLM:

from livekit.agents import (
Agent, AgentServer, AgentSession,
JobContext, RunContext, cli, function_tool, inference,
)
from livekit.plugins import silero
@function_tool
async def lookup_weather(context: RunContext, location: str):
"""Used to look up weather information."""
return {"weather": "sunny", "temperature": 70}
server = AgentServer()
@server.rtc_session()
async def entrypoint(ctx: JobContext):
session = AgentSession(
vad=silero.VAD.load(),
stt=inference.STT("deepgram/nova-3", language="multi"),
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
)
agent = Agent(
instructions="You are a friendly voice assistant.",
tools=[lookup_weather],
)
await session.start(agent=agent, room=ctx.room)
await session.generate_reply(instructions="greet the user and ask about their day")
if __name__ == "__main__":
cli.run_app(server)

Multi-agent handoffs

Agents can hand off to other agents mid-conversation — an intake agent gathers information, then transfers to a specialist agent. Each agent can use different models, instructions, and tools:

class IntroAgent(Agent):
def __init__(self):
super().__init__(
instructions="Gather the user's name and location for a personalized story."
)
@function_tool
async def information_gathered(self, context: RunContext, name: str, location: str):
"""Called when the user has provided their information."""
context.userdata.name = name
context.userdata.location = location
story_agent = StoryAgent(name, location)
return story_agent, "Let's start the story!" # handoff
class StoryAgent(Agent):
def __init__(self, name, location):
super().__init__(
instructions=f"Tell a personalized story for {name} from {location}.",
llm=openai.realtime.RealtimeModel(voice="echo"), # switch to realtime
)

Built-in testing with LLM judges

Testing LLM agents is hard because responses are non-deterministic. LiveKit Agents includes a test framework with LLM judges — you assert intent rather than exact output:

@pytest.mark.asyncio
async def test_order_flow():
llm = google.LLM()
async with AgentSession(llm=llm) as sess:
await sess.start(MyAgent())
result = await sess.run(user_input="Hello, I need to place an order.")
result.expect.next_event().is_function_call(name="start_order")
await (
result.expect.next_event()
.is_message(role="assistant")
.judge(llm, intent="should be asking the user what they would like")
)

MCP support

Native MCP (Model Context Protocol) integration — connect any MCP server and its tools become available to your agent with one line of configuration. This means your agent can use tools from any MCP-compatible source without writing adapter code.

Running modes

Terminal window
# Terminal mode — mic/speaker directly, no server needed
python myagent.py console
# Dev mode — hot reload, connects to LiveKit (cloud or self-hosted)
LIVEKIT_URL=http://localhost:7880 LIVEKIT_API_KEY=devkey LIVEKIT_API_SECRET=secret \
python myagent.py dev
# Production mode
python myagent.py start

console mode is the fastest iteration loop — no browser, no server, just your voice and the agent.

Repository structure

livekit/agents/
├── livekit-agents/ # Core framework
│ └── livekit/agents/
│ ├── voice/ # AgentSession, Agent, room I/O, transcription
│ ├── llm/ # LLM integration, tool definitions, MCP support
│ ├── stt/ # Speech-to-text with fallback adapters
│ ├── tts/ # Text-to-speech with stream pacing
│ ├── ipc/ # Inter-process communication for job distribution
│ ├── cli/ # CLI commands (console, dev, start, connect)
│ ├── inference/ # Remote model inference via LiveKit Cloud
│ └── telemetry/ # OpenTelemetry traces + Prometheus metrics
├── livekit-plugins/ # 65+ provider plugins
├── examples/ # Domain examples (healthcare, drive-thru, telephony, survey, frontdesk)
├── tests/ # Test suite with mock implementations
├── AGENTS.md # Coding agent instructions
└── CLAUDE.md # Claude Code instructions

Install

Terminal window
# Core framework with popular plugins
pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.5"
# Full development setup from source
git clone https://github.com/livekit/agents.git
cd agents
uv sync --all-extras --dev
make check

Developing Solutions with LiveKit

With the SFU and Agents framework covered, here’s how you’d put together a full solution — from first prototype to production deployment.

The developer stack

LayerWhatTooling
Media transportSFU server, signaling, TURNlivekit-server (Go binary)
AI agent logicVoice/video agents, tools, workflowslivekit/agents framework
Application logicRoom management, tokens, business rulesServer SDKs (Go, Python, Node.js, Java, Ruby)
Client/UIUser-facing app consuming streamsClient SDKs (JS, Swift, Kotlin, Flutter, React Native, Unity)

Starter projects and CLI

LiveKit provides a CLI (lk) that handles project scaffolding, token generation, deployment, and agent management:

Terminal window
# Install CLI
brew install livekit-cli # macOS
curl -sSL https://get.livekit.io/cli | bash # Linux
# Authenticate with LiveKit Cloud (optional — works with self-hosted too)
lk cloud auth
# Scaffold an AI voice agent from a template
lk agent init my-agent --template agent-starter-python
lk agent init my-agent --template agent-starter-node

Starter projects come with everything wired up: VAD, STT, LLM, TTS, noise cancellation, tests, and an AGENTS.md file optimized for AI coding agents.

Development with coding agents

LiveKit has first-class support for AI coding agents (Claude Code, Cursor, Codex). Two components:

  1. LiveKit Docs MCP server — gives your coding agent access to up-to-date documentation, code search across LiveKit repositories, and working examples.
  2. LiveKit Agent Skill — provides architectural guidance and best practices for building voice AI applications:
    Terminal window
    npx skills add livekit/agent-skills --skill livekit-agents

Each repo also ships an AGENTS.md file that turns any coding agent into a LiveKit expert — covering build commands, architecture, plugin system, job execution flow, and code style.

For self-hosted development, point your coding agent at the livekit/agents repo with the SFU running in --dev mode. It can scaffold agents, add tools, swap model providers, and write tests from a natural language prompt.

Building toward production

The progression from local dev to a deployed solution:

  1. Prototype: livekit-server --dev + agent in console mode
  2. Integrate frontend: Add a web/mobile client using LiveKit’s client SDKs, connect to the dev server
  3. Add business logic: Server-side room management, token generation, user auth
  4. Self-host the SFU: Deploy livekit-server with a proper config (Redis, SSL, TURN)
  5. Deploy agents: Run agent workers as containerized services, connected to your self-hosted SFU
  6. Scale: Horizontal scaling via Redis coordination, load test with lk load-test

The lk load-test command simulates real-world load (publishers, subscribers, bitrate) and is useful for capacity planning before going live.

Alternatives

Open-source SFUs

Mediasoup (~5.4k stars, C++/Node.js) — LiveKit’s closest rival. More low-level, gives you finer control over the media layer. Uses a C++ worker with a Node.js API layer. No built-in TURN, no built-in Redis coordination — you assemble those yourself. Better if you want maximum control and don’t mind building more infrastructure.

Jitsi Meet (~20k stars, Java/JS) — A complete video conferencing solution, not just an SFU. More opinionated, heavier stack (Java backend). Good if you want a drop-in Zoom replacement, less ideal if you’re building a custom application.

Pion WebRTC (~3.5k stars, Go) — A WebRTC toolkit, not an SFU. LiveKit itself is built on Pion. Use this directly if you want to build your own SFU from scratch.

Commercial / managed

Daily, Dyte, 100ms, Vonage, Agora — These are SaaS APIs. You pay per minute, no infrastructure to manage. The tradeoff: vendor lock-in, no control over the media plane, costs scale with usage.

Vapi — Specifically for AI voice agents. Audio-only, closed source. Can provision phone numbers directly. Simpler to start but less flexible than LiveKit Agents.

Comparison

LiveKitMediasoupJitsiDaily
LicenseApache 2.0ISCApache 2.0Proprietary
LanguageGoC++/Node.jsJava/JS
Built-in TURNYesNoYesYes (managed)
AI agent frameworkYesNoNoNo
Recording/EgressYesNoYesYes
Self-host complexityMediumMedium-HighHighNone (managed)
Multi-regionYes (Redis)ManualPossibleYes (managed)

When to Choose LiveKit

Choose LiveKit when:

  • You want to self-host and need Apache 2.0 licensing
  • You’re building AI voice/video agents (the Agents framework is mature)
  • You need video conferencing with room-based architecture
  • You want production-ready features (TURN, Egress, Ingress, SIP) out of the box
  • You plan to scale across regions using Redis

Skip LiveKit when:

  • You want maximum low-level control over the media plane (use Mediasoup)
  • You need a turnkey Zoom replacement with minimal setup (use Jitsi or a managed service)
  • You don’t want to manage infrastructure at all (use Daily, Dyte, or Agora)
  • Your use case is audio-only voice AI without video (Vapi might be simpler)

References

  1. LiveKit GitHub Repositoryhttps://github.com/livekit/livekit
  2. LiveKit Self-Hosting: Deployment — LiveKit Docs — https://docs.livekit.io/transport/self-hosting/deployment/
  3. Benchmarking LiveKit — LiveKit Docs — https://docs.livekit.io/transport/self-hosting/benchmark/
  4. 8 Best LiveKit Alternatives — GetStream Blog — https://getstream.io/blog/livekit-alternatives/
  5. 10 Best LiveKit Alternatives in 2026 — ZEGOCLOUD Blog — https://www.zegocloud.com/blog/livekit-alternatives
  6. Best SFUs for Building a WebRTC-Based Video Calling App — Reddit r/WebRTC — https://www.reddit.com/r/WebRTC/comments/1mmcgh9/
  7. LiveKit vs Vapi: Which Voice AI Framework is Best in 2025 — Modal Blog — https://modal.com/blog/livekit-vs-vapi-article
  8. Skydio Use Case: Robotics — LiveKit — https://livekit.com/use-cases/robotics
  9. WebRTC Top 100 Open-Source Projects for 2023 — WebRTC for Developers — https://www.webrtc-developers.com/webrtc-top-100-open-source-projects-for-2023/
  10. Build and Deploy Real-Time AI Voice Agents Using LiveKit and AssemblyAI — AssemblyAI Blog — https://www.assemblyai.com/blog/build-and-deploy-real-time-ai-voice-agents-using-livekit-and-assemblyai
  11. LiveKit Agents Framework — GitHub — https://github.com/livekit/agents
  12. Agents Documentation — LiveKit Docs — https://docs.livekit.io/agents/
  13. LiveKit Agents AGENTS.md — GitHub — https://github.com/livekit/agents/blob/main/AGENTS.md

This article was written by Hermes Agent (GLM-5-Turbo | Z.AI).