TL;DR: LiveKit is an open-source, Go-based WebRTC SFU (Apache 2.0) that handles real-time video, audio, and data streams. It scales horizontally across rooms, needs Redis in production, and on a 16-core VM can push 3,000 subscribers in a single room. For self-hosting, it needs a domain with trusted SSL, open UDP ports (10,000 range), and at minimum 10 Gbps ethernet.
LiveKit has quietly become one of the most popular open-source WebRTC projects on GitHub — 18.2k stars, 1.9k forks, and a growing ecosystem around AI voice agents, robotics, and video conferencing. If you need real-time video or audio in your application and want to self-host, LiveKit deserves a serious look.
What LiveKit Actually Is
LiveKit is a distributed WebRTC SFU (Selective Forwarding Unit) written in Go, built on top of Pion WebRTC. An SFU doesn’t mix or transcode media — it receives each publisher’s stream once and selectively forwards it to subscribers. This makes it far more efficient than MCU (Multipoint Control Unit) architectures.
The server is a single Go binary — no JVM, no Node.js runtime, no complex dependency tree. You can deploy it with Docker, Kubernetes (official Helm chart), or just run the binary directly.
What it does
- Scalable multi-user video/audio conferencing
- Real-time data channels (text, files, bytes, remote procedure calls)
- Speaker detection, simulcast, SVC codecs (VP9, AV1)
- End-to-end encryption
- Selective subscription (clients choose which tracks to receive)
- Moderation APIs (mute, remove, change permissions)
- Embedded TURN server (no separate coturn needed)
- Distributed and multi-region deployment via Redis
The ecosystem
LiveKit isn’t just the SFU server. The org maintains a full stack:
| Component | Purpose |
|---|---|
| SFU Server | Core media routing |
| Agents | Build real-time AI voice/video agents (Python, Node.js) |
| Egress | Record rooms, export to file or RTMP |
| Ingress | Ingest RTMP, WHIP, HLS streams |
| SIP | Connect to traditional phone networks |
| CLI | Token generation, room management, load testing |
Client SDKs cover: JavaScript/TypeScript, Swift (iOS/macOS), Kotlin (Android), Flutter, React Native, Unity, Rust, C++, and even ESP32.
Self-Hosting Requirements
Networking
WebRTC is notoriously tricky to deploy because it uses UDP. LiveKit needs:
| Port | Protocol | Purpose |
|---|---|---|
| 7880 | TCP | HTTP/WebSocket signaling |
| 7881 | TCP | WebRTC over TCP fallback |
| 50000–60000 | UDP | WebRTC media (configurable range) |
| 5349 | TCP | TURN/TLS (optional but recommended) |
| 443 | UDP | TURN/UDP for QUIC-friendly firewalls |
| 6789 | TCP | Prometheus metrics (optional) |
That’s a 10,000-port UDP range open to the internet. This is the single biggest operational headache — firewalls, security groups, and NAT traversal all need to cooperate.
You need a domain with a trusted SSL certificate (self-signed won’t work — WebRTC browsers enforce this). If you enable TURN/TLS, that needs its own domain and cert too.
Production config
port: 7880log_level: infortc: tcp_port: 7881 port_range_start: 50000 port_range_end: 60000 use_external_ip: trueredis: address: redis-server:6379keys: my-api-key: my-api-secretturn: enabled: true tls_port: 5349 domain: turn.myhost.com cert_file: /path/to/turn.crt key_file: /path/to/turn.keyRedis is required for production — it coordinates state across multiple LiveKit nodes. Without it, you’re limited to a single instance.
Hardware
LiveKit is CPU and bandwidth bound, not memory bound. The official recommendation:
- 10 Gbps ethernet or faster
- Compute-optimized VM instances on cloud providers
- For Docker, use
--network hostfor best performance
Performance benchmarks (16-core, c2-standard-16)
All benchmarks measure max participants in a single room. Rooms can scale horizontally — each room fits on one node, but you can have unlimited rooms across nodes.
Audio-only (10 publishers, 3,000 subscribers):
| Metric | Value |
|---|---|
| Bytes in/out | 7.3 kBps / 23 MBps |
| Packets in/out | 305 / 959,156 |
| CPU utilization | 80% |
Video meeting (150 publishers, 150 subscribers, 720p simulcast):
| Metric | Value |
|---|---|
| Bytes in/out | 50 MBps / 93 MBps |
| Packets in/out | 51,068 / 762,749 |
| CPU utilization | 85% |
Livestreaming (1 publisher, 3,000 subscribers):
| Metric | Value |
|---|---|
| Bytes in/out | 233 kBps / 531 MBps |
| Packets in/out | 246 / 560,962 |
| CPU utilization | 92% |
The livestreaming scenario is the most bandwidth-heavy — 531 MBps outbound. If you’re on a cloud provider, network egress costs will dominate.
Deployment options
# macOSbrew install livekit
# Linuxcurl -sSL https://get.livekit.io | bash
# Dockerdocker run --network host livekit/livekit-server --config config.yaml
# Dev mode (no config needed)livekit-server --devFor Kubernetes, use the official livekit/livekit-helm chart. For distributed multi-region, Redis handles cross-node coordination.
Local development
LiveKit’s dev mode strips away all production complexity — no Redis, no SSL, no domain, no TURN. One command and you have a working SFU:
livekit-server --devThis starts a server on 127.0.0.1:7880 with hardcoded credentials (devkey / secret). To accept connections from other devices on your LAN:
livekit-server --dev --bind 0.0.0.0The official client SDKs and example apps (like LiveKit Meet) can point to http://localhost:7880 out of the box. For AI agent development, the LiveKit Agents Python/Node.js SDK connects to the same local endpoint.
Minimum hardware for local dev
Since LiveKit is a single Go binary that only forwards packets (no transcoding), the hardware bar is low:
| Resource | Minimum | Comfortable |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 512 MB | 2 GB |
| Storage | 100 MB (binary) | 500 MB |
| GPU | None | None |
| Network | localhost / LAN | LAN or Wi-Fi |
| OS | Linux, macOS, Windows | Any |
No Redis, no Docker, no external dependencies. The server binary itself is under 30 MB. Memory usage for a dev room with a handful of participants typically sits around 50–100 MB — it scales with active connections, not with room count.
A few practical notes:
- Wi-Fi works fine for development with 2–5 participants. Switch to ethernet if you start seeing packet loss or jitter in video feeds.
- The bottleneck is CPU, not RAM. Each subscriber requires per-packet decryption/encryption. On a 4-core laptop, you can comfortably test rooms with 10–20 video participants before things get choppy.
- Audio-only testing is virtually free — even a Raspberry Pi can handle dozens of audio subscribers.
- Testing from multiple devices on the same LAN works with
--bind 0.0.0.0, but you’ll need to use the machine’s LAN IP (e.g.,http://192.168.1.x:7880) instead of localhost. - Browser security requires HTTPS for camera/microphone access on non-localhost origins. If testing from another device on your LAN, either use a self-signed cert (and accept the warning) or use a tunneling tool like
ngrokorcloudflaredto get a localhost URL with HTTPS.
Who’s Using It
Skydio (drone company) deeply integrated LiveKit for live streaming and remote operations of their drones.
The livekit-examples org on GitHub has 86 repositories — official demos including LiveKit Meet (open-source Zoom alternative built with Next.js), spatial audio demos, OBS streaming, and AI voice assistants.
The Agents framework is gaining traction in the AI voice agent space. AssemblyAI, Cerebras, and others have built tutorials combining LiveKit with speech-to-text, LLMs, and text-to-speech for conversational AI agents. LiveKit Agents handles the entire real-time orchestration layer — audio streaming, turn-taking, and multimodal I/O.
The Agents Framework
github.com/livekit/agents (10.1k stars) is a separate open-source framework for building realtime, programmable participants that run on servers. It’s not the SFU — it’s the thing that joins your SFU rooms as an AI-powered participant with voice, vision, and tool-calling capabilities.
This is the piece you’d use to build an AI receptionist, a customer service voice agent, a real-time translator, a telephony bot, or any application where an LLM needs to interact with humans through audio and video in real time.
What it actually is
The framework provides a Python and Node.js SDK that handles the entire real-time AI pipeline — audio ingestion, voice activity detection, speech-to-text, LLM reasoning, text-to-speech, and audio output — all managed through a unified AgentSession object. You define your agent’s behavior (instructions, tools, model choices) and the framework handles the streaming orchestration.
Core abstractions:
| Concept | Role |
|---|---|
| Agent | Your application — instructions, tools, model config |
| AgentSession | Manages the conversation lifecycle between agent and end user |
| AgentServer | Process that receives job requests and dispatches agents to rooms |
| JobContext | Context passed to your entrypoint — room handle, participant info |
| @function_tool | Decorator that exposes a Python async function as an LLM-callable tool |
Two pipeline architectures
You can choose between two approaches, or mix them:
STT-LLM-TTS pipeline — strings together three specialized providers. More control over each component, broader provider choice, and you can swap any part independently:
User speech → VAD (Silero) → STT (Deepgram Nova-3) → LLM (GPT-4.1 mini) → TTS (Cartesia Sonic-3) → UserRealtime model — a single model handles the full speech-to-speech loop. Lower latency, more natural conversation flow, but limited to providers that offer realtime endpoints (OpenAI, Google Gemini):
User speech → OpenAI Realtime API → UserPlugin ecosystem — 65 providers
The framework ships with 65+ plugins under livekit-plugins/, each a separate package following the pattern livekit-plugins-<provider>:
| Category | Providers |
|---|---|
| LLM | OpenAI, Anthropic, Google, Cerebras, Groq, xAI, Azure, AWS, Mistral, Fireworks, NVIDIA, Ollama |
| STT | Deepgram, Google, AWS, Azure, AssemblyAI, Speechmatics, Gladia, Soniox, Whisper (local) |
| TTS | Cartesia, ElevenLabs, Google, Azure, AWS, PlayHT, Rime, Murf, Speechify, Ultravox |
| Realtime | OpenAI, Google Gemini Live |
| VAD | Silero (local), Turn Detector (multilingual transformer model) |
| Noise cancellation | Krisp, AI Acoustics |
| Avatars | Tavus, Hedra, Bithuman, Simli, LemonSlice, Avatar.io |
| Other | LangChain, NLTK, MCP servers |
The plugin system uses provider-agnostic interfaces with fallback adapters and stream adapters — swap Deepgram for Google STT by changing one line, everything else stays the same.
A voice agent with tools
This is a working agent that can look up weather — the @function_tool decorator makes any async function available to the LLM:
from livekit.agents import ( Agent, AgentServer, AgentSession, JobContext, RunContext, cli, function_tool, inference,)from livekit.plugins import silero
@function_toolasync def lookup_weather(context: RunContext, location: str): """Used to look up weather information.""" return {"weather": "sunny", "temperature": 70}
server = AgentServer()
@server.rtc_session()async def entrypoint(ctx: JobContext): session = AgentSession( vad=silero.VAD.load(), stt=inference.STT("deepgram/nova-3", language="multi"), llm=inference.LLM("openai/gpt-4.1-mini"), tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"), ) agent = Agent( instructions="You are a friendly voice assistant.", tools=[lookup_weather], ) await session.start(agent=agent, room=ctx.room) await session.generate_reply(instructions="greet the user and ask about their day")
if __name__ == "__main__": cli.run_app(server)Multi-agent handoffs
Agents can hand off to other agents mid-conversation — an intake agent gathers information, then transfers to a specialist agent. Each agent can use different models, instructions, and tools:
class IntroAgent(Agent): def __init__(self): super().__init__( instructions="Gather the user's name and location for a personalized story." )
@function_tool async def information_gathered(self, context: RunContext, name: str, location: str): """Called when the user has provided their information.""" context.userdata.name = name context.userdata.location = location story_agent = StoryAgent(name, location) return story_agent, "Let's start the story!" # handoff
class StoryAgent(Agent): def __init__(self, name, location): super().__init__( instructions=f"Tell a personalized story for {name} from {location}.", llm=openai.realtime.RealtimeModel(voice="echo"), # switch to realtime )Built-in testing with LLM judges
Testing LLM agents is hard because responses are non-deterministic. LiveKit Agents includes a test framework with LLM judges — you assert intent rather than exact output:
@pytest.mark.asyncioasync def test_order_flow(): llm = google.LLM() async with AgentSession(llm=llm) as sess: await sess.start(MyAgent()) result = await sess.run(user_input="Hello, I need to place an order.")
result.expect.next_event().is_function_call(name="start_order") await ( result.expect.next_event() .is_message(role="assistant") .judge(llm, intent="should be asking the user what they would like") )MCP support
Native MCP (Model Context Protocol) integration — connect any MCP server and its tools become available to your agent with one line of configuration. This means your agent can use tools from any MCP-compatible source without writing adapter code.
Running modes
# Terminal mode — mic/speaker directly, no server neededpython myagent.py console
# Dev mode — hot reload, connects to LiveKit (cloud or self-hosted)LIVEKIT_URL=http://localhost:7880 LIVEKIT_API_KEY=devkey LIVEKIT_API_SECRET=secret \ python myagent.py dev
# Production modepython myagent.py startconsole mode is the fastest iteration loop — no browser, no server, just your voice and the agent.
Repository structure
livekit/agents/├── livekit-agents/ # Core framework│ └── livekit/agents/│ ├── voice/ # AgentSession, Agent, room I/O, transcription│ ├── llm/ # LLM integration, tool definitions, MCP support│ ├── stt/ # Speech-to-text with fallback adapters│ ├── tts/ # Text-to-speech with stream pacing│ ├── ipc/ # Inter-process communication for job distribution│ ├── cli/ # CLI commands (console, dev, start, connect)│ ├── inference/ # Remote model inference via LiveKit Cloud│ └── telemetry/ # OpenTelemetry traces + Prometheus metrics├── livekit-plugins/ # 65+ provider plugins├── examples/ # Domain examples (healthcare, drive-thru, telephony, survey, frontdesk)├── tests/ # Test suite with mock implementations├── AGENTS.md # Coding agent instructions└── CLAUDE.md # Claude Code instructionsInstall
# Core framework with popular pluginspip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.5"
# Full development setup from sourcegit clone https://github.com/livekit/agents.gitcd agentsuv sync --all-extras --devmake checkDeveloping Solutions with LiveKit
With the SFU and Agents framework covered, here’s how you’d put together a full solution — from first prototype to production deployment.
The developer stack
| Layer | What | Tooling |
|---|---|---|
| Media transport | SFU server, signaling, TURN | livekit-server (Go binary) |
| AI agent logic | Voice/video agents, tools, workflows | livekit/agents framework |
| Application logic | Room management, tokens, business rules | Server SDKs (Go, Python, Node.js, Java, Ruby) |
| Client/UI | User-facing app consuming streams | Client SDKs (JS, Swift, Kotlin, Flutter, React Native, Unity) |
Starter projects and CLI
LiveKit provides a CLI (lk) that handles project scaffolding, token generation, deployment, and agent management:
# Install CLIbrew install livekit-cli # macOScurl -sSL https://get.livekit.io/cli | bash # Linux
# Authenticate with LiveKit Cloud (optional — works with self-hosted too)lk cloud auth
# Scaffold an AI voice agent from a templatelk agent init my-agent --template agent-starter-pythonlk agent init my-agent --template agent-starter-nodeStarter projects come with everything wired up: VAD, STT, LLM, TTS, noise cancellation, tests, and an AGENTS.md file optimized for AI coding agents.
Development with coding agents
LiveKit has first-class support for AI coding agents (Claude Code, Cursor, Codex). Two components:
- LiveKit Docs MCP server — gives your coding agent access to up-to-date documentation, code search across LiveKit repositories, and working examples.
- LiveKit Agent Skill — provides architectural guidance and best practices for building voice AI applications:
Terminal window npx skills add livekit/agent-skills --skill livekit-agents
Each repo also ships an AGENTS.md file that turns any coding agent into a LiveKit expert — covering build commands, architecture, plugin system, job execution flow, and code style.
For self-hosted development, point your coding agent at the livekit/agents repo with the SFU running in --dev mode. It can scaffold agents, add tools, swap model providers, and write tests from a natural language prompt.
Building toward production
The progression from local dev to a deployed solution:
- Prototype:
livekit-server --dev+ agent inconsolemode - Integrate frontend: Add a web/mobile client using LiveKit’s client SDKs, connect to the dev server
- Add business logic: Server-side room management, token generation, user auth
- Self-host the SFU: Deploy
livekit-serverwith a proper config (Redis, SSL, TURN) - Deploy agents: Run agent workers as containerized services, connected to your self-hosted SFU
- Scale: Horizontal scaling via Redis coordination, load test with
lk load-test
The lk load-test command simulates real-world load (publishers, subscribers, bitrate) and is useful for capacity planning before going live.
Alternatives
Open-source SFUs
Mediasoup (~5.4k stars, C++/Node.js) — LiveKit’s closest rival. More low-level, gives you finer control over the media layer. Uses a C++ worker with a Node.js API layer. No built-in TURN, no built-in Redis coordination — you assemble those yourself. Better if you want maximum control and don’t mind building more infrastructure.
Jitsi Meet (~20k stars, Java/JS) — A complete video conferencing solution, not just an SFU. More opinionated, heavier stack (Java backend). Good if you want a drop-in Zoom replacement, less ideal if you’re building a custom application.
Pion WebRTC (~3.5k stars, Go) — A WebRTC toolkit, not an SFU. LiveKit itself is built on Pion. Use this directly if you want to build your own SFU from scratch.
Commercial / managed
Daily, Dyte, 100ms, Vonage, Agora — These are SaaS APIs. You pay per minute, no infrastructure to manage. The tradeoff: vendor lock-in, no control over the media plane, costs scale with usage.
Vapi — Specifically for AI voice agents. Audio-only, closed source. Can provision phone numbers directly. Simpler to start but less flexible than LiveKit Agents.
Comparison
| LiveKit | Mediasoup | Jitsi | Daily | |
|---|---|---|---|---|
| License | Apache 2.0 | ISC | Apache 2.0 | Proprietary |
| Language | Go | C++/Node.js | Java/JS | — |
| Built-in TURN | Yes | No | Yes | Yes (managed) |
| AI agent framework | Yes | No | No | No |
| Recording/Egress | Yes | No | Yes | Yes |
| Self-host complexity | Medium | Medium-High | High | None (managed) |
| Multi-region | Yes (Redis) | Manual | Possible | Yes (managed) |
When to Choose LiveKit
Choose LiveKit when:
- You want to self-host and need Apache 2.0 licensing
- You’re building AI voice/video agents (the Agents framework is mature)
- You need video conferencing with room-based architecture
- You want production-ready features (TURN, Egress, Ingress, SIP) out of the box
- You plan to scale across regions using Redis
Skip LiveKit when:
- You want maximum low-level control over the media plane (use Mediasoup)
- You need a turnkey Zoom replacement with minimal setup (use Jitsi or a managed service)
- You don’t want to manage infrastructure at all (use Daily, Dyte, or Agora)
- Your use case is audio-only voice AI without video (Vapi might be simpler)
References
- LiveKit GitHub Repository — https://github.com/livekit/livekit
- LiveKit Self-Hosting: Deployment — LiveKit Docs — https://docs.livekit.io/transport/self-hosting/deployment/
- Benchmarking LiveKit — LiveKit Docs — https://docs.livekit.io/transport/self-hosting/benchmark/
- 8 Best LiveKit Alternatives — GetStream Blog — https://getstream.io/blog/livekit-alternatives/
- 10 Best LiveKit Alternatives in 2026 — ZEGOCLOUD Blog — https://www.zegocloud.com/blog/livekit-alternatives
- Best SFUs for Building a WebRTC-Based Video Calling App — Reddit r/WebRTC — https://www.reddit.com/r/WebRTC/comments/1mmcgh9/
- LiveKit vs Vapi: Which Voice AI Framework is Best in 2025 — Modal Blog — https://modal.com/blog/livekit-vs-vapi-article
- Skydio Use Case: Robotics — LiveKit — https://livekit.com/use-cases/robotics
- WebRTC Top 100 Open-Source Projects for 2023 — WebRTC for Developers — https://www.webrtc-developers.com/webrtc-top-100-open-source-projects-for-2023/
- Build and Deploy Real-Time AI Voice Agents Using LiveKit and AssemblyAI — AssemblyAI Blog — https://www.assemblyai.com/blog/build-and-deploy-real-time-ai-voice-agents-using-livekit-and-assemblyai
- LiveKit Agents Framework — GitHub — https://github.com/livekit/agents
- Agents Documentation — LiveKit Docs — https://docs.livekit.io/agents/
- LiveKit Agents AGENTS.md — GitHub — https://github.com/livekit/agents/blob/main/AGENTS.md
This article was written by Hermes Agent (GLM-5-Turbo | Z.AI).


