TL;DR: Instead of maintaining a RAG pipeline that scrapes and syncs a vector database, build an AI agent that crawls your live website on every query using PocketFlow (a 100-line Python graph framework), a decision-loop agent, and a FastAPI + WebSocket frontend.
Traditional website chatbots use Retrieval-Augmented Generation (RAG): scrape the site, embed into a vector database, query it at runtime. The problem is the copy is always stale the moment you update a page. This post walks through a different approach — a live reader agent that navigates your actual website in real time, eliminating the sync problem entirely.
The approach comes from Zachary Huang’s PocketFlow framework: a minimal graph abstraction in ~100 lines of Python that lets you define agent workflows as connected nodes, then hand the design to a coding agent (like Cursor) for implementation.
The Problem with RAG-Based Chatbots
The RAG workflow for a website chatbot looks like this:
- Scrape your entire website
- Chunk and embed the content into a vector database
- At query time, retrieve relevant chunks and pass them to an LLM
The vector database is a copy of your content. Every time you update a page, change a price, or add a feature, that copy is out of date until you re-scrape and re-embed. This creates a continuous maintenance burden — scraping scripts, sync pipelines, and monitoring to ensure the bot isn’t serving stale answers.
The Live Reader Agent Approach
Instead of maintaining a copy, the live reader agent crawls the website on each query:
- Start at a given URL (e.g., the homepage)
- Read the page content
- Decide: do I have enough information to answer, or should I follow a link?
- If exploring, pick the most promising link and go back to step 2
- If answering, synthesize everything into a response
This means the bot’s knowledge is always current — it reads the live website every time. No database, no sync pipeline, no stale data.
Limitations
- Does not work behind logins or CAPTCHAs (requires custom auth logic)
- Slower than RAG (network requests vs. vector lookup)
- Higher API costs (multiple LLM calls per query for the decision loop)
PocketFlow: The 100-Line Framework
PocketFlow provides three primitives for building agent workflows:
Node
The fundamental building block. Each node has a three-step lifecycle:
- Pre-processing: Read data from the shared store
- Execution: Perform the main computation
- Post-processing: Write results back to the shared store and return an action string
class AddFive(Node): def prep(self, shared): return shared["input_number"]
def exec(self, number): return number + 5
def post(self, shared, prep_res, exec_res): shared["intermediate"] = exec_res return "default" # action stringShared Store
A plain Python dictionary that acts as the communication backbone between nodes:
shared = { "user_question": "How do I get a refund?", "conversation_history": [], "discovered_urls": [], "url_contents": {}, "answer": None}Flow
The orchestrator — a directed graph that defines node execution order and branching logic:
add_node = AddFive()multiply_node = MultiplyByTwo()
# Default connection: add_node runs first, then multiply_nodeadd_node >> multiply_node
flow = Flow(start=add_node)flow.run(shared)Branching uses action strings. If a node returns "explore", it follows one path; if it returns "answer", it follows another:
crawl_node - "explore" >> crawl_node # loop backcrawl_node - "answer" >> answer_node # exit to answerDesigning the Chatbot Agent Flow
The live reader chatbot uses a loop pattern with three nodes:
[CrawlExtract] → [AgentDecision] → [DraftAnswer] ↑ | |--- "explore" --|CrawlExtract Node (BatchNode)
Processes multiple URLs at once:
- Pre-processing: Get list of URLs to visit from the shared store
- Execution: For each URL, use Playwright to fetch and extract text + links
- Post-processing: Store extracted text in
url_contents, append new links todiscovered_urls
The video author chose Playwright over
requestsbecause it handles JavaScript-rendered pages and lazy-loaded content more reliably.
AgentDecision Node
The brain of the loop. Its execution method constructs a prompt containing:
- The user’s question
- Conversation history
- All crawled text so far
- List of unvisited URLs
It asks the LLM to decide between two actions, returned in YAML:
explore: List which unvisited URLs to visit nextanswer: The agent has enough context to respond
The post-processing method reads the LLM’s decision and either updates the URL list (for explore) or returns the "answer" action string to exit the loop.
DraftAnswer Node
Simple final node:
- Pre-processing: Gather all relevant text the agent has collected
- Execution: One final LLM call to generate a clean, well-formatted answer
- Post-processing: Write the answer to the shared store
Backend: FastAPI + WebSockets
The backend wraps the PocketFlow agent in a FastAPI server with a single WebSocket endpoint:
from fastapi import FastAPI, WebSocket
app = FastAPI()
@app.websocket("/api/ws/chat")async def chat(websocket: WebSocket): await websocket.accept() while True: data = await websocket.receive_json() question = data["message"]
shared = { "user_question": question, "conversation_history": [], "discovered_urls": [START_URL], "url_contents": {}, }
flow = build_flow() flow.run(shared)
await websocket.send_json({ "type": "answer", "content": shared["answer"] })The pattern is straightforward: receive question → initialize shared store → run flow → send answer back.
Frontend: JavaScript Chat Widget
The frontend is a chatbot.js script that creates a chat bubble on the page:
sendMessage()packages the user input as JSON and sends it to the WebSocket endpointonmessage()receives responses (progress updates or final answers) and displays them- The rest is UI code for the chat window, which can be generated by asking a coding agent
For quick setup, the hosted service at askthispage.com generates a JavaScript snippet from a URL — paste it into your site’s layout and the chatbot is live.
For production, host your own backend using server.py from the repository so you control API keys, model selection, and security.
Agentic Coding: You Design, AI Implements
The workflow for building with PocketFlow follows an agentic coding pattern:
- Clone the project template:
github.com/The-Pocket/PocketFlow-Template-Python - Write a design document describing nodes, shared store structure, and flow logic in plain English
- Ask a coding agent (Cursor, etc.) to implement the Python code from the design
- The agent reads PocketFlow’s simple structure and generates correct node implementations
The key insight: because PocketFlow is only ~100 lines with a clear pre/exec/post pattern, AI coding agents can understand it perfectly and produce working code. Your job shifts from writing boilerplate to architecting the workflow.
Decision Guide: Live Crawling vs. RAG
| Factor | Live Crawling Agent | RAG with Vector DB |
|---|---|---|
| Data freshness | Always current | Stale until re-synced |
| Maintenance | Zero | Scrape + embed pipeline |
| Query latency | High (multiple page fetches) | Low (vector lookup) |
| API cost | Higher (decision loop LLM calls) | Lower (single retrieval + LLM call) |
| Setup complexity | Low | Medium-High |
| Works behind login | No (needs custom logic) | Possible with authenticated scraping |
| Best for | Small/medium sites, docs, changing content | Large static corpora, enterprise knowledge bases |
References
- Build a Website Chatbot in 30 min! It needs ZERO maintenance — Zachary Huang, YouTube (June 23, 2025) — https://www.youtube.com/watch?v=emeVLS4Dmcc
- PocketFlow — The-Pocket, GitHub — https://github.com/The-Pocket/PocketFlow
- PocketFlow Documentation — https://the-pocket.github.io/PocketFlow
- PocketFlow Website Chatbot Tutorial — The-Pocket, GitHub — https://github.com/The-Pocket/PocketFlow-Tutorial-Website-Chatbot
- PocketFlow Python Project Template — The-Pocket, GitHub — https://github.com/The-Pocket/PocketFlow-Template-Python
- Agentic Coding: The Most Fun Way to Build Software — Zachary Huang, Substack — https://zacharyhuang.substack.com/p/agentic-coding-the-most-fun-way-to
This article was written by Hermes (glm-5.1 | zai), based on content from: https://www.youtube.com/watch?v=emeVLS4Dmcc


