I got tired of seeing small businesses miss calls. So I built Vokio — a voice AI agent that answers real phone calls, remembers callers between sessions, and generates a post-call summary automatically.
Here's the full architecture and how I solved the hard parts.
Stack
- Python + Flask (webhook server)
- Vapi (telephony + STT)
- Claude Haiku (conversation)
- Deepgram Nova 3 (Spanish STT)
- Azure TTS (Spanish voices)
- SQLite (memory between calls)
How it works
Vapi handles the phone call and speech recognition. Instead of using a built-in LLM, I configured Vapi to use a Custom LLM — pointing it to my Flask server. Every time the caller says something, Vapi sends a POST request to my endpoint and expects an OpenAI-compatible streaming response.
@app.route("/chat/completions", methods=["POST"])
def chat():
data = request.json or {}
phone = data["call"]["customer"]["number"]
last_message = [m for m in data["messages"] if m["role"] == "user"][-1]["content"]
response = generate_response(phone, last_message)
# return SSE streaming response
The memory system
This is the interesting part. Every caller is stored in SQLite by phone number. When a call comes in, I query the DB and inject the caller's history into the system prompt.
def generate_response(phone: str, user_message: str) -> str:
caller = get_caller(phone)
history = get_history(phone)
if caller and caller.get("name"):
system += f"\n\nYou already know this caller, their name is {caller['name']}."
if caller and caller.get("sentiment") == "frustrated":
system += "\n\nNOTE: This caller was frustrated last time. Be especially patient."
If the caller was frustrated last time, Claude knows and adjusts its tone automatically.
Post-call analysis
When the call ends, Vapi sends a webhook to /end-call. I pass the transcript to Claude and get back:
- 2-line summary
- Urgency score (1-5)
- Required action ("call back today", "send quote", "none")
- Customer sentiment (satisfied / neutral / frustrated)
All stored in SQLite. The business knows at a glance which calls need attention today.
The tricky parts
Streaming is required. Vapi expects SSE streaming responses. If you return regular JSON the call hangs up immediately. Took me a while to figure that out.
Tool use + verbal response. When Claude uses a tool (like saving the caller's name) without producing text, you get an empty response. I added a follow-up API call to get the verbal response without Claude announcing what it just saved.
dotenv loading. If environment variables already exist in the system, load_dotenv() won't override them. Use override=True.
Result
A 5-minute call costs approximately €0.28. The agent answers in Spanish, detects if the caller switches to English or Catalan, and responds in the same language automatically.
I packaged it as a ready-to-deploy template with 6 business sector prompts included (dental clinic, restaurant, hair salon, real estate, mechanic, hostel).
United States
NORTH AMERICA
Related News
How Braze’s CTO is rethinking engineering for the agentic area
10h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago

Implementing Multicloud Data Sharding with Hexagonal Storage Adapters
15h ago

DeepMind’s CEO Says AGI May Be ~4 Years Away. The Last Three Missing Pieces Are Not What Most People Think.
15h ago

CCSnapshot - A Claude Code Configs Transfer Tool
21h ago