Lead-to-Appointment for Financial Services (NDA)

We built a phone/SMS concierge that answers inbound leads, places outbound check-ins, asks targeted questions, recommends products in real time, books with the right team, and follows up so the meeting actually happens. It runs on Twilio over the PSTN, no apps while keeping end-to-end turn latency around 700–800 ms and sustaining high throughput during peak loads.
Engagement signal
Some early calls lasted up to 5 minutes, and users stayed engaged. The assistant reflected answers accurately and kept momentum toward a booking.
Transport & streaming
All call control and model streaming ran over WebSockets (audio frames, partial transcripts, token streams). The pipeline: PSTN → STT → LLM orchestration → cloud TTS, maintained ~700–800 ms average response time per turn under load, while preserving barge-in behavior.
Outbound check-ins
Automated calls/SMS after booking and post-meeting. The first wave reduced no-shows; the second captured outcomes and reactivated stalled leads.
Rescheduling anywhere
Users could reschedule via call or SMS. The assistant handled change requests inline, updated the calendar, and re-issued confirmations automatically.
Guided dialogue length
Typical guided conversations ran 3–5 minutes: brief qualification, one or two product suggestions, and immediate booking if the user was ready.
On-the-fly product suggestion (RecSys)
During the call, we ranked products using rules + a lightweight learning-to-rank layer (embeddings over product descriptors). When scores were close, a gentle multi-armed bandit bias learned which option converted better by segment. The assistant offered 1–2 options and booked the appropriate line (advisors / underwriters / loans).
Speech stack choices
TTS: We standardized on cloud TTS for consistent prosody and fast starts, controlling cost via concise responses and caching of common prompts.
STT: Self-hosted Whisper-family plus major cloud engines. Non-English calls were the hardest accents, numerals, domain terms, so we added exact-phrase lists and domain lexicons for critical slots.
What we adjusted
Wordy non-English intros raised AHT without improving bookings → kept openings concise.
Full local TTS for rare languages under-performed → cloud TTS with custom pronunciations for key terms proved more reliable.
Long first-call questionnaires reduced completion → trimmed to 6–8 core fields; moved the rest to short SMS follow-ups.
LLM + structured extraction
An LLM-orchestrated flow produced structured outputs (JSON) for key fields (income band, timeline, product interest, constraints). A light CoT-style planner asked clarifying follow-ups (“net or gross?”, “monthly or yearly?”) when confidence dipped, then emitted validated JSON to the CRM.
Natural turn-taking
We trialed energy-based and neural VAD. Neural was steadier in noise; we tuned sensitivity per locale for quieter speakers and kept a hybrid approach so callers could interject naturally.
Start small, iterate fast
First pass was simple: incoming call → consent → short qualification → product suggestion → book directly into the team’s shared calendar. Then we tightened timing, extraction accuracy, and scheduling UX.
What we built
User journey
Call/SMS → 3-5 minutes of clear questions and guided dialogue→ 1–2 relevant products suggested in real time → immediate booking → SMS confirmation with self-serve reschedule link → reminder → quick post-meeting check-in.
Show-up rate (no-show reduction)
Speed to answer (seconds)
Turn latency (target 700–800 ms)
Average qualification time (AHT, minutes)
Booking rate (% conversations ending with a slot)
Contact rate (% leads reached)
Metrics we tracked
Results
The system turned inbound interest into a predictable appointment engine faster follow-ups, consistent qualification, and on-call product suggestions that lifted conversion through each stage. Advisors received cleaner, context-rich handoffs, improving utilization and closing momentum. Automated rescheduling and check-ins reduced operational drag and protected pipeline health.

Net effect: more coverage from the same team, higher quality appointments, and a steadier path from lead to revenue.
Websockets
PostgreSQL
Twilio
ElevenLabs
Celery
FastAPI
Redis
Realtime OpenAI API
Custom LLM Tracing Interface
State Machine + LLM Tools
Qdrant
Schema-Guided Reasoning
Whisper Large
Extraction Module
Stack:
Ready to map your first (or next) AI assistant?
Get a free 30-minute scoping call to pinpoint feasible use cases and a pragmatic path to production.