Lead-to-Appointment for Financial Services (NDA)

We built a phone/SMS concierge that answers inbound leads, places outbound check-ins, asks targeted questions, recommends products in real time, books with the right team, and follows up so the meeting actually happens. It runs on Twilio over the PSTN, no apps while keeping end-to-end turn latency around 700–800 ms and sustaining high throughput during peak loads.

Engagement signal

Some early calls lasted up to 5 minutes, and users stayed engaged. The assistant reflected answers accurately and kept momentum toward a booking.

Transport & streaming

All call control and model streaming ran over WebSockets (audio frames, partial transcripts, token streams). The pipeline: PSTN → STT → LLM orchestration → cloud TTS, maintained ~700–800 ms average response time per turn under load, while preserving barge-in behavior.

Outbound check-ins

Automated calls/SMS after booking and post-meeting. The first wave reduced no-shows; the second captured outcomes and reactivated stalled leads.

Rescheduling anywhere

Users could reschedule via call or SMS. The assistant handled change requests inline, updated the calendar, and re-issued confirmations automatically.

Guided dialogue length

Typical guided conversations ran 3–5 minutes: brief qualification, one or two product suggestions, and immediate booking if the user was ready.

On-the-fly product suggestion (RecSys)

During the call, we ranked products using rules + a lightweight learning-to-rank layer (embeddings over product descriptors). When scores were close, a gentle multi-armed bandit bias learned which option converted better by segment. The assistant offered 1–2 options and booked the appropriate line (advisors / underwriters / loans).

Speech stack choices

TTS: We standardized on cloud TTS for consistent prosody and fast starts, controlling cost via concise responses and caching of common prompts.

STT: Self-hosted Whisper-family plus major cloud engines. Non-English calls were the hardest accents, numerals, domain terms, so we added exact-phrase lists and domain lexicons for critical slots.

What we adjusted

Wordy non-English intros raised AHT without improving bookings → kept openings concise.

Full local TTS for rare languages under-performed → cloud TTS with custom pronunciations for key terms proved more reliable.

Long first-call questionnaires reduced completion → trimmed to 6–8 core fields; moved the rest to short SMS follow-ups.

LLM + structured extraction

An LLM-orchestrated flow produced structured outputs (JSON) for key fields (income band, timeline, product interest, constraints). A light CoT-style planner asked clarifying follow-ups (“net or gross?”, “monthly or yearly?”) when confidence dipped, then emitted validated JSON to the CRM.

Natural turn-taking

We trialed energy-based and neural VAD. Neural was steadier in noise; we tuned sensitivity per locale for quieter speakers and kept a hybrid approach so callers could interject naturally.

Start small, iterate fast

First pass was simple: incoming call → consent → short qualification → product suggestion → book directly into the team’s shared calendar. Then we tightened timing, extraction accuracy, and scheduling UX.

What we built

User journey

Call/SMS → 3-5 minutes of clear questions and guided dialogue→ 1–2 relevant products suggested in real time → immediate booking → SMS confirmation with self-serve reschedule link → reminder → quick post-meeting check-in.

Show-up rate (no-show reduction)

Speed to answer (seconds)

Turn latency (target 700–800 ms)

Average qualification time (AHT, minutes)

Booking rate (% conversations ending with a slot)

Contact rate (% leads reached)

Metrics we tracked

Results

The system turned inbound interest into a predictable appointment engine faster follow-ups, consistent qualification, and on-call product suggestions that lifted conversion through each stage. Advisors received cleaner, context-rich handoffs, improving utilization and closing momentum. Automated rescheduling and check-ins reduced operational drag and protected pipeline health.

Net effect: more coverage from the same team, higher quality appointments, and a steadier path from lead to revenue.

Websockets

PostgreSQL

Twilio

ElevenLabs

Celery

FastAPI

Redis

Realtime OpenAI API

Custom LLM Tracing Interface

State Machine + LLM Tools

Qdrant

Schema-Guided Reasoning