How much does it cost to have an outsourced receptionist? Even if it is 500 a month if we are really talking about thousands of dollars per month lost your ROI is still crazy.
But maybe soon we will not even realise we speak to a robot, given the current speed of ai development.
I wonder how that will erode trust in calls. I moved from cold emailing and cold LinkedIn to cold calling because of the massive amounts of ai spam I have to compete with. But maybe cold calling will die soon as well if the robots emerge.
For example, even if it shows a boost of $100,000 per month in revenue. It could likely have been achieved with a shared virtual assistant / receptionist for about $200-1000 per month (depending on exactly call volumes).
So really, the revenue was already lost and going forward you’re just deciding to capture it. You've created a more complicated mouse trap than what was already available to you. The difference is saving a couple hundred dollars of labor less whatever your AI/tech costs are. I’d still go the human route because it’s more future proof and if this is a luxury service, human service is always going to feel more luxurious.
If I were already an existing customer and just wanted to schedule an oil change, it'd be fine, though I'd probably just schedule on the website anyway. I'm really only going to call in if I have an unusual circumstance and actually need to speak with someone.
I was on the wrong end of some (presumably) LLM powered support via ebay's chatbot earlier this week and it was a completely terrible experience. But that's because ebay haven't done a very good job, not because the idea of LLM-powered support is fundamentally flawed.
When implemented well it can work great.
Unfortunately, the human behind it was not technically-savvy enough to clarify a point, so I had to either accept the LLM response, or quit trying. But at least it saved me the time from trying to explain to a level 1 support person that I knew exactly what I was asking about.
Here’s the video: https://youtu.be/QmH9b27xm6k
It was very impressive at that time. They did raise money after that pitch, but they ended up pivoting (multiple times). They IPOd in 2017
* i'd love to hear a sample/customer call. Even if it's just a test
* a blog without rss? How can i subscribe for part 2?
I went through hell on a home remodel project 6 months ago around this stuff. I got a quote from a reputable plumber and went to schedule the rough-in session. An AI receptionist answered, got confused during the scheduling flow and could not understand my address, asking me to repeat it over and over. And it couldn't forward to me to human.
If I'm paying you tens of thousands of dollars for remodeling work, I damn well better be able to get in touch with you. I found a different contractor and never looked back.
Claude will hallucinate anyway, sometimes.
I don't think there's any way around this other than a cli or MCP that says "press the 'play prerecorded .WAV file button that says the brake repair service info and prices.'"
Like CMON this is the bare minimum here.
> He’s under the hood all day. The phone rings, he can’t answer, the customer hangs up and calls someone else
the mechanic is already very busy in the first place so unless he plans on expanding shop the whole thing is a waste of time
If you only have 4 options, just give me the old school list of voice options and I'll press 1 through 4, in less time, and being only moderately annoyed.
But a knowledgeable AI system as described in the article - that knows what it knows and tells you when it doesn't - could work great. If it had access to inventory and calendar, it might have worked for you. The question is whether the implementation lives up to the high expectations set by the articles.
I appreciated your post and have some takeaways around text formatting for TTS in my own projects. Thanks!
Regarding the AI receptionists, from the calls I've listened to, there's still a bit of the uncanny valley/overlapping speech issues that I'm unsure are ever fixable just due to latency.
But for low margin businesses like contracting and (I imagine) auto repair where labor is your most expensive cost, these owners are doing anything they can to reduce their overhead.
This isn't to disparage the project - I think this sort of usage will become very common and a decent standard that produces good consumer surplus in terms of reduced costs etc. Especially impressive is that it's a DIY family-first implementation that seems to be working. It's great hacker work.
But be warned it will erode - in general - the luxury previously associated with your brand, and also turn some customers away entirely.
Would love to see benchmarks on Mac Studio with its 7.4 GB/s SSD bandwidth — feels like the sweet spot for this technique.
However, does the regular "joe/jane" feel the same way? I imagine my mom or dad would most likely not notice or care if they did.
More generally, when done well, RAG is really great. I was recently trying out a new bookkeeping software (manager.io), and really appreciated the chatbot they've added to their website. Basically, instead of digging through the documentation and forums to try to find answers to questions, I can just ask. It's great.
If my mechanic answered with an LLM I’d take my car elsewhere.
Why should people be impressed by this?
Then you tell it to just not answer off the wall questions etc. and if you are using a good model it will resist casual attempts.
I don't see being able to ask nonsense questions as being a big deal for an average small business. But you could put a guardrail model in front to make it a lot harder if it was worth it.
If I had to call four different places and spend five minutes on the phone with each shop, that'd eat up my entire lunch time.
I assume the Op, being a programmer and not a car mechanic, just assumed they mean the same thing.
The entire discussion here about how AI undercuts luxury brands has absolutely nothing to do with the actual post.
Bingo.
You can't get away with AI slop in a service oriented for wealthy customers.
The day my dealership starts answering me with AI they lose a customer 100%.
This solution screams "built by a tech bro with no idea about economics and marketing" which is the VC playbook into modernizing (and failing) businesses they don't understand.
How are they measuring the success rate? It seems like a project like this is a great time to dive into the problem and define the parameters of success. If only to inform how you design the ai’s presentation of the shop. Ie. how quickly does it get customer’s profile and discover their issue.
Thinking about my experiences with mechanics shops—with the exception of dealerships and larger operations—if you’re talking to a principal, the conversation is brief. It’s possible customers will respond positively if the bot is effective for scheduling and if the price communicated by phone, and the final price are somehow aligned to expectations.
But a speech-to-text and text-to-speech system that I know is "understanding" me would be great rather than waiting music. The shop could even sell it as "As a small shop, most of our employees are busy fixing cars, so we are using AI to help with calls" (Although then people who are anxious about AI stealing jobs might hang up). The robot can ask me what I need, and then say "So for [this service], the price would be..." (to tell the caller what it has understood).
If the AI can even look at gaps in the shop's schedule and set an appointment time, the customer might even be happy that they just spent a minute on the phone instead of 10+...
So we cannot always assume that the business owner (especially the solo mom and pops) wants more business. Good ones are already very busy.
He wouldn’t care to get these extra jobs if he’s full, so why do this to begin with. He could however hire another mechanic if he books more jobs and grow his business to one of shop owner instead of mechanic (no idea if this is his motivation or not).
It’s likely he’s not actually under the hood all day but If phone rings twice a day and it just happens to be he’s under the hood at those times, he misses the call and it’s like he’s under the hood all day. It doesn’t mean he has no capacity, it just means he’s missing some calls throughout the day.
I know it's not that simple, but my gut says theres value to at least hearing out the people taking action to call you. Especially if that's automated and low cost to you.
Just to be clear, the LLM assistant could be a great supplement to the app for people with disabilities or those who struggle with phone apps for whatever reason, but for most people the LLM phone call seems worse.
If we take OP’s post at face value, presumably his brother is already at 100% capacity otherwise he wouldn’t be missing all these calls.
That said, a good service writer is worth their weight in gold. Also, they are typically going to be the person you end up selling the business to when you retire. Most mechanics aren't good enough at the business side of things to actually buy, but service writers are.
Christ just hire some local teenager or whomever. There's people who will work for minimum wage.
Nothing pisses people off faster than calling up and getting put on the line with a robot. Like if we're thinking about this problem and how to solve it we can look at other examples like a website with a booking form,call the mechanics cell directly, hire a receptionist or worst case outsource the receptionist to a booking agency.
But if you say "talk to a manager" it'll still force a human to answer, which is the only thing I ever do.
More to the point - does this garage even have the time and space to service more vehicles? Generating a bunch of new low-value/low-loyalty customers takes up time and space and might have a lower return-per-hour while making it harder to retain higher value returning customers.
Additionally, as "luxury mechanic" (apparently specializing in BMW but servicing other makes) you'll need to appeal to "luxury drivers" and bolting on more crap that makes the experience worse is probably not the way to do that.
"Hmm, this user seems to really understand network topology, better get him over to engineering"
vs.
"Hmm, the user doesn't know the difference between their router and their modem, I should help them identify the router then walk them through a power cycle".
It would be somewhat odd to specialize in both American and European luxury cars. It'd be significantly less odd to service a RR and a BMW 3er next to each other.
A BMW owner has fussier standards (on average) than a Toyota owner. The 'higher touch' a service you're trying to provide, the less welcome these interventions will be. If there's a distinction between a normal-car garage and a luxury-car garage, this probably comes down to some sort of licensing or certification from those luxury brands. Seems plausible to me that luxury brand X could stipulate things like availability of human contact points.
Re: not being a car mechanic, it's true, but I'll have you know that I replaced my own blower motor a few months ago :)
OP's brother is by all accounts running a successful boutique workshop, but the various luxury annotations were completely unnecessary and just detract from the actual project. If they do want to lean into the luxury segment, being cheap with AI receptionists is not the way to go. They need to hire actual staff who has experience with HNW individuals.
I guess as a plumber having enough of the type of jobs that can wait a week that you can turn away the urgent calls might be one of those feature-not-a-bug type situations.
In fact, decision trees are nice because they tell your more or less up front what they're capable of.
What really sucks (AI or decision tree, either way) is when they don't let you easily speak with someone.
"Every program attempts to expand until it has a built in LLM."
in that medium, llms are so much better than old phonetrees and waiting on hold
I'll switch to the AI chat where it lets you select your order and I'll do the same thing, and it has no issue telling me it can give me a refund and process it instantly.
So my case, the two seem to behave differently. And these are on items that say they're eligible for refunds to begin with when you first order them.
I think most folks already wouldn't be able to tell, with the modern TTS.
It's like AI photos, they fool you unless you're looking for it.
This is the critical data —» how many people hang up on the AI chatbot vs how many people hang up on the voice message prompt.
If it is even close, well, the AI needs to be improved.
If the AI is way ahead, but still loses/drops more than a live receptionist (outsourced or in-house), the AI either needs improvement, or to be dumped for a live receptionist, and that's kind of a spreadsheet problem (how many jobs lost in each case, vs costs).
Asking a business to hire a receptionist is probably a bit unlikely for small businesses in today's environment.
I don’t know if he’s “tested”, but he said he’s happy enough with the service. We don’t always have to AB test every possible option - sometimes good enough is good enough.
Obviously that process could happen purely via voice but I think there's not as much love for walking through forms in a phone call.
If Joe has a PC in the shop with a tailored UI, he could get pings of pending requests and when he comes up for air, update the intake (via voice to minimize greasy hands) and initiate a call back then and there?
This garage is for those older cars and has no connection to the actual manufacturers, so there is no licensing required.
A friend of mine worked for a call center that did car rentals, old people would call them and ask to rent a car.
Maybe the AI system should have "Press 1 to talk to AI, press 2 to leave a message" so experts like you can press 2.
Y'all are in the wrong business :D
Not every job a plumber does is an emergency situation. I used a plumber to help me setup a backyard project to set up a portable propane tankless gas water heater. I took a look at buying at the parts and pieces I would need, but they needed special tools that would only be used once if I were to buy them. Instead, I had the plumber do it for me with all of the necessary parts/pieces on the truck plus the tools to do it. It cost me less than it would have to buy everything. Now, I just need a cold water feed, and I have a portable hot/cold running system.
Not everyone works all three or wants to do more than one of these groups. There’s different levels of demand, pay, competition at each.
You can shut the entire network off, shower/poop at neighbours places or work, laundry at the local self-laundry shop and brush you teeth with a bootle of water. Inconvenient sure, but it would as much problematic to be denied electricity for a long time: lights off, fridge off, no heating, boiler off… there’s alternatives but the usual way for us is to share a long electric cord by an open window… so obligatory work-and-stay-at-home if you’re lucky to have an appropriate activity.
It is not always about getting more customers.
Often the relevant information is a pain to find on a website, but even if it isn't, the people who answer the phone often have important context like "Usually we do offer that recently but one of our suppliers..." or "We can do that, but maybe instead..." or "Oh the website isn't updated with..."
"Hi, I'm the LargeBank AI Assistant. How can I help you?" "I'd like to know the balance of my checking account."
And then authenticate and get the balance as usual. Simpler and faster. Agreed that it becomes a problem if it's seen as a replacement for human agents though. In an ideal world it would actually free up the human agents for when they're actually needed. In reality it'll probably be some of each.
Spoken word is still the most information dense way for humans to communicate abstract ideas in real time.
But the real question you should also ask is what else can that human do for you that the AI can't because they have eyes and ears and hands?
The model is exactly like Planet Fitness or similar gyms: It doesn't work if everyone visits at once, but you plan on most people using it once a week.
“Hey can you look out and see if Joe’s almost done with the blue Chrysler?” is an easy ask for the phone answerer at my local Joe’s shop (it’s his wife, and as a bonus she’ll also holler at him or his crew to hurry up because @alwa is waiting on it).
Contrast with the grant-funded pharmacy I use. Some management type suggested they could deal with their insane level of overwork by automating away the phones to a hostile and labyrinthine network of IVRs. Oh, it has “AI,” but only to force choices between forks in decisions trees corresponding to questions I didn’t have—and every path still eventually ends in “this voice mailbox is full, goodbye.”
After literal hours of my life trying to wrestle their IVRs into helping—I do sympathize with their workload and don’t want to be a special snowflake—I now drive 30 minutes to ask questions face to face.
In general I’ve maxed out what’s discoverable by automated means before I call. So a call center is both useless and insulting.
So, I agree. But I believe the problem is pretty solvable with enough tokens.
Even if the new model that came out last week totally fixed all the problems this time for real, most people's experience with chatbots is that they are prone to misunderstanding or making false statements. "Hallucinations"
I have yet to experience any degree of confidence in any output from an LLM, so I'd rather leave the message. I don't know how common this point of view is.
Get a 5 gallon bucket with lid. Put garbage bag inside. Put toilet seat from broken toilet on it.
Use it, remove refuse if needed, put lid on.
I paid quite a lot for hauling and fixing alternator.
Same with basic house maintenance prices are through the roof.
Reading > Listening
Speaking > Typing
If you want raw performance on both sides, It is better to dictate an email that gets read later.
additionally for many use cases it's not feasible from an eng standpoint to expose a separate api for each entire workflow, instead they typically have many smaller composable steps that need to be strung together in a certain order depending on the situation
its well fit for an llm + tools
"I'd like to schedule a smog check tomorrow or Wednesday?" rather than leaving a message and hoping for a callback that you don't miss either (and have go to voice mail).
Being able to have a voice appointment scheduling system (assuming that it isn't being jail broken https://www.youtube.com/shorts/GJVSDjRXVoo ) could be useful... though there are problems with giving it agency over decisions ( https://www.bbc.com/travel/article/20240222-air-canada-chatb... ).
por espanol marque beep
if you have a quest beep
for beep
beep*beep*beep*beepbeep*
The account balance for account ending in NNNN is: $375.86
I shouldn't have to navigate a conversation in a situation where muscle memory will take me through the phone system decision tree in seconds.
I agree with you on the dealership dynamics though.
Partly as a preventative measure: we trust them. In the rare cases when they find something, it’s real. As a consequence we get ahead of brewing problems.
Plus loyalty, to some extent; we try to throw work their way when we can, even if we probably could handle it ourselves. The relationship between our families goes back a good 60 years by now.
Fully grant that my situation is unlikely to be representative. And no shade toward OP—it sounds like a cool project thoughtfully done, and a real improvement over the status quo for her relative!
My brother is a luxury mechanic shop owner, and he’s losing thousands of dollars per month because he misses hundreds of calls per week. He’s under the hood all day. The phone rings, he can’t answer, the customer hangs up and calls someone else. That’s a lost job — sometimes a $450 brake service, sometimes a $2,000 engine repair — just gone because no one picked up.
So I’m building him an AI receptionist. I named it Axle — like a car axle — because of course I did. 😏
This isn’t a generic chatbot. It’s a custom-built voice agent that answers his phone, knows his exact prices, his hours, his policies, and can collect a callback when it doesn’t know something. To get this right requires a custom build, so first I scraped his website data, created a product requirements document (PRD), and scoped the project into a 3-part build.
The first step was making sure the AI could actually answer questions accurately — without hallucinating prices or making things up.
A raw LLM is dangerous here. If a customer asks “how much for brakes?” and the AI guesses $200 when the real answer is $450, that’s a broken expectation and a frustrated customer. The fix is Retrieval-Augmented Generation (RAG): instead of letting the model guess, you give it a knowledge base of real information and make it answer only from that.
Here’s what I did:
Scraped Dane’s website — I pulled his service pages and pricing into markdown files. From there I built a structured knowledge base covering 21+ documents: every service type, pricing, turnaround times, hours, payment methods, cancellation policies, warranty info, loaner vehicles, and what car makes he specializes in.
Embedded the knowledge base into MongoDB Atlas — Each document gets converted into a 1024-dimensional vector using Voyage AI (voyage-3-large). These vectors capture the semantic meaning of each document, not just the keywords. They’re stored in MongoDB Atlas alongside the raw text, with an Atlas Vector Search index on the embedding field.
Built the retrieval pipeline — When a customer asks a question, the query gets embedded using the same Voyage AI model and then run against the Atlas Vector Search index. It returns the top 3 most semantically similar documents — so “how much for a brake job?” correctly retrieves the brake service pricing doc even if those exact words don’t appear together.
Wired up Claude for response generation — The retrieved documents get passed as context to Anthropic Claude (claude-sonnet-4-6) along with a strict system prompt: answer only from the knowledge base, keep responses short and conversational, and if you don’t know — say so and offer to take a message. No hallucinations allowed.
By the end of Part 1, I could type a question in the terminal and get a grounded, accurate answer back. “How much is an oil change?” → “$45 for conventional, $75 for synthetic. Includes oil filter, fluid top-off, and tire pressure check. Takes about 30 minutes.”
💡 I’m learning as I go using the MongoDB AI Learning Hub — and you can too! Check it out to build your own AI Agents!
Next I had to get this brain onto an actual phone line that customers could call.
I chose Vapi as the voice platform. It handles everything on the telephony side: purchasing a phone number, speech-to-text (via Deepgram), text-to-speech (via ElevenLabs), and real-time function calling back to my server. The whole voice infrastructure is handled — I just needed to build the webhook it calls.
Built a FastAPI webhook server — Every time a caller asks a question, Vapi sends a tool-calls request to my /webhook endpoint with the caller’s query. The server routes that to the RAG pipeline, gets a response from Claude, and sends it back to Vapi, which reads it aloud to the caller. The whole round trip has to be fast enough to feel like a natural conversation.
Exposed it with Ngrok — During development, the server runs locally on port 8000. Ngrok punches a tunnel through to a public HTTPS URL, which I paste into the Vapi dashboard as the webhook endpoint. Vapi can now reach my local server in real time as calls come in. For production this would move to a cloud host, but for building and testing Ngrok gets the job done in two minutes.
Configured the Vapi assistant — In the Vapi dashboard I set up the assistant with a greeting (“Hi, thanks for calling Dane’s Motorsport, how can I help you today?”), wired up two tools (answerQuestion for RAG-backed responses and saveCallback for collecting a name and number when a question can’t be answered), and pointed both at the webhook URL.
Added conversation memory — Vapi sends the full conversation history with each request, so the RAG pipeline gets the prior turns as context. If a caller asks “what are your hours?” and then follows up with “and how much for a tire rotation?”, the AI handles both coherently.
Logged every call to MongoDB — Each interaction gets stored in a calls collection: the caller’s number, the query, the AI’s response, whether it escalated to a human, and the timestamp. Callback requests from unknown questions go into a separate callbacks collection so Dane can follow up. This turns the phone system into a data asset — he can see what customers are asking most, when call volume spikes, and how often the AI hands off to a human.
Then finally, the thing that took the most iteration: making it sound right.
Text responses and voice responses are completely different. A response that reads fine on screen — with bullet points, dollar signs formatted as “$45.00”, or a sentence that starts with “Certainly!” — sounds awful when spoken aloud. I had to tune the system prompt specifically for voice delivery.
Picked the right voice — Vapi integrates with ElevenLabs and gives you access to a huge library of AI voices. I went through about 20 of them, reading the same test script each time: a greeting, a price quote, an escalation. Most sounded either too robotic, too enthusiastic, or just wrong for a mechanic shop. I landed on Christopher — calm, natural, unhurried. The kind of voice that sounds like someone who actually knows cars. Getting this right mattered more than I expected; a great AI response delivered in the wrong voice still feels off.
Rewrote the system prompt for voice — Short sentences. No markdown. No filler phrases like “Great question!” or “Certainly!”. Prices spoken naturally (“forty-five dollars” instead of “$45”). Responses capped at 2–4 sentences max. The goal is to sound like a knowledgeable, friendly human — not a chatbot reading a webpage.
Tested the escalation flow — When a caller asks something that isn’t in the knowledge base, the AI doesn’t guess. It tells the caller it doesn’t have that information, asks for their name and a good callback number, and saves that to MongoDB. Dane gets a list of callbacks to return — no lost leads.
Wrote integration tests — I built a test suite covering the RAG pipeline, the webhook handler, and the full end-to-end flow. This was especially important for catching edge cases: what happens when Vapi sends a malformed request, what happens when the vector search returns no results above the confidence threshold, what happens when the caller doesn’t leave a callback number.
Here’s everything wired together:
voyage-3-large) — text embeddings for semantic retrievalclaude-sonnet-4-6) — response generation, grounded in the knowledge basepymongo, voyageai, anthropic, fastapiRight now the AI answers questions and takes callbacks. The next phase has a few pieces: connecting it to a real calendar so it can book appointments directly during the call; adding text message notifications so Dane gets an instant SMS whenever a new callback comes in; building a simple dashboard so he can see and manage all his pending callbacks in one place; locking down the security for production robustness; deploying to Railway so it runs on a persistent public URL; and then handing it over for him to actually use with real customers.
Dane misses 100+ calls a week. Each missed call is a potential job. Some of those jobs are $50, some are $2,000. This system runs 24/7, never puts someone on hold, and knows every price and policy as well as he does.
The build took three focused sprints. The hardest part wasn’t the code — it was getting the voice tone right so it actually sounds like someone who works at a mechanic shop and not a Silicon Valley startup.
If you’re building something similar, the core insight is this: don’t use a raw LLM for a business-specific voice agent. Ground it in a real knowledge base, constrain it to only answer from that base, and design the fallback flow before anything else. The escalation path is not an edge case — it’s a core feature.
Jaguar-of-Theseus
If they were to have an app on their website, I wouldn't know because I don't use the webpage for that purpose - I call them.
Now, they've all got receptionists there that work full time and handle the appointments and take that first tier of service. These are larger places that have two receptionists working the full day (handling walkins, calling confirmations, and the other administrative tasks)... I don't think that an LLM (even with access to appointments) would do a better job than what they do (and certainly wouldn't be able to do the "ok, I showed up, now what do I do?")
However, I could see this for a small mechanic shop. When I lived in California, I went to what is now Shoreline Auto Care on El Camino and Shoreline - a small two bay mechanic... and that's not the type of place that has the business that can afford a full time receptionist.
So the question for a place like that... "what do you get for the phone calls you miss?"
I love the design of it though, I'd never even though about diverting flow toilets, but this design is so simple and elegant.
Running a small website with a calendar booking link just sounds much easier, cheaper, less error prone, and a better UX than running a voice LLM that is connected to a RAG and calendar. And I still don't think the technology around us has been built to support small websites or small businesses.
If the LLM augmented voicemail is not much more than the business voicemail service that such places have now, is it enough value add?
That also implies other things - such as the capability to integrate with the calendar and appointment system which I'm still in the very hesitant side, but it could be an interesting service add on if it was properly limited.