Transcribing foreign names Vapi AI #support

Transcribing foreign names

radu

01/27/2025, 6:53 PM

Hi, The transcriber is having issues transcribing foreign names. When it has this problem, it doesn't even try to transcribe, so the model doesn't receive any message, so the call ends in a silence timeout. Our config is: { provider: 'deepgram', model: 'nova-2', language: 'fr-CA', keywords: [], endpointing: 300, } Do you have a suggestion on how to progress?

Vapi

01/27/2025, 6:55 PM

You can continue this conversation by mentioning me in the message.

Vapi

01/27/2025, 6:55 PM

@radu Hello! I understand that you're experiencing issues with the transcriber not recognizing foreign names, leading to silence timeouts. Here are a couple of suggestions to help improve the transcription of foreign names: 1. **Custom Vocabulary**: If you have a list of foreign names that are commonly used, you can add them to a custom vocabulary list. This helps the transcriber recognize and accurately transcribe these names. 2. **Language Model**: Since you're using the 'fr-CA' language model, ensure that it's the best fit for the accent and dialect of the speakers. If the speakers have a different accent, consider using a model that closely matches their speech patterns. 3. **Disable Endpointing**: The endpointing feature might be cutting off audio before the foreign names are spoken. Try increasing the endpointing value or disabling it to see if that allows more time for the names to be captured and transcribed. 4. **Contact Deepgram Support**: If the issue persists, reaching out to Deepgram's support team might provide more tailored solutions. They could offer insights into optimizing the transcriber settings or updating the model to better handle foreign names. Would you like to try adjusting the endpointing value or disabling it to see if that resolves the issue? If so, I can guide you through the process.

radu

01/27/2025, 7:05 PM

Here's a call ID: f697314b-cafa-439f-80d7-ce904640174b After this message:

Copy code

{
  "role": "bot",
  "time": 1738003027309,
  "source": "",
  "endTime": 1738003032239,
  "message": "Merci. Puis-je avoir votre nom et le nom de votre entreprise s'il vous plaît.",
  "duration": 4689.9970703125,
  "secondsFromStart": 35.63
}

The caller said his name, and the name of his company. Only his company was transcribed.

Copy code

{
  "role": "user",
  "time": 1738003036579,
  "endTime": 1738003039199,
  "message": "Into un neuf sept deux inc.",
  "duration": 2620,
  "secondsFromStart": 44.9
}

Then, the model asked for his name. He said his name, nothing was transcribed. Then the idle message was triggered, telling him to repeat. He said his name again, accentuating it to be very clear. Nothing was transcribed again. Then the called timed out.

radu

01/27/2025, 7:05 PM

@User

radu

01/27/2025, 7:07 PM

I gave the audio file to Deepgram, with the same model and it properly transcribed his name. For some reason, when it goes through Vapi, it completely ignores it. Do you know why? Are you filtering out transcriptions for certain confidence values?

Shubham Bajaj

01/29/2025, 8:50 AM

@radu Looking into it, why name is not transcribed.

Shubham Bajaj

01/29/2025, 8:56 AM

@radu Try another call by setting **startSpeakingPlan.waitSeconds to 0.8 **this will give model ample to recognize and capture names correctly most likely the endpointing plans times early and doesn't gets capture. Give it a try and do let me know how it goes.

radu

01/30/2025, 1:35 AM

I’ll give this a try and let you know. However, I noticed that the name part had only 40-60% confidence when giving the audio file to Deepgram. Does Vapi filter low confidence transcriptions?

Shubham Bajaj

01/30/2025, 8:59 AM

Yes we do filter out low confidence words.

radu

01/30/2025, 1:14 PM

When asking the caller for their name, and they have a foreign name/accent, and that’s all they say in their reply, it will get filtered out and the transcriber doesn’t send anything to the model. I think we would really benefit from a new parameter to pass to the transcriber to control the confidence threshold. Do you think it’s something that could be done?

radu

01/30/2025, 1:15 PM

It doesn’t only happen with foreign names, some times it happens with normal phrases too. In most cases, we’d rather send something somewhat wrong to the model, rather than nothing.

radu

01/31/2025, 4:51 PM

I tried this, and it's the same problem.

radu

01/31/2025, 5:06 PM

Here's the call ID: ac0cef98-013b-481e-a098-0ed53ebf5f10

radu

01/31/2025, 5:08 PM

Since you confirmed the filtering, I'm convinced that the name just has a low confidence percentage and gets filtered out. The problem with that is that the model doesn't receive anything at all, as if nothing was spoken. Foreign names is not the only case when this happens. We'd much rather the model get at least something, even if it's wrong, than nothing, which leads to silence timeouts, and horrible user experience for the callers.

Shubham Bajaj

02/01/2025, 3:41 AM

By Filtering I mean the default score is high, and if words have low confidence score than the threshold then they get filtered out automatically.

radu

02/01/2025, 3:52 AM

Yes, that’s what I understand too, but this ends up filtering a whole reply from a caller, when they say a foreign name. Would it be possible to let us control that threshold? I’ve noticed that we’re not the only ones with this problem, it could be a simple fix to this.

Shubham Bajaj

02/01/2025, 3:55 AM

Out of this what was spoken by the user? "Bonjour, quel est votre nomJe n'ai pas compris. Pouvez-vous répéter s'il vous plaît."

radu

02/01/2025, 3:56 AM

The user only repeated their name multiple times

Shubham Bajaj

02/01/2025, 3:57 AM

What input it was, can you write it-out here.

radu

02/01/2025, 3:57 AM

Since it didn’t transcribe anything for 10s, the idle message was spoken, and then 15s later, the call timeouted since nothing was transcribed again

radu

02/01/2025, 3:57 AM

I’m going to send you a PM with the full name

Shubham Bajaj

02/01/2025, 3:58 AM

you can also write here as well, to keep everything under the thread.

radu

02/01/2025, 3:59 AM

I sent you the name via PM for privacy reasons, since this is public

Shubham Bajaj

02/01/2025, 4:00 AM

0.00-1.02 [Interim Result] Bonjour 0.00-2.05 [Interim Result] Bonjour, quel est votre 0.00-2.91 [Speech Final] Bonjour, quel est votre nom 2.91-3.97 [Interim Result] Alors 10.56-13.70 [Interim Result] Je n'ai pas compris 10.56-14.72 [Interim Result] Je n'ai pas compris. Pouvez-vous répéter ces 10.56-15.53 [Is Final] Je n'ai pas compris. Pouvez-vous répéter s'il vous plaît. 15.53-16.77 [Interim Result] Ma [Utterance End] Je n'ai pas compris. Pouvez-vous répéter s'il vous plaît.

radu

02/01/2025, 4:04 AM

Yes, nothing about the name the caller said 5-6 times

radu

02/01/2025, 4:05 AM

If only we could control that confidence filter threshold by say a “confidenceThreshold” parameter 😊

Shubham Bajaj

02/01/2025, 4:26 AM

@radu something is wrong maybe with deepgram, checking with them allow me sometime to get back to you on this.

radu

02/01/2025, 4:28 AM

If you put the audio file to Deepgram playground with the same model, it does transcribe something but with very low confidence, like 40-60%. What’s the threshold that you filter out from on your side?

Shubham Bajaj

02/01/2025, 4:29 AM

The Deepgram playground operates on a different model than that used for live audio streaming. Unfortunately, they have not established a connection with us regarding your call, which may affect our confidence in this matter. However, we appreciate your feature request and will certainly consider adding it.

radu

02/01/2025, 4:35 AM

In any case, the confidence of the spoken name was low. I hope that you do consider adding this, since we have a couple fairly large clients in Quebec that need foreign names to be transcribed (even incorrectly), otherwise it’s a definite “no” for adopting this, since most people calling them have foreign names.

radu

02/01/2025, 4:35 AM

I’d like to hear back about this as well, please 🙏

Shubham Bajaj

02/03/2025, 4:33 AM

@radu The current update is that I am inquiring about your issue with Deepgram. However, it might appears as the audio may not have been sufficiently clear for the speech models, even after increasing the volume by several decibels.

radu

02/03/2025, 1:30 PM

Thank you for the update. The first call shared is a real client calling, and he was pretty clear, everything else was properly understood and was at the same volume. It was only when he replied with his name that there was no transcription.

radu

02/03/2025, 1:31 PM

It was also clear enough for the Deepgram Playground

Shubham Bajaj

02/03/2025, 5:27 PM

There is difference in terms of quality and response when comes to live transcription and recored transcription.

radu

02/03/2025, 9:31 PM

Okay, let's see what Deepgram has to say about this.

Shubham Bajaj

02/04/2025, 10:21 AM

FYI: Still waiting for resopnse from their side.

radu

02/12/2025, 2:03 PM

Hi, I want to follow-up on this. We had continued in our PMs. Did Deepgram answer? @Shubham Bajaj

Praveen

02/12/2025, 6:27 PM

Closing this ticket, had an discussion with Radu over DM.

4 Views

Previous Next