Transcribing foreign names
# support
r
Hi, The transcriber is having issues transcribing foreign names. When it has this problem, it doesn't even try to transcribe, so the model doesn't receive any message, so the call ends in a silence timeout. Our config is: { provider: 'deepgram', model: 'nova-2', language: 'fr-CA', keywords: [], endpointing: 300, } Do you have a suggestion on how to progress?
v
You can continue this conversation by mentioning me in the message.
@radu Hello! I understand that you're experiencing issues with the transcriber not recognizing foreign names, leading to silence timeouts. Here are a couple of suggestions to help improve the transcription of foreign names: 1. **Custom Vocabulary**: If you have a list of foreign names that are commonly used, you can add them to a custom vocabulary list. This helps the transcriber recognize and accurately transcribe these names. 2. **Language Model**: Since you're using the 'fr-CA' language model, ensure that it's the best fit for the accent and dialect of the speakers. If the speakers have a different accent, consider using a model that closely matches their speech patterns. 3. **Disable Endpointing**: The endpointing feature might be cutting off audio before the foreign names are spoken. Try increasing the endpointing value or disabling it to see if that allows more time for the names to be captured and transcribed. 4. **Contact Deepgram Support**: If the issue persists, reaching out to Deepgram's support team might provide more tailored solutions. They could offer insights into optimizing the transcriber settings or updating the model to better handle foreign names. Would you like to try adjusting the endpointing value or disabling it to see if that resolves the issue? If so, I can guide you through the process.
r
Here's a call ID: f697314b-cafa-439f-80d7-ce904640174b After this message:
Copy code
{
  "role": "bot",
  "time": 1738003027309,
  "source": "",
  "endTime": 1738003032239,
  "message": "Merci. Puis-je avoir votre nom et le nom de votre entreprise s'il vous plaît.",
  "duration": 4689.9970703125,
  "secondsFromStart": 35.63
}
The caller said his name, and the name of his company. Only his company was transcribed.
Copy code
{
  "role": "user",
  "time": 1738003036579,
  "endTime": 1738003039199,
  "message": "Into un neuf sept deux inc.",
  "duration": 2620,
  "secondsFromStart": 44.9
}
Then, the model asked for his name. He said his name, nothing was transcribed. Then the idle message was triggered, telling him to repeat. He said his name again, accentuating it to be very clear. Nothing was transcribed again. Then the called timed out.
@User
I gave the audio file to Deepgram, with the same model and it properly transcribed his name. For some reason, when it goes through Vapi, it completely ignores it. Do you know why? Are you filtering out transcriptions for certain confidence values?
s
@radu Looking into it, why name is not transcribed.
@radu Try another call by setting **startSpeakingPlan.waitSeconds to 0.8 **this will give model ample to recognize and capture names correctly most likely the endpointing plans times early and doesn't gets capture. Give it a try and do let me know how it goes.
r
I’ll give this a try and let you know. However, I noticed that the name part had only 40-60% confidence when giving the audio file to Deepgram. Does Vapi filter low confidence transcriptions?
s
Yes we do filter out low confidence words.
r
When asking the caller for their name, and they have a foreign name/accent, and that’s all they say in their reply, it will get filtered out and the transcriber doesn’t send anything to the model. I think we would really benefit from a new parameter to pass to the transcriber to control the confidence threshold. Do you think it’s something that could be done?
It doesn’t only happen with foreign names, some times it happens with normal phrases too. In most cases, we’d rather send something somewhat wrong to the model, rather than nothing.
I tried this, and it's the same problem.
Here's the call ID: ac0cef98-013b-481e-a098-0ed53ebf5f10
Since you confirmed the filtering, I'm convinced that the name just has a low confidence percentage and gets filtered out. The problem with that is that the model doesn't receive anything at all, as if nothing was spoken. Foreign names is not the only case when this happens. We'd much rather the model get at least something, even if it's wrong, than nothing, which leads to silence timeouts, and horrible user experience for the callers.
s
By Filtering I mean the default score is high, and if words have low confidence score than the threshold then they get filtered out automatically.
r
Yes, that’s what I understand too, but this ends up filtering a whole reply from a caller, when they say a foreign name. Would it be possible to let us control that threshold? I’ve noticed that we’re not the only ones with this problem, it could be a simple fix to this.
s
Out of this what was spoken by the user? "Bonjour, quel est votre nomJe n'ai pas compris. Pouvez-vous répéter s'il vous plaît."
r
The user only repeated their name multiple times
s
What input it was, can you write it-out here.
r
Since it didn’t transcribe anything for 10s, the idle message was spoken, and then 15s later, the call timeouted since nothing was transcribed again
I’m going to send you a PM with the full name
s
you can also write here as well, to keep everything under the thread.
r
I sent you the name via PM for privacy reasons, since this is public
s
0.00-1.02 [Interim Result] Bonjour 0.00-2.05 [Interim Result] Bonjour, quel est votre 0.00-2.91 [Speech Final] Bonjour, quel est votre nom 2.91-3.97 [Interim Result] Alors 10.56-13.70 [Interim Result] Je n'ai pas compris 10.56-14.72 [Interim Result] Je n'ai pas compris. Pouvez-vous répéter ces 10.56-15.53 [Is Final] Je n'ai pas compris. Pouvez-vous répéter s'il vous plaît. 15.53-16.77 [Interim Result] Ma [Utterance End] Je n'ai pas compris. Pouvez-vous répéter s'il vous plaît.
r
Yes, nothing about the name the caller said 5-6 times
If only we could control that confidence filter threshold by say a “confidenceThreshold” parameter 😊
s
@radu something is wrong maybe with deepgram, checking with them allow me sometime to get back to you on this.
r
If you put the audio file to Deepgram playground with the same model, it does transcribe something but with very low confidence, like 40-60%. What’s the threshold that you filter out from on your side?
s
The Deepgram playground operates on a different model than that used for live audio streaming. Unfortunately, they have not established a connection with us regarding your call, which may affect our confidence in this matter. However, we appreciate your feature request and will certainly consider adding it.
r
In any case, the confidence of the spoken name was low. I hope that you do consider adding this, since we have a couple fairly large clients in Quebec that need foreign names to be transcribed (even incorrectly), otherwise it’s a definite “no” for adopting this, since most people calling them have foreign names.
I’d like to hear back about this as well, please 🙏
s
@radu The current update is that I am inquiring about your issue with Deepgram. However, it might appears as the audio may not have been sufficiently clear for the speech models, even after increasing the volume by several decibels.
r
Thank you for the update. The first call shared is a real client calling, and he was pretty clear, everything else was properly understood and was at the same volume. It was only when he replied with his name that there was no transcription.
It was also clear enough for the Deepgram Playground
s
There is difference in terms of quality and response when comes to live transcription and recored transcription.
r
Okay, let's see what Deepgram has to say about this.
s
FYI: Still waiting for resopnse from their side.
r
Hi, I want to follow-up on this. We had continued in our PMs. Did Deepgram answer? @Shubham Bajaj
p
Closing this ticket, had an discussion with Radu over DM.
4 Views