Random Words Changing Vapi AI #support

Random Words Changing

04/16/2024, 12:46 PM

We seem to have some random words changing and not sure why. For instance, the LLM understands the acroynim SEIS, but the output we get from the transcript and from Elevenlabs is changing in to Saizen EIS. We have no idea why this is happening. Similar, we are talking about a town called Hove. Sometimes it comes out as Hoe, sometimes Hovde. Any idea why this is happening?

Vapi Support

04/16/2024, 12:46 PM

Hi @J., I'm an AI assistant for Vapi. While a team member reviews this question, I'll look through relevant sources and see if I can provide suggestions. Please tag @Vapi Support with any follow-up questions.

Vapi Support

04/16/2024, 12:46 PM

----------------------------- **While I generate an answer, here are some sources that might be helpful**: * [Quickstart]() * [Vapi Web SDK]() * [Set up your Server URL]() * [Delete Assistant]() -----------------------------

Vapi Support

04/16/2024, 12:46 PM

### Answer The issue you're facing with the alteration of certain words or acronyms, like SEIS to "Saizen EIS" and the inconsistency in recognizing the town "Hove," can be attributed to the limitations within the speech-to-text and text-to-speech processes used by voice assistants. These components are crucial stages in voice AI interaction, involving the conversion of speech to text and vice versa, and are susceptible to inaccuracies due to various factors such as pronunciation, context understanding, or the inherent capabilities of the underlying models and providers like Eleven Labs [(1)]() . Voice AI systems, including those driven by LLMs (Large Language Models), process information in stages: speech-to-text (transcribing spoken language), LLM computation (interpreting and generating a response), and text-to-speech (converting the response back to spoken language). Each step has its own set of challenges and potential for error, especially with nuanced or less common terms [(2)]() . To minimize such discrepancies, consider using custom configurations or training data that include the specific terms, acronyms, or names you're working with. Adjustments at the transcriber level, choosing suitable models or providers, or even specifying pronunciation guides could enhance accuracy. However, it's important to recognize that complete elimination of such issues may not be feasible due to the inherent complexities of natural language processing and speech recognition technologies. For detailed guidance on customizing your voice assistant or exploring provider options that may offer better handling of specific terms or contexts, reviewing the documentation and available customization options is recommended [(3)]() .

Sahil

04/16/2024, 1:04 PM

Can you share the call_id of that particular call.

04/16/2024, 3:02 PM

Here is the one about Hove: 6e7ddca6-3295-47f9-b2ef-16c465048722 Here is the SEIS one: 3b181ac4-b6a6-4c2a-b106-80bfed443c2e @Sahil

Sahil

04/16/2024, 3:25 PM

I would suggest you do two things to resolve this issue. The first is to use the Deepgram Keyword Booster with the words "SEIS" and "Hove," which you can configure via an API call. The second is to change the Deepgram model to "Nova 2 Conversation" if you are using it via a web call. If you are using it via phone, then change the model to the "Nova 2 Phonecall" model. You need to add this property in your API call inside the transcriber.

"keywords": ["SEIS:1", "Hove:1"]

Some helpful references: https://docs.vapi.ai/api-reference/assistants/create-assistant (Check the transcriber section)

Sahil

04/16/2024, 3:28 PM

Also, to answer your question about why this is happening, Deepgram doesn't always produce a 100% accurate response. Sometimes, due to different accents, it produces some inaccurate responses.

04/16/2024, 3:40 PM

I thought similar originally, but then wasn't quite sure what Deepgram would have to do with it. For instance, if you look at the SEIS one. It's not something the user has ever said. At no point, does the user say SEIS, just the assistant, so Deepgram has nothing to do with this, as Deepgram is just doing the transcription. The issue appears to be somewhere between the LLM (GPT 3.5 in this case) and Elevenlabs

Sahil

04/16/2024, 3:53 PM

It is being generated by the assistant. Can you try changing your LM model to GPT-4 and testing it out?

04/16/2024, 3:56 PM

Yep exactly the same issue with GPT-4, if not even worse somehow. So many different ways it said SEIS. One time it even said "seasonally". Even changed "Neural Voice" to "Neuro Voice" then litereally one word later, spells it correctly. Call ID: bd8c0dba-1ae9-4ad2-8204-6b4b4d9535fa It's like it's trying to guess how it might be pronouced? If you just run the same prompt direct into GPT 4 or 3.5, you get the expected results, so I'm not sure what's happening

Sahil

04/16/2024, 4:40 PM

This is really strange. @nikhil could you please help me with this?

04/16/2024, 6:07 PM

It's odd as well that it's happening on multiple assistants, not just one. But it's also seems to be very specific words, not general words

nikhil

04/16/2024, 7:34 PM

yes we transcribe the bot speech, that's the issue were deepgram is misunderstanding and hence wrong transcription

nikhil

04/16/2024, 7:34 PM

we have fixed this in staging-api.vapi.ai / staging-dashboard.vapi.ai

nikhil

04/16/2024, 7:34 PM

try it

nikhil

04/16/2024, 7:34 PM

and lmk

04/16/2024, 9:12 PM

@nikhil If you take a listen to the reccordings though, the bot speach is saying those words. It's not even trying to say the correct word. But if you go direct into elevenlabs with the correct transcript, it has no problem saying it

04/16/2024, 9:13 PM

I'll give the staging a try

04/16/2024, 9:17 PM

2bee03c7-f27b-4dc5-831b-5ff3e59ccbb6 Here is a call I just had with the assistant on the staging version. It pronounces SEIS as: SEs say ACRS Says and Seas. Also similarly did Neural Voice as Neuro Voice again. Seems to be these two words quite constantly

04/16/2024, 9:41 PM

d6d1d940-72dc-4d7d-a335-5d951958cb9c Another one using Groq instead of GPT, same issues

04/19/2024, 9:36 AM

@Sahil @nikhil Any further thoughts on this?

04/21/2024, 2:30 PM

Is this being looked into? We have this sort of issue on quite a few of our assistants and we display the transcript live to users, so they see these errors in real time

Sahil

04/21/2024, 3:54 PM

We are working with the Deepgram team to explore ways to improve response quality. Specifically, we're seeking methods to capture certain data more clearly.

04/21/2024, 3:55 PM

Out of interest, why not just put what the LLM responds in the transcript? Why have Deepgram involved in the assistant transcript part?

Sahil

04/21/2024, 4:02 PM

To process the response of the Assistant sentence by sentence to help reduce latency.

05/20/2025, 12:20 AM

@Sahil Is there any way to get the correct output when the call ends?

05/20/2025, 12:21 AM

i understand you need deepgram to transcribe both the bot and the user text during the call, but after, why can't it be compiled so that the output transcript from the bot matches what they actually said

Kyle Brunker

05/20/2025, 7:14 PM

it is because there might be interruption during a call and showing the complete output based on it will cause confusion.

Previous Next