Problems with voice transcription and unexpected language changes in the assistant's voice Vapi AI #support

Problems with voice transcription and unexpected l...

juliointheworld

04/08/2025, 9:16 PM

Call-id: f1e8ae95-322f-4d67-b1ae-6a08d5301121 call_id: a76bcbc5-54b1-4038-be35-e18abbdb5c2d Call_id: 51061192-4aab-415a-9bd9-2ed4f09095f5 1. In multiple calls, the assistant correctly pronounces the user's phone number, but when transcribing it (voice to text), it is captured incorrectly, which generates errors in the execution of functions. 2. When repeating or confirming phone numbers, the assistant's voice changes to what appears to be Portuguese, even though the entire workflow is set to Spanish. I appreciate your help in resolving these issues. I look forward to your response or recommendations for improving system performance.

Vapi Support Bot

04/08/2025, 9:19 PM

Hi there! 👋 I understand you're experiencing some frustrating issues with your voice AI system. While I don't have access to the specific call recordings from the IDs you shared, I can definitely help with both the phone number transcription issues and the unexpected language changes you're experiencing. ## Improving Voice-to-Text Accuracy for Phone Numbers For the phone number transcription issues, here are some adjustments you can make: 1. **Adjust Transcription Endpointing Settings**: Increase the

onNumberSeconds

parameter to give the system more time to process number sequences accurately:

Copy code

json
{
  startSpeakingPlan: {
    transcriptionEndpointingPlan: {
      onNumberSeconds: 1.0  // Increase from default 0.5s
    }
  }
}

[Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1342583376707850250) 2. **Adjust

numberToDigitsCutoff

Setting**: This determines when numbers are transcribed as words or digits. Setting it higher than expected numbers ensures they're converted to words:

Copy code

json
{
  numberToDigitsCutoff: 100000  // For numbers like "12345" to be spoken as words
}

[Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1342349738032500736) 3. **Consider DTMF as Alternative**: For critical number capture, enabling keypad input (DTMF) can be more accurate than voice transcription [Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1341108728115757138). ## Preventing Unexpected Language Changes For the language switching issue (Spanish to Portuguese): 1. **Explicitly Set the Language**: Use the

language

property in your voice configuration:

Copy code

json
{
  "voice": {
    "provider": "azure",
    "voiceId": "es-ES-ElviraNeural",
    "language": "es-ES"
  }
}

[Source](https://docs.vapi.ai/changelog/2024/11/24)

Vapi Support Bot

04/08/2025, 9:19 PM

2. **Ensure Voice Model Language Consistency**: Make sure the voice model's language matches your desired language (Spanish) [Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1270786139355090995). 3. **Configure Voice Fallback Plan**: If using fallback voices, ensure they also use Spanish to prevent unexpected language switches [Source](https://docs.vapi.ai/changelog/2024/11/24). 4. **Disable Smart Endpointing**: Set

smartEndpointingEnabled

false

as it only works for English transcriptions [Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334385998423130174). 5. **Configure chunkPlan for Spanish**: Ensure

chunkPlan.enabled

is set to

true

to improve sentence boundary detection for Spanish responses [Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343543475383046185). ## General Performance Improvements 1. **Adjust Voice Recognition Settings**: Fine-tune wait time and sensitivity settings to improve understanding [Source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333067714729607228). 2. **Optimize Prompt Engineering**: Structure prompts effectively for voice interactions and add clear error handling [Source](https://docs.vapi.ai/assistants). 3. **Test and Iterate**: Use Voice Testing to automate evaluation of your AI voice agents [Source](https://docs.vapi.ai/test/voice-testing). Would you like me to elaborate on any specific aspect of these recommendations or help with implementing any particular solution? Sources: - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1342583376707850250) - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1342349738032500736) - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1341108728115757138)

Vapi Support Bot

04/08/2025, 9:19 PM

- [Vapi Changelog](https://docs.vapi.ai/changelog/2024/11/24) - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1270786139355090995) - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334385998423130174) - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343543475383046185) - [Discord Thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333067714729607228) - [Vapi Assistants Documentation](https://docs.vapi.ai/assistants) - [Vapi Voice Testing Documentation](https://docs.vapi.ai/test/voice-testing)

juliointheworld

04/09/2025, 2:23 AM

Thanks, Why does Neha from Vapi's voice change language when saying numbers?

juliointheworld

04/09/2025, 4:12 PM

@User Hi

Kings_big💫

04/09/2025, 4:20 PM

Check multilingual settings, make sure that the transcriber and corresponding language settings are correctly configured to match the desired language for both text input and speech output and choosing a voice that aligns with the intended language can help maintain consistency in pronunciation

Kyle Brunker

04/10/2025, 11:26 AM

To help track down this issue, could you share for each call ID: - When exactly this happened (the timestamp) - What response you expected to get - What response you actually got instead This would really help us figure out what went wrong!

Kyle Brunker

04/10/2025, 11:28 AM

I know you have already shared the most of information but sharing the following will help me investigate it without any confusion/assumptions.

juliointheworld

04/11/2025, 8:49 PM

CALL-ID: f1e8ae95-322f-4d67-b1ae-6a08d5301121 4:15 - 4:27 -> The user enters their cell phone number and the transcription is fine. 4:28 - 4:39 -> The Assistant says the correct cell phone number but in the transcription it omits a number (4) *Expected response - 304(4)126258: Tres cero cuatro CUATRO uno dos seis dos, cinco ocho. *Real response - 304126258: Tres cero cuatro. Uno dos seis dos cinco ocho. 4:43 - 5:05 -> Since the number was not transcribed correctly, the function returns an error because the required 10 digits are missing (even though the number was correctly confirmed to the user). Re-enter the response and correct the error.

juliointheworld

04/11/2025, 8:49 PM

CALL_ID: a76bcbc5-54b1-4038-be35-e18abbdb5c2d 3:19 - 3:30 -> The user says the document number and the transcription does not capture two numbers. *User response - 986(6)(9)538 : Cédula de ciudadanía, nueve ocho seis SEIS NUEVE cinco tres ocho *User Transcription - 986538: Cédula de ciudadanía, nueve ocho seis cinco tres ocho 3:30 a 3:42 -> The assistant says the document number and the transcription does not capture a number. *AI response - (9)86538: Tipo de documento, cédula de ciudadanía, número de documento. NUEVE Ocho seis cinco tres ocho. Es correcto. *AI Transcription - 86538: Tipo de documento, cédula de ciudadanía, número de documento. Ocho seis cinco tres ocho. Es correcto. 4:03 a 4:14 -> The assistant says the user's number correctly but the transcription does not capture a number. *AI response - 9866953(8): Tipo di documento, cédula de ciudadanía, número de documento, nueve ocho seis seis nueve cinco tres OCHO, es correcto? *AI Transcription - 9866953: Tipo di documento, cédula de ciudadanía, número de documento, nueve ocho seis seis nueve cinco tres correcto? 4:52 a 5:05 -> The user says the phone number but the transcriber does not capture a number. *User response - 30441262(5)8: Tres cero cuatro cuatro uno dos seis dos CINCO ocho. *User Transcription - 304412628: Tres cero cuatro cuatro uno dos seis dos ocho. 5:41 a 5:51 -> The assistant says the phone number but the transcriber doesn't pick up a number. *AI response - 304412625(8): Número de celular. Tres cero cuatro cuatro uno dos seis dos cinco OCHO¿Es correcto? *AI Transcription - 304412625: Número de celular. Tres cero cuatro cuatro uno dos seis dos cinco ¿Es correcto? 5:55 -> In this call, although the transcriber did not capture the complete numbers, he did send the correct number in the appointment creation function.

juliointheworld

04/11/2025, 8:50 PM

CALL-ID: 51061192-4aab-415a-9bd9-2ed4f09095f5 3:31 a 3:45 -> The assistant correctly repeats the user's document number, but the transcriber does not capture three numbers in different positions. *AI response - 986(6)(9)5(3)7: Claro, tipo de documento. Cédula de ciudadanía, número de documento, nueve ocho, seis SEIS NUEVE cinco. TRES Siete. ¿Es correcto? *AI Transcription - 98657: Claro, tipo de documento. Cédula de ciudadanía, número de documento, nueve ocho, seis cinco. Siete. ¿Es correcto?

juliointheworld

04/11/2025, 8:50 PM

NOTES: 1- Add the numbers in digits to make it easier to understand the error. 2- You can notice the voice changing to a Portuguese accent at most of the specified moments. I opted to change the voice to an Azure one, which wouldn't present these issues. 3- After posting this ticket, I tested the two best versions of Deepgram for Latin American Spanish: Nova 2 (es-419) and Nova 3 (multi). They continued to present issues, and I preferred to leave the Nova 3 (multi) model with a Confidence Threshold set to 0.23; it still works with difficulty. 4- I appreciate any help that allows me to lower the failure rate when capturing numbers, as these are critical processes within my workflow. The DTMF keyboard option is not something we want to implement. Thanks in advance for the help. I'm excited about what we are creating and will create with VAPI.

Kyle Brunker

04/11/2025, 10:26 PM

Looking into it

juliointheworld

04/16/2025, 2:50 AM

Checking in to see if you've had a chance to review the issue. I’d really appreciate any updates or input

juliointheworld

04/16/2025, 3:57 PM

@Vapi @Shubham Bajaj

Shubham Bajaj

04/21/2025, 12:24 PM

@juliointheworld Hey, sorry I was out of office so couldn't reply to you earlier. I'll take a look at all of your call IDs now

Shubham Bajaj

04/21/2025, 12:46 PM

@juliointheworld Your callers do say every digit, and the assistant does pronounce every digit, but the STT (speech‑to‑text) engine often collapses two identical digits that come back‑to‑back (“44”→“4”, “66”→“6”, etc.). Because your downstream function expects exactly 10 digits, any missing digit produces validation errors and a painful correction loop

Shubham Bajaj

04/21/2025, 12:52 PM

What you can do here:- • SSML Parsing for 11Labs: Enable SSML parsing for 11Labs and the assistant, then pass digits to the 11Labs voice provider with break time or pause time between digits. • Number Chunking: Instruct the assistant to break two-digit groups into individual digits during output generation. • * Automatic Self-Critique:* Implement an automatic self-critique that compares the tool call response. If an error is detected, the assistant should apologize, re-spell the digits slowly, and confirm with the user, and then retrigger the tool.

3 Views

Previous Next