Number pronunciation is correct in the audio but transcript messes it up Vapi AI #support

Number pronunciation is correct in the audio but t...

giorgior27

02/21/2025, 7:47 PM

Hello, for the following call 2da027a4-a29f-4ed3-a168-61ca4ded7cf9 You can see around minute 3:01 that the assistant correctly pronunce the number but then looking at transcription it is completely different and wrong. It's not the first time that I notice that, how is the transcription handled?

SlaviSavanovic

02/22/2025, 5:06 AM

Are you using API? You can opt in for it to return the models response instead of the transcription response. You might also be able to do this in the dashboard but I am unaware of that method

Shubham Bajaj

02/22/2025, 2:09 PM

logs

🔵 13:34:09:337 Voice Input Formatted: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year.", Original: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year." 🔵 13:34:16:478

assistant

Final Transcript : between 125000 dollars per year.: 0.98535156 @giorgior27 could you please share other call IDs that are experiencing this issue? because it looks like can be solved using

endpointing

but wanted to be sure of it.

giorgior27

02/24/2025, 2:47 PM

I will search for the others, about endpointing you mean the setting in transcriber? I changed to 300.

giorgior27

02/25/2025, 9:29 AM

Just reproduced that bug. Call ID: e779bcfa-ff70-44b6-83b9-fadfa860a928 Time: minute 3:19 Same issue: Voice says between 55k and 70k but the transcription (and consequently the analysis that is based on transcription) reports 125k.

Shubham Bajaj

02/26/2025, 3:06 AM

🔵 09:09:17:474 Voice Input Formatted: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year.", Original: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year." 🔵 09:09:20:876 Conversation Buffer Appending Bot Message. Emitting "conversation-update"... { "role": "bot", "message": "The salary range for the retail store manager position is", "time": 1740474559081, "endTime": 1740474562131, "secondsFromStart": 200.51, "duration": 3050, "source": "" } 🔵 09:09:24:781 Conversation Buffer Appending Bot Message. Emitting "conversation-update"... { "role": "bot", "message": "between 125000 dollars per year.", "time": 1740474562131, "endTime": 1740474565891, "secondsFromStart": 203.56, "duration": 3760, "source": "" } @giorgior27 https://cdn.discordapp.com/attachments/1342583376707850250/1344143625394851920/Screenshot_2025-02-26_at_8.32.17_AM.png?ex=67bfd6cd&is=67be854d&hm=b198af0fcf7719a5d66d16e91f9903f3e660b711a9609f781c9f0826701eec34&

Shubham Bajaj

02/26/2025, 3:11 AM

@giorgior27 To improve this, you have a few options: 1. Use model output instead of transcription for analysis. 2. Adjust the transcription endpointing settings: This can help by giving the transcriber more time to process number sequences accurately. { startSpeakingPlan: { transcriptionEndpointingPlan: { onNumberSeconds: 1.0, // Increase from default 0.5s to give more time for number processing // for rest of the punctuations keep default or adjust based on needs } } } By increasing onNumberSeconds, you give the transcription system more time to accurately process number sequences before finalizing the transcript, which can help improve accuracy for complex number patterns like ranges.

giorgior27

02/26/2025, 10:40 AM

Yes playing with startspeaking plan is what I'm actually trying to do. Thank you for the suggestions

giorgior27

02/26/2025, 10:40 AM

Is the setting 300 for deepgram transcriber endpointing also impacting this? I saw it was 10 by default but followed the suggestion to switch to 300 ms

giorgior27

02/27/2025, 4:59 PM

@Shubham Bajaj sorry can you tell me about enpointing setting in transcriber if I did that good or not to change from 10 to 300?

Aditya

03/02/2025, 2:21 PM

Hey @giorgior27 sorry for the delay, yes it is advised to set deepgram endpointing value to 300 for better results.

Previous Next