Number pronunciation is correct in the audio but t...
# support
g
Hello, for the following call 2da027a4-a29f-4ed3-a168-61ca4ded7cf9 You can see around minute 3:01 that the assistant correctly pronunce the number but then looking at transcription it is completely different and wrong. It's not the first time that I notice that, how is the transcription handled?
s
Are you using API? You can opt in for it to return the models response instead of the transcription response. You might also be able to do this in the dashboard but I am unaware of that method
s
logs
šŸ”µ 13:34:09:337 Voice Input Formatted: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year.", Original: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year." šŸ”µ 13:34:16:478
assistant
Final Transcript : between 125000 dollars per year.: 0.98535156 @giorgior27 could you please share other call IDs that are experiencing this issue? because it looks like can be solved using
endpointing
but wanted to be sure of it.
g
I will search for the others, about endpointing you mean the setting in transcriber? I changed to 300.
Just reproduced that bug. Call ID: e779bcfa-ff70-44b6-83b9-fadfa860a928 Time: minute 3:19 Same issue: Voice says between 55k and 70k but the transcription (and consequently the analysis that is based on transcription) reports 125k.
s
šŸ”µ 09:09:17:474 Voice Input Formatted: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year.", Original: "The salary range for the retail store manager position is between fifty-five thousand and seventy thousand dollars per year." šŸ”µ 09:09:20:876 Conversation Buffer Appending Bot Message. Emitting "conversation-update"... { "role": "bot", "message": "The salary range for the retail store manager position is", "time": 1740474559081, "endTime": 1740474562131, "secondsFromStart": 200.51, "duration": 3050, "source": "" } šŸ”µ 09:09:24:781 Conversation Buffer Appending Bot Message. Emitting "conversation-update"... { "role": "bot", "message": "between 125000 dollars per year.", "time": 1740474562131, "endTime": 1740474565891, "secondsFromStart": 203.56, "duration": 3760, "source": "" } @giorgior27 https://cdn.discordapp.com/attachments/1342583376707850250/1344143625394851920/Screenshot_2025-02-26_at_8.32.17_AM.png?ex=67bfd6cd&is=67be854d&hm=b198af0fcf7719a5d66d16e91f9903f3e660b711a9609f781c9f0826701eec34&
@giorgior27 To improve this, you have a few options: 1. Use model output instead of transcription for analysis. 2. Adjust the transcription endpointing settings: This can help by giving the transcriber more time to process number sequences accurately. { startSpeakingPlan: { transcriptionEndpointingPlan: { onNumberSeconds: 1.0, // Increase from default 0.5s to give more time for number processing // for rest of the punctuations keep default or adjust based on needs } } } By increasing onNumberSeconds, you give the transcription system more time to accurately process number sequences before finalizing the transcript, which can help improve accuracy for complex number patterns like ranges.
g
Yes playing with startspeaking plan is what I'm actually trying to do. Thank you for the suggestions
Is the setting 300 for deepgram transcriber endpointing also impacting this? I saw it was 10 by default but followed the suggestion to switch to 300 ms
@Shubham Bajaj sorry can you tell me about enpointing setting in transcriber if I did that good or not to change from 10 to 300?
a
Hey @giorgior27 sorry for the delay, yes it is advised to set deepgram endpointing value to 300 for better results.