11labs Voice not being generated with high latency...
# support
a
Our team ran into the problem of vapi not generating assistant speech in the moments of long function calls on our custom llm backend. (10s for example) It cannot be solved by anything - we've tried using tags but it doesnt help. Example calls Id's: 0818c973-6730-42b4-8c8c-351821489856 9e0d2bdd-5722-4ecd-bbcd-d6317758a607 Sometimes in the logs we see "[CHECKPOINT] Voice audio requested" - but audio doesnt come even after that. In some cases there are no "Voice audio requested" at all! We use 11labs Eleven Turbo v2.5 with sarah voice I've even made a repo to test this case, you can test it too! https://github.com/ArtemPolydom/advanced-concepts-custom-llm Is there a solution? It is a disaster!
the fun thing is that it generates it sometimes, randomly
@User @User
s
@ArtemHorik Here are a few things you need to ensure it works: First, set Deepgram's endpointing to 300ms and then set startSpeakingPlan waitSeconds to 0.8 seconds. After that, it will capture the user input correctly and will have time to process it. Additionally, using ngrok will add some latency. Basically what happened is: The user input was captured, then a request was sent to your server. In between, the user spoke again and another request was sent, which caused the previous one to be discarded. During this time, the server took time to respond, which resulted in a few cycles of back-and-forth visible in the logs screenshot. My suggestion is to try adjusting the endpointing and startSpeakingPlan settings, and host it on a platform like Replit. When you try another call, you will notice the difference in latency. Do let me know how it goes!! https://cdn.discordapp.com/attachments/1331005546349985883/1331394807310057492/Screenshot_2025-01-22_at_04.13.32.png?ex=6791758a&is=6790240a&hm=cfa8463e27795a8cb292eb93a3e1452bb2cacb3fc35f2adfc11ee1247164c5f1& https://cdn.discordapp.com/attachments/1331005546349985883/1331394807880224839/Screenshot_2025-01-22_at_04.16.34.png?ex=6791758a&is=6790240a&hm=662fe23261b63b501c14a4fcfdd81024714290ed2134e717f230824d2dd2af97&
a
@Shubham Bajaj there are no Deepgram's endpointing settings in the agent dashboard But I'll try somehow, thank you
s
@ArtemHorik Here's the example-curl request to update your assistant config for the asked changes. Do let me know if you require further help.
Copy code
curl -X PATCH https://api.vapi.ai/assistant/your-assistant-id-here \
     -H "Authorization: Bearer token" \
     -H "Content-Type: application/json" \
     -d '{
  "transcriber": {
    "provider": "deepgram",
    "language": "en",
    "model": "nova-2-phonecall",
    "endpointing": 300
  },
  "startSpeakingPlan": {
    "waitSeconds": 0.8
  }
}'
a
@Shubham Bajaj This didn't really solve the problem. I'm not talking anything after the request - there are no interruption. But your backend still doesn't make a request for audio generation sometimes (it like waits till I end my speech freaking 10 seconds, how does it possible). Why it works perfectly with other voice providers but not with the 11labs? If this problem is only with 11labs, so it does not depend on transcriber settings. Is there a solution?
024fdcb8-637f-4b8c-9fd5-1550ce4625f2 this call was with endpointing 300 and it didnt generate a voice 2 times you can check
s
@ArtemHorik Thank you for bringing this to my attention. It might be an issue on our end. Please allow me some time to investigate whether it is related to our system or 11labs.
@ArtemHorik As evidenced by the screenshots and logs, the 11labs connection was closed until the output tokens were received for two cases. Consequently, this resulted in an awkward silence during the call. I kindly suggest sending the tokens as early as possible to prevent such occurrences in the future.
logs
šŸ”µ 23:15:22:026 ElevenLabs (WebSocket #21) Connecting... URL: wss://api.elevenlabs.io/v1/text-to-speech/oooop-hidden/stream-input?output_format=pcm_44100&optimize_streaming_latency=3&model_id=eleven_turbo_v2_5&enable_logging=false šŸ”µ 23:15:22:121 ElevenLabs (WebSocket #21) Connected. šŸ”µ 23:15:22:121 ElevenLabs (WebSocket #21) Sending BOS Message... šŸ”µ 23:15:34:739 CustomLLMRequest Messages: --request-aborted-because-of-user-input-recieved-again-- šŸ”µ 23:15:34:849 CustomLLMRequest Messages: šŸ”µ 23:15:35:914 [user CHECKPOINT] Model sent start token šŸ”µ 23:15:42:188 ElevenLabs (WebSocket #21) Input Timeout Exceeded. šŸ”µ 23:15:46:313 [user CHECKPOINT] Model sent first output token 🟔 23:15:46:434 ElevenLabs (Websocket #21) State "closed". Punting 56... "Thank you for your patience. Here is the payment link: [" šŸ”µ 23:13:43:408 ElevenLabs (WebSocket #6) Connecting... URL: wss://api.elevenlabs.io/v1/text-to-speech/hidden/stream-input?output_format=pcm_44100&optimize_streaming_latency=3&model_id=eleven_turbo_v2_5&enable_logging=false šŸ”µ 23:13:43:495 ElevenLabs (WebSocket #6) Sending BOS Message... šŸ”µ 23:13:54:846 CustomLLMRequest Messages: šŸ”µ 23:14:03:570 ElevenLabs (WebSocket #6) Input Timeout Exceeded. šŸ”µ 23:14:06:539 [user CHECKPOINT] Model sent first output token 🟔 23:14:06:858 ElevenLabs (Websocket #6) State "closed". Punting 72... "Apologies for the confusion. Here is the correct payment link for you: [" https://cdn.discordapp.com/attachments/1331005546349985883/1332618742408155189/Screenshot_2025-01-25_at_1.18.38_PM.png?ex=6795e96b&is=679497eb&hm=5d2200fe72702b7ae7773ad97138d976893402a05a546554428014d4da92c175& https://cdn.discordapp.com/attachments/1331005546349985883/1332618742739238964/Screenshot_2025-01-25_at_1.07.59_PM.png?ex=6795e96b&is=679497eb&hm=df8842a20f01bd4b9d969d4175288514e010a40ce2d645004186a053b78baa7e&