Improving latency Vapi AI #support

Improving latency

marem

03/31/2025, 12:21 PM

I try to improve latency for my agents When I check in call logs, I see some logs about turn latency which most of the time contains

endpointing

around 1500ms. What is

endpointing

means? Is there a way to improve this time?

Copy code

Turn latency: 2719ms (transcriber: 232ms, endpointing: 1501ms, kb: N/A, model: 532ms, voice: 406ms)

Vapi Support Bot

03/31/2025, 12:23 PM

Hi there! I see you're looking to improve latency for your Vapi agents 👋 I can definitely explain what "endpointing" means and share some ways to improve this time in your call logs. ## What is endpointing? Endpointing in Vapi is the process of determining when a user has finished speaking. According to the [Vapi documentation](https://docs.vapi.ai/how-vapi-works), it's "a fancy word for knowing when the user is done speaking." Vapi uses a custom fusion audio-text model to detect when a user has completed their turn, considering both the user's tone and what they're saying to decide how long to pause before sending the input to the LLM. Traditional methods use silence detection with a timeout, but Vapi's approach combines silence detection with machine learning models to properly endpoint conversation speech, preventing improper interruption while encouraging proper backchanneling. ## How to improve endpointing time Since your logs show endpointing taking around 1500ms, here are several ways to improve this time: 1. **Adjust Wait Timeout**: Set

waitSeconds

to around 0.8 seconds in your

startSpeakingPlan

configuration:

Copy code

json
   "startSpeakingPlan": {
     "waitSeconds": 0.8,
     "smartEndpointingEnabled": false
   }

This is suggested as a median value in [Vapi's Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336360452208332831). 2. **Configure Smart Endpointing**: You can enable or disable Smart Endpointing based on your needs. It's off by default but can be turned on for scenarios where you want to avoid interrupting users while they think through responses, as mentioned in the [speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration).

Vapi Support Bot

03/31/2025, 12:23 PM

3. **Use Custom Endpointing**: Implement phrase-based rules for custom endpointing, which can help improve response times. There's a [gist with custom endpointing rules](https://gist.github.com/bajajcodes/cd664c81344ef9f27a4d304de43bd313) that might be helpful. 4. **Optimize VAPI Configuration**: Review settings like

silenceTimeoutSeconds

responseDelaySeconds

, and

llmRequestDelaySeconds

that might affect response times. 5. **If using Deepgram**: Set Deepgram's endpointing to 300ms, as suggested in [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). 6. **Consider Hosting Location**: Host Vapi in a region closer to your users to reduce latency. By default, Vapi servers are in US regions, but you can consider on-premises solutions for specific regions. Is there anything specific about these suggestions you'd like me to elaborate on? Sources: - [Vapi documentation](https://docs.vapi.ai/how-vapi-works) - [Vapi glossary](https://docs.vapi.ai/glossary) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336360452208332831) - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) - [Discord discussions on Deepgram](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272)

marem

03/31/2025, 12:43 PM

We use the default "startSpeakingPlan" config which default value for waitSeconds is 0.4. Why are you suggesting I increase it? @User

marem

03/31/2025, 1:22 PM

How can I configure "On No Punctuation Seconds" using api @Vapi Support Bot ?

Vapi Support Bot

03/31/2025, 1:23 PM

Hi there! 👋 I can help you with configuring the "On No Punctuation Seconds" setting using the Vapi API. I'm unable to address why the previous response suggested increasing waitSeconds from 0.4 to 0.8 as I don't have that specific reasoning. However, I can definitely help with your current question about configuring "On No Punctuation Seconds" via the API! ## How to configure "On No Punctuation Seconds" using the API The

onNoPunctuationSeconds

parameter is located within the

transcriptionEndpointingPlan

object, which is nested inside the

startSpeakingPlan

configuration. According to the [Vapi API documentation](https://docs.vapi.ai/api-reference/assistants/create-assistant), you can configure it as follows:

Copy code

json
"startSpeakingPlan": {
    "waitSeconds": 0.4,
    "smartEndpointingEnabled": false,
    "customEndpointingRules": [
        {
            "type": "assistant",
            "regex": "regex",
            "timeoutSeconds": 1.1
        }
    ],
    "transcriptionEndpointingPlan": {
        "onPunctuationSeconds": 0.1,
        "onNoPunctuationSeconds": 1.5,
        "onNumberSeconds": 0.5
    }
}

To update this setting, you'll need to use the Vapi API's update endpoint for assistants. Based on the [information from Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333057366748565514), you can do this using either the Vapi API client or another API client like Postman. The API endpoint documentation can be found at: [https://docs.vapi.ai/api-reference/assistants/update](https://docs.vapi.ai/api-reference/assistants/update?playground=%2Fapi-reference%2Fassistants%2Fupdate)

Vapi Support Bot

03/31/2025, 1:23 PM

For context, the

onNoPunctuationSeconds

parameter (default value 1.5) is used when a customer has been silent for a while and has finished speaking, but the transcriber isn't confident enough to add punctuation to the transcription, as explained in the [speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration). Is there anything specific about configuring this parameter that you'd like me to explain further? Sources: - [Vapi API documentation](https://docs.vapi.ai/api-reference/assistants/create-assistant) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333057366748565514) - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

marem

03/31/2025, 1:37 PM

ok. My goal now is to improve the TTS so it includes punctuation on end of call. Is there some settings I can play with? @Vapi Support Bot

Vapi Support Bot

03/31/2025, 1:39 PM

Hi there! 👋 I'd be happy to help you improve TTS punctuation at the end of calls in Vapi. There are several settings you can adjust to improve how TTS handles punctuation at the end of calls: ## Voice Input Formatting Enable the

Voice Input Formatted

feature which specifically fixes punctuation and makes text more readable for TTS. According to the [Vapi documentation](https://docs.vapi.ai/assistants/voice-formatting-plan), this feature orchestrates mini-functions that together fix punctuation and make text more readable out loud. ## Chunk Plan Configuration The

chunkPlan

settings control how text is processed before being sent to the TTS provider:

Copy code

json
{
  "voice": {
    "chunkPlan": {
      "enabled": true,
      "minCharacters": 30, // Adjust between 1-80
      "punctuationBoundaries": [".", "!", "?", ";"] // Customize based on your needs
    }
  }
}

According to [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343543475383046185), keeping

chunkPlan.enabled: true

is particularly important for proper sentence boundary detection using language-appropriate punctuation. ## Transcription Endpointing Plan The

onNoPunctuationSeconds

parameter (which you asked about earlier) is particularly relevant here:

Copy code

json
"startSpeakingPlan": {
    "transcriptionEndpointingPlan": {
        "onPunctuationSeconds": 0.1,
        "onNoPunctuationSeconds": 1.5,
        "onNumberSeconds": 0.5
    }
}

This parameter is used when a customer has been silent for a while and has finished speaking, but the transcriber isn't confident enough to add punctuation to the transcription. ## Input Punctuation Boundaries Use the

assistant.voice.inputPunctuationBoundaries

setting to help determine the end of a complete message. When the voice input contains punctuation listed in this setting, it's considered a valid boundary, indicating the end of a complete message. ## SSML Support

Vapi Support Bot

03/31/2025, 1:39 PM

Enable SSML (Speech Synthesis Markup Language) parsing with the

enableSsmlParsing

parameter, which allows for more control over speech synthesis, including punctuation. ## Voice Provider Settings If you're using ElevenLabs, you can adjust parameters like

stability

and

similarityBoost

which can influence the overall quality of the TTS output. Try a call with:

stability: 0.5 similarityBoost: 0.75

. ## Model Output in Messages Enable

modelOutputInMessagesEnabled

to use the model's output instead of transcribed output in messages, which may improve TTS:

Copy code

curl -X PATCH https://api.vapi.ai/assistant/your-assistant-id \
     -H "Authorization: Bearer your-private-key" \
     -H "Content-Type: application/json" \
     -d '{
          "modelOutputInMessagesEnabled": true
        }'

Would you like me to elaborate on any of these specific settings? Sources: - [Vapi documentation](https://docs.vapi.ai/assistants/voice-formatting-plan) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343543475383046185) - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Aniah

03/31/2025, 2:40 PM

Hey! To help track down this issue, could you share: - The call ID - When exactly this happened (the timestamp) This would really help us figure out what went wrong!

marem

03/31/2025, 2:55 PM

Hey @Shubham Bajaj , Here is a callId: https://dashboard.vapi.ai/calls/24203066-9f79-4ff1-bfd0-7e8d11c35227 Log:

Copy code

12:11:39:025
[INFO]
Turn latency: 2349ms (transcriber: 211ms, endpointing: 1500ms, kb: N/A, model: 406ms, voice: 188ms)

I want to improve endpointing time

Aniah

03/31/2025, 10:07 PM

Hey Mareem, can you try one call without your 11Labs key? Can you try another call with your 11Labs key and Deepgram end-pointinging set to 300ms? Can you try a final third call with Deepgram end-pointinging set to 300ms and without your 11Labs key? You will notice a difference in all three of them, and my bet is on the last option, the last third call. To give you more on this, you need to set the Deepgram end-pointing to 300ms. You need to configure your Start Speaking Plan waitSeconds to 0.6, and play out with Start Speaking Plan transcription end-pointing plan according to your requirement and the call. You cannot use the Smart End Pointing Plan of the Start Speaking Plan because it only works for English transcriptions. The conclusion is to try one call with Deepgram end-pointing set to 300ms without your 11Labs key and then play out with Start Speaking Plan according to your call requirements. The Start Speaking Plan smart end-pointing plan does not work for non-English transcriptions.

marem

04/01/2025, 8:48 AM

Lots of insights, thanks @Shubham Bajaj . Couple of questions before starting my tests: - I use cartesia for these calls. Why should I remove 11labs key? (is there a side-effect ?) - I didn't activated Smart End pointing Plan because it' s mentioned it only workds for english. So ok for that. - my current start speaking plan use default waitSeconds value which is 0.4, you suggest to pass it to 0.6? is it because I need to increase Deepgram end-pointing to 300ms?

Aniah

04/02/2025, 2:43 PM

looking into it.

Aniah

04/02/2025, 3:20 PM

Marem, Thanks for pointing it out. I mistakenly mixed up your call with some other call. Sorry for that. You are using Cartesia, and your 11Labs key has no side effects on it. Again, smart end pointing a mistake from my side. Now coming to endpointing. What's happening here? It's time to identify when the user has stopped speaking. So, DeepGram endpointing to 300ms gives DeepGram enough time to identify the user as finished speaking. Meanwhile, we are also checking in parallel if the user has finished speaking and setting it to wait seconds to 0.6. This complements the DeepGram endpointing. The next thing which you need to do is you need to play out with the transcription endpointing plan because your call language is non-English, and in this case, you will be required to set the onPunctuationSeconds to 0.1, onNumberSeconds to 0.5, and onNoPunctuationSeconds in between 1 to 1.2 (why because it might get a little aggressive on non English transcriptions). Finally, to get the best results, you have to use A/B testing where you compare the two calls and see what works best for you in the real environment.

Aniah

04/02/2025, 3:21 PM

Hello, how can I help you further?

marem

04/03/2025, 4:40 AM

Ok thanks. During my tests, I realiased most of the issues come from single utterance not caught sometimes ("oui", "Allo", ...). I increased Deepgram endpointing to 300ms and it seems better. I am also performing some tests with other transcribers: nova-3 with multi, and google gemini-2.0-flash to check if they behave better in the case of single utterance. Maybe you have insights about these 2 TTS models? Also, I was wondering if there is a way to override some of the transcription, for instance with deepgram. When Deepgram returns "Âllo", I would like to override it to "Âllo?" to trigger the onPunctuationSeconds. Do you know if it's feasable in some way?

marem

04/03/2025, 5:03 AM

I found

customEndpointingRules

on the startSpeakingPlan, I'll experiment with it

marem

04/03/2025, 11:11 AM

What's the expected format for the customEndpoitingRule regex. I am experimenting with

"regex": "/^oui(?:, allô)?$/i"

but that does not seems to match with transcriber output. Maybe it's not the expected format?

marem

04/03/2025, 11:11 AM

@Vapi Support Bot

Vapi Support Bot

04/03/2025, 11:12 AM

I see you're working with the

customEndpointingRules

in Vapi and having trouble with the regex format. Based on the information I have, the

regex

field within

customEndpointingRules

is expected to be a string, but I don't have specific details on the exact regex syntax or format that Vapi expects. Could you share what you're trying to accomplish with this regex pattern? Are you trying to detect specific French greetings like "oui" and "allô" to trigger custom endpointing behavior? Also, have you tried the regex without the JavaScript-style delimiters (

and

/i

)? For example, instead of:

Copy code

"regex": "/^oui(?:, allô)?$/i"

Have you tried:

Copy code

"regex": "^oui(?:, allô)?$"

Or perhaps a simpler pattern to test if the basic functionality works? Sources: - [Vapi API documentation](https://docs.vapi.ai/api-reference/assistants/create-assistant) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1335676323846291537)

marem

04/03/2025, 11:27 AM

I tried with your suggestion on this call:

9d6def19-565a-424c-825e-05d3cf79e613

Copy code

"regex": "^oui(?:, allô)?$",
        "regexOptions": [
          {
            "type": "ignore-case",
            "enabled": true
          }
        ],
"timeoutSeconds": 0.1

I see in the logs

Copy code

11:22:49:890
[LOG]
Transcriber output: Oui,

11:22:49:891
[LOG]
Endpointing timeout 1000ms (rule: `heuristic`)

So it didn't matched

marem

04/03/2025, 11:36 AM

ok, it seemed to work on another call: [LOG] Endpointing timeout 100ms (rule `customer`: ^oui(?:, allô)?$ on Oui) So I can iterate from here

marem

04/03/2025, 12:44 PM

Thanks for the inputs @Shubham Bajaj , you can close the issue.

Aniah

04/06/2025, 12:39 PM

Hi, checking if this is resolved/solved for you?

marem

04/06/2025, 1:17 PM

Yes, it's resolved. Thanks

Aniah

04/07/2025, 1:21 PM

Marking this ticket as Solved ✅

11 Views

Previous Next