Improving latency
# support
m
I try to improve latency for my agents When I check in call logs, I see some logs about turn latency which most of the time contains
endpointing
around 1500ms. What is
endpointing
means? Is there a way to improve this time?
Copy code
Turn latency: 2719ms (transcriber: 232ms, endpointing: 1501ms, kb: N/A, model: 532ms, voice: 406ms)
v
Hi there! I see you're looking to improve latency for your Vapi agents 👋 I can definitely explain what "endpointing" means and share some ways to improve this time in your call logs. ## What is endpointing? Endpointing in Vapi is the process of determining when a user has finished speaking. According to the [Vapi documentation](https://docs.vapi.ai/how-vapi-works), it's "a fancy word for knowing when the user is done speaking." Vapi uses a custom fusion audio-text model to detect when a user has completed their turn, considering both the user's tone and what they're saying to decide how long to pause before sending the input to the LLM. Traditional methods use silence detection with a timeout, but Vapi's approach combines silence detection with machine learning models to properly endpoint conversation speech, preventing improper interruption while encouraging proper backchanneling. ## How to improve endpointing time Since your logs show endpointing taking around 1500ms, here are several ways to improve this time: 1. **Adjust Wait Timeout**: Set
waitSeconds
to around 0.8 seconds in your
startSpeakingPlan
configuration:
Copy code
json
   "startSpeakingPlan": {
     "waitSeconds": 0.8,
     "smartEndpointingEnabled": false
   }
This is suggested as a median value in [Vapi's Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336360452208332831). 2. **Configure Smart Endpointing**: You can enable or disable Smart Endpointing based on your needs. It's off by default but can be turned on for scenarios where you want to avoid interrupting users while they think through responses, as mentioned in the [speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration).
3. **Use Custom Endpointing**: Implement phrase-based rules for custom endpointing, which can help improve response times. There's a [gist with custom endpointing rules](https://gist.github.com/bajajcodes/cd664c81344ef9f27a4d304de43bd313) that might be helpful. 4. **Optimize VAPI Configuration**: Review settings like
silenceTimeoutSeconds
,
responseDelaySeconds
, and
llmRequestDelaySeconds
that might affect response times. 5. **If using Deepgram**: Set Deepgram's endpointing to 300ms, as suggested in [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). 6. **Consider Hosting Location**: Host Vapi in a region closer to your users to reduce latency. By default, Vapi servers are in US regions, but you can consider on-premises solutions for specific regions. Is there anything specific about these suggestions you'd like me to elaborate on? Sources: - [Vapi documentation](https://docs.vapi.ai/how-vapi-works) - [Vapi glossary](https://docs.vapi.ai/glossary) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336360452208332831) - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) - [Discord discussions on Deepgram](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272)
m
We use the default "startSpeakingPlan" config which default value for waitSeconds is 0.4. Why are you suggesting I increase it? @User
How can I configure "On No Punctuation Seconds" using api @Vapi Support Bot ?
v
Hi there! 👋 I can help you with configuring the "On No Punctuation Seconds" setting using the Vapi API. I'm unable to address why the previous response suggested increasing waitSeconds from 0.4 to 0.8 as I don't have that specific reasoning. However, I can definitely help with your current question about configuring "On No Punctuation Seconds" via the API! ## How to configure "On No Punctuation Seconds" using the API The
onNoPunctuationSeconds
parameter is located within the
transcriptionEndpointingPlan
object, which is nested inside the
startSpeakingPlan
configuration. According to the [Vapi API documentation](https://docs.vapi.ai/api-reference/assistants/create-assistant), you can configure it as follows:
Copy code
json
"startSpeakingPlan": {
    "waitSeconds": 0.4,
    "smartEndpointingEnabled": false,
    "customEndpointingRules": [
        {
            "type": "assistant",
            "regex": "regex",
            "timeoutSeconds": 1.1
        }
    ],
    "transcriptionEndpointingPlan": {
        "onPunctuationSeconds": 0.1,
        "onNoPunctuationSeconds": 1.5,
        "onNumberSeconds": 0.5
    }
}
To update this setting, you'll need to use the Vapi API's update endpoint for assistants. Based on the [information from Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333057366748565514), you can do this using either the Vapi API client or another API client like Postman. The API endpoint documentation can be found at: [https://docs.vapi.ai/api-reference/assistants/update](https://docs.vapi.ai/api-reference/assistants/update?playground=%2Fapi-reference%2Fassistants%2Fupdate)
For context, the
onNoPunctuationSeconds
parameter (default value 1.5) is used when a customer has been silent for a while and has finished speaking, but the transcriber isn't confident enough to add punctuation to the transcription, as explained in the [speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration). Is there anything specific about configuring this parameter that you'd like me to explain further? Sources: - [Vapi API documentation](https://docs.vapi.ai/api-reference/assistants/create-assistant) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333057366748565514) - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
m
ok. My goal now is to improve the TTS so it includes punctuation on end of call. Is there some settings I can play with? @Vapi Support Bot
v
Hi there! 👋 I'd be happy to help you improve TTS punctuation at the end of calls in Vapi. There are several settings you can adjust to improve how TTS handles punctuation at the end of calls: ## Voice Input Formatting Enable the
Voice Input Formatted
feature which specifically fixes punctuation and makes text more readable for TTS. According to the [Vapi documentation](https://docs.vapi.ai/assistants/voice-formatting-plan), this feature orchestrates mini-functions that together fix punctuation and make text more readable out loud. ## Chunk Plan Configuration The
chunkPlan
settings control how text is processed before being sent to the TTS provider:
Copy code
json
{
  "voice": {
    "chunkPlan": {
      "enabled": true,
      "minCharacters": 30, // Adjust between 1-80
      "punctuationBoundaries": [".", "!", "?", ";"] // Customize based on your needs
    }
  }
}
According to [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343543475383046185), keeping
chunkPlan.enabled: true
is particularly important for proper sentence boundary detection using language-appropriate punctuation. ## Transcription Endpointing Plan The
onNoPunctuationSeconds
parameter (which you asked about earlier) is particularly relevant here:
Copy code
json
"startSpeakingPlan": {
    "transcriptionEndpointingPlan": {
        "onPunctuationSeconds": 0.1,
        "onNoPunctuationSeconds": 1.5,
        "onNumberSeconds": 0.5
    }
}
This parameter is used when a customer has been silent for a while and has finished speaking, but the transcriber isn't confident enough to add punctuation to the transcription. ## Input Punctuation Boundaries Use the
assistant.voice.inputPunctuationBoundaries
setting to help determine the end of a complete message. When the voice input contains punctuation listed in this setting, it's considered a valid boundary, indicating the end of a complete message. ## SSML Support
Enable SSML (Speech Synthesis Markup Language) parsing with the
enableSsmlParsing
parameter, which allows for more control over speech synthesis, including punctuation. ## Voice Provider Settings If you're using ElevenLabs, you can adjust parameters like
stability
and
similarityBoost
which can influence the overall quality of the TTS output. Try a call with:
stability: 0.5 similarityBoost: 0.75
. ## Model Output in Messages Enable
modelOutputInMessagesEnabled
to use the model's output instead of transcribed output in messages, which may improve TTS:
Copy code
curl -X PATCH https://api.vapi.ai/assistant/your-assistant-id \
     -H "Authorization: Bearer your-private-key" \
     -H "Content-Type: application/json" \
     -d '{
          "modelOutputInMessagesEnabled": true
        }'
Would you like me to elaborate on any of these specific settings? Sources: - [Vapi documentation](https://docs.vapi.ai/assistants/voice-formatting-plan) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343543475383046185) - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
a
Hey! To help track down this issue, could you share: - The call ID - When exactly this happened (the timestamp) This would really help us figure out what went wrong!
m
Hey @Shubham Bajaj , Here is a callId: https://dashboard.vapi.ai/calls/24203066-9f79-4ff1-bfd0-7e8d11c35227 Log:
Copy code
12:11:39:025
[INFO]
Turn latency: 2349ms (transcriber: 211ms, endpointing: 1500ms, kb: N/A, model: 406ms, voice: 188ms)
I want to improve endpointing time
a
Hey Mareem, can you try one call without your 11Labs key? Can you try another call with your 11Labs key and Deepgram end-pointinging set to 300ms? Can you try a final third call with Deepgram end-pointinging set to 300ms and without your 11Labs key? You will notice a difference in all three of them, and my bet is on the last option, the last third call. To give you more on this, you need to set the Deepgram end-pointing to 300ms. You need to configure your Start Speaking Plan waitSeconds to 0.6, and play out with Start Speaking Plan transcription end-pointing plan according to your requirement and the call. You cannot use the Smart End Pointing Plan of the Start Speaking Plan because it only works for English transcriptions. The conclusion is to try one call with Deepgram end-pointing set to 300ms without your 11Labs key and then play out with Start Speaking Plan according to your call requirements. The Start Speaking Plan smart end-pointing plan does not work for non-English transcriptions.
m
Lots of insights, thanks @Shubham Bajaj . Couple of questions before starting my tests: - I use cartesia for these calls. Why should I remove 11labs key? (is there a side-effect ?) - I didn't activated Smart End pointing Plan because it' s mentioned it only workds for english. So ok for that. - my current start speaking plan use default waitSeconds value which is 0.4, you suggest to pass it to 0.6? is it because I need to increase Deepgram end-pointing to 300ms?
a
looking into it.
Marem, Thanks for pointing it out. I mistakenly mixed up your call with some other call. Sorry for that. You are using Cartesia, and your 11Labs key has no side effects on it. Again, smart end pointing a mistake from my side. Now coming to endpointing. What's happening here? It's time to identify when the user has stopped speaking. So, DeepGram endpointing to 300ms gives DeepGram enough time to identify the user as finished speaking. Meanwhile, we are also checking in parallel if the user has finished speaking and setting it to wait seconds to 0.6. This complements the DeepGram endpointing. The next thing which you need to do is you need to play out with the transcription endpointing plan because your call language is non-English, and in this case, you will be required to set the onPunctuationSeconds to 0.1, onNumberSeconds to 0.5, and onNoPunctuationSeconds in between 1 to 1.2 (why because it might get a little aggressive on non English transcriptions). Finally, to get the best results, you have to use A/B testing where you compare the two calls and see what works best for you in the real environment.
Hello, how can I help you further?
m
Ok thanks. During my tests, I realiased most of the issues come from single utterance not caught sometimes ("oui", "Allo", ...). I increased Deepgram endpointing to 300ms and it seems better. I am also performing some tests with other transcribers: nova-3 with multi, and google gemini-2.0-flash to check if they behave better in the case of single utterance. Maybe you have insights about these 2 TTS models? Also, I was wondering if there is a way to override some of the transcription, for instance with deepgram. When Deepgram returns "Âllo", I would like to override it to "Âllo?" to trigger the onPunctuationSeconds. Do you know if it's feasable in some way?
I found
customEndpointingRules
on the startSpeakingPlan, I'll experiment with it
What's the expected format for the customEndpoitingRule regex. I am experimenting with
"regex": "/^oui(?:, allô)?$/i"
but that does not seems to match with transcriber output. Maybe it's not the expected format?
@Vapi Support Bot
v
I see you're working with the
customEndpointingRules
in Vapi and having trouble with the regex format. Based on the information I have, the
regex
field within
customEndpointingRules
is expected to be a string, but I don't have specific details on the exact regex syntax or format that Vapi expects. Could you share what you're trying to accomplish with this regex pattern? Are you trying to detect specific French greetings like "oui" and "allô" to trigger custom endpointing behavior? Also, have you tried the regex without the JavaScript-style delimiters (
/
and
/i
)? For example, instead of:
Copy code
"regex": "/^oui(?:, allô)?$/i"
Have you tried:
Copy code
"regex": "^oui(?:, allô)?$"
Or perhaps a simpler pattern to test if the basic functionality works? Sources: - [Vapi API documentation](https://docs.vapi.ai/api-reference/assistants/create-assistant) - [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1335676323846291537)
m
I tried with your suggestion on this call:
9d6def19-565a-424c-825e-05d3cf79e613
Copy code
"regex": "^oui(?:, allô)?$",
        "regexOptions": [
          {
            "type": "ignore-case",
            "enabled": true
          }
        ],
"timeoutSeconds": 0.1
I see in the logs
Copy code
11:22:49:890
[LOG]
Transcriber output: Oui,

11:22:49:891
[LOG]
Endpointing timeout 1000ms (rule: `heuristic`)
So it didn't matched
ok, it seemed to work on another call: [LOG] Endpointing timeout 100ms (rule `customer`: ^oui(?:, allô)?$ on Oui) So I can iterate from here
Thanks for the inputs @Shubham Bajaj , you can close the issue.
a
Hi, checking if this is resolved/solved for you?
m
Yes, it's resolved. Thanks
a
Marking this ticket as Solved ✅
11 Views