URGENT: Latency isn't consistant from assistant to assistant Vapi AI #support

URGENT: Latency isn't consistant from assistant to...

Robert Simons

04/10/2025, 11:28 PM

I have an assistant that I created a few months ago with an estimated Latency of ~1050 ms that works perfectly. I created a new assistant with identical latency estimations and nearly identical settings other than a slightly different prompt, and the delay is 3-4 seconds long between speaking. How can I troubleshoot or fix this?

Vapi Support Bot

04/10/2025, 11:30 PM

Hi there! I understand you're experiencing inconsistent latency between your assistants, which can be frustrating when you need reliable performance 🔍 Unfortunately, I can't diagnose the exact cause without seeing your specific configurations, but I can definitely help you troubleshoot this latency discrepancy between your assistants. Here are several potential causes and troubleshooting steps you can take: ### Prompt-Related Factors Since you mentioned the main difference is a "slightly different prompt," this could be the culprit: - **Prompt Complexity**: Even small differences in prompt structure can affect processing time. According to [Vapi documentation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334772094898016317), more complex prompts require more processing time. - **Prompt Size**: Larger prompts might lead to increased latency. Try to keep your prompt concise and well-structured. - **First Message Type**: If your new assistant uses a model-generated first message instead of a static one, this can [significantly increase latency](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339303957675245669). ### Assistant Configuration Settings Check these specific settings that affect latency: - **Silence Timeout Settings**: Review

silenceTimeoutSeconds

responseDelaySeconds

, and

llmRequestDelaySeconds

settings as these [directly impact response times](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). - **Chunking Configuration**: The

chunkPlan.enabled

setting and parameters like

minCharacters

and

punctuationBoundaries

affect text processing speed. You might want to adjust these:

Copy code

json
{
  "voice": {
    "chunkPlan": {
      "enabled": true,
      "minCharacters": 30, // Try adjusting between 1-80
      "punctuationBoundaries": [".", "!", "?", ";"]
    }
  }
}

Vapi Support Bot

04/10/2025, 11:30 PM

- **Voice Model**: Different voice models have different performance characteristics. Check if you're using the same voice model (PlayHT2.0, PlayHT2.0-turbo, or Play3.0-mini) in both assistants. - **Minimum Wait Time**: There's a setting that controls the minimum wait time before the assistant speaks (defaults to 0.4 seconds). If this is set differently, it could [affect perceived latency](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343915991670853654). ### Network and Infrastructure - **Network Latency**: Ensure both assistants are running in the same network environment. According to [Vapi documentation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336685825227755591), network infrastructure significantly impacts latency. - **Server Performance**: If your assistants are running on different servers or at different times, server load could be affecting performance. ### Other Factors - **Transcription Settings**: Check if there are differences in transcription services or endpointing settings between your assistants. - **Cold Boot Start Time**: The first few interactions might have higher latency due to "cold boot" effects, which [can take more than 5 seconds](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1340022322454855781). Would you like me to elaborate on any of these specific areas, or do you have more details about your configuration that might help narrow down the issue? Sources: - [Discord Thread on Prompt Complexity](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334772094898016317) - [Discord Thread on First Message Type](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339303957675245669) - [Discord Thread on Silence Timeout Settings](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272)

Vapi Support Bot

04/10/2025, 11:30 PM

- [Discord Thread on Network Infrastructure](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336685825227755591) - [Discord Thread on Minimum Wait Time](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343915991670853654) - [Discord Thread on Cold Boot Start Time](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1340022322454855781)

Mistermanu#000

04/11/2025, 1:54 AM

@Vapi where is responseDelaySeconds, and llmRequestDelaySeconds ? I don't see it in the API reference

Robert Simons

04/11/2025, 6:28 PM

@User I still need help with this, I tried adjusting my settings but nothing is working. Is there a way to get this on a different Vapi server or host it on my own server?

Robert Simons

04/11/2025, 7:02 PM

I can see that my endpointing is really high on these calls: Call ID: dfcc23b8-8940-45c2-8814-70c14a211506 06:07:56:502 [INFO] Turn latency: 3376ms (transcriber: 0ms, endpointing: 1245ms, kb: N/A, model: 1477ms, voice: 624ms) 06:08:08:894 [INFO] Turn latency: 5135ms (transcriber: 403ms, endpointing: 3426ms, kb: N/A, model: 473ms, voice: 803ms) 06:08:20:827 [INFO] Turn latency: 3780ms (transcriber: 678ms, endpointing: 1756ms, kb: N/A, model: 695ms, voice: 608ms) 06:08:31:295 [INFO] Turn latency: 2932ms (transcriber: 0ms, endpointing: 1579ms, kb: N/A, model: 651ms, voice: 700ms) 06:08:40:864 [INFO] Turn latency: 4257ms (transcriber: 714ms, endpointing: 2491ms, kb: N/A, model: 595ms, voice: 440ms) 06:08:48:394 [INFO] Turn latency: 3268ms (transcriber: 594ms, endpointing: 1690ms, kb: N/A, model: 517ms, voice: 466ms) 06:09:01:698 [INFO] Turn latency: 4577ms (transcriber: 0ms, endpointing: 2871ms, kb: N/A, model: 721ms, voice: 981ms) https://internal-api.aws-us-west-2-backend-production2.vapi.ai:10443/log?callId=dfcc23b8-8940-45c2-8814-70c14a211506&limit=100&page=5&sortOrder=ASC

Praveen

04/11/2025, 7:42 PM

Copy code

{
  "startSpeakingPlan": {
    "waitSeconds": 0.6,
    "smartEndpointingPlan": {
      "provider": "livekit",
      "waitFunction": "50 + 400 * x"
    }
  }
}

Praveen

04/11/2025, 7:46 PM

Hey Robert, the latency is because of endpointing which you can see from the logs. You need to update your start speaking plan, smart endpointing plan to LiveKit.

Praveen

04/11/2025, 7:47 PM

Also, the waitfunction waits for approximately 50-100ms to find out if the user has finished speaking. It's one of the best numbers to use, but I recommend A/B testing different values with native speakers to find out the optimal balance. The settings often emerge through iterative testing.

Praveen

04/11/2025, 7:48 PM

The wait function has this format:

baseValue + scalarValue * x

baseValue

is the minimum wait time in milliseconds -

scalarValue

determines how much the prediction affects the wait -

is the prediction value (0-1) indicating if you're still likely to speak the default function in Vapi is

70 + 4000 * x

, which gives a range from 70ms (minimum wait) to 4070ms (maximum wait).

Praveen

04/11/2025, 7:49 PM

For example, using `50 + 400 * x`: - This function gives you a range from 50ms (minimum wait) to 250ms (maximum wait) - When the model is confident you're done speaking, it waits close to 50ms - When the model thinks you might continue, it waits closer to 250ms

Praveen

04/11/2025, 7:49 PM

You can use this example wait function to test if it works for your needs. If it's not quite right, you can adjust the values based on your preferences.

Previous Next