URGENT: Latency isn't consistant from assistant to...
# support
r
I have an assistant that I created a few months ago with an estimated Latency of ~1050 ms that works perfectly. I created a new assistant with identical latency estimations and nearly identical settings other than a slightly different prompt, and the delay is 3-4 seconds long between speaking. How can I troubleshoot or fix this?
v
Hi there! I understand you're experiencing inconsistent latency between your assistants, which can be frustrating when you need reliable performance 🔍 Unfortunately, I can't diagnose the exact cause without seeing your specific configurations, but I can definitely help you troubleshoot this latency discrepancy between your assistants. Here are several potential causes and troubleshooting steps you can take: ### Prompt-Related Factors Since you mentioned the main difference is a "slightly different prompt," this could be the culprit: - **Prompt Complexity**: Even small differences in prompt structure can affect processing time. According to [Vapi documentation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334772094898016317), more complex prompts require more processing time. - **Prompt Size**: Larger prompts might lead to increased latency. Try to keep your prompt concise and well-structured. - **First Message Type**: If your new assistant uses a model-generated first message instead of a static one, this can [significantly increase latency](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339303957675245669). ### Assistant Configuration Settings Check these specific settings that affect latency: - **Silence Timeout Settings**: Review
silenceTimeoutSeconds
,
responseDelaySeconds
, and
llmRequestDelaySeconds
settings as these [directly impact response times](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). - **Chunking Configuration**: The
chunkPlan.enabled
setting and parameters like
minCharacters
and
punctuationBoundaries
affect text processing speed. You might want to adjust these:
Copy code
json
{
  "voice": {
    "chunkPlan": {
      "enabled": true,
      "minCharacters": 30, // Try adjusting between 1-80
      "punctuationBoundaries": [".", "!", "?", ";"]
    }
  }
}
- **Voice Model**: Different voice models have different performance characteristics. Check if you're using the same voice model (PlayHT2.0, PlayHT2.0-turbo, or Play3.0-mini) in both assistants. - **Minimum Wait Time**: There's a setting that controls the minimum wait time before the assistant speaks (defaults to 0.4 seconds). If this is set differently, it could [affect perceived latency](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1343915991670853654). ### Network and Infrastructure - **Network Latency**: Ensure both assistants are running in the same network environment. According to [Vapi documentation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336685825227755591), network infrastructure significantly impacts latency. - **Server Performance**: If your assistants are running on different servers or at different times, server load could be affecting performance. ### Other Factors - **Transcription Settings**: Check if there are differences in transcription services or endpointing settings between your assistants. - **Cold Boot Start Time**: The first few interactions might have higher latency due to "cold boot" effects, which [can take more than 5 seconds](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1340022322454855781). Would you like me to elaborate on any of these specific areas, or do you have more details about your configuration that might help narrow down the issue? Sources: - [Discord Thread on Prompt Complexity](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334772094898016317) - [Discord Thread on First Message Type](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339303957675245669) - [Discord Thread on Silence Timeout Settings](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272)
m
@Vapi where is responseDelaySeconds, and llmRequestDelaySeconds ? I don't see it in the API reference
r
@User I still need help with this, I tried adjusting my settings but nothing is working. Is there a way to get this on a different Vapi server or host it on my own server?
I can see that my endpointing is really high on these calls: Call ID: dfcc23b8-8940-45c2-8814-70c14a211506 06:07:56:502 [INFO] Turn latency: 3376ms (transcriber: 0ms, endpointing: 1245ms, kb: N/A, model: 1477ms, voice: 624ms) 06:08:08:894 [INFO] Turn latency: 5135ms (transcriber: 403ms, endpointing: 3426ms, kb: N/A, model: 473ms, voice: 803ms) 06:08:20:827 [INFO] Turn latency: 3780ms (transcriber: 678ms, endpointing: 1756ms, kb: N/A, model: 695ms, voice: 608ms) 06:08:31:295 [INFO] Turn latency: 2932ms (transcriber: 0ms, endpointing: 1579ms, kb: N/A, model: 651ms, voice: 700ms) 06:08:40:864 [INFO] Turn latency: 4257ms (transcriber: 714ms, endpointing: 2491ms, kb: N/A, model: 595ms, voice: 440ms) 06:08:48:394 [INFO] Turn latency: 3268ms (transcriber: 594ms, endpointing: 1690ms, kb: N/A, model: 517ms, voice: 466ms) 06:09:01:698 [INFO] Turn latency: 4577ms (transcriber: 0ms, endpointing: 2871ms, kb: N/A, model: 721ms, voice: 981ms) https://internal-api.aws-us-west-2-backend-production2.vapi.ai:10443/log?callId=dfcc23b8-8940-45c2-8814-70c14a211506&limit=100&page=5&sortOrder=ASC
v
Copy code
{
  "startSpeakingPlan": {
    "waitSeconds": 0.6,
    "smartEndpointingPlan": {
      "provider": "livekit",
      "waitFunction": "50 + 400 * x"
    }
  }
}
Hey Robert, the latency is because of endpointing which you can see from the logs. You need to update your start speaking plan, smart endpointing plan to LiveKit.
Also, the waitfunction waits for approximately 50-100ms to find out if the user has finished speaking. It's one of the best numbers to use, but I recommend A/B testing different values with native speakers to find out the optimal balance. The settings often emerge through iterative testing.
The wait function has this format:
baseValue + scalarValue * x
-
baseValue
is the minimum wait time in milliseconds -
scalarValue
determines how much the prediction affects the wait -
x
is the prediction value (0-1) indicating if you're still likely to speak the default function in Vapi is
70 + 4000 * x
, which gives a range from 70ms (minimum wait) to 4070ms (maximum wait).
For example, using `50 + 400 * x`: - This function gives you a range from 50ms (minimum wait) to 250ms (maximum wait) - When the model is confident you're done speaking, it waits close to 50ms - When the model thinks you might continue, it waits closer to 250ms
You can use this example wait function to test if it works for your needs. If it's not quite right, you can adjust the values based on your preferences.