i have an issue with the open AI models in vapi.
# support
m
When gpt 4.1 nano appeared in the llm options in vapi assistans, it had an 250 latency and it was the fastest one. However in the past severl weeks it appears as 510 latency that is well slower that the previous 4o mini. Do you know what should be done to restore it to its original 250 latency?
k
Ensure correct model settings, tweak stopSpeakingPlan and waitSeconds..
v
To help reduce perceived latency, you can adjust the assistant's Start Speaking Plan by lowering the
waitSeconds
value. Here’s an example API configuration:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
Set this via the API to make your assistant respond faster after the user finishes speaking. Note: This may not affect backend model latency if the slowdown is due to external factors. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
u
Hello @Matei ,
Currently, the fastest model is GPT4o Mini Cluster.
390ms
please try using this
v
To help reduce perceived latency, try lowering the
waitSeconds
value in your assistant's Start Speaking Plan. For example, set it via the API like this:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
This will make your assistant respond faster after the user finishes speaking. Please note, this only affects response timing and may not impact backend model latency if the slowdown is due to external factors. Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
To help reduce perceived latency, you can lower the assistant's
waitSeconds
in the Start Speaking Plan. For example:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
Set this via the API to make your assistant respond faster after the user finishes speaking. Note: This only affects response timing and may not resolve backend model latency if the slowdown is due to external factors beyond your configuration. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
To help your assistant respond faster, you can lower the
waitSeconds
value in your
startSpeakingPlan
via the API:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
This reduces the delay after the user finishes speaking, but please note it does not affect the backend model's actual processing latency, which may be due to provider-side changes. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
To help reduce perceived latency, you can lower the
waitSeconds
value in your assistant's
startSpeakingPlan
like this:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
This will make your assistant respond faster after the user finishes speaking, but it won't change the backend model's actual processing speed if the latency is due to external factors. Currently, the fastest available model is GPT4o Mini Cluster (390ms). Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
v
Message marked as helpful by @! AetherMind! 🎉
as you can see this graph is taken directly from the open AI website and it shows that the fastest model is gpt 4.1 nano that is faster than gpt4o mini. Also you san see that gpt4.1 mini is faster that gpt 4.1. however when you look at the screenshot i took in vapi in the choose llm model section, you can see that the latencies are all wrong.
h
im amazed people keep asking about all the benchmark latency changes more than doubling in some cases and the responses are to check your prompt and start speaking plan settings
v
To make your assistant respond faster after the user finishes speaking, set the
waitSeconds
value in your
startSpeakingPlan
to a lower value via the API:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
This reduces perceived latency, but please note it does not affect the actual backend model processing speed, which may differ from OpenAI's published benchmarks due to platform-specific factors. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
To help your assistant respond faster after the user finishes speaking, set the
waitSeconds
value in your
startSpeakingPlan
to a lower value via the API:
Copy code
json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}
This reduces perceived latency, but does not affect backend model processing speed, which may differ from OpenAI's published benchmarks due to provider-side or infrastructure factors. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)
p
It is also location dependent, so latency could be lower for some and higher for others. If you have latency issues or any other questions, please let us know.