i have an issue with the open AI models in vapi. Vapi AI #support

i have an issue with the open AI models in vapi.

Matei

05/27/2025, 9:15 AM

When gpt 4.1 nano appeared in the llm options in vapi assistans, it had an 250 latency and it was the fastest one. However in the past severl weeks it appears as 510 latency that is well slower that the previous 4o mini. Do you know what should be done to restore it to its original 250 latency?

Kings_big💫

05/27/2025, 9:25 AM

Ensure correct model settings, tweak stopSpeakingPlan and waitSeconds..

Vapi Support Bot

05/27/2025, 9:26 AM

To help reduce perceived latency, you can adjust the assistant's Start Speaking Plan by lowering the

waitSeconds

value. Here’s an example API configuration:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

Set this via the API to make your assistant respond faster after the user finishes speaking. Note: This may not affect backend model latency if the slowdown is due to external factors. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

! AetherMind

05/27/2025, 9:57 AM

Hello @Matei ,

! AetherMind

05/27/2025, 9:58 AM

Currently, the fastest model is GPT4o Mini Cluster.

! AetherMind

05/27/2025, 9:58 AM

390ms

! AetherMind

05/27/2025, 9:58 AM

please try using this

Vapi Support Bot

05/27/2025, 9:59 AM

To help reduce perceived latency, try lowering the

waitSeconds

value in your assistant's Start Speaking Plan. For example, set it via the API like this:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

This will make your assistant respond faster after the user finishes speaking. Please note, this only affects response timing and may not impact backend model latency if the slowdown is due to external factors. Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Vapi Support Bot

05/27/2025, 9:59 AM

To help reduce perceived latency, you can lower the assistant's

waitSeconds

in the Start Speaking Plan. For example:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

Set this via the API to make your assistant respond faster after the user finishes speaking. Note: This only affects response timing and may not resolve backend model latency if the slowdown is due to external factors beyond your configuration. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Vapi Support Bot

05/27/2025, 9:59 AM

To help your assistant respond faster, you can lower the

waitSeconds

value in your

startSpeakingPlan

via the API:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

This reduces the delay after the user finishes speaking, but please note it does not affect the backend model's actual processing latency, which may be due to provider-side changes. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Vapi Support Bot

05/27/2025, 9:59 AM

To help reduce perceived latency, you can lower the

waitSeconds

value in your assistant's

startSpeakingPlan

like this:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

This will make your assistant respond faster after the user finishes speaking, but it won't change the backend model's actual processing speed if the latency is due to external factors. Currently, the fastest available model is GPT4o Mini Cluster (390ms). Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Vapi Support Bot

05/27/2025, 10:02 AM

Message marked as helpful by @! AetherMind! 🎉

Matei

05/27/2025, 12:29 PM

you now understand what i mean? https://cdn.discordapp.com/attachments/1376851259717455944/1376900005285072917/Screenshot_2025-05-27_at_15.25.34.png?ex=6837018e&is=6835b00e&hm=2578dae7c68ec2560a005076f6e85d29acfd5a6f3f9885df63a4350cd73598cf& https://cdn.discordapp.com/attachments/1376851259717455944/1376900006006493364/Screenshot_2025-05-27_at_15.22.02.png?ex=6837018e&is=6835b00e&hm=aa71b4159af7dc2e8355c4170e1c8a1a193af8e34186cb775c4fc844e9a35206&

Matei

05/27/2025, 12:29 PM

as you can see this graph is taken directly from the open AI website and it shows that the fastest model is gpt 4.1 nano that is faster than gpt4o mini. Also you san see that gpt4.1 mini is faster that gpt 4.1. however when you look at the screenshot i took in vapi in the choose llm model section, you can see that the latencies are all wrong.

Ham

05/27/2025, 3:22 PM

im amazed people keep asking about all the benchmark latency changes more than doubling in some cases and the responses are to check your prompt and start speaking plan settings

Vapi Support Bot

05/27/2025, 3:24 PM

To make your assistant respond faster after the user finishes speaking, set the

waitSeconds

value in your

startSpeakingPlan

to a lower value via the API:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

This reduces perceived latency, but please note it does not affect the actual backend model processing speed, which may differ from OpenAI's published benchmarks due to platform-specific factors. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Vapi Support Bot

05/27/2025, 3:25 PM

To help your assistant respond faster after the user finishes speaking, set the

waitSeconds

value in your

startSpeakingPlan

to a lower value via the API:

Copy code

json
"startSpeakingPlan": {
  "waitSeconds": 0.2
}

This reduces perceived latency, but does not affect backend model processing speed, which may differ from OpenAI's published benchmarks due to provider-side or infrastructure factors. Reference: [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration) Source: - [Speech configuration documentation](https://docs.vapi.ai/customization/speech-configuration)

Praveen

05/28/2025, 1:16 AM

It is also location dependent, so latency could be lower for some and higher for others. If you have latency issues or any other questions, please let us know.

Previous Next