Can latency be reduced by using 4o-realtime Vapi AI #support

Can latency be reduced by using 4o-realtime

Matt

04/04/2025, 11:44 AM

According to the estimation, STT+TTS account for 350ms of latency. I wonder if using gpt-4o-realtime should cut latency by 350ms, as STT+TTS are part of the LLM? My use case is an AI receptionist via PSTN phone network. Also, has somebody tried https://github.com/openai/openai-realtime-twilio-demo ? https://cdn.discordapp.com/attachments/1357682302683185182/1357682302980718694/image.png?ex=67f117ad&is=67efc62d&hm=42ea06fb57664aeadf646794e67e81848e4edc0de7c3293366663bd3f7d3be1f&

Matt

04/04/2025, 11:46 AM

Update: I had to save the assistant first. By selecting 4o-realtime, the latency estimate reduced by 400ms!! That is a crazy improvement!

Vapi Support Bot

04/04/2025, 11:46 AM

Hi there! 👋 I wasn't able to find specific information confirming whether GPT-4o-realtime will cut latency by exactly 350ms in your case, nor detailed experiences with the openai-realtime-twilio-demo. However, I can definitely help with information about GPT-4o-realtime's potential for latency reduction! According to the [Vapi documentation](https://docs.vapi.ai/openai-realtime), GPT-4o-realtime does have the potential to reduce latency because it "natively processes audio in and audio out," unlike other configurations that "orchestrate a transcriber, model and voice API to simulate speech-to-speech." This integrated approach could potentially eliminate some of the latency associated with separate STT and TTS processes. However, there are some important considerations: 1. **Beta Status**: GPT-4o-realtime is currently in beta mode and may have some stability issues. According to [Discord discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336440478727667844), "Realtime model is still in beta mode, and because of this you can expect few hiccups." 2. **Stability Concerns**: Some users have reported that the realtime cluster "will not work properly" and "is not stable right now, according to OpenAI," as mentioned in [Discord conversations](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1337465970557452299). 3. **Voice Limitations**: When using GPT-4o-realtime, "Only OpenAI voices may be selected" and "the voice selection will not act as a TTS (text-to-speech) model, but rather as the voice used within the speech-to-speech model," according to the [Vapi documentation](https://docs.vapi.ai/openai-realtime). For your AI receptionist via PSTN, you might also want to consider other latency optimization strategies such as: - Adjusting Deepgram's endpointing settings, which can impact transcription latency - Optimizing network connectivity and ensuring TURN servers are geographically close to users

Vapi Support Bot

04/04/2025, 11:46 AM

- Implementing efficient real-time protocols for communication Would you like more specific information about any of these optimization strategies for your AI receptionist application? Sources: - [Vapi Documentation OpenAI Realtime](https://docs.vapi.ai/openai-realtime) - [Discord Discussion Beta Status](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336440478727667844) - [Discord Discussion Stability Concerns](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1337465970557452299)

Matt

04/04/2025, 11:46 AM

Update: I had to save the assistant first. By selecting 4o-realtime, the latency estimate reduced by 400ms!! That is a crazy improvement! https://cdn.discordapp.com/attachments/1357682302683185182/1357682793223557192/image.png?ex=67f11822&is=67efc6a2&hm=706ccdd5ab1258e29c33a7769e2338026eeaaeb7b545909ad907299d3b922e39&

Kyle

04/06/2025, 11:50 AM

Thank you for reaching out to us. We've received your message and will get back to you as soon as possible.

Kyle

04/06/2025, 11:56 AM

Hi, checking if this is resolved/solved for you?

Previous Next