Hi there! 👋
I wasn't able to find specific information confirming whether GPT-4o-realtime will cut latency by exactly 350ms in your case, nor detailed experiences with the openai-realtime-twilio-demo. However, I can definitely help with information about GPT-4o-realtime's potential for latency reduction!
According to the [Vapi documentation](
https://docs.vapi.ai/openai-realtime), GPT-4o-realtime does have the potential to reduce latency because it "natively processes audio in and audio out," unlike other configurations that "orchestrate a transcriber, model and voice API to simulate speech-to-speech." This integrated approach could potentially eliminate some of the latency associated with separate STT and TTS processes.
However, there are some important considerations:
1. **Beta Status**: GPT-4o-realtime is currently in beta mode and may have some stability issues. According to [Discord discussions](
https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336440478727667844), "Realtime model is still in beta mode, and because of this you can expect few hiccups."
2. **Stability Concerns**: Some users have reported that the realtime cluster "will not work properly" and "is not stable right now, according to OpenAI," as mentioned in [Discord conversations](
https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1337465970557452299).
3. **Voice Limitations**: When using GPT-4o-realtime, "Only OpenAI voices may be selected" and "the voice selection will not act as a TTS (text-to-speech) model, but rather as the voice used within the speech-to-speech model," according to the [Vapi documentation](
https://docs.vapi.ai/openai-realtime).
For your AI receptionist via PSTN, you might also want to consider other latency optimization strategies such as:
- Adjusting Deepgram's endpointing settings, which can impact transcription latency
- Optimizing network connectivity and ensuring TURN servers are geographically close to users