What is the recommended region to host my custom m...
# support
m
The response latency when using a custom model is very high, because the request needs to go from Vapi to our end-point, then from our end-point to the LLM, and back to Vapi. Are there any recommendations to reduce this latency?
v
To reduce response latency when using a custom model with Vapi, here are some effective strategies you can implement: 1. **Optimize Network Infrastructure**: - Use a Content Delivery Network (CDN) with edge locations close to your users. This minimizes the physical distance and network hops, reducing latency significantly. 2. **Enhance Server Performance**: - Ensure that your servers are high-performance and optimized for quick request handling. Keep your hardware up-to-date and consider geographical proximity to your users to further reduce latency. 3. **Efficient Communication Protocols**: - Utilize lightweight communication protocols like WebSockets or WebRTC for real-time data transfer. These protocols are designed for low-latency communication. 4. **Code Optimization**: - Review and optimize your application code to eliminate bottlenecks that can slow down processing times. 5. **Audio Processing Improvements**: - Implement audio codecs that balance compression and quality to speed up audio transmission without sacrificing clarity. 6. **ChunkPlan Configuration**: - Adjust the ChunkPlan settings in Vapi. For example, enabling chunking can help manage output more efficiently, and setting a lower minCharacters value can reduce latency at the cost of some quality. 7. **Load Testing and Monitoring**: - Conduct load testing to identify potential latency issues under various conditions. Continuous monitoring can help you proactively address latency problems. 8. **Background Noise Filtering**: - Use real-time noise filtering to ensure audio clarity without adding latency, allowing your models to perform efficiently. By implementing these strategies, you should see a noticeable improvement in response latency when using your custom model with Vapi. If you have any specific configurations or setups in mind, feel free to share, and I can provide more tailored advice! 😊 Sources:
- [Best Practices for Reducing Response Latency in Vapi When Using Custom Models](https://discord.com/channels/1211482211119796234/1211483291191083018/1336685825227755591) - [Optimizing Communication Between VAPI and a Custom Model to Minimize Latency](https://discord.com/channels/1211482211119796234/1211483291191083018/1336685825227755591) - [Vapi Configuration to Decrease Response Latency for Custom Models](https://discord.com/channels/1211482211119796234/1211483291191083018/1343543475383046185)
m
what's your time to first token on your model?
m
It takes around 4 seconds for Vapi to start responding using the custom model. This covers the time between when the user stops speaking, and when Vapi start responding. The time for the LLM to start responding is around 1.5 seconds. The rest is basically from Vapi to our end-point and back.
k
Hello Mohab, could you please share the call ID and, if comfortable, more details about your custom LLM model and it's deployment? This will allow me to consider any potential suggestions.
m
Our custom LLM is just an endpoint that sends the payload we get from Vapi to OpenAI after doing some modifications. It is hosted on AWS us-west-2. I can't share the call ID because that'd contain customer information.
s
Hey @Mohab if you can share the details then it won't be possible for me to look into the details and help you out. Let me know your thoughts on this.
m
@Shubham Bajaj Here's a call ID: cbe505fc-8042-4632-beaa-c95e42d7ac43
s
@Mohab can you share your recent call ID? We only keep call logs for two weeks, I cannot look into it for latencies.