I am using a custom fine-tuned model from Open AI but it works very slow Vapi AI #support

I am using a custom fine-tuned model from Open AI ...

MILANCHeJ

01/21/2025, 2:40 PM

It does work, but it takes 4-5 seconds the first time to get an answer from the model, and after that again it takes more time than usual to communicate with this model. Is there something that can be done to reduce this time?

Vapi

01/21/2025, 2:42 PM

@MILANCHeJ I understand that the response time is a critical aspect of your experience with VAPI. There are several strategies we can explore to improve the response time: 1. **Optimization**: We can look into optimizing the model or the infrastructure it runs on to ensure faster processing times. 2. **Caching**: Implementing caching for common queries can reduce the need to process the same request multiple times. 3. **Preloading**: If there are predictable patterns in usage, we can preload certain parts of the model to reduce initial latency. 4. **Concurrency**: Adjusting the concurrency settings might help if the delays are due to queuing. 5. **Network Latency**: Investigating and optimizing network latency between your application and the VAPI servers could also be beneficial. If you'd like, I can guide you through some of these optimizations or put you in touch with our support team for more in-depth assistance. Would you like to proceed with any specific optimization or should I connect you with support for further help?

Vapi

01/21/2025, 2:42 PM

You can continue this conversation by mentioning me in the message.

Shubham Bajaj

01/21/2025, 3:15 PM

@MILANCHeJ could you share the call id so i can take a look?

MILANCHeJ

01/21/2025, 4:22 PM

@Shubham Bajaj Here is a call id from today 2975e931-9440-4461-a1bf-0e765cd4ac26

Shubham Bajaj

01/22/2025, 5:57 AM

@MILANCHeJ As you can see in this screenshot(latency-info), the model is taking time to respond, which delays the conversation and makes it non-communicable. You will need the help of an expert or ChatGPT to understand how your fine-tuned model can be optimized - there is no alternative. Apart from this, I would also suggest setting the Deepgram transcriber endpointing to 300ms and startSpeakingPlan.waitSeconds to 0.8 seconds to capture user input (even single words) and correctly identify when the user has stopped speaking. This will help but won't produce a big difference because your model requires optimizations. https://cdn.discordapp.com/attachments/1331272194998796308/1331502912844599427/Screenshot_2025-01-22_at_11.23.45.png?ex=6791da38&is=679088b8&hm=dd88e5a58e351cd48a4589c94875fbcb4e5fcab6f84d677a1c48aa5808f9066c&

4 Views

Previous Next