@its.jcw
It seems like there might be an issue with the implementation of the delay between responses in your custom LLM endpoint. The behavior you're describing suggests that the responses are being buffered and then sent all at once, rather than being streamed with the intended delay.
To troubleshoot this issue, please check the following in your custom LLM implementation:
1. **Asynchronous Handling**: Ensure that your endpoint is designed to handle asynchronous operations correctly. It should be able to send a response and then wait for the specified delay before sending the next one.
2. **Streaming Configuration**: Verify that the streaming configuration on your server is set up to allow for data to be sent in real-time without buffering. This might involve configuring the server to flush the output buffer after each response.
3. **Delay Implementation**: Double-check the code where you've implemented the delay to ensure it's functioning as expected. It should introduce a pause between the processing of each response without holding up the entire batch of responses.
4. **Testing Endpoint Independently**: Test your custom LLM endpoint independently of the VAPI to confirm that it behaves as expected when accessed directly.
5. **Reviewing Call Log**: Since you've provided a call log ID, reviewing the detailed logs might give you more insight into where the delay is occurring.
If after checking these points you're still facing issues, it might be helpful to provide more details about how you've implemented the delay in your custom LLM endpoint. If you're unable to resolve the problem, please ask support for more info on this.