Streaming example from custom-llm doesn't actually...
# support
i
I have pointed my agent (mostly default settings - deepgram and 11labs) to a custom LLM endpoint that is exactly this implementation: https://dump.sahilsuman.me/streaming-custom-llm-vapi.txt Instead of getting three responses separated by five seconds, I get a 15 second delay followed by all three responses back to back. This is a reproduction of the same issue I'm getting streaming back from my custom-llm implementation. Call log ID: 63389b50-e65f-4cee-800e-1b87b6781f23 What am I missing?
v
message has been deleted
@its.jcw It seems like there might be an issue with the implementation of the delay between responses in your custom LLM endpoint. The behavior you're describing suggests that the responses are being buffered and then sent all at once, rather than being streamed with the intended delay. To troubleshoot this issue, please check the following in your custom LLM implementation: 1. **Asynchronous Handling**: Ensure that your endpoint is designed to handle asynchronous operations correctly. It should be able to send a response and then wait for the specified delay before sending the next one. 2. **Streaming Configuration**: Verify that the streaming configuration on your server is set up to allow for data to be sent in real-time without buffering. This might involve configuring the server to flush the output buffer after each response. 3. **Delay Implementation**: Double-check the code where you've implemented the delay to ensure it's functioning as expected. It should introduce a pause between the processing of each response without holding up the entire batch of responses. 4. **Testing Endpoint Independently**: Test your custom LLM endpoint independently of the VAPI to confirm that it behaves as expected when accessed directly. 5. **Reviewing Call Log**: Since you've provided a call log ID, reviewing the detailed logs might give you more insight into where the delay is occurring. If after checking these points you're still facing issues, it might be helpful to provide more details about how you've implemented the delay in your custom LLM endpoint. If you're unable to resolve the problem, please ask support for more info on this.
You can continue this conversation by mentioning me in the message.
Got it. Tagging @User for additional help.
i
I forgot to mention it - that example is from @Sahil from another support post
s
You can use this example @its.jcw
You just need to send the data in this format.
i
aren't I though?
that looks like the same data format i'm using: https://dump.sahilsuman.me/streaming-custom-llm-vapi.txt
i can try that one too though quick.... trying now
s
Also, the delay is because of ngrok thing when you will put it on the server this issue will be fixed.
i
oh! it's true, i'm using ngrok
is there a solve for this over ngrok? or it's an inherent limitation of ngrok
and while you're here, if you don't mind a side question - we added an org to a subscribed account, but not clear if the org also uses that same credit card subscription or if we have to add the card again
s
Nope. It is not meant for this purpose like it was just made for just for demo testing part. You will need to use a server for it. Just a tip, host your server on us-west you will get least latency.
i
how much savings in latency between us-east-1 (what we're using now) and us-west might I expect?
s
When they are using your org, then they will be using your subscription.
You need to test it out, I suggested us-west because our server is hosted on us-west region.
i
ah ok, will do
so subscription is per account, and not per org?
s
It is per account.
i
awesome - thanks so much for the help!
3 Views