Streaming example from custom-llm doesn't actually stream through to TTS Vapi AI #support

Streaming example from custom-llm doesn't actually...

its.jcw

06/10/2024, 8:39 PM

I have pointed my agent (mostly default settings - deepgram and 11labs) to a custom LLM endpoint that is exactly this implementation: https://dump.sahilsuman.me/streaming-custom-llm-vapi.txt Instead of getting three responses separated by five seconds, I get a 15 second delay followed by all three responses back to back. This is a reproduction of the same issue I'm getting streaming back from my custom-llm implementation. Call log ID: 63389b50-e65f-4cee-800e-1b87b6781f23 What am I missing?

Vapi

06/10/2024, 8:40 PM

message has been deleted

Vapi

06/10/2024, 8:41 PM

@its.jcw It seems like there might be an issue with the implementation of the delay between responses in your custom LLM endpoint. The behavior you're describing suggests that the responses are being buffered and then sent all at once, rather than being streamed with the intended delay. To troubleshoot this issue, please check the following in your custom LLM implementation: 1. **Asynchronous Handling**: Ensure that your endpoint is designed to handle asynchronous operations correctly. It should be able to send a response and then wait for the specified delay before sending the next one. 2. **Streaming Configuration**: Verify that the streaming configuration on your server is set up to allow for data to be sent in real-time without buffering. This might involve configuring the server to flush the output buffer after each response. 3. **Delay Implementation**: Double-check the code where you've implemented the delay to ensure it's functioning as expected. It should introduce a pause between the processing of each response without holding up the entire batch of responses. 4. **Testing Endpoint Independently**: Test your custom LLM endpoint independently of the VAPI to confirm that it behaves as expected when accessed directly. 5. **Reviewing Call Log**: Since you've provided a call log ID, reviewing the detailed logs might give you more insight into where the delay is occurring. If after checking these points you're still facing issues, it might be helpful to provide more details about how you've implemented the delay in your custom LLM endpoint. If you're unable to resolve the problem, please ask support for more info on this.

Vapi

06/10/2024, 8:41 PM

You can continue this conversation by mentioning me in the message.

Vapi

06/10/2024, 8:42 PM

Got it. Tagging @User for additional help.

its.jcw

06/10/2024, 8:44 PM

I forgot to mention it - that example is from @Sahil from another support post

Sahil

06/10/2024, 8:49 PM

https://dump.sahilsuman.me/groq.txt

Sahil

06/10/2024, 8:50 PM

You can use this example @its.jcw

Sahil

06/10/2024, 8:50 PM

https://cdn.discordapp.com/attachments/1249825188808364092/1249828000568971324/image.png?ex=6668b888&is=66676708&hm=c3b0da4ffbadb332e7b3d35de81fe07a1f04ddede792cb413754ef5a4054a1f8&

Sahil

06/10/2024, 8:50 PM

You just need to send the data in this format.

its.jcw

06/10/2024, 8:50 PM

aren't I though?

its.jcw

06/10/2024, 8:51 PM

that looks like the same data format i'm using: https://dump.sahilsuman.me/streaming-custom-llm-vapi.txt

its.jcw

06/10/2024, 8:51 PM

i can try that one too though quick.... trying now

Sahil

06/10/2024, 8:52 PM

Also, the delay is because of ngrok thing when you will put it on the server this issue will be fixed.

its.jcw

06/10/2024, 8:54 PM

oh! it's true, i'm using ngrok

its.jcw

06/10/2024, 8:55 PM

is there a solve for this over ngrok? or it's an inherent limitation of ngrok

its.jcw

06/10/2024, 8:59 PM

and while you're here, if you don't mind a side question - we added an org to a subscribed account, but not clear if the org also uses that same credit card subscription or if we have to add the card again

Sahil

06/10/2024, 9:00 PM

Nope. It is not meant for this purpose like it was just made for just for demo testing part. You will need to use a server for it. Just a tip, host your server on us-west you will get least latency.

its.jcw

06/10/2024, 9:00 PM

how much savings in latency between us-east-1 (what we're using now) and us-west might I expect?

Sahil

06/10/2024, 9:01 PM

When they are using your org, then they will be using your subscription.

Sahil

06/10/2024, 9:01 PM

You need to test it out, I suggested us-west because our server is hosted on us-west region.

its.jcw

06/10/2024, 9:01 PM

ah ok, will do

its.jcw

06/10/2024, 9:01 PM

so subscription is per account, and not per org?

Sahil

06/10/2024, 9:02 PM

It is per account.

its.jcw

06/10/2024, 9:02 PM

awesome - thanks so much for the help!

3 Views

Previous Next