Custom LLM streaming responses not getting voiced when expected Vapi AI #support

Custom LLM streaming responses not getting voiced ...

its.jcw

06/14/2024, 6:49 PM

I'm running one of the NodeJS samples that have been posted in other support threads for how to do streaming, and I'm struggling to get it to actually trigger TTS correctly. I've confirmed (via curl) that the responses are streaming back when expected, and in fact, in the log you can also see this via the timestamp, but it's inconsistently getting voiced. Sometimes one of the three will happen, but usually at least two or three of the responses won't get voiced until the end of the call - what am I doing wrong? Call ID: a2098cfd-3f76-435c-b94a-ae8aea6a52e8

Copy code

03:30:53:917
[CHECKPOINT]
Model request started

03:30:53:918
[LOG]
Model request started (gpt-4o, custom-llm)

03:31:04:034
[CHECKPOINT]
Model sent start token

03:31:04:035
[LOG]
Model output: Let me think.

03:31:04:035
[CHECKPOINT]
Model sent first output token

03:31:04:037
[LOG]
Voice input: Let me think.

03:31:09:033
[LOG]
Model output: still thinking.

03:31:09:034
[LOG]
Voice input: still thinking.

03:31:13:830
[LOG]
Model output: still thinking.

03:31:13:832
[LOG]
Voice input: still thinking.

03:31:13:834
[CHECKPOINT]
Model sent end token

03:31:14:322
[CHECKPOINT]
11labs: audio received

03:31:14:361
[CHECKPOINT]
Assistant speech started

03:31:14:361
[INFO]
Turn Latency: 20446ms (Endpointing 3ms, Model 10118ms, Voice: 10288ms)

03:31:17:269
[CHECKPOINT]
Assistant speech ended

Vapi

06/14/2024, 6:50 PM

message has been deleted

Sahil

06/14/2024, 9:29 PM

I think you are using a chat completion model example. For that part, you need to send the complete message at once. Checkout this repository https://github.com/VapiAI/server-side-example-python-flask/blob/main/app/api/custom_llm.py

its.jcw

06/15/2024, 3:59 AM

@Sahil I don't see anything relevant in that example? The streaming in that example is coming entirely from OpenAI. I'm talking about utilizing a streaming response that is hand-crafted, like another example you've posted before: https://dump.sahilsuman.me/streaming-custom-llm-vapi.txt In fact, the logs and call id is literally this example, where there are three separate chunks being sent 5 seconds apart. It should be reasonable to expect those to be voiced upon being streamed, and not all clustered at the end of the custom-llm call, right?

its.jcw

06/15/2024, 5:08 AM

i'm continuing to try various things to get this to work - here's an example where there seems to be enough text to voice (including a punctuation delimiter), but nothing is sent to be voiced - why? https://cdn.discordapp.com/attachments/1251247053598621727/1251403003953680404/image.png?ex=666e735e&is=666d21de&hm=8a1dbd29b4146bc6017bc4368b234e1543e9b10997f8e5260eccbb0a7b139404&

its.jcw

06/15/2024, 6:07 AM

Some learnings so far: - Streaming doesn't seem to be working AT ALL for

11labs

. However, it DOES seem to be working for

playht

. - I'm bumping into what I think is input buffering somewhere downstream from my custom-llm, and I've worked around it by emitting an SSE "comment" when I want to "flush" the output through, and it seems to be working! I'm now seeing the messages appearing in the VAPI log at the correct timestamps. - It seems like the default configuration won't actually take advantage of this streaming capability though, unless you ALSO configure Punctuation Boundaries. Once I set this to

period

, I seem to be making some progress now finally. - It seems to still not be voicing text input that is too short though, even though I have min chars set to 3 and there is a punctuation mark at the end. Going to continue to experiment with how I can force it to voice these shorter "filler" words, because I know there won't be any more text streamed for a bit.

its.jcw

06/15/2024, 7:15 AM

are there any other special "directives" i can use to influence when TTS happens?

Copy code

res.write('data: [DONE]\n\n');

Sahil

06/15/2024, 9:10 AM

Give me time till monday. I will build some example and share with you!

its.jcw

06/15/2024, 9:20 AM

That would be amazing, thanks!

its.jcw

06/15/2024, 9:24 AM

My last update for now: I was able to get a TTS happening "mid-stream" by using

playht

and ensuring that: - the sentence is long enough - the sentence ends in a punctuation mark (which is also explicitly listed in the voice config) - the punctuation mark has a trailing space I'm trying to get TTS happening at three places over 10 seconds - the beginning, the middle and the end. So far I'm only able to get it to happen at the beginning and the end - I'm not sure how else to influence VAPI to engage the TTS service, hopefully this upcoming sample will provide that answer! I actually don't want to use

playht

, I want to use

11labs

- if possible, can you get the sample working with

11labs

Sahil

06/15/2024, 10:20 AM

I will need some more context about how exactly you are doing things. Let's have a meeting and understand your process, and then I will be able to help you out in a better way.

Sahil

06/17/2024, 3:27 AM

https://dump.sahilsuman.me/streaming-custom-llm-vapi.txt

Sahil

06/17/2024, 3:27 AM

@its.jcw Here you go

its.jcw

06/17/2024, 10:47 PM

Thanks for the call today @Sahil - let me know what you hear back! Here's the call id's again if needed: 11labs 1794217e-aa8e-4cac-8939-ddbebf880b37 playht f5d92aea-ae82-4d4f-b0a2-12b977a2dc61

Sahil

06/18/2024, 2:28 PM

Will discuss about it tonight.

Sahil

06/18/2024, 5:36 PM

I talked to Nikhil about this, and he told me that different voice providers handle data in different ways. It's not just about the word characters; punctuation and other factors also play a role. He told me that in order to fix this part for 11labs, just add "" in the string message. For example: "Hello, How are you? "

its.jcw

06/18/2024, 10:10 PM

omg that... works! wow, thanks so much - we'll explore this further

7 Views

Previous Next