its.jcw
06/14/2024, 6:49 PM03:30:53:917
[CHECKPOINT]
Model request started
03:30:53:918
[LOG]
Model request started (gpt-4o, custom-llm)
03:31:04:034
[CHECKPOINT]
Model sent start token
03:31:04:035
[LOG]
Model output: Let me think.
03:31:04:035
[CHECKPOINT]
Model sent first output token
03:31:04:037
[LOG]
Voice input: Let me think.
03:31:09:033
[LOG]
Model output: still thinking.
03:31:09:034
[LOG]
Voice input: still thinking.
03:31:13:830
[LOG]
Model output: still thinking.
03:31:13:832
[LOG]
Voice input: still thinking.
03:31:13:834
[CHECKPOINT]
Model sent end token
03:31:14:322
[CHECKPOINT]
11labs: audio received
03:31:14:361
[CHECKPOINT]
Assistant speech started
03:31:14:361
[INFO]
Turn Latency: 20446ms (Endpointing 3ms, Model 10118ms, Voice: 10288ms)
03:31:17:269
[CHECKPOINT]
Assistant speech ended
Vapi
06/14/2024, 6:50 PMSahil
06/14/2024, 9:29 PMits.jcw
06/15/2024, 3:59 AMits.jcw
06/15/2024, 5:08 AMits.jcw
06/15/2024, 6:07 AM11labs
. However, it DOES seem to be working for playht
.
- I'm bumping into what I think is input buffering somewhere downstream from my custom-llm, and I've worked around it by emitting an SSE "comment" when I want to "flush" the output through, and it seems to be working! I'm now seeing the messages appearing in the VAPI log at the correct timestamps.
- It seems like the default configuration won't actually take advantage of this streaming capability though, unless you ALSO configure Punctuation Boundaries. Once I set this to period
, I seem to be making some progress now finally.
- It seems to still not be voicing text input that is too short though, even though I have min chars set to 3 and there is a punctuation mark at the end. Going to continue to experiment with how I can force it to voice these shorter "filler" words, because I know there won't be any more text streamed for a bit.its.jcw
06/15/2024, 7:15 AMres.write('data: [DONE]\n\n');
Sahil
06/15/2024, 9:10 AMits.jcw
06/15/2024, 9:20 AMits.jcw
06/15/2024, 9:24 AMplayht
and ensuring that:
- the sentence is long enough
- the sentence ends in a punctuation mark (which is also explicitly listed in the voice config)
- the punctuation mark has a trailing space
I'm trying to get TTS happening at three places over 10 seconds - the beginning, the middle and the end. So far I'm only able to get it to happen at the beginning and the end - I'm not sure how else to influence VAPI to engage the TTS service, hopefully this upcoming sample will provide that answer!
I actually don't want to use playht
, I want to use 11labs
- if possible, can you get the sample working with 11labs
?Sahil
06/15/2024, 10:20 AMSahil
06/17/2024, 3:27 AMSahil
06/17/2024, 3:27 AMits.jcw
06/17/2024, 10:47 PMSahil
06/18/2024, 2:28 PMSahil
06/18/2024, 5:36 PMits.jcw
06/18/2024, 10:10 PM