IS VAPI Postprocessing LLM responses for text to s...
# support
a
Hey team! When i look at logs sometimes the model ouput is not exactly what is put into the voice input. I need it to be the same as the output from the llm, looks like there may be a prompt altering the output a bit can you confirm? For example I ask the model to respond with dashes between numbers like 5 - 5 - 5 but i notice when passed to the text to speech it is five five five and not including the dashes. I need dashes or other XML in their for eleven labs to correctly pronounce, say or pause, how can I do this? ---> https://elevenlabs.io/docs/product/speech-synthesis/prompting Thanks!
v
@AiTitus :) Hello! It sounds like you're encountering an issue where the formatting of the output from the language model isn't being preserved when passed to the text-to-speech (TTS) system. To ensure that the TTS system, such as ElevenLabs, correctly interprets the output with dashes or other SSML (Speech Synthesis Markup Language) tags, you'll need to make sure that the output from the language model includes these characters or tags as part of the response. If the model is currently set up to remove or alter these characters, you might need to adjust the prompt or the post-processing of the model's output to ensure that the desired formatting is maintained. This could involve modifying the prompt to explicitly instruct the model to include dashes or SSML tags, or it could involve a post-processing step where you add these elements back into the model's output before passing it to the TTS system. If you have control over the prompt or the post-processing logic, you can make these adjustments yourself. If not, you may need to reach out to the development team responsible for the voice pipeline to request these changes. If you need further assistance with this or have specific requirements for the SSML tags, please let us know, and we'll do our best to help you out!
You can continue this conversation by mentioning me in the message.
s
Hey @AiTitus :) yes we do input formatting before passing it to the voice providers and dashes are replaced over there, If you disable voice formatting it may have a negative effect instead you can use
<break>-or-similar
from 11labs along with replacement plan to add the
"
post formatting. Do let me know if you have any questions.
a
how do i disable voice formatting?
i handle the LLM output on my end to make sure it is good for voice.
what do you mean use instead you can use -or-similar from 11labs?
will your formatting mess with this format for example? If I can rely on thiss then i can pass that from the LLM to TTS
@Shubham Bajaj please provide an example of how this would look: "you can use -or-similar from 11labs along with replacement plan to add the " post formatting."
s
Hey @AiTitus :) we have disable the quotes and voice formatting, can you try now?
a
@Shubham Bajaj looks like its stayin MINUS so i am trying the break XML to see if that works, just fyi...call ID: 55ad44a6-6671-4f17-9bec-ec2410b5fcda starts at 1:12
s
> stayin MINUS so i am trying the break XML @AiTitus :) can you please what your trying to say here?
Hey @AiTitus :) just checking how does it goes and what you trying to convey?
a
it is saying the dashes
so i will try the see if it works
one issue is JSON has to be issue is the " have to be escaped , are you unescaping the " when sending to eleven labs
s
Hey @AiTitus :) can you rephrase it I cannot understand what your trying to say.
a
i try and pass it in JSON but then the " " are escaped and elevenlabs does not use it properly when this
any suggestions?
i inject into the response from LLM like this "Hey @Shubham Bajaj how are you you?" so when it goes to VAPI it works as proper JSON but then it looks like i tbreaks eleven labs feature
s
@AiTitus :) we have disabled formatting of
"
in voice formatting plan, can you share the recent call id where it's still breaking? Apology for not replying within sub-second because of list of tickets sometimes some messages gets lost in the stack.
4 Views