Voice Input does not match Model Output
# support
a
Hey Team, I've isolated what I think is an issue. It seems that the Model Output is being modified prior to the Voice Input. In the image you can see the model is outputting "--" as per my prompt request. I need this input to go into the TTS. However, the Voice Input seems to strip that out. This is is impacting the generated audio. See logs [here](https://dashboard.vapi.ai/calls/0986a34e-9d5f-44d1-83d5-7297fb397956). Can this be prevented? It's happening to other SSML tags as well (see [here](https://dashboard.vapi.ai/calls/3752243f-07eb-4370-82bc-bce0ca0efad8)).
v
@arash Hello! I understand that you're experiencing an issue where certain characters or SSML tags are being stripped out before the text is passed to the Text-to-Speech (TTS) engine, which is affecting the generated audio. To address this, we need to ensure that the
inputMinCharacters
and
inputPunctuationBoundaries
settings are configured correctly to allow such characters and tags to be passed through without modification. For the
inputMinCharacters
, we can set an appropriate minimum character count that includes your SSML tags or special characters. And for
inputPunctuationBoundaries
, we can include characters like "--" or any specific SSML tags you're using to the array so that they are recognized as valid input boundaries. Could you please provide me with the values you'd like to set for
inputMinCharacters
and the specific characters or SSML tags you want to include in
inputPunctuationBoundaries
? This way, I can assist you in configuring these settings correctly.
You can continue this conversation by mentioning me in the message.
a
@Vapi The
inputPunctuationBoundaries
don't allow for custom modification (i.e. i can't specify
--
) I didn't set the
inputMinCharacters
but I have just set it to 20, but not seeing a difference.
v
@arash I see, if the
inputPunctuationBoundaries
cannot be customized to include specific characters like
--
, and adjusting the
inputMinCharacters
to 20 hasn't resolved the issue, it seems we might be dealing with a limitation or a bug in the system. Since the settings you've adjusted do not appear to influence the behavior you're describing, I recommend reaching out to our support team for a more in-depth investigation. They can provide further assistance and potentially escalate the issue to our technical team if necessary. Please provide them with the details you've shared here, including the logs and the specific problem with the SSML tags and characters being stripped out. They will need this information to understand the context and provide an accurate solution. Would you like me to guide you on how to contact support or is there anything else I can assist you with at this moment?
a
yep - how can i message support?
@Vapi Please help me contact support?
v
@arash Certainly! You can contact our support team by sending an email with the details of your issue. Please include the logs, the behavior you're experiencing with the SSML tags and characters, and any other relevant information that could help the support team understand and address your problem. Here is the email address to reach out to our support team: [support@vapi.ai](support@vapi.ai) Feel free to compose an email to them, and they will get back to you as soon as possible to assist with your issue. If you need any help drafting your message or if there's anything else I can do for you, please let me know!
a
@User as this issue is still occurring, as per my my most recent test as well: https://dashboard.vapi.ai/calls/61fcddf1-02c2-4dca-a83b-68f539f9c578
s
@arash 1. For Cartesia we don't support SSML tags yet. 2. Because of voice input formatting, the tags were replaced.
logs
šŸ”µ 15:22:56:943 Voice Input Formatted: "Absolutely. Can you share your full name?", Original: "Absolutely. Can you share your full name?" https://cdn.discordapp.com/attachments/1305681916984299560/1305928617338867753/Screenshot_2024-11-12_at_9.42.26_PM.png?ex=6734d04d&is=67337ecd&hm=e217c977fb38e1878923c2a17557cf93ba3f8185c757c311e7472d376b480b44&
Hey @arash you need to use 11labs instead to use SSML.
a
Hi @Shubham Bajaj all these examples are via 11labs!
Hi @Shubham Bajaj - my apologies, one of these demos was with cartesia, but i see the same thing with the 11labs lab voice here: https://dashboard.vapi.ai/assistants/27f57585-a295-4149-aa6a-b1c32e9b94c1
In this example i shared the "--" tags are being removed, and those are needed for my input. I can reproduce a specific one iwth SSML tags now for you as well.
a
Hey @Shubham Bajaj, I'm using the VAPI UI to create this. How can I specify that parameter?
Hey @Shubham Bajaj - I used the API to update the assitant to set that parameter. It seems though that the quotations are bring stripped out. These quotes are necessary to make it work - I've tested on 11labs w/o quotes, with single quotes (
'
) and it only works with double quotes (
"
). Is there a parameter that I can just put in the raw input?? https://dashboard.vapi.ai/calls/81de4432-b0a7-4131-9497-c86ed6aff478 https://cdn.discordapp.com/attachments/1305681916984299560/1305951114176364674/image.png?ex=6734e541&is=673393c1&hm=a1d033445d357d20b975b719bc12e900e0b24f772e6be57c69e274dbbe4f6a3c&
s
Hey @arash you can turn the voice formatting to
off
but it will have negative effect. I will check with team alternative for this, please allow me some time.
a
Thanks @Shubham Bajaj - at minimum, is there a way to such that we can provide
--
to not be filtered out? This is a special indicator within 11labs.
s
Hey @arash It just came to my mind you can use chunking plan for it to replace back the speical tags for ssml parsing.
Copy code
json
  "voice": {
    "provider": "11labs",
    "voiceId": "burt",
    "chunkPlan": {
      "formatPlan": {
        "replacements": [
          {
            "type": "exact",
            "key": "<break time=3s />",
            "value": "<break time=\"3s\" />"
          }
        ]
      }
    }
Try out using replacements and let me know it goes, replacement text is part of formatting plan so it won't be changed or formatted and best part is applied at the last of all formatting so your secured.
Also
dashes
will be replaced with spaces. So, try out this and let me know meanwhile I will check what are other alternatives.
t
I'm having a similar issue. Is there somewhere with more detail on how the voice formatting is done? Using replacements is somewhat unpredictable without more explicit understanding of what gets replaced and when.
s
Replacement replaces the key content with value content, you provide for the LLM model output, so its predictable what will be replaced and when be replaced. If you have more doubts around it create a support ticket list out your doubts I will help you for sure.
Hey @arash checking if you got chance to try it out?
3 Views