Voice Input does not match Model Output Vapi AI #support

Voice Input does not match Model Output

arash

11/11/2024, 11:53 PM

Hey Team, I've isolated what I think is an issue. It seems that the Model Output is being modified prior to the Voice Input. In the image you can see the model is outputting "--" as per my prompt request. I need this input to go into the TTS. However, the Voice Input seems to strip that out. This is is impacting the generated audio. See logs [here](https://dashboard.vapi.ai/calls/0986a34e-9d5f-44d1-83d5-7297fb397956). Can this be prevented? It's happening to other SSML tags as well (see [here](https://dashboard.vapi.ai/calls/3752243f-07eb-4370-82bc-bce0ca0efad8)).

Vapi

11/11/2024, 11:56 PM

@arash Hello! I understand that you're experiencing an issue where certain characters or SSML tags are being stripped out before the text is passed to the Text-to-Speech (TTS) engine, which is affecting the generated audio. To address this, we need to ensure that the

inputMinCharacters

and

inputPunctuationBoundaries

settings are configured correctly to allow such characters and tags to be passed through without modification. For the

inputMinCharacters

, we can set an appropriate minimum character count that includes your SSML tags or special characters. And for

inputPunctuationBoundaries

, we can include characters like "--" or any specific SSML tags you're using to the array so that they are recognized as valid input boundaries. Could you please provide me with the values you'd like to set for

inputMinCharacters

and the specific characters or SSML tags you want to include in

inputPunctuationBoundaries

? This way, I can assist you in configuring these settings correctly.

Vapi

11/11/2024, 11:56 PM

You can continue this conversation by mentioning me in the message.

arash

11/12/2024, 12:00 AM

@Vapi The

inputPunctuationBoundaries

don't allow for custom modification (i.e. i can't specify

--

) I didn't set the

inputMinCharacters

but I have just set it to 20, but not seeing a difference.

Vapi

11/12/2024, 12:00 AM

@arash I see, if the

inputPunctuationBoundaries

cannot be customized to include specific characters like

--

, and adjusting the

inputMinCharacters

to 20 hasn't resolved the issue, it seems we might be dealing with a limitation or a bug in the system. Since the settings you've adjusted do not appear to influence the behavior you're describing, I recommend reaching out to our support team for a more in-depth investigation. They can provide further assistance and potentially escalate the issue to our technical team if necessary. Please provide them with the details you've shared here, including the logs and the specific problem with the SSML tags and characters being stripped out. They will need this information to understand the context and provide an accurate solution. Would you like me to guide you on how to contact support or is there anything else I can assist you with at this moment?

arash

11/12/2024, 12:00 AM

yep - how can i message support?

arash

11/12/2024, 12:04 AM

@Vapi Please help me contact support?

Vapi

11/12/2024, 12:04 AM

@arash Certainly! You can contact our support team by sending an email with the details of your issue. Please include the logs, the behavior you're experiencing with the SSML tags and characters, and any other relevant information that could help the support team understand and address your problem. Here is the email address to reach out to our support team: [support@vapi.ai](support@vapi.ai) Feel free to compose an email to them, and they will get back to you as soon as possible to assist with your issue. If you need any help drafting your message or if there's anything else I can do for you, please let me know!

arash

11/12/2024, 3:29 PM

@User as this issue is still occurring, as per my my most recent test as well: https://dashboard.vapi.ai/calls/61fcddf1-02c2-4dca-a83b-68f539f9c578

Shubham Bajaj

11/12/2024, 4:14 PM

@arash 1. For Cartesia we don't support SSML tags yet. 2. Because of voice input formatting, the tags were replaced.

logs

🔵 15:22:56:943 Voice Input Formatted: "Absolutely. Can you share your full name?", Original: "Absolutely. Can you share your full name?" https://cdn.discordapp.com/attachments/1305681916984299560/1305928617338867753/Screenshot_2024-11-12_at_9.42.26_PM.png?ex=6734d04d&is=67337ecd&hm=e217c977fb38e1878923c2a17557cf93ba3f8185c757c311e7472d376b480b44&

Shubham Bajaj

11/12/2024, 4:14 PM

Hey @arash you need to use 11labs instead to use SSML.

arash

11/12/2024, 4:14 PM

Hi @Shubham Bajaj all these examples are via 11labs!

arash

11/12/2024, 4:17 PM

Hi @Shubham Bajaj - my apologies, one of these demos was with cartesia, but i see the same thing with the 11labs lab voice here: https://dashboard.vapi.ai/assistants/27f57585-a295-4149-aa6a-b1c32e9b94c1

arash

11/12/2024, 4:18 PM

In this example i shared the "--" tags are being removed, and those are needed for my input. I can reproduce a specific one iwth SSML tags now for you as well.

arash

11/12/2024, 4:21 PM

Here is the same demo with 11labs: https://dashboard.vapi.ai/calls/3aaa84e8-ea1c-435f-8d8e-80acabc11e62

Shubham Bajaj

11/12/2024, 4:33 PM

@arash can you enableSSMLParsing and then try out. For property name check image shared. https://cdn.discordapp.com/attachments/1305681916984299560/1305933618903912459/Screenshot_2024-11-12_at_10.03.30_PM.png?ex=6734d4f5&is=67338375&hm=0f4ef4f66202839c7940d54ed274507910c395c1561d53e721dd2e2e84266234&

arash

11/12/2024, 4:49 PM

Hey @Shubham Bajaj, I'm using the VAPI UI to create this. How can I specify that parameter?

arash

11/12/2024, 5:43 PM

Hey @Shubham Bajaj - I used the API to update the assitant to set that parameter. It seems though that the quotations are bring stripped out. These quotes are necessary to make it work - I've tested on 11labs w/o quotes, with single quotes (

) and it only works with double quotes (

). Is there a parameter that I can just put in the raw input?? https://dashboard.vapi.ai/calls/81de4432-b0a7-4131-9497-c86ed6aff478 https://cdn.discordapp.com/attachments/1305681916984299560/1305951114176364674/image.png?ex=6734e541&is=673393c1&hm=a1d033445d357d20b975b719bc12e900e0b24f772e6be57c69e274dbbe4f6a3c&

Shubham Bajaj

11/13/2024, 9:09 AM

Hey @arash you can turn the voice formatting to

off

but it will have negative effect. I will check with team alternative for this, please allow me some time.

arash

11/13/2024, 9:35 PM

Thanks @Shubham Bajaj - at minimum, is there a way to such that we can provide

--

to not be filtered out? This is a special indicator within 11labs.

Shubham Bajaj

11/14/2024, 12:59 PM

Hey @arash It just came to my mind you can use chunking plan for it to replace back the speical tags for ssml parsing.

Copy code

json
  "voice": {
    "provider": "11labs",
    "voiceId": "burt",
    "chunkPlan": {
      "formatPlan": {
        "replacements": [
          {
            "type": "exact",
            "key": "<break time=3s />",
            "value": "<break time=\"3s\" />"
          }
        ]
      }
    }

Try out using replacements and let me know it goes, replacement text is part of formatting plan so it won't be changed or formatted and best part is applied at the last of all formatting so your secured.

Shubham Bajaj

11/14/2024, 1:00 PM

Also

dashes

will be replaced with spaces. So, try out this and let me know meanwhile I will check what are other alternatives.

tom_blaser

11/14/2024, 4:02 PM

I'm having a similar issue. Is there somewhere with more detail on how the voice formatting is done? Using replacements is somewhat unpredictable without more explicit understanding of what gets replaced and when.

Shubham Bajaj

11/15/2024, 1:51 PM

Replacement replaces the key content with value content, you provide for the LLM model output, so its predictable what will be replaced and when be replaced. If you have more doubts around it create a support ticket list out your doubts I will help you for sure.

Shubham Bajaj

11/15/2024, 1:53 PM

Hey @arash checking if you got chance to try it out?

3 Views

Previous Next