Pronunciation Dictionary
# support
i
I am noticing that text to speech models are having trouble pronoucing some of the words passed to them by the LLM. Is there any way to pass in a pronunciation dictionary that will replace the LLMs output with the correct phonetic spelling? Here is the documentation as to how it works with the Cartesia API: https://docs.cartesia.ai/2024-11-13/build-with-cartesia/capability-guides/specify-custom-pronunciations
v
Hi there! 👋 I understand you're having issues with text-to-speech models mispronouncing certain words. Based on the information I have, there are several ways to handle pronunciation issues in Vapi: ## Using Cartesia's Pronunciation Dictionary Since Vapi integrates with Cartesia as a text-to-speech provider, you can use Cartesia's custom pronunciation feature. According to the [documentation you shared](https://docs.cartesia.ai/2024-11-13/build-with-cartesia/capability-guides/specify-custom-pronunciations), you can specify custom pronunciations with the Cartesia API, and this should work when using Cartesia as your TTS provider in Vapi. ## Using SSML for Phonetic Replacement Another approach is to use SSML (Speech Synthesis Markup Language) for phonetic replacement. To implement this in Vapi: 1. Enable SSML parsing by setting
enableSsmlParsing: true
in your configuration 2. Use SSML tags to define how specific words should be pronounced This method is particularly effective when using ElevenLabs with Vapi, as mentioned in [Vapi's documentation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1338644573207007304). ## Other Options in Vapi - **Custom TTS Models**: You can connect custom TTS models to Vapi that might handle pronunciations better. Check the
CustomVoice
API documentation for more details. - **Voice Pipeline Tweaks**: Some aspects of configuring your voice pipeline may require tweaks to get the desired pronunciation behavior. - **Custom Formatters**: Vapi may allow the use of custom formatters, especially for languages other than English, to address pronunciation issues. Would you like more specific information about implementing any of these approaches? I'd be happy to help further! Sources: - [Vapi Documentation on Pronunciation Issues](https://docs.vapi.ai/quickstart/phone/inbound)
k
Can you share the call ID and also list out the words which you want to get pronounced correctly and how they’re and also pronounced in the call along with their timestamp, so I could understand what could have been going wrong for you?
i
@Shubham Bajaj ID: 94d3f137-0a01-4a0b-afd7-10cd73fe7c83 Timestamp: 0:03 Word: Zenzedi Pronunciation: zen-ze-di
k
🔵 16:11:23:323 Cartesia Pushing (1-366830a)... 60 "Zenzadi is a medication often used for conditions like ADHD." If you can't share an audio recording sample of the correct pronunciation, I can try checking with the Cartesia team.
Now we don't support pronunciation dictionaries of 11labs, but you can use there prompting guide to pronounce it correctly.