Need assistance in pronunciation of AI agents in s...
# support
f
Hello Vapi assistant I am looking for your wisdom on the best overall Vapi setup (LLM + Transcriber + TTS/Voice combo) for smaller languages like Slovak, Czech, etc. My dilemma is the classic trade-off: My current "fast & accurate" setup (Deepgram Nova-2 + OpenAI TTS-1 w/ GPT 4o realtime mini Cluster, Shimmer voice) has great low latency and is reliable for specifics like reading numbers corectly. I've also experimented with both GPT-4o Realtiem Mini and the standard GPT-4o Realtime as the LLM in this setup, but honestly didn't perceive a major difference between them for my needs. But using ElevenLabs for that beautiful voice quality adds noticeable latency. What combos are you finding give the best balance between low latency, high reliability/accuracy, AND natural-sounding voice in these languages within Vapi? Side note: I hit validation-failed errors trying to use Anthropic/Claude so I couldn't even properly test its performance. Has anyone had success using Claude models smoothly in voice agents ? I would like to test Claude Sonet 3.7 Really appreciate any shared experiences or recommendations! Thanks!
v
Hey there! 👋 I understand you're looking for the optimal Vapi setup for smaller languages like Slovak and Czech. While I can't provide a definitive "best" setup specifically for these languages or confirm if anyone has successfully used Claude models smoothly, I can definitely share what I know about available options and common experiences! ## Multilingual Support Options For transcription (speech-to-text), you're already using a solid option: - Deepgram Nova-2 is known for its accuracy in various languages, including support for features like code switching and keywords, with noted accuracy in languages like Russian according to [Vapi documentation](https://docs.vapi.ai/customization/multilingual). For text-to-speech, you have several options: - Azure offers the most extensive language support with approximately 400 prebuilt voices across 140 languages and variants, making it a strong candidate for smaller languages like Slovak and Czech according to [Vapi's multilingual documentation](https://docs.vapi.ai/customization/multilingual). - ElevenLabs provides high-quality, natural-sounding voices with fine-tuning capabilities for speech parameters, though as you've experienced, it comes with higher latency according to [Vapi's documentation](https://docs.vapi.ai/providers/voice/elevenlabs). - Neets.ai offers ultrafast TTS with low latency streaming and expressive voices, which might be worth exploring for your latency concerns according to [Vapi's documentation](https://docs.vapi.ai/providers/voice/neets). ## LLM Options For LLMs, you mentioned using GPT-4o Realtime Mini: - GPT-4o-mini has a 128k context window but lower reasoning capability compared to GPT-4o, while the Realtime model is still in beta according to Vapi's documentation. - GPT-4o Realtime is an OpenAI native speech-to-speech model that processes audio in and audio out, currently in beta. ## Latency Optimization To optimize for low latency, consider:
- Optimizing network infrastructure, server configuration, and code efficiency - Using efficient protocols like WebRTC or WebSocket - Implementing efficient audio processing - Conducting load testing and using monitoring tools - Hosting your server in the EU to minimize latency for EU-based users if that's where your users are located According to [Vapi's documentation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336685825227755591), these strategies can significantly improve response times. ## Anthropic/Claude Issues Regarding the validation-failed errors with Anthropic/Claude models: - Several users have reported similar validation errors when trying to use these models - The errors may be due to schema differences between Anthropic and OpenAI - For Anthropic models, ensure all tool properties are alphanumeric - Issues can occur if
tool_result
blocks are missing or incorrectly formatted According to [user discussions](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332510949940133960), these validation issues are a known challenge when working with Claude models in Vapi. Is there a specific aspect of your setup you'd like me to explore further? Sources: - [Vapi documentation](https://docs.vapi.ai/customization/multilingual) - [Vapi's documentation on ElevenLabs](https://docs.vapi.ai/providers/voice/elevenlabs) - [Vapi's documentation on Neets](https://docs.vapi.ai/providers/voice/neets) - [Discord discussion on latency optimization](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336685825227755591) - [Discord discussion on Anthropic/Claude issues](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332510949940133960)
v
Message marked as helpful by @Fanfantulipan! 🎉
p
checking if this is resolved/solved for you?