@Dululu For Chinese voice AI using Vapi.ai, the best voice model depends on whether you're prioritizing TTS (text-to-speech) quality, STT (speech-to-text) accuracy, latency, or cost.
Here's a breakdown of the best current options for Chinese support in Vapi-compatible models:
---
1. ElevenLabs (for TTS)
Pros: Very high-quality, expressive, near-human voices.
Chinese Support: Limited. Currently, it's experimental and not fully natural for Mandarin.
Use if: You prioritize ultra-realistic voice quality for English but not ideal for Chinese yet.
---
2. OpenAI TTS (via Vapi)
Voices: onyx, nova, echo, etc.
Chinese Support: Good, works well with Mandarin, especially if the input is clean.
Pros: Low latency, good tone, natural pacing.
Use if: You want smooth, multi-language support including Chinese, and high-quality TTS.
---
3. Azure Neural TTS (Microsoft)
Voices: zh-CN-XiaoxiaoNeural, zh-CN-YunjianNeural, etc.
Chinese Support: Excellent, with regional accents and expressive style options.
Pros: Highly realistic, supports SSML, stable.
Use if: You want the best Chinese TTS, especially for a production use case.
---
4. Google Cloud TTS
Voices: cmn-CN-Wavenet-A, cmn-CN-Wavenet-D, etc.
Chinese Support: Solid, reliable.
Pros: Good quality, decent latency, supports various dialects.
Use if: You're already in the Google ecosystem and need solid TTS.
---
5. Deepgram (for STT - speech-to-text)
Language: Mandarin (zh)
Accuracy: Very good, especially with clean audio.
Use if: You want real-time transcription of Chinese voice input. Works well with Vapi.
---
Recommended Pair for Chinese Vapi Bot:
TTS: Azure zh-CN-XiaoxiaoNeural or OpenAI onyx
STT: Deepgram with language set to zh