Japanese support Vapi AI #support

Japanese support

Ben

12/04/2024, 1:46 PM

I'd like to better understand what aspects of the Orchestration layer are expected to work with Japanese and how well: - Endpointing - "Based on both the user’s tone and what they’re saying, it decides how long to pause before hitting the LLM." does it understand Japanese for this purpose? - Interruptions (Barge-in) - does it work with Japanese words for interuptions - Background Voice Filtering - is it trained on any Japanese language voices? - Backchanneling - can it avoid interruptions when backhanneling in Japanese - emotion detection - is it based on language or tone of voice (would it work with Japanese) - Filler Injection - can it add Japanese fillter words

Ben

12/04/2024, 2:09 PM

@User

Shubham Bajaj

12/06/2024, 7:15 AM

@Ben - Endpointing: You can use the Deepgram endpointing model for this. - Interruptions: Yes, it handles interruptions. - Background Voice Filtering: It focuses on isolating the user's voice from background noise, prioritizing user speech audio signals. - Backchanneling: By default, it's in English, but based on your voice language, it will be voiced in Japanese. - Emotion Detection: Works on the text model, but is not very effective. - Filler Injection: This determines whether fillers are injected into the model output before inputting it into the voice provider.

Shubham Bajaj

12/06/2024, 7:15 AM

Do let me know if you have more questions.

Ben

12/06/2024, 10:43 AM

OK thank you @User ! So just to clarify: - Endpointing, Interuptions and Background noise filtering all good with Japanese - Backchanneling: if the user says in a Japanese equivelent of "uh-huh", e.g. そうだね it would NOT recognise that as it recognises only English, but the vice versa (the agent backchanneling the user) - could respond with そうだね (as it translates the English "uh-huh" to Japanese) - Emotion Detection: It does work on Japanese text in principle, but not trained for that (not very effective) - Filler Injection: Does it know Japanese filler words (umm... in Japanese would be e.g. あのう。。。) to be able to achieve this? Are you able to explain a little more about how it works as that would help me understand I think (i.e. is it's taking the text from the LLM and using another LLM to inject filler into output before it gets text to speech-ed? in which case it would depend on that post-processing LLMs support for Japanese I suppose) The other related question I had was about latency in Japan; I can't see any options with Vapi to specify the region - I guess Vapi using a mixture of regions based on the region of the individual APIs selected by users so it would be hard to say what region(s) it's currently based in? I ask as, for example, I think Azure's text to speech can in principle be hosted in Japanese region for lower latency, but with Vapi is it right we would not be able to use that?

Shubham Bajaj

12/10/2024, 10:17 PM

Regarding regions, as of now, you need to use your own keys and specify the required regions for STT through the provider credentials section only. - Backchanneling: I meant that it is generated in English but can be spoken in the voice's language based on the input voice used. - Emotion Detection: It works on text regardless of the language. - Filler Injection: This determines whether fillers are injected into the model output before it is input into the voice provider.

5 Views

Previous Next