@Muppet
Hello! I'm here to help with any issues you're encountering with VAPI. It's great that you're exploring the capabilities with a non-English assistant, and I understand how important accurate transcription is for maintaining the quality of the service.
The behavior you're describing, where the platform uses speech-to-text (STT) for the entire conversation turn, including the assistant's output, can indeed introduce errors, especially when converting from text to speech and back to text. This is a known challenge in voice technology, particularly with non-English languages where the STT models may not be as robust.
To address this, it's often recommended to handle the assistant's output separately from the user's input. The assistant's output can be managed by directly using the text generated from the language model, rather than relying on a transcription of the spoken output. This can help to avoid the compounding errors you mentioned.
For the user's input, improving the accuracy of STT can involve fine-tuning the language models, providing more training data in the target language, or adjusting the acoustic models to better handle the specific accents and dialects.
If you're looking for more detailed feedback or assistance, the VAPI community forums or the official support channels would be the best places to discuss these issues. There, you can share your experiences with other developers and get insights from the VAPI team on best practices and potential solutions.
If you have any specific questions or need further assistance with your current setup, please let me know, and I'll do my best to assist you!