Issue with Mode Parameter in TalkScriber Configuration Vapi AI #support

Issue with Mode Parameter in TalkScriber Configura...

Tedros

11/16/2024, 9:40 AM

Good morning, I’ve been working with TalkScriber for multilingual transcription (via the web SDK) and noticed an issue with preserving the original spoken language. Despite setting the mode parameter to “transcribe” and removing the language parameter to allow auto-detection, transcripts are still translated to English by default. This behavior persists unless the language parameter is explicitly set, which defeats the purpose of using “transcribe” mode to preserve the spoken language. Interestingly, the transcription seems to work correctly for a few seconds, maintaining the spoken language, but then it falls back to English, translating everything into English even though the spoken language is detected. Occasionally, it also bugs out entirely. (Through both the web SDK and the Vapi dashboard) Here’s the configuration I’m using:

Copy code

transcriber: {  
  model: "whisper",  
  mode: "transcribe", // set mode to transcribe rather than the default of translate  
}

I also tried the below configuration with hidden parameters but it had the exact same behavior:

Copy code

transcriber: {
model: "whisper",
mode: "transcribe", // Keeps the transcript in the original language
language: null, // Ensures autodetection
multilingual: true // Enables multiple languages
}

I found the following documentation for TalkScriber’s Whisper model that confirms the purpose of the mode parameter:

Copy code

parser.add_argument("--task", type=str, default="transcribe", choices=["transcribe", "translate"], help="whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')")  
parser.add_argument("--language", type=str, default=None, choices=sorted(LANGUAGES.keys()) + sorted([k.title() for k in TO_LANGUAGE_CODE.keys()]), help="language spoken in the audio, specify None to perform language detection")

This documentation suggests: • Transcribe: Transcribes audio in the detected or selected language. • Translate: Translates audio from the detected or selected language to English. Based on this, “transcribe” mode should preserve the spoken language, but in practice: • When I speak in Spanish without explicitly setting a language, the transcript starts in Spanish for a couple of seconds and then switches to English. • Explicitly setting the language parameter works to maintain the spoken language, but that negates the value of auto-detection and using “transcribe” mode. It seems the issue could be caused by one of the following: 1. A default language (possibly English) in VAPI’s implementation overriding “transcribe” mode. 2. The mode parameter being ignored or incorrectly configured. Would it be possible to: 1. Remove any default language for “transcribe” mode or make language a required parameter to ensure accurate behavior? 2. Confirm that the mode parameter is functioning as intended to preserve the spoken language without auto-translating to English? This issue is critical for us, as preserving the original language is essential for our multilingual applications. Let me know if you need any additional details, and thank you so much for your support!

Vapi

11/16/2024, 9:43 AM

@Tedros Good morning! Thank you for bringing this to our attention. It sounds like you've done a thorough job of diagnosing the issue with the TalkScriber's Whisper model. Your understanding of the "transcribe" mode is correct—it should indeed preserve the spoken language without translating it to English. The behavior you're experiencing is not expected, and I appreciate the detailed explanation you've provided. It seems there might be a bug or a misconfiguration in the system that's causing the language auto-detection to switch to English after a few seconds. To address your requests: 1. We can look into the possibility of removing any default language settings that might be affecting the "transcribe" mode. This would ensure that the transcription sticks to the detected language without reverting to English. 2. We will also confirm that the mode parameter is functioning correctly and not being overridden or misinterpreted by the system. To proceed with resolving this issue, I'll need to escalate it to our technical team for further investigation. They will check the implementation details and ensure that the "transcribe" mode works as intended, preserving the original spoken language throughout the transcription process. In the meantime, if you have any additional details or specific examples that could help us pinpoint the issue, please feel free to share them. Your cooperation is invaluable in helping us improve our services. Thank you for your patience and understanding. We're committed to resolving this as quickly as possible to support your multilingual applications. If you have any other concerns or need further assistance, please don't hesitate to ask.

Vapi

11/16/2024, 9:43 AM

You can continue this conversation by mentioning me in the message.

Shubham Bajaj

11/18/2024, 1:12 PM

Hey @Tedros pasting the recent response from the email thread here, in case you want to continue from discord.

Copy code

They haven't provided any specific details about the issue or an ETA yet, but I’ll do my best to get an update from them. After reading through your emails, I can truly understand the frustration you're facing. While it may not be as ideal as your current suggestions, would it be possible to resolve the issue at the telephony level? By this, I mean routing the calls to language-specific assistants based on the caller's country of origin. For this, only the telephony system can expose the necessary APIs.

Let me know what you think.

Tedros

11/18/2024, 1:28 PM

Hey thanks for getting back to me, unfourtunatly that route wouldn't work as i'm not connecting to assistant ids. I am creating the assistant based on configs loaded from an external db on page render. So just creating the assistant through code config rather than just connecting to an assistant id and pulling the info.

Tedros

11/18/2024, 1:32 PM

I'm a bit confused as all the multilingual capable transcription services that i've tried in vapi have the same behaviour of falling back to english (deepgram code switching just wasn't reliable but gladia has the same behaviour as talkscriber). Is there anything with the integration that might be casuing it? Or is it just an issue on gladia and talkscribers side. I'd much prefer to use talkscriber like i mentioned before (if possible), but was hoping at least one of them would work for the time being. The issue with talkscriber seems like it can absolutley be fixed assuming they are willing to provide some more details so hopefully they do. Is there anything I can do in the meantime given these not so ideal constraints?

Tedros

11/18/2024, 5:21 PM

also after it succesfully records the transcript in the original language in the first 20 seconds or so, it will begin to start responding to itself. It looks as if the transcription as a whole totally bugs out and the call can't be continued. multilingual functionality just seems incredibly unstable within talkscriber from my observations

Shubham Bajaj

11/18/2024, 10:17 PM

If your available we can connect in next 5 minutes and do a brainstorm on possibilities.

Shubham Bajaj

11/18/2024, 10:17 PM

The issue is from transcriber side so we also cannot do much more about it.

Shubham Bajaj

11/18/2024, 10:19 PM

Yes they are more focused on single language transcription accuracy and latency to give you better experience but soon they will pick multi lingual.

Tedros

11/18/2024, 10:20 PM

That works for me

Shubham Bajaj

11/18/2024, 10:22 PM

Check your DM.

5 Views

Previous Next