I did not speak, but the model detected sound
# support
w
Hi, I found that when using this model for dialogue, there are some contents that I have not said, but appear in the text, such as "okay, thank you" and the like. What is the reason for this, and how can it be solved?
v
You can continue this conversation by mentioning me in the message.
@WadeSong0121 Hello! It sounds like you're experiencing unexpected phrases like "okay, thank you" appearing in the dialogue text. This could be due to the backchanneling feature, which, when enabled, allows the model to insert small acknowledgments or affirmations to create a more natural conversation flow. If you find these insertions inappropriate or excessive, you can disable the backchanneling feature. Would you like to proceed with disabling it?
w
Yes , How can I disable it
v
@WadeSong0121 To disable the backchanneling feature, you would typically need to adjust the settings in your Voice API or dialogue system configuration. Since this is a general Voice API support, I would need to know the specific system or platform you're using to provide accurate instructions. If you're using a platform that allows you to modify the behavior of the dialogue model through settings or parameters, look for options related to response generation or conversational settings, and you should be able to find the toggle for backchanneling or similar features there. If you're not sure where to find these settings or if the platform you're using doesn't provide direct access to such configurations, please reach out to the support team of the specific Voice API or dialogue system you're using for more detailed assistance.
w
I'm using the model that comes with the vapi, the transcriber is using the talkscriber model, and the voice is using the OpenAI model, so how do I set up the code to disable this property?
v
@WadeSong0121 Since you're using the VAPI with the OpenAI model for voice, you can disable the filler injection feature by setting the
fillerInjectionEnabled
property to
false
. This property controls whether fillers are injected into the model output before inputting it into the voice provider. Here's how you can set it up in your code:
Copy code
javascript
// Assuming you have a configuration object for VAPI
const vapiConfig = {
  // ... other configuration properties
  voice: {
    fillerInjectionEnabled: false, // Set this to false to disable filler injections
    provider: 'OpenAI' // Your current voice provider
  }
  // ... other configuration properties
};

// Use this configuration when initializing or updating your VAPI instance
Make sure to apply this configuration before starting your voice interactions to ensure that the filler injection feature is disabled. If you need further assistance with the code or have any other questions, feel free to ask!
w
What if I'm using azure's model for my voice model, is it the same?
v
@WadeSong0121 Yes, the process to disable the filler injection feature is the same regardless of the voice provider you are using. If you're using Azure as your voice provider, you would still set the
fillerInjectionEnabled
property to
false
in your configuration. Here's how you can do it:
Copy code
javascript
const vapiConfig = {
  // ... other configuration properties
  voice: {
    fillerInjectionEnabled: false, // Set this to false to disable filler injections
    provider: 'azure', // Specify Azure as your voice provider
    voiceId: 'your-azure-voice-id' // Replace with your specific Azure voice ID
  }
  // ... other configuration properties
};

// Apply this configuration to your VAPI instance
Make sure to replace
'your-azure-voice-id'
with the actual voice ID you want to use from Azure. This configuration will prevent the model from injecting fillers like "okay, thank you" into the output provided to Azure's voice model. If you have any more questions or need further assistance, feel free to ask!
w
I have set up but found there seems still some transcript about thank you and so on. And some background noice may also turn to thank you . What else can I do to avoid it
v
@WadeSong0121 If you've already disabled the filler injection feature and are still encountering issues with unexpected phrases like "thank you" appearing in the transcripts, it could be due to the speech recognition model misinterpreting background noise or other sounds as speech. Here are a few steps you can take to mitigate this issue: 1. **Improve Audio Quality**: Ensure that the audio input is as clear as possible. Use noise-cancellation microphones or apply noise-reduction techniques to minimize background noise. 2. **Adjust Model Sensitivity**: Some speech recognition services allow you to adjust the sensitivity or confidence threshold. By increasing the threshold, the model will require a higher confidence level before transcribing speech, which may reduce false positives. 3. **Post-Processing**: Implement a post-processing step in your application to filter out common misinterpretations or unwanted phrases from the transcripts. 4. **Custom Speech Recognition Models**: If you're using Azure, consider using Custom Speech in Azure Cognitive Services to train a model on your specific audio environment and use cases, which can improve accuracy. 5. **Feedback to the Provider**: Provide feedback to the speech recognition service provider (Azure in this case) with examples of the misinterpretations. They may be able to improve their models or offer specific advice. If you've tried these steps and are still facing issues, it might be necessary to reach out to Azure support for further assistance with their speech recognition service. They may have additional tools or settings that can help fine-tune the recognition process to better suit your needs.
w
My voice recognition model uses the talkscriber model configured in the backend. How to Adjust Model Sensitivity
v
@WadeSong0121 If you're using the Talkscriber model and you want to adjust the sensitivity to reduce the likelihood of background noise being interpreted as words like "thank you," you'll need to look into the settings or parameters that the model exposes for this purpose. Typically, speech recognition models may offer parameters such as: - **Confidence Threshold**: A setting that allows you to specify the minimum confidence level for a piece of recognized speech to be accepted. - **Acoustic Model Tuning**: Adjustments to the acoustic model to better handle noisy environments. - **Language Model Tuning**: Adjustments to the language model to prioritize or deprioritize certain phrases. Since these settings are specific to the Talkscriber model and its implementation, you would need to consult the documentation provided for that model or reach out to the support team responsible for the Talkscriber model to get detailed instructions on how to adjust these parameters. If the Talkscriber model is part of a service or platform that you have access to, there might be an API or a configuration file where you can set these parameters. If you're unsure about how to proceed, please ask support for more info on this.
w
In vapi, how should I set these parameters? @User
v
@WadeSong0121 In VAPI, to adjust the sensitivity of the Talkscriber model, you would typically need to modify the configuration settings related to speech recognition. However, VAPI's standard configuration options may not directly expose a parameter named "sensitivity" or "confidence threshold." If VAPI allows for custom configuration of the Talkscriber model, you would look for settings related to confidence levels or similar parameters that control the strictness of the speech-to-text conversion process. This could be in the form of a configuration object or API parameters that you can set when initializing or making requests to the VAPI. Since the specific details on how to set these parameters in VAPI are not provided here, and the mention
<@&1211491065807507517>
does not correspond to a known command or setting within the context of this conversation, I would recommend checking the documentation for VAPI or contacting the support team for detailed instructions on adjusting the model sensitivity. If there are specific parameters or settings you are aware of and would like to adjust, please provide those details, and I can guide you on how to set them within VAPI. If you're unsure or need more information, please ask support for more info on this.
s
Hey @WadeSong0121 To help track down this issue, could you share: - The call ID - When exactly this happened (the timestamp) - What response you expected to get - What response you actually got instead This would really help me figure out what went wrong!
w
Thanks for your help. Here is the information .I hope you can help identify this, it is now having a very serious impact on the user's normal use. How should I configure to fix it? 1、callid: 93a9a127-be81-4c2c-b42a-7df8f822aa74 2、time: 2024-12-01 08:09:45s 3、In this conversation, the user never says anything, but it recognizes that a lot of Thank you-like words are said as shown in the figure. actually the ambient sound is also very small, but there are still so many misbehaviors https://cdn.discordapp.com/attachments/1311140522785767475/1312706887753400340/image.png?ex=674d790e&is=674c278e&hm=05b42ba7aa0c38f2c0ade5833a5209ddbd866ef665b3f90a5bfd4ad629bf3803&
And I have tried another model. There seems the same . The gladia - fast model. The config is "backgroundSound": "off", "backchannelingEnabled": false, "backgroundDenoisingEnabled": false, "fillerInjectionEnabled": false 1. callid:81f0361a-cca7-4e13-85e3-74ed489f5a29 2. time:2024-12-02T09:56:32 3. There shouldn't be any words because I didn't say anything. https://cdn.discordapp.com/attachments/1311140522785767475/1313082145262272532/image.png?ex=674ed68b&is=674d850b&hm=fd379f8bdd85a5a1f5ba17a0940cbb8d66f96515fc3a5c6a719eaeaa78b810e5&
s
@WadeSong0121 allow me sometime to get back to you on this, If I don't reply within a day do ping me!!
w
Hi ,what is the current status of this issue? @Shubham Bajaj
s
@WadeSong0121 I am waiting for the service provider response.
w
Hi is there any update? @Shubham Bajaj
I hope can solve the problem as soon as possible because these badcases have caused a lot loss of users. I have posted the question in 1st , and today is 6th. @User @User
s
Hey @WadeSong0121 First, updates take time because I need to diagnose the issue on my end, then within the team, followed by cross-team collaboration. After that, we verify what happened and identify possible causes. Now, regarding your issue: the Talkscriber model sometimes hallucinates due to these
out-of-no-where
you observed transcriptions. To address this, you may need to use a different transcriber or a custom transcriber.
Do let me know your thoughts on this.
w
I also encountered this problem when using the gladia model. As shown above, it is not only the talkscriber model. @Shubham Bajaj
s
@WadeSong0121 can you share the call id with gladia as well, I will reach out to both teams. But for sure it's from transcriber side.
w
this is @User
And I have tried another model. There seems the same . The gladia - fast model. The config is "backgroundSound": "off", @Shubham Bajaj "backchannelingEnabled": false, "backgroundDenoisingEnabled": false, "fillerInjectionEnabled": false 1. callid:81f0361a-cca7-4e13-85e3-74ed489f5a29 2. time:2024-12-02T09:56:32 3. There shouldn't be any words because I didn't say anything.
Hi , anything update about it? @Shubham Bajaj
Hi I have the problem and not resolved yet. Appriciate for your help. @henry [Vapi Team] @AiTitus :)
s
w
thanks for your help. And when will it be solved
s
@WadeSong0121 just got to know their models tend to halcuinate with background nosie do you mind using deepgram or custom-transcriber instead for best results!!
w
Hi, thanks for your response. My user often encounter this problem. They want to listen to the response and suddenly it stops talking and just recognizes something like thank you or something like that. It's very strange. Basically all the background sounds that are recognized are recognized as "thank you". And I haven't seen any other users in the community experiencing this issue, so I'm not sure if there's something wrong with some of my configurations that's causing this. Is there a more recommended official parameter configuration? Or is there model has something like volume shreshold or some paramters to avoid such bad case?
s
1. enable backgroundDenoisingEnabled 2. use deepgram if possible
w
Thx , I will have a try
4 Views