VAPI input audio problems
# support
s
Look, I did an experiment. Since my call goes through the WEB version and I have huge problems with how the AI ​​hears the client - I decided to check, because I do not believe that such problems can exist. I took a service like zoom, in the web version, and made a connection so that I could hear in the browser what the client says. And believe it or not, I hear everything absolutely perfectly, but when I listen to the recording of the conversation through VAPI - it is terrible. d7a35618-d008-46ef-bb30-0ba6660cdf01 - call ID for example. All the words that the client said can be made out 100% - the AI ​​just does not understand them because the sound that your service receives is distorted. I experimented with many other services and they transmit sound amazingly. What can I do? How can I fix this problem? Listen to how the client is heard, given that I have a recording of the conversation of how the client is heard through another service - AMAZING, if this problem is solved then there will be no problems with recognition.
v
message has been deleted
@San Sanych I'm sorry to hear that you're experiencing audio quality issues with our service. It's great that you've taken the initiative to conduct an experiment to compare the audio quality. To address this, we can look into the specific call ID you've provided and investigate any potential issues with the audio processing or transmission within VAPI. To proceed with this, I'll need to gather some additional information. Could you please provide the following details? 1. The date and time of the call associated with the provided call ID. 2. The specific issues you noticed with the audio quality (e.g., is it too quiet, distorted, cutting out, etc.). 3. Any specific configurations or settings you are using with VAPI that might affect the audio quality. Once I have this information, I can look into the matter more closely and work towards a resolution. If you have any other details that you think might be helpful, please feel free to share them as well.
You can continue this conversation by mentioning me in the message.
s
@Vapi i cant ask for help, i got "this interaction failed"
v
@San Sanych I'm sorry to hear that you're encountering difficulties. It seems there might be an issue with the system you're using to request help. While I can't directly fix the interaction failure, I can certainly guide you on what steps to take next. Firstly, could you please try the following: 1. Refresh your browser or restart the application you're using to ensure it's not a temporary glitch. 2. Check your internet connection to make sure it's stable and reliable, as connectivity issues can sometimes cause these problems. 3. If possible, try using a different device or browser to see if the issue persists. If after these steps you're still facing the "this interaction failed" message, it would be best to reach out to the technical support team of the platform you're using. They should be able to provide more specific assistance and resolve the issue for you. In the meantime, if there's anything else I can assist you with regarding VAPI or any other concerns, please let me know!
s
@Vapi
v
@San Sanych Please ask support for more info on this.
s
@User @User
@Shubham Bajaj
@Vapi Can i manually increase volume input from client side with VAPI WEB version? So i want to volume up client speech.
v
@San Sanych Yes, you can adjust the input volume of the client's speech in the VAPI WEB version, but this would typically be done before the audio stream is sent to VAPI. You would need to manipulate the audio stream in the browser using JavaScript's Web Audio API. Here's a basic example of how you might increase the volume:
Copy code
javascript
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
  const audioContext = new AudioContext();
  const source = audioContext.createMediaStreamSource(stream);
  const gainNode = audioContext.createGain();
  gainNode.gain.value = 1.5; // Increase the volume by 50%
  source.connect(gainNode);
  gainNode.connect(audioContext.destination);
})
.catch(err => {
  console.log('Error accessing audio stream:', err);
});
This code requests access to the user's microphone and applies a gain to the audio stream, which increases the volume. The
gainNode.gain.value
can be adjusted to set the desired volume level. Please note that manipulating the audio stream can potentially introduce distortion if the gain is set too high, so it's important to test and choose an appropriate level that enhances the audio without degrading quality. If you're experiencing issues with the audio volume in VAPI, it might be worth exploring the settings and configurations on the client's side to ensure their microphone levels are set correctly before the audio stream is sent to VAPI.
s
@Vapi tried this, its not working still from vapi recorder call i hear like 5% of volume that user speak. I have recorded every call with external app and i hear all voices and volumes, its very different from sound that vapi hear.... this is strange and doing working proccess very hard, because AI dont understand like 50% of words.
v
@San Sanych I understand your concern, and it's certainly frustrating when the audio quality isn't meeting expectations, especially when it impacts the AI's ability to understand speech. Since you've already tried adjusting the volume on the client side without success, and there's a significant difference between the audio recorded by VAPI and an external app, it suggests there might be an issue with how the audio is being processed or transmitted to VAPI. Here's what we can do to troubleshoot and resolve this issue: 1. Check Microphone Settings: Ensure that the client's microphone settings are correctly configured in their operating system and browser, with no automatic volume adjustments that could be reducing the input level. 2. Test with Different Devices: If possible, test with different microphones and devices to rule out hardware-related issues. 3. Network Quality: Poor network quality can affect the audio stream. Verify that both the client and the server have a stable and fast internet connection. 4. Contact Support: Since the issue seems to be specific to the audio processing within VAPI, it would be best to contact VAPI support directly with the details of your issue, including the call ID and the discrepancies you've noticed between VAPI's recordings and external recordings. 5. Provide Feedback to VAPI: If there's a consistent issue with the audio quality, providing feedback to the VAPI development team with examples and recordings can help them identify and fix the problem. 6. Alternative Solutions: As a temporary workaround, consider using the external app's recordings to process the audio through VAPI manually, if that's a possibility. I'm here to help you through this process, so if you need further assistance or have more information to provide, please let me know.
s
@Vapi Speaker Boost for assistant for what? this is boosts sound from client output? so it will volume up voice of client or what?
v
@San Sanych The term "Speaker Boost" typically refers to increasing the output volume of the speaker, not the input volume of a microphone. If you're referring to a feature within VAPI or another system, a "Speaker Boost" would generally be used to amplify the sound that is being played out of the speaker, which could be the assistant's voice or any other audio output. However, if you're looking to boost the volume of the client's voice (the input), that would be a different feature, often referred to as "Microphone Boost" or "Input Gain." This would increase the volume of the client's speech before it's processed by the assistant, potentially making it clearer and easier for the AI to understand. If you're experiencing issues with the AI not understanding the client due to low volume, you would need to look for ways to increase the input gain or microphone boost, not the speaker boost. Unfortunately, VAPI itself does not provide a direct feature to boost the input volume; this would need to be done at the system or browser level, or through preprocessing of the audio stream before it reaches VAPI. If you're looking for a way to increase the volume of the client's voice within VAPI, please provide more context or clarify the feature you're referring to, and I'll do my best to assist you further.
s
Hey @San Sanych for these two words user voice was audible and they were transcribed immediately, as you have stated something is wrong I believe it's on user side because the same problem doesn't occurs for us and noise is coming from the user side which is audible in the call recording. I am open to hear the conversation recorded at same through different medium. User: Слава, User: -Вала
s
@Shubham Bajaj Hello I don't even know how to explain it to you correctly. I tested two audio recordings. 1) a recording of a conversation via the microsip program (which makes a call and conducts a conversation) 2) a recording on the VAPI server that I received after the call Result: the audio from the client's microphone to the VAPI server is super terrible, you can't hear anything, you can't make out anything. The audio that comes (real) from the client's microphone and goes through the PC is great and you can hear all the words clearly. I don't understand how the problem can only be with me. I tested different methods. I transmitted the sound from the client's microphone to a service like ZOOM and heard everything perfectly, in turn, the same sound transmitted to the VAPI service was heard and understood by 3 out of 10. I don't know what conclusions to draw from this, but this problem is very important to me, since I need the AI ​​to understand the client's speech, but it can't do this because the client's sound is simply terrible. I tried increasing the sound by 200 and 300% through different amplifiers, but VAPI still gets the same sound. I don't know how to fix this.
s
Hey @San Sanych thanks for explaining in dept and you workarounds, let me see what can I do for you.
s
Hey @Shubham Bajaj do you have any updates for me? call id : 966e3563-3ad5-4cff-b118-d0c51270d819 you can see that client 2 times said: 1) "Да" - yes 2) "На связи, что вы хотели" - on line, what do you want And AI didnt recognize or reacted on this speech, why? AI two times said idle message, but should recognize ANY speech, why he see that user said something and AI really recognize it like he needs to said idle message but he saw that user speech started. 12:15:06:109 [CHECKPOINT] Assistant speech started 12:15:08:618 [CHECKPOINT] Assistant speech stopped 12:15:10:286 [CHECKPOINT] User speech possibly starting 12:15:15:570 [LOG] Voice cached: Вы на связи??? 12:15:18:545 [CHECKPOINT] User speech possibly starting 12:15:18:606 [CHECKPOINT] User speech started 12:15:18:646 [CHECKPOINT] User speech possibly stopping 12:15:19:166 [CHECKPOINT] User speech stopped 12:15:23:487 [LOG] Voice cached: Алло, вы меня слышите???
df8cffec-5587-4837-8b6d-66111855a99b next call id why first speech from client "Алло" ai didnt recognize? it was so loud and accurate.. He should react somehow. I dont understand how i can use this ai even if it cant recognize typical phrases or sound. You should add some function (enable/disable) which will give ai opportunity to react in some way on any speech even if transcriber cant recognize. I dont understand why deepgram cant recognize "Алло!" when its very loud and accurate. Maybe it is some problem on vapi side? @Shubham Bajaj
s
Call ID: df8cffec-5587-4837-8b6d-66111855a99b
logs
🔵 12:07:41:915
user
Final Transcript : Алло!: 0.42260742 🔵 12:07:41:916 Ignoring Feedback Алло! 🔵 12:07:47:127 Idle Timeout Triggered. (Count#1) Message: Алло, вы на связи? 🔵 12:07:52:419
user
Partial Transcript : Для: 0.20037842 🔵 12:07:52:420 Transcribed user speech but not confident enough to interrupt, ignoring... (confidence: 0.20037842, numWordsToInterruptAssistant: 1, numWords: 1) 🔵 12:08:02:199 Language detected en with confidence 0.06594796 🔵 12:08:11:817
user
Partial Transcript : Извините,: 0.6435547 🔵 12:08:12:116 VAD Detected: Nope, still speaking... 🔵 12:08:13:156
user
Final Transcript : Имените кальций на цветной.: 0.51326495 🔵 12:08:13:261 Completion Request Starting (#1, provider: azure-openai, model: gpt-4o-2024-08-06, region: westus3, credential: false)... 🔵 12:08:13:628 [user LOG] Model called tool: endCall()
Hey @San Sanych it's clear from the logs, deepgram isn't confident of the transcibed user audio if its correct, and has retuned confidence less than 0.5. We at Vapi for anything for transcription confidence less than 0.5 simply ignore it by default.
What you can do to resolve this try playing with endpointg parameter. Your current config:
Copy code
json
    "transcriber": {
      "model": "nova-2",
      "language": "ru",
      "provider": "deepgram"
    }
set the endpointing to 300ms and try how it goes. https://cdn.discordapp.com/attachments/1306529820909502506/1307342073934123060/Screenshot_2024-11-16_at_7.20.33_PM.png?ex=6739f4af&is=6738a32f&hm=ea7a4162a533e3b6691dfab3fad41586d36c4ac96df0df7b10b62ca46bd8ba69&
If this also doesn't works:- 1. try using other endpointing values 2. change transcriber 3. use custom transcriber which supports streaming
@San Sanych do let me know how it goes.
s
Can I manually reject this ignoring and let transcriber return any messages that it transcribed? @Shubham Bajaj I understand that deepgram maybe the best one stt model but I need to configure it in way that I need. maybe you can make this parameter configurable so that I can specify what suits me and 0.1 value? it sees the words correctly, but because of your settings it does not perceive them
So how I can remove your ignoring below 0.5?
s
As of now, the number isn't configurable so you cannot change it.
You can try using custom transcriber or other transcriber, till then I get the update from the team.
s
do you have docs how i can implement custom transcriber?
s
You can integrate any transcriber, via the CustomTranscriber option. For more information, please visit https://api.vapi.ai/api and search for the CustomTranscriber Schema.
Rough example to handle the CustomTranscriber data. const WebSocket = require('ws'); const http = require('http'); const server = http.createServer(app); const wss = new WebSocket.Server({ server }); wss.on('connection', (ws) => { console.log('WebSocket client connected'); ws.on('message', (message, isBinary) => { if (isBinary) { console.log('Received binary message', message.length, 'bytes'); } else { console.log('Received text message:', JSON.parse(message)); } }); ws.on('close', () => { clearInterval(intervalId); console.log('WebSocket client disconnected'); }); // Send a transcription response every 5 seconds const intervalId = setInterval(() => { if (ws.readyState === WebSocket.OPEN) { const message = { type: 'transcriber-response', transcription: 'hey', channel: 'customer', }; ws.send(JSON.stringify(message)); console.log('Sent transcription response:', message); } }, 5000); });
@San Sanych if you require a proper documentation then create a support ticket and I will get it done.
s
as i understand and tested, deepgram is the best one, so i just need to have ability to manually change some settings for transcriber.
@Shubham Bajaj Do you have some updates from team? And is there some way to volume up the input stream from client? So any variants i tried - on vapi recordings i hear only one sound volume. So i need to do somehow that AI will hear client speech in better way (with increased volume), because my clients have not good micros and i need somehow to artificially increase this sound.
s
No I mean to say to use your own custom transcriber instead of deepgram, which works best for your language.
What I have got from team we cannot do much in this because we get audio from telephony provider , and here voice issue again is related to transcription so give sometime I will update your regarding confidence parameter works?
s
i dont have such, i thik deepgram good for my languages for now, but i just need to have ability to add some custom configuration for transcriber
14 Views