Listen URL audio distortion Vapi AI #support

Listen URL audio distortion

VapiLLM

03/21/2025, 7:51 PM

Hello, I am able to listen in to a live call or to play back an audio using this code from the vapi documentation. However, when I play back the audio, it sounds like some sort of audio distortion or filter. : const WebSocket = require('ws'); const fs = require('fs'); let pcmBuffer = Buffer.alloc(0); const ws = new WebSocket(""); ws.on('open', () => console.log('WebSocket connection established')); ws.on('message', (data, isBinary) => { if (isBinary) { pcmBuffer = Buffer.concat([pcmBuffer, data]); console.log(

Received PCM data, buffer size: ${pcmBuffer.length}

); } else { console.log('Received message:', JSON.parse(data.toString())); } }); ws.on('close', () => { if (pcmBuffer.length > 0) { fs.writeFileSync('audio.pcm', pcmBuffer); console.log('Audio data saved to audio.pcm'); } }); ws.on('error', (error) => console.error('WebSocket error:', error));

SlaviSavanovic

03/21/2025, 10:47 PM

good luck with this my friend, I fought getting the audio to play back over the browser correctly for over 100 hours and had no luck. It would be amazing if vapi had documentation on this

Kyle

03/24/2025, 2:55 PM

Looking into it.

Shubham Bajaj

03/24/2025, 10:18 PM

@VapiLLM You're collecting audio data from Vapi's WebSocket connection and saving it as a raw PCM file. When you try to play this file, it sounds distorted because most audio players can't properly interpret raw PCM data without metadata about its format (sample rate, bit depth, channels). ## Overview of the Solution We need to either: 1. Convert your raw PCM file to a standard audio format like WAV that includes proper headers 2. Use a specialized tool that can play raw PCM with the correct parameters ## Detailed Solution ### Option 1: Convert your PCM file to WAV using FFmpeg FFmpeg can convert your raw PCM file to WAV with the correct parameters:

Copy code

bash
ffmpeg -f s16le -ar 16000 -ac 1 -i audio.pcm output.wav

This command specifies: - `-f s16le`: The format is signed 16-bit little-endian PCM - `-ar 16000`: The sample rate is 16kHz (based on Vapi's common settings) - `-ac 1`: One audio channel (mono) ### Option 2: Modify your code to save as WAV directly Here's an updated version of your code that will save the audio as a WAV file directly:

Copy code

javascript
const WebSocket = require('ws');
const fs = require('fs');

let pcmBuffer = Buffer.alloc(0);
const ws = new WebSocket("");

ws.on('open', () => console.log('WebSocket connection established'));

ws.on('message', (data, isBinary) => {
  if (isBinary) {
    pcmBuffer = Buffer.concat([pcmBuffer, data]);
    console.log(`Received PCM data, buffer size: ${pcmBuffer.length}`);
  } else {
    console.log('Received message:', JSON.parse(data.toString()));
  }
});

ws.on('close', () => {
  if (pcmBuffer.length > 0) {
    // Define WAV parameters based on Vapi's PCM format
    const sampleRate = 16000;  // 16kHz is common for Vapi
    const numChannels = 1;     // Usually mono
    const bitsPerSample = 16;  // 16-bit PCM

  
    const header = Buffer.alloc(44);
    
    // RIFF chunk descriptor
    header.write('RIFF', 0);
    header.writeUInt32LE(36 + pcmBuffer.length, 4); 
    header.write('WAVE', 8);
    

    header.write('fmt ', 12);
    header.writeUInt32LE(16, 16); 
    header.writeUInt16LE(1, 20);  
    header.writeUInt16LE(numChannels, 22); // NumChannels
    header.writeUInt32LE(sampleRate, 24); 

    header.writeUInt32LE(sampleRate * numChannels * bitsPerSample/8, 28);
    
    header.writeUInt16LE(numChannels * bitsPerSample/8, 32);
    
    header.writeUInt16LE(bitsPerSample, 34); 
    header.write('data', 36);
    header.writeUInt32LE(pcmBuffer.length, 40); 
    const wavBuffer = Buffer.concat([header, pcmBuffer]);
    
    fs.writeFileSync('audio.wav', wavBuffer);
    console.log('Audio data saved to audio.wav');
  }
});

ws.on('error', (error) => console.error('WebSocket error:', error));

This modified code adds a standard WAV header to your PCM data, making it playable in any audio player.

Shubham Bajaj

03/24/2025, 10:18 PM

@VapiLLM The most reliable solution is Option 2 modifying your code to save as WAV directly as this creates a file that's playable in any standard audio player without requiring additional conversion steps.

VapiLLM

03/26/2025, 11:26 PM

hello, are you sure the sampling frequency is 16000

VapiLLM

03/26/2025, 11:26 PM

what is the buffer size

Kyle

03/29/2025, 11:44 AM

My belief is that the sampling frequency should is 16KHz. Sample rate (16000 Hz), bit depth (16-bit), format (linear16/PCM) we're continuously appending incoming binary WebSocket data to a buffer. The actual size of each chunk received depends on: - WebSocket configuration - Network conditions - How Vapi is sending data

VapiLLM

03/29/2025, 12:31 PM

Hi Shubham, when I try 16k, it sounds very slow, here is my python code to stream it audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, output=True) def save_audio_window(): """Save and analyze each 10-second window of audio.""" global buffer_list, audio_df if len(buffer_list) >= window_size: audio_array = np.concatenate(buffer_list[:window_size]) # Take the first 10 seconds timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") # Convert to int16 format (PCM) audio_int16 = np.array(audio_array, dtype=np.int16) # Save to WAV file filename = f"audio_{timestamp}.wav" write(filename, RATE, audio_int16) print(f"Saved: {filename}") # Append to DataFrame audio_df = pd.concat([audio_df, pd.DataFrame({"timestamp": [timestamp], "audio_data": [filename]})], ignore_index=True) # Remove used data from buffer buffer_list = buffer_list[window_size:] def on_message(ws, message): """Handles incoming WebSocket messages.""" global buffer_list if isinstance(message, bytes): stream.write(message) buffer_list.append(np.frombuffer(message, dtype=np.int16)) print(f"Playing PCM audio, received {len(message)} bytes") if len(buffer_list) * CHUNK >= window_size: save_audio_window() else: print(f"Received message: {json.loads(message)}") def on_error(ws, error): print('WebSocket error: {error}') def on_open(ws): print("WebSocket connection established")# def on_close(ws, close_status_code, close_msg): print('closed') print("WebSocket closed") stream.stop_stream() stream.close() audio.terminate()

VapiLLM

03/29/2025, 12:32 PM

ws = websocket.WebSocketApp(listen_url, on_message=on_message, on_error=on_error, on_close=on_close) ws.on_open = on_open ws.run_forever()

VapiLLM

03/29/2025, 12:32 PM

This code streams well, but the audio sounds very distorted. With 16k its too slow, with 44k something sounds off. I dont know what to set the buffer size to

Shubham Bajaj

03/30/2025, 11:08 AM

- **Sample Rate**: 8kHz or 24kHz (not 16kHz) - **Format**: PCM signed 16-bit little-endian (S16LE) Here's how to fix your Python code:

Copy code

python
# Try these audio parameters
FORMAT = pyaudio.paInt16  # 16-bit PCM
CHANNELS = 1              # Mono
RATE = 24000             # Try 24kHz instead of 16kHz
CHUNK = 960              # For 24kHz, 40ms chunks (24000 * 0.04)

# Alternatively, try 8kHz
# RATE = 8000
# CHUNK = 320            # For 8kHz, 40ms chunks (8000 * 0.04)

In your

on_message

function, you might need to add some buffering to ensure smooth playback:

Copy code

python
def on_message(ws, message):
    """Handles incoming WebSocket messages."""
    global buffer_list

    if isinstance(message, bytes):  
        # Play the audio data directly
        stream.write(message)  
        
        # For analysis, store the data
        audio_data = np.frombuffer(message, dtype=np.int16)
        
        # Apply a small amount of buffering (1-2 frames) for smoother playback
        buffer_list.append(audio_data)
        
        print(f"Playing PCM audio, received {len(message)} bytes")
        
        # Save audio data when enough has accumulated
        if len(buffer_list) * CHUNK >= window_size:
            save_audio_window()
    else:  
        print(f"Received message: {json.loads(message)}")

## For Playing Back Saved Files If you want to play back the saved PCM files correctly, you need to use the same parameters when opening them:

Copy code

python
# To play back your saved PCM files
from scipy.io.wavfile import write
import numpy as np

# Convert PCM to WAV with correct parameters
pcm_data = np.fromfile('your_saved_file.pcm', dtype=np.int16)
write('converted_file.wav', 24000, pcm_data)  # Try 24kHz

## Recommendations 1. **Try different sample rates**: Start with 24kHz, then try 8kHz if that doesn't work. 2. **Check chunk size**: Make sure your chunk size matches the expected frame size for your sample rate. 3. **Add buffering**: A small buffer (1-2 frames) can help smooth out playback issues. 4. **FFmpeg conversion**: If you still have issues, try converting with FFmpeg:

Copy code

ffmpeg -f s16le -ar 24000 -ac 1 -i audio.pcm output.wav

Please let me know if any of these adjustments help with the audio distortion, and I can further refine the solution.

Ryan Opfer

04/01/2025, 1:32 AM

Someone needs to fix the bugs around sample rates of the audio files. 16000 hz doesn't work when calling by telephone, but it does works via the browser. 8000 hz works by telephone (at least temporarily) but plays back in 2x speed (chipmunk) via the browser. And when I get it to work on both, it doesn't last very long. Here's my current Assistant ID I'm using: 133e21d5-96c3-4ae2-bcfc-cca419b3f630

Ryan Opfer

04/01/2025, 3:38 AM

And now it's broken again!! This just doesn't work at all. Switching back to text.

Kyle

04/01/2025, 10:06 PM

Thanks for sharing the feedback. I'll check with the team about this and let you know if once I get new development update.

aurelien-ldp

04/02/2025, 7:56 PM

hi there, got the same issue when using Twilio with a basic test from a bot on the dashboard. I hear the voice distorted over the phone. I redirect calls from Twilio to my VAPI sip address.

Kyle

04/06/2025, 12:38 PM

Hi, checking if this is resolved/solved for you?

VapiLLM

04/23/2025, 1:39 AM

Hi, I tried the following, and the audio was better though not great, however there is a lag now FORMAT = pyaudio.paInt16 # 16-bit PCM CHANNELS = 1 # Mono RATE = 24000 # Try 24kHz instead of 16kHz CHUNK = 960 # For 24kHz, 40ms chunks (24000 * 0.04)

Kyle

04/24/2025, 12:11 PM

can you provide more details about the lag? Ideally, you shouldn't experience it. If you can share more details, I can try to suggest better alternatives.

10 Views

Previous Next