Listen URL audio distortion
# support
v
Hello, I am able to listen in to a live call or to play back an audio using this code from the vapi documentation. However, when I play back the audio, it sounds like some sort of audio distortion or filter. : const WebSocket = require('ws'); const fs = require('fs'); let pcmBuffer = Buffer.alloc(0); const ws = new WebSocket(""); ws.on('open', () => console.log('WebSocket connection established')); ws.on('message', (data, isBinary) => { if (isBinary) { pcmBuffer = Buffer.concat([pcmBuffer, data]); console.log(
Received PCM data, buffer size: ${pcmBuffer.length}
); } else { console.log('Received message:', JSON.parse(data.toString())); } }); ws.on('close', () => { if (pcmBuffer.length > 0) { fs.writeFileSync('audio.pcm', pcmBuffer); console.log('Audio data saved to audio.pcm'); } }); ws.on('error', (error) => console.error('WebSocket error:', error));
s
good luck with this my friend, I fought getting the audio to play back over the browser correctly for over 100 hours and had no luck. It would be amazing if vapi had documentation on this
k
Looking into it.
s
@VapiLLM You're collecting audio data from Vapi's WebSocket connection and saving it as a raw PCM file. When you try to play this file, it sounds distorted because most audio players can't properly interpret raw PCM data without metadata about its format (sample rate, bit depth, channels). ## Overview of the Solution We need to either: 1. Convert your raw PCM file to a standard audio format like WAV that includes proper headers 2. Use a specialized tool that can play raw PCM with the correct parameters ## Detailed Solution ### Option 1: Convert your PCM file to WAV using FFmpeg FFmpeg can convert your raw PCM file to WAV with the correct parameters:
Copy code
bash
ffmpeg -f s16le -ar 16000 -ac 1 -i audio.pcm output.wav
This command specifies: - `-f s16le`: The format is signed 16-bit little-endian PCM - `-ar 16000`: The sample rate is 16kHz (based on Vapi's common settings) - `-ac 1`: One audio channel (mono) ### Option 2: Modify your code to save as WAV directly Here's an updated version of your code that will save the audio as a WAV file directly:
Copy code
javascript
const WebSocket = require('ws');
const fs = require('fs');

let pcmBuffer = Buffer.alloc(0);
const ws = new WebSocket("");

ws.on('open', () => console.log('WebSocket connection established'));

ws.on('message', (data, isBinary) => {
  if (isBinary) {
    pcmBuffer = Buffer.concat([pcmBuffer, data]);
    console.log(`Received PCM data, buffer size: ${pcmBuffer.length}`);
  } else {
    console.log('Received message:', JSON.parse(data.toString()));
  }
});

ws.on('close', () => {
  if (pcmBuffer.length > 0) {
    // Define WAV parameters based on Vapi's PCM format
    const sampleRate = 16000;  // 16kHz is common for Vapi
    const numChannels = 1;     // Usually mono
    const bitsPerSample = 16;  // 16-bit PCM

  
    const header = Buffer.alloc(44);
    
    // RIFF chunk descriptor
    header.write('RIFF', 0);
    header.writeUInt32LE(36 + pcmBuffer.length, 4); 
    header.write('WAVE', 8);
    

    header.write('fmt ', 12);
    header.writeUInt32LE(16, 16); 
    header.writeUInt16LE(1, 20);  
    header.writeUInt16LE(numChannels, 22); // NumChannels
    header.writeUInt32LE(sampleRate, 24); 

    header.writeUInt32LE(sampleRate * numChannels * bitsPerSample/8, 28);
    
    header.writeUInt16LE(numChannels * bitsPerSample/8, 32);
    
    header.writeUInt16LE(bitsPerSample, 34); 
    header.write('data', 36);
    header.writeUInt32LE(pcmBuffer.length, 40); 
    const wavBuffer = Buffer.concat([header, pcmBuffer]);
    
    fs.writeFileSync('audio.wav', wavBuffer);
    console.log('Audio data saved to audio.wav');
  }
});

ws.on('error', (error) => console.error('WebSocket error:', error));
This modified code adds a standard WAV header to your PCM data, making it playable in any audio player.
@VapiLLM The most reliable solution is Option 2 modifying your code to save as WAV directly as this creates a file that's playable in any standard audio player without requiring additional conversion steps.
v
hello, are you sure the sampling frequency is 16000
what is the buffer size
k
My belief is that the sampling frequency should is 16KHz. Sample rate (16000 Hz), bit depth (16-bit), format (linear16/PCM) we're continuously appending incoming binary WebSocket data to a buffer. The actual size of each chunk received depends on: - WebSocket configuration - Network conditions - How Vapi is sending data
v
Hi Shubham, when I try 16k, it sounds very slow, here is my python code to stream it audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, output=True) def save_audio_window(): """Save and analyze each 10-second window of audio.""" global buffer_list, audio_df if len(buffer_list) >= window_size: audio_array = np.concatenate(buffer_list[:window_size]) # Take the first 10 seconds timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") # Convert to int16 format (PCM) audio_int16 = np.array(audio_array, dtype=np.int16) # Save to WAV file filename = f"audio_{timestamp}.wav" write(filename, RATE, audio_int16) print(f"Saved: {filename}") # Append to DataFrame audio_df = pd.concat([audio_df, pd.DataFrame({"timestamp": [timestamp], "audio_data": [filename]})], ignore_index=True) # Remove used data from buffer buffer_list = buffer_list[window_size:] def on_message(ws, message): """Handles incoming WebSocket messages.""" global buffer_list if isinstance(message, bytes): stream.write(message) buffer_list.append(np.frombuffer(message, dtype=np.int16)) print(f"Playing PCM audio, received {len(message)} bytes") if len(buffer_list) * CHUNK >= window_size: save_audio_window() else: print(f"Received message: {json.loads(message)}") def on_error(ws, error): print('WebSocket error: {error}') def on_open(ws): print("WebSocket connection established")# def on_close(ws, close_status_code, close_msg): print('closed') print("WebSocket closed") stream.stop_stream() stream.close() audio.terminate()
ws = websocket.WebSocketApp(listen_url, on_message=on_message, on_error=on_error, on_close=on_close) ws.on_open = on_open ws.run_forever()
This code streams well, but the audio sounds very distorted. With 16k its too slow, with 44k something sounds off. I dont know what to set the buffer size to
s
- **Sample Rate**: 8kHz or 24kHz (not 16kHz) - **Format**: PCM signed 16-bit little-endian (S16LE) Here's how to fix your Python code:
Copy code
python
# Try these audio parameters
FORMAT = pyaudio.paInt16  # 16-bit PCM
CHANNELS = 1              # Mono
RATE = 24000             # Try 24kHz instead of 16kHz
CHUNK = 960              # For 24kHz, 40ms chunks (24000 * 0.04)

# Alternatively, try 8kHz
# RATE = 8000
# CHUNK = 320            # For 8kHz, 40ms chunks (8000 * 0.04)
In your
on_message
function, you might need to add some buffering to ensure smooth playback:
Copy code
python
def on_message(ws, message):
    """Handles incoming WebSocket messages."""
    global buffer_list

    if isinstance(message, bytes):  
        # Play the audio data directly
        stream.write(message)  
        
        # For analysis, store the data
        audio_data = np.frombuffer(message, dtype=np.int16)
        
        # Apply a small amount of buffering (1-2 frames) for smoother playback
        buffer_list.append(audio_data)
        
        print(f"Playing PCM audio, received {len(message)} bytes")
        
        # Save audio data when enough has accumulated
        if len(buffer_list) * CHUNK >= window_size:
            save_audio_window()
    else:  
        print(f"Received message: {json.loads(message)}")
## For Playing Back Saved Files If you want to play back the saved PCM files correctly, you need to use the same parameters when opening them:
Copy code
python
# To play back your saved PCM files
from scipy.io.wavfile import write
import numpy as np

# Convert PCM to WAV with correct parameters
pcm_data = np.fromfile('your_saved_file.pcm', dtype=np.int16)
write('converted_file.wav', 24000, pcm_data)  # Try 24kHz
## Recommendations 1. **Try different sample rates**: Start with 24kHz, then try 8kHz if that doesn't work. 2. **Check chunk size**: Make sure your chunk size matches the expected frame size for your sample rate. 3. **Add buffering**: A small buffer (1-2 frames) can help smooth out playback issues. 4. **FFmpeg conversion**: If you still have issues, try converting with FFmpeg:
Copy code
ffmpeg -f s16le -ar 24000 -ac 1 -i audio.pcm output.wav
Please let me know if any of these adjustments help with the audio distortion, and I can further refine the solution.
r
Someone needs to fix the bugs around sample rates of the audio files. 16000 hz doesn't work when calling by telephone, but it does works via the browser. 8000 hz works by telephone (at least temporarily) but plays back in 2x speed (chipmunk) via the browser. And when I get it to work on both, it doesn't last very long. Here's my current Assistant ID I'm using: 133e21d5-96c3-4ae2-bcfc-cca419b3f630
And now it's broken again!! This just doesn't work at all. Switching back to text.
k
Thanks for sharing the feedback. I'll check with the team about this and let you know if once I get new development update.
a
hi there, got the same issue when using Twilio with a basic test from a bot on the dashboard. I hear the voice distorted over the phone. I redirect calls from Twilio to my VAPI sip address.
k
Hi, checking if this is resolved/solved for you?
v
Hi, I tried the following, and the audio was better though not great, however there is a lag now FORMAT = pyaudio.paInt16 # 16-bit PCM CHANNELS = 1 # Mono RATE = 24000 # Try 24kHz instead of 16kHz CHUNK = 960 # For 24kHz, 40ms chunks (24000 * 0.04)
k
can you provide more details about the lag? Ideally, you shouldn't experience it. If you can share more details, I can try to suggest better alternatives.
5 Views