Custom TTS with Vapi Vapi AI #support

Join Discord

Custom TTS with Vapi

# support

kalishukla

03/20/2025, 10:49 AM

Hi, Can anyone help me with - how do i connect custom TTS model with Vapi?

Shubham Bajaj

03/22/2025, 6:28 AM

@kalishukla Vapi allows you to integrate your own Text-to-Speech (TTS) model through the

custom-voice

provider option. This guide explains how to configure your custom TTS solution with Vapi using our API, and details the exact request/response formats your server must handle. ## Configuration Process ### 1. Set Up Your Custom TTS Endpoint First, create a server endpoint that can: - Receive POST requests with text content - Process this text through your TTS model - Return audio as a raw PCM stream ### 2. Configure Through the Vapi API Custom TTS must be configured through our API (not available in the dashboard):

Copy code

javascript
// API request to update an assistant with custom TTS
const response = await fetch('https://api.vapi.ai/api/v1/assistants/{assistantId}', {
  method: 'PATCH',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    voice: {
      provider: 'custom-voice',
      server: {
        url: 'https://your-tts-server.com/endpoint',
        timeoutSeconds: 30, // Optional, defaults to 20 seconds
        secret: 'your-secret-key' // Optional but recommended for security
      },
      // Optional chunking configuration
      chunkPlan: {
        enabled: true,
        formatPlan: {
          enabled: true
        }
      }
    }
  })
});

## Request & Response Specifications ### Incoming Request From Vapi Your TTS endpoint will receive requests from Vapi in this format:

Copy code

POST https://your-tts-server.com/endpoint
Content-Type: application/json
X-Vapi-Secret: your-secret-key (if configured)

{
  "message": {
    "type": "voice-request",
    "text": "The text that needs to be converted to speech",
    "sampleRate": 24000
  }
}

### Expected Response From Your Server Your server MUST respond with a stream of raw audio data with these specifications: Headers:

Copy code

Content-Type: audio/raw
Transfer-Encoding: chunked

Body Format: - Raw PCM audio data (no WAV headers or container formats) - Single-channel (mono) - 16-bit signed integer PCM encoding - Little-endian byte order - Sample rate matching the requested rate (typically 24000Hz) Important: The response is NOT JSON - it must be raw binary audio data streamed directly in the response body. ### Error Handling If your server encounters an error, return a standard HTTP error response:

Copy code

HTTP/1.1 500 Internal Server Error
Content-Type: application/json

{
  "error": "Error description"
}

## Testing Your Integration 1. After configuration, make a test call using your assistant 2. Monitor your server logs to ensure it's receiving requests correctly 3. Verify the audio is streaming back to Vapi successfully 4. Test error scenarios to ensure graceful handling ## Common Issues and Solutions 1. **Audio Format Issues**: Ensure you're not including WAV headers or other metadata in your audio stream. 2. **Streaming Problems**: Make sure you're returning a proper stream and not waiting to generate the entire audio before responding. 3. **Latency Concerns**: If there's high latency, optimize your TTS service for faster initial response time. 4. **Authentication Errors**: Verify that your server is correctly checking the

X-Vapi-Secret

header if you've configured a secret.

2 Views

Previous Next