Custom TTS with Vapi
# support
k
Hi, Can anyone help me with - how do i connect custom TTS model with Vapi?
s
@kalishukla Vapi allows you to integrate your own Text-to-Speech (TTS) model through the
custom-voice
provider option. This guide explains how to configure your custom TTS solution with Vapi using our API, and details the exact request/response formats your server must handle. ## Configuration Process ### 1. Set Up Your Custom TTS Endpoint First, create a server endpoint that can: - Receive POST requests with text content - Process this text through your TTS model - Return audio as a raw PCM stream ### 2. Configure Through the Vapi API Custom TTS must be configured through our API (not available in the dashboard):
Copy code
javascript
// API request to update an assistant with custom TTS
const response = await fetch('https://api.vapi.ai/api/v1/assistants/{assistantId}', {
  method: 'PATCH',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    voice: {
      provider: 'custom-voice',
      server: {
        url: 'https://your-tts-server.com/endpoint',
        timeoutSeconds: 30, // Optional, defaults to 20 seconds
        secret: 'your-secret-key' // Optional but recommended for security
      },
      // Optional chunking configuration
      chunkPlan: {
        enabled: true,
        formatPlan: {
          enabled: true
        }
      }
    }
  })
});
## Request & Response Specifications ### Incoming Request From Vapi Your TTS endpoint will receive requests from Vapi in this format:
Copy code
POST https://your-tts-server.com/endpoint
Content-Type: application/json
X-Vapi-Secret: your-secret-key (if configured)

{
  "message": {
    "type": "voice-request",
    "text": "The text that needs to be converted to speech",
    "sampleRate": 24000
  }
}
### Expected Response From Your Server Your server MUST respond with a stream of raw audio data with these specifications: Headers:
Copy code
Content-Type: audio/raw
Transfer-Encoding: chunked
Body Format: - Raw PCM audio data (no WAV headers or container formats) - Single-channel (mono) - 16-bit signed integer PCM encoding - Little-endian byte order - Sample rate matching the requested rate (typically 24000Hz) Important: The response is NOT JSON - it must be raw binary audio data streamed directly in the response body. ### Error Handling If your server encounters an error, return a standard HTTP error response:
Copy code
HTTP/1.1 500 Internal Server Error
Content-Type: application/json

{
  "error": "Error description"
}
## Testing Your Integration 1. After configuration, make a test call using your assistant 2. Monitor your server logs to ensure it's receiving requests correctly 3. Verify the audio is streaming back to Vapi successfully 4. Test error scenarios to ensure graceful handling ## Common Issues and Solutions 1. **Audio Format Issues**: Ensure you're not including WAV headers or other metadata in your audio stream. 2. **Streaming Problems**: Make sure you're returning a proper stream and not waiting to generate the entire audio before responding. 3. **Latency Concerns**: If there's high latency, optimize your TTS service for faster initial response time. 4. **Authentication Errors**: Verify that your server is correctly checking the
X-Vapi-Secret
header if you've configured a secret.
2 Views