kalishukla
03/20/2025, 10:49 AMShubham Bajaj
03/22/2025, 6:28 AMcustom-voice provider option. This guide explains how to configure your custom TTS solution with Vapi using our API, and details the exact request/response formats your server must handle.
## Configuration Process
### 1. Set Up Your Custom TTS Endpoint
First, create a server endpoint that can:
- Receive POST requests with text content
- Process this text through your TTS model
- Return audio as a raw PCM stream
### 2. Configure Through the Vapi API
Custom TTS must be configured through our API (not available in the dashboard):
javascript
// API request to update an assistant with custom TTS
const response = await fetch('https://api.vapi.ai/api/v1/assistants/{assistantId}', {
method: 'PATCH',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
voice: {
provider: 'custom-voice',
server: {
url: 'https://your-tts-server.com/endpoint',
timeoutSeconds: 30, // Optional, defaults to 20 seconds
secret: 'your-secret-key' // Optional but recommended for security
},
// Optional chunking configuration
chunkPlan: {
enabled: true,
formatPlan: {
enabled: true
}
}
}
})
});
## Request & Response Specifications
### Incoming Request From Vapi
Your TTS endpoint will receive requests from Vapi in this format:
POST https://your-tts-server.com/endpoint
Content-Type: application/json
X-Vapi-Secret: your-secret-key (if configured)
{
"message": {
"type": "voice-request",
"text": "The text that needs to be converted to speech",
"sampleRate": 24000
}
}
### Expected Response From Your Server
Your server MUST respond with a stream of raw audio data with these specifications:
Headers:
Content-Type: audio/raw
Transfer-Encoding: chunked
Body Format:
- Raw PCM audio data (no WAV headers or container formats)
- Single-channel (mono)
- 16-bit signed integer PCM encoding
- Little-endian byte order
- Sample rate matching the requested rate (typically 24000Hz)
Important: The response is NOT JSON - it must be raw binary audio data streamed directly in the response body.
### Error Handling
If your server encounters an error, return a standard HTTP error response:
HTTP/1.1 500 Internal Server Error
Content-Type: application/json
{
"error": "Error description"
}
## Testing Your Integration
1. After configuration, make a test call using your assistant
2. Monitor your server logs to ensure it's receiving requests correctly
3. Verify the audio is streaming back to Vapi successfully
4. Test error scenarios to ensure graceful handling
## Common Issues and Solutions
1. **Audio Format Issues**: Ensure you're not including WAV headers or other metadata in your audio stream.
2. **Streaming Problems**: Make sure you're returning a proper stream and not waiting to generate the entire audio before responding.
3. **Latency Concerns**: If there's high latency, optimize your TTS service for faster initial response time.
4. **Authentication Errors**: Verify that your server is correctly checking the X-Vapi-Secret header if you've configured a secret.