What is optimal voice configuration to sound most ...
# support
m
What is optimal voice configuration to sound most human? I wish to get the voice quality you've shown in your Vapi youtube demos but everything I've tried so far sounds horrible - bad speech patterns, bad speech timing, too much computer and not enough human.
v
To achieve a more human-like voice quality in Vapi, you can optimize several configuration settings. Here’s a detailed guide to help you get started: ### 1. Chunk Plan Configuration - **Enable Chunk Plan**: This helps with proper sentence boundary detection and improves speech flow.
Copy code
json
  {
    "voice": {
      "chunkPlan": {
        "enabled": true,
        "minCharacters": 30,
        "punctuationBoundaries": [".", "!", "?", ";"]
      }
    }
  }
### 2. Voice Model and Provider Settings - **Adjust Stability and Similarity**: Fine-tune these settings to enhance voice quality.
Copy code
json
  {
    "stability": 0.5,
    "similarityBoost": 0.75
  }
### 3. Speech Rate and Volume Adjustments - **Control Speech Rate**: Look for parameters like
rate
,
speed
, or
tempo
to adjust the speed of speech. - **Volume Adjustment**: Lowering the volume can help mask artificial sounds and make the voice feel more natural. ### 4. Adjusting Speech Patterns and Timing - **Interrupt Settings**: Set
numWordsToInterruptAssistant
to around 3-5 words to allow for a more natural conversation pace. - **Max Duration**: Set
maxDurationSeconds
to 5-10 seconds to prevent hasty interactions. ### Example Configuration Here’s a sample JSON configuration that incorporates these settings:
Copy code
json
{
  "numWordsToInterruptAssistant": 4,
  "maxDurationSeconds": 8,
  "voice": {
    "provider": "your-provider",
    "settings": {
      "rate": 0.85,
      "volume": 0.8
    }
  }
}
### Additional Features - **Background Noise Filtering**: This can improve clarity and reduce distractions. - **Emotion Detection**: Helps align the tone of the voice with the emotional context of the conversation. - **Filler Word Injection**: Adding natural fillers like "umm" or "ahh" can make speech sound more human-like.
By carefully adjusting these settings and testing them, you should be able to achieve a voice quality closer to what you see in our Vapi demos. Let me know if you need further assistance! 😊 Sources: - [Recommended Voice Configuration Settings in Vapi](https://discord.com/channels/1211482211119796234/1211483291191083018/1343543475383046185) - [Adjusting Speech Patterns and Timing in Vapi](https://discord.com/channels/1211482211119796234/1211483291191083018/1334027136305991750) - [Features to Reduce Computer-like Sound in Voice Outputs](https://docs.vapi.ai/how-vapi-works)
Message marked as helpful by @MrFreePress! πŸŽ‰
Message marked as helpful by @MrFreePress! πŸŽ‰
m
This is great @Vapi Support Bot thank you! Of all the voice model and provider choices which ones do you deem highest quality, most responsive, and fit in with your previous advice?
v
I'm glad you found the information helpful! 😊 For the highest quality and most responsive voice models and providers in Vapi, here are my top recommendations based on the previous advice: ### Recommended Voice Models and Providers: 1. **ElevenLabs**: Known for its advanced emotional tones and high-quality voice synthesis, making it a great choice for natural-sounding speech. 2. **PlayHT**: Offers a wide array of expressive voices, which can enhance the overall quality of your voice outputs. 3. **Azure**: With support for approximately 400 voices across 140 languages, it's particularly versatile for multilingual applications. 4. **Rime-ai**: Provides dynamic and customizable voice options, allowing for tailored voice experiences. 5. **OpenAI**: Known for sophisticated models that can power text-to-speech, ensuring high-quality outputs. ### Key Considerations: - **Quality and Responsiveness**: Each provider has its strengths, so it may be beneficial to experiment with a few to see which one best fits your specific needs. - **Customization**: Vapi allows you to mix and match providers, so you can optimize for both quality and responsiveness based on your application requirements. By selecting one of these providers and applying the optimal configurations we discussed earlier, you should be able to achieve the voice quality similar to what you see in the Vapi demos. If you have any more questions or need further assistance, feel free to ask! Sources: - [High-Quality and Responsive Voice Models in Vapi](https://docs.vapi.ai/providers/voice/playht) - [Voice Models Comparison in Vapi](https://docs.vapi.ai/introduction) - [Choosing a Voice Model or Provider in Vapi](https://docs.vapi.ai/customization/multilingual)
Message marked as helpful by @MrFreePress! πŸŽ‰
s
@MrFreePress checking if this is resolved for you?
m
Thank you. It's good information but I set Vapi aside a month ago after hitting a wall and I'll head back to test it out after I finish up a couple priorities this week. I appreciate the detail. I had tried several eleven labs voices earlier, some better than others, but nothing close to Vapi demo quality. Hopefully I can get there by putting all these new pieces together.
s
@MrFreePress let me once your back.
h
message 1 voice.property settings should not exist
v
I notice you mentioned "voice.property settings should not exist" - could you please clarify what you mean by this? Are you referring to: - A specific issue you're encountering with voice property settings in Vapi? - A belief that certain voice configuration properties shouldn't be used? - An error message you're receiving related to voice properties? This will help me better understand your concern and provide more relevant assistance with your Vapi voice configuration.
h
its an error message to the settings you posted , also told me maxDurationSeconds must not be less than 10 { "numWordsToInterruptAssistant": 4, "maxDurationSeconds": 8, "voice": { "provider": "your-provider", "settings": { "rate": 0.85, "volume": 0.8 } } }
my agent is also sounding robotic a lot, sometimes words are garbled, weird pauses after words in the middle of a sentence (hopefully chunk plan fix this)
used 10 for max duration but it wont accept settings
Copy code
{
  "numWordsToInterruptAssistant": 4,
  "maxDurationSeconds": 8,
  "voice": {
    "provider": "your-provider",
    "settings": {
      "rate": 0.85,
      "volume": 0.8
    }
  }
}
is this under stop speaking plan?
s
Hello Ham, could you please create your own ticket and we can definitely help you out with improving your setup!
49 Views