Realtime API, changing the voice has no effect
# support
a
Hi Vapi support, Whether you change the voice in the vapi dashboard and test a webcall there or create an assistant using the api, the
VoiceId
parameter has no effect, it's always the same female voice. For example if you set it to "echo" (which is supposed to be a male voice), you would still get the same female voice. Several other people on discord have noticed it to. Note that it applies to the realtime openai model:
"model": "gpt-4o-realtime-preview-2024-10-01"
Other strange things: on the vapi website, in the voice dropdown menu, vapi offers voices that aren't supported by openAI realtime API, such as onyx.
v
@AdeB Hello! Thank you for bringing this to our attention. It seems like there might be an issue with the
VoiceId
parameter not updating the voice as expected when using the
"model": "gpt-4o-realtime-preview-2024-10-01"
. To address this, I would recommend the following steps: 1. Ensure that you are selecting the correct voice provider and syncing your chosen voice properly. Since you mentioned Play.ht, make sure you've synced with Play.ht by following the instructions provided in the VAPI documentation. 2. Double-check the
VoiceId
you are using to ensure it corresponds to the correct voice from the provider you've synced with. 3. If the issue persists after these checks, it could be a bug that needs to be reported to the technical team. Regarding the voices that are not supported by the OpenAI realtime API, it's possible that the VAPI voice dropdown menu includes voices from various providers, and not all of them are compatible with the OpenAI realtime model. It's important to select a voice that is confirmed to be supported by the model you are using. If you've already tried the above steps and the issue is still unresolved, please let us know, and we can escalate this to the technical team for further investigation. If there's anything else you need assistance with, feel free to ask!
You can continue this conversation by mentioning me in the message.
b
hi @AdeB , is this issue still occurring for you? i believe a fix was released a few hours ago. as for the voices, you are right that they aren't supported, and we will be adding a warning on our dashboard for that shortly
a
Thank you bryant, I confirm that it's now working 🙂 As you may already know, new voices have been introduced recently https://community.openai.com/t/new-realtime-api-voices-and-cache-pricing/998238
b
@AdeB Hey Alex, all these should be available too!
a
Thank you @bryant but I just tried the new voice "ash" in the API and it isn't working.
(alloy, shimmer, echo work well)
b
hey @AdeB , i was able to get it working through the API,
due to limitations, certain features such as first messages won't work with realtime only voices (for now)
a
Thank you @bryant , I confirm it works if you don't set a first message. If you set one, it doesn't! Just out of curiosity, why do you still require a STT model with the realtime API? Is it legacy to allow other features to work, like the transcript? (I assume the transcript is from the STT model, not the realtime API)
b
Hey @AdeB , you're right that it's from our STT model. We use the STT to help with things such as interruptions, and at the moment OpenAI's STT with realtime is slower
a
Makes sense, thank you for your fast answers 🙂
b
No worries @AdeB , feel free to play around on https://staging-dashboard.vapi.ai with the new fixes
a
Thank you! (I just tried to create an assistant on the staging dashboard but it wasn't working.) > We use the STT to help with things such as interruptions Does that mean you aren't using the openAI realtime's VAD (see screenshot)? - If you don't, are the vapi parameters like
wait seconds
have an effect when you choose the realtime model? - If you use openAI VAD, do you expose the parameters of the screenshot in your API? Default params of the realtime API tend to interrupt a lot the user, so it would be great to be able to adjust those VAD parameters. Thank you! https://cdn.discordapp.com/attachments/1306984728905256980/1307781807755890769/image.png?ex=673b8e38&is=673a3cb8&hm=1005aef8d065d6b6973b4b1c1dc04c23c5f69511b9243e2a7eb7cbd240cb9348&
More specifically, the realtime openAI parameters are:
Copy code
"turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 500
        },
A great addition (and a way to differentiate from competition) would be to allow the user to limit the history context of the realtime api. At the moment, the cost per minute increases as the conversations lasts longer – which is a big issue with the realtime api.
b
Hey @AdeB , we don't use openai's VAD. RE: the staging dashboard, it's not as stable as production as we push changes daily to it
through my own testing, I've noticed that it sometimes responds to itself while using OpenAI's VAD, although providing the option for users to use their VAD is possibly something we could explore in the future
a
makes sense, thank you!!
I will continue to report issues if I find any.
Hi @bryant! Any chance you have a date to add support for first message to the new voices? The new voices are a lot more natural and truly show the power of the realtime API. Thank you!
b
@AdeB unfortunately its dependent on OpenAI's release dates, but rest assured we will support them as soon as we can
6 Views