Support of assistant-speaks-first-with-model-gener...
# support
i
I tried OpenAI Realtime in Vapi today and noticed that static greetings work surprisingly well ("firstMessageMode": "assistant-speaks-first"), but AI-generated ones don’t ("firstMessageMode": "assistant-speaks-first-with-model-generated-message"). Am I correct in assuming that you are aware of this issue and have an implementation plan to address it? Thank you.
v
@Ivan I'm glad to hear that the static greetings are working well for you! Regarding the AI-generated greetings, I appreciate you bringing this to our attention. While I don't have specific details about an implementation plan to address this issue, I can assure you that feedback like yours is crucial for continuous improvement. If you have any more insights or specific feedback on how the AI-generated greetings could be improved, please feel free to share, and I'll make sure it gets passed along to the development team. If you need immediate assistance or further information on this matter, please ask support for more info on this.
You can continue this conversation by mentioning me in the message.
Got it. Tagging @User for additional help.
b
Hi @Ivan thanks for bringing this to our attention. The realtime model is still in beta and a very different paradigm to our existing platform, so we are gradually incorporating it, and not all features may work perfectly at the moment. I will look into this
i
>The realtime model is ... a very different paradigm Hi @bryant ! Yes, I know it’s not easy. I just wanted to bring your attention to the feature we use, which is somewhat a blocker for switching to the Realtime API (beta). And if you’d like a bit more feedback, here are a couple of suggestions you probably already know about: 1. Handling tool responses with "endCallAfterSpokenEnabled": true and a text content would be great. We currently use this with conventional STT-LLM-TTS pipeline. 2. Adding cost optimization mechanisms for the Realtime API would be highly beneficial. For example, allowing a maximum number of recent lines/phrases in the dialog history to remain as audio tokens, while older lines are automatically converted to text tokens, which are much cheaper. According to OpenAI, it’s possible to use
conversation.item.delete
and then
conversation.item.create
as
input_text
to insert specific items back into the dialog history. Merry Christmas and Happy New Year!
b
1. I believe this should already work? Please correct me if i'm wrong on this 2. Definitely possibly, though we wouldn't want to do anything that would impact the quality of the conversation, especially if transcription is less accurate than whatever OpenAI uses to represent their version of the conversation history
i
1. I’ve checked today, and it works, but only sometimes. In the call 159c889a-a48d-4a58-81fa-938a4a4cc5f6, it didn’t work. The bot called the tool but then got stuck. The tool’s response was:
Copy code
{
  "results": [
    {
      "tool_call_id": "call_ZueutcNilVwPrPVQ",
      "name": "sayAndEndCall",
      "result": "Success.",
      "message": {
        "type": "request-complete",
        "role": "assistant",
        "end_call_after_spoken_enabled": true,
        "content": "Thank you for your time, and have a great day!"
      }
    }
  ]
}
However, in the next call (23117e67-98ec-480e-b0a7-ac21e88bec3c), a similar scenario worked well. Do you want me to create a separate support topic? 2. Any text transcript will certainly be less accurate than an audio token representation. However, in the trade-off between the cost of a 5-minute real-time API dialog and losing audio nuances from previous minutes of the dialog, I would choose cost reduction. Of course, this should be an optional optimization — just my suggestion.
b
That shouldn't be necessary, let me investigate it
b
@bryant just stumbled upon this tread after searching discord. Is there somewhere an overview which features of vapi and realtime are supported / not supported, and how they get mapped? We find ourselves tweaking many settings on Vapi that seemingly don't have any effect on how calls happen. We use 4o-realtime. I just would like to know which feature we can expect to work and which ones not.
b
With the 2024-10-01 variant, first messages, idle messages are compatible. However this was made possible with the 2024-12-17 variants. As of right now, only KB is not compatible with it
b
@bryant thanks for the clarification here. What we're trying to fix is that our voice bot is often interrupting users. Could you tell me which Vapi settings map to these OpenAI realtime settings, if any? https://cdn.discordapp.com/attachments/1320824982569226332/1325911711429165066/image.png?ex=677d8302&is=677c3182&hm=dde0a50f4d94cbf35f2870e62b3242f53f7096209e3bd470a6d632237be795ec&
3 Views