Support of assistant-speaks-first-with-model-generated-message with OpenAI Realtime Vapi AI #support

Support of assistant-speaks-first-with-model-gener...

Ivan

12/23/2024, 6:46 PM

I tried OpenAI Realtime in Vapi today and noticed that static greetings work surprisingly well ("firstMessageMode": "assistant-speaks-first"), but AI-generated ones don’t ("firstMessageMode": "assistant-speaks-first-with-model-generated-message"). Am I correct in assuming that you are aware of this issue and have an implementation plan to address it? Thank you.

Vapi

12/23/2024, 6:49 PM

@Ivan I'm glad to hear that the static greetings are working well for you! Regarding the AI-generated greetings, I appreciate you bringing this to our attention. While I don't have specific details about an implementation plan to address this issue, I can assure you that feedback like yours is crucial for continuous improvement. If you have any more insights or specific feedback on how the AI-generated greetings could be improved, please feel free to share, and I'll make sure it gets passed along to the development team. If you need immediate assistance or further information on this matter, please ask support for more info on this.

Vapi

12/23/2024, 6:49 PM

You can continue this conversation by mentioning me in the message.

Vapi

12/23/2024, 6:50 PM

Got it. Tagging @User for additional help.

bryant

12/24/2024, 3:32 AM

Hi @Ivan thanks for bringing this to our attention. The realtime model is still in beta and a very different paradigm to our existing platform, so we are gradually incorporating it, and not all features may work perfectly at the moment. I will look into this

Ivan

12/24/2024, 1:48 PM

>The realtime model is ... a very different paradigm Hi @bryant ! Yes, I know it’s not easy. I just wanted to bring your attention to the feature we use, which is somewhat a blocker for switching to the Realtime API (beta). And if you’d like a bit more feedback, here are a couple of suggestions you probably already know about: 1. Handling tool responses with "endCallAfterSpokenEnabled": true and a text content would be great. We currently use this with conventional STT-LLM-TTS pipeline. 2. Adding cost optimization mechanisms for the Realtime API would be highly beneficial. For example, allowing a maximum number of recent lines/phrases in the dialog history to remain as audio tokens, while older lines are automatically converted to text tokens, which are much cheaper. According to OpenAI, it’s possible to use

conversation.item.delete

and then

conversation.item.create

input_text

to insert specific items back into the dialog history. Merry Christmas and Happy New Year!

bryant

12/26/2024, 5:26 AM

1. I believe this should already work? Please correct me if i'm wrong on this 2. Definitely possibly, though we wouldn't want to do anything that would impact the quality of the conversation, especially if transcription is less accurate than whatever OpenAI uses to represent their version of the conversation history

Ivan

12/26/2024, 11:06 AM

1. I’ve checked today, and it works, but only sometimes. In the call 159c889a-a48d-4a58-81fa-938a4a4cc5f6, it didn’t work. The bot called the tool but then got stuck. The tool’s response was:

Copy code

{
  "results": [
    {
      "tool_call_id": "call_ZueutcNilVwPrPVQ",
      "name": "sayAndEndCall",
      "result": "Success.",
      "message": {
        "type": "request-complete",
        "role": "assistant",
        "end_call_after_spoken_enabled": true,
        "content": "Thank you for your time, and have a great day!"
      }
    }
  ]
}

However, in the next call (23117e67-98ec-480e-b0a7-ac21e88bec3c), a similar scenario worked well. Do you want me to create a separate support topic? 2. Any text transcript will certainly be less accurate than an audio token representation. However, in the trade-off between the cost of a 5-minute real-time API dialog and losing audio nuances from previous minutes of the dialog, I would choose cost reduction. Of course, this should be an optional optimization — just my suggestion.

bryant

12/27/2024, 2:44 AM

That shouldn't be necessary, let me investigate it

boris_be

01/03/2025, 5:50 PM

@bryant just stumbled upon this tread after searching discord. Is there somewhere an overview which features of vapi and realtime are supported / not supported, and how they get mapped? We find ourselves tweaking many settings on Vapi that seemingly don't have any effect on how calls happen. We use 4o-realtime. I just would like to know which feature we can expect to work and which ones not.

bryant

01/03/2025, 7:32 PM

With the 2024-10-01 variant, first messages, idle messages are compatible. However this was made possible with the 2024-12-17 variants. As of right now, only KB is not compatible with it

boris_be

01/06/2025, 7:39 PM

@bryant thanks for the clarification here. What we're trying to fix is that our voice bot is often interrupting users. Could you tell me which Vapi settings map to these OpenAI realtime settings, if any? https://cdn.discordapp.com/attachments/1320824982569226332/1325911711429165066/image.png?ex=677d8302&is=677c3182&hm=dde0a50f4d94cbf35f2870e62b3242f53f7096209e3bd470a6d632237be795ec&

boris_be

01/06/2025, 7:40 PM

(this is from the openai realtime api docs: https://platform.openai.com/docs/api-reference/realtime-sessions/create)

3 Views

Previous Next