First reply latency
# support
t
Currently I have next setup: deepgram, gpt4-o-mini, 11labs, use phone number bought from the vapi. The dashboard said it took 900 ms on web mode. Problem is when the agent pass the IVR system (call tree) and the human picking up the phone and answering like "Hi how can I help you", there are a latency lag about 1-2 seconds and then 5-7 seconds the human and the agent talking simultaneously and are trying to understand each other. For example, adc54466-104c-48bf-b18d-fe12e8b0b6c2 How to solve such problem? I'm wondering about some phrase that an agent could say when the human picked up the phone, and send these messages (greetings from the agent and the human) to the llm to generate next message
v
@Timur It sounds like you're experiencing latency and overlap issues during the transition from an IVR system to a live conversation. To address this, you could implement a brief pause after the IVR system connects to a human operator, allowing the agent to wait for the human to finish their greeting before responding. Here's a phrase that the agent could use: "Hello, thank you for taking my call. Please give me a moment to ensure our connection is clear." This phrase acknowledges the human operator and buys a few seconds to prevent the overlap issue. After the human responds, you can then send the greetings from both the agent and the human to the LLM to generate the next message. If you're still experiencing issues after trying this, it might be necessary to adjust the timing settings in your system or consult with the support team of the respective APIs you're using to optimize the interaction flow. Please ask support for more info on this if the problem persists.
You can continue this conversation by mentioning me in the message.
t
No, the problem is when the human start speaking, the agent need some time to generate next phrase and human start repeating "Hello?, Hello?". I want to repeat the human immediately after him
v
@Timur I understand the issue now. To minimize the delay and prevent the human from repeating their greeting, you could implement an immediate acknowledgment from the agent that confirms the connection and informs the human that a response is being prepared. This can be a simple, pre-recorded message that plays automatically when the human picks up, such as: "Hello, I can hear you. One moment please while I process your response." This message serves two purposes: it assures the human that they have been heard, and it buys the agent a few seconds to generate the next phrase. This should help to reduce the confusion and prevent the human from repeating their greeting due to thinking there is no one on the line. If you have the capability to detect when the human has finished speaking and trigger the agent's response immediately, you could also consider implementing voice activity detection to optimize the timing of the agent's responses. This would require adjustments to your system's configuration to ensure that the agent is able to process and respond to speech as quickly as possible.
t
Nah, I want some ready-to-say message that will played once the human said greetings
v
@Timur Certainly! A ready-to-say message that can be played immediately after the human's greeting could be: "Hi there! Just a moment while I get everything set up for our conversation." This quick response acknowledges the human's greeting and buys a little time for the agent to prepare the next part of the conversation. It's friendly and informs the caller that the conversation will continue shortly, which should prevent them from repeating their greeting. Would you like to implement this message into your system now? If so, please provide any necessary parameters or additional context needed to proceed.
t
I know about this assistant.firstMessageMode, but as I know, an agent will try to say this message once some one picked up the phone, including the IVR system
@User
s
initially the model call the dtmf tool, then afterwards model prompted voice tokens.
log
šŸ”µ 07:10:05:752 [user LOG] Model called tool: dial-keypad-tool({"key":"one"})
Now regarding the connecting with agent you can configure startSpeakingPlan according to you, if required stopSpeakingPlan.
Copy code
json
"startSpeakingPlan": {
    "waitSeconds": 0.2,
    "smartEndpointingEnabled": true
},
"stopSpeakingPlan": {
    "numWords": 0,
    "voiceSeconds": 0.2,
    "backoffSeconds": 2
}
t
Problem is that the agent could handle IVR system only with these config:
Copy code
startSpeakingPlan: {
    waitSeconds: 0.6,
    transcriptionEndpointingPlan: {
      onPunctuationSeconds: 0.4,
      onNoPunctuationSeconds: 1.3,
      onNumberSeconds: 0.9,
    },
  },
If we decrease waitSeconds, then the system will chunking the IVR's menu (for contact this person, press 1, ....) in smaller chunks. When we increase waitSeconds, it chunking the IVR's menu in bigger chunks. Then the dtmf tool will interact only with one chunk and could response with the wrong numbers of menu. But on other side, if we increase waitSeconds, then there are a lag in the conversation with the real human 😦
s
hmm then do one thing handle ivr by yourself using twilio flow/flex, and then transfer call to vapi.
t
thank you. Do you have a link on related twilio docs?
s
you can find it on twilio docs
just google it
8 Views