Unexpected latency on assistant
# support
j
The AI agent can take up to 7~8 seconds to respond. It's unclear whether it's an infrastructure problem or an endpointing problem. Twilio's metrics seem to indicate it's not an infrastructure problem since packet loss is 0, jitter is small, and latency is < 10ms to the PBX. call id:
a9aa60f6-a2b6-4e27-8554-2a0fb9393c67
example log:
Copy code
21:52:13:764 [LOG] Transcriber output: plan. Are you calling today for the out of network benefit?
21:52:13:784 [LOG] Endpointing timeout 0ms (rule: `smart`)
21:52:13:785 [CHECKPOINT] Model request started
21:52:13:795 [LOG] Model request started (attempt #1, gpt-4o-2024-05-13, openai)
21:52:14:231 [CHECKPOINT] User speech stopped
21:52:14:312 [CHECKPOINT] Model sent start token
21:52:14:314 [LOG] Model output: Yes
21:52:14:314 [CHECKPOINT] Model sent first output token
21:52:14:325 [LOG] Model output: .
21:52:14:326 [LOG] Model request cost (attempt #1, $0.020025, 3996 prompt, 3 completion)
21:52:14:327 [CHECKPOINT] Model sent end token
21:52:14:327 [LOG] Voice input: Yes.
21:52:14:327 [CHECKPOINT] Voice audio requested
21:52:14:653 [CHECKPOINT] Voice audio received
21:52:14:657 [CHECKPOINT] Assistant speech started
21:52:14:657 [INFO] Turn latency: 1064ms (transcriber: 172ms, endpointing: 21ms, model: 529ms, voice: 339ms)
21:52:15:360 [CHECKPOINT] Assistant speech stopped
21:52:31:471 [CHECKPOINT] User speech possibly starting
21:52:31:711 [CHECKPOINT] User speech possibly starting
When listening to the transcripts (we recorded on Twilio), the time between "Are you calling today for the out of network benefit?" and the model saying "Yes" is ~15 seconds. We're looking for potential root causes for this latency, and also why there might be a discrepancy in the timestamps between the logs and the actual speech.
v
@jxxxxxxx It appears that the delay you're experiencing may not be related to network infrastructure, given the metrics you've provided. The issue could be related to the endpointing settings or the processing time within the AI model itself. The endpointing timeout is set to 0ms, which means the system is relying on the
smart
rule to determine when the user has finished speaking. If the
smart
rule isn't accurately detecting the end of the user's speech, this could introduce a delay. Additionally, the model request time seems to be a significant portion of the latency (529ms), which could be due to the complexity of the input, server load, or other factors within the AI processing pipeline. To further investigate the discrepancy between the logs and the actual speech, we would need to examine the endpointing configuration and the AI model's processing in more detail. If you have access to the endpointing settings and the AI model's configuration, reviewing these might provide more insights into the cause of the latency. If you need assistance with checking the endpointing settings or have other questions regarding the AI model's processing time, please let me know, and I can help guide you through the necessary steps or provide additional support.
You can continue this conversation by mentioning me in the message.
j
@kev3522 @Shubham Bajaj Hi Shubham, would you be able to take a look at the logs on your side for
a9aa60f6-a2b6-4e27-8554-2a0fb9393c67
around
21:52:14:657
and help us figure out exactly where the latency is coming from? This is urgent, as our assistant can't properly handle these phone calls. Thank you!
k
we can close this - Nikhil helped us during office hours and pointed that it's a Twilio issue. we're going to try other telephony providers
s
@jxxxxxxx can you share the public recording url?
2 Views