I do have the same complaint. For example on this call: 9c662e0f-fbc6-4033-a683-8edf91255656
It takes 4 sec for the AI to respond to the caller. Seems to be because it's pulling the KB right after the caller speaks, LLM take 3sec to do I don't know what since the KB is supposed to provide the answer if any, then the AI asks the question where the caller is from just as instructed in the prompt, then it takes another 10sec to look through things.
This back and forth makes no sense. Why pull the kb as soon as the user speaks instead of just going by it's prompt which is to gather the location of the caller THEN lookup it's info? I do see the kb logs in Trieve showing data getting fetched for every word that's spoken, seems like a setting issue maybe?
Beyond that, you'll see after the caller says Los Angeles, the KB is getting pulled, ok, but then you see this:
17:23:10:287
[LOG]
Model request started (attempt #1, claude-3-7-sonnet-20250219, anthropic)
17:23:16:962
[CHECKPOINT]
Model sent start token
6seconds for the Model to send start token? Why?
So the question is how do we improve latency?