vapi agents are more prone to hallucinations
# support
g
I have seen this time and time again that gpt4 turbo model is relatively dumber. It is way easier to break and make it hallucinate
v
Hi @GeneralKugelBlitz, I'm an AI assistant for Vapi. While a team member reviews this question, I'll look through relevant sources and see if I can provide suggestions. Please tag @Vapi Support with any follow-up questions.
----------------------------- **While I generate an answer, here are some sources that might be helpful**: * [Set up your Server URL]() * [Call Functions]() * [Billing Examples]() * [Inbound Support Example]() -----------------------------
### Answer I understand your concerns about the performance of models, including instances of hallucinations. Vapi integrates with OpenAI's models, offering flexibility in model usage and function calling to enhance performance and reliability in various scenarios [(1)]() . For specific issues or optimization advice, it might be beneficial to explore Vapi's documentation or contact support directly for tailored assistance.
g
the model in gpt UI is smart enough to know there are no availabilities on saturday, but gpt-4-1106-preview model just makes availabilities up
using the same exact prompt on both
AZURE open ai is just dumber
" AI: Great. Let's look at Tuesday. For Tuesday, April 20 third, we have a wide range of times available. We're open from 9 8 to 4 acid. Have a preferred time in the morning or afternoon? User: 9 what to for what? AI: My apologies for the confusion. On Tuesday, April 20 third, we have openings from 9 at to 4 Aje. You prefer a morning or an afternoon appointment? User: Uh, afternoon? " there is hallucination and then there is this.
"from 9 8 to 4 acid" it seems like its on acid
@User GPT 4 would be too expensive, could you suggest any model that doesn't trip like this and is reliable?
m
I've had good results with Antrhopic, if you get the prompting right, together with the low latency, its a great provider, plus haiku and sonnet are pretty cheap.
have you given any thoughts on this?
g
thanks
I was thinking the same
m
yeah man, give it a try. But its interesting how long the sentences are, so I would recommend you to keep the temperature and the tokes very low
j
Yeah I'd say the same as above ^ opus or haiku
g
opus is expensive and slow
what about sonnet?
and why haiku, isn't it the smallest model?
and do function calls work as well with all of these?
I will test all of them but would appreciate some pointers from people who have already tested
m
well for me opus is out of the equation. But I feel like the difference between the other two models regarding performance is subtle but perceivable. But the reason I chose haiku is the latency, which makes up for the lower processing capabilities. Regarding function calling, I haven't noticed much difference, but you def have to get the prompting right
from what you have shown, i believe you should make the instructions shorter. The descriptions for the functions on the functions panel already do a lot by itself
g
I don't follow which instructions need to be shorter?
if you are talking about system prompt, when you give it general instructions it was always messing it up cuz its too dumb. so I had to give it specific example for each scenario, my prompt is like 1400 tokens
no matter how many times I say don't just dump all the slots as a list, it would still do it
its prone to print out information then explain and summarize it
so examples are needed to change its behaviour
8888888888888888888888888888888888888888
User: Hey, Melissa. How's it going? AI: Speaking in a friendly professional tone, hello? This is Melissa from California Dental. I'm doing well. Thank you for asking. How can I assist you today? Do you have a dental concern or would you like to schedule an appointment. User: Right. What? No. The first thing is tell me how are you doing. AI: Chuckles warmly, oh, I'm doing splendidly. Thank you for asking. As an AI assistant, I don't have personal feelings, but I'm here and ready to help you with anything related to scheduling a dental appointment. How can I assist you today? User: Yeah. Well, I have a tooth issue. Like, uh, pain? haiku is all sorts of wrong
also didn't call the function and hallucinated that it did
j
Hm yeah one more thing you can try is specifying a specific model in the model field. 0125 models are less prone to function hallucinations
s
You can try other models, but make sure to follow the LLM model prompting guide because you can see a significant difference due to it.
g
I know there is a bunch which one would be best for less hallucinatioon and good function calling?