Margin for openAI tokens
# support
p
Hey do you add any margin for openAI tokens? I thought I heard somewhere in your marketing materials that you don't but I see that I've been charged much more for LLM than what open AI charges for GPT3.5
v
Hi @piechos, I'm an AI assistant for Vapi. While a team member reviews this question, I'll look through relevant sources and see if I can provide suggestions. Please tag @Vapi Support with any follow-up questions.
----------------------------- **While I generate an answer, here are some sources that might be helpful**: * [Dashboard]() * [Quickstart]() * [Pricing]() * [Custom LLMs]() -----------------------------
### Answer Vapi bills you for the Transcription and Voice services at cost. If you choose to go with a LLM we offer through the platform, you will be billed for the Model at cost. Additionally, Vapi charges a base fee of $0.05 per minute on top of what you are billed for providers [(1)]() . If you've noticed a discrepancy in pricing, it might be worth checking the detailed breakdown of charges on your Vapi dashboard or contacting support for clarification.
Got it. Tagging @User for additional help.
p
$2.71 seems too much for the amount of tokens used. gpt3.5-turbo charges less
s
600k prompt tokens? That's too long! which llm is this?
p
Gpt3.5 turbo. What do you mean too long? Even with 600k is too much. OpenAI charges $0.5 per million prompt tokens
s
https://yourgpt.ai/tools/openai-and-other-llm-api-pricing-calculator this calculator should help double check. i think it adds up for this amount of token usage.
actually I'm not sure, better have the @User check this..
p
Thanks. Calculator confirms my calculations. I should be charged 0.304 instead of 2.71
n
mind sending me a callid. we likely haven't updated the values
p
fa801d80-25ab-406f-8151-4807164f0405
Which values?
Now I inserted my openAI API keys and you don't charge me for LLM but openAI also doesn't charge me so you probably still pay for tokens. Seems like you have some issue here
n
yup that issue i did fix recently on staging-api.vapi.ai and will be released to prod soon. for now, enjoy the free tokens:)
the bug regarding us mischarging you, let me take a look
p
Ok, thanks
Today openAI started charging me for GPT-4-0125-preview when using voice agent. But in my agent I've chosen GPT-3.5. Please check this. My assistant id is 76d28212-ea18-4705-9140-edd6ff9bd953
And now with another use I’ve been charged for gpt3.5 and then with another use for gpt4 again. The number of tokens I’ve been charged is smaller than showed in vapi dashboard. Really weird things are happening with your billing
Please fix it because when I get charged with gpt4 price is 20 times more expensive which is a big problem
s
@piechos Could you please provide the call_id for the call where you were charged a high amount?
p
If you are talking about the previous issue where I was charged for LLM by vapi it's this one - fa801d80-25ab-406f-8151-4807164f0405
I fyou are talking about todays issue where I've been charged for GPT4 by openAI when using GPT3.5 it's one of these: f22a49f1-7de0-4cc9-9f17-a9624bcbc776 e6347236-1d22-40eb-b30d-e417495912b8
s
...this one
p
The problem is that openAI charges me for gpt4 (and ones for gpt3.5) but agent had choosen gpt3.5 the whole time
@Sahil did you fin the issue?
Today another call to openAI API (GPT4) was issued using my API Keys that I created for VAPI. Why? Sometimes VAPI uses GPT4, sometimes GPT3.5, and sometimes doesn't call any model (I don't get charged by openAI API). The chosen model is GPT3.5-Turbo and I haven't touched this setting for a few days. Do you at least know what's going on there?
s
Hey Piechos, I will provide you the answer by tonight. If anything went wrong with the pricing we will refund you. Don’t worry about it.
s
And if this is happening to him, should be happening to everyone? 😅
p
I don't worry about you refunding me. I just wanna make sure that you solve it so I can start fully using my agent reliably without being afraid of being charged 20x more for LLM
n
Could you DM me screenshot of OpenAI charging you for GPT4? I don't see GPT4 getting used anywhere in the call logs for the ids you shared
It might be easier to hop on a quick call, DM'ing you
p
Today it happened again. I've been charged for GPT4 for this call... fa49e72a-5591-44cd-bb6c-2011a5ba16be
So I removed my openAI API keys to check how VAPI charges (because open AI kept me charging for GPT4 even though I used GPT3.5 in my assistant) and I see this. @nikhil could you please explain the calculation behind the LLM cost? If you are using gpt-3.5-turbo-0125 then it should be (162647 / 1000000 * 0.5) + (1359/1000000 * 1.5) = 0.083362 If you are using gpt-3.5-turbo-instruct then it should be (162647 / 1000000 * 1.5) + (1359/1000000 * 2) = 0.2466885 But you charged me $0.83. How did you come up with this number? I know these amounts are small and you say that you will reimburse me but this issue is blocking me from scaling it up and from using it in production. It's been 10 days since I reported it and nothing changed Please help Call id - 009b27bf-dd28-45bb-a80a-f0e1ab31e290
n
Taking another look. Gimme 10
(Still Checking)
Found the "bug":
Copy code
🔵 08:27:51:795 Getting Sequence For `gpt-3.5-turbo` (26774 tokens, functions: true, hipaa: false)...
🔵 08:27:51:798 Selected Models: gpt-4-0125-preview, gpt-4-turbo, gpt-4-turbo-preview, gpt-4-turbo-2024-04-09, gpt-4-0125-eastus1, gpt-4-1106-preview...
when your token length exceeds the context window of your requested model, we auto magically find a model that works. this is why with your key, you were seeing gpt-4 requests too. we could give up and end the call but not sure if that's any better. for now, i'll just leave this as is. on your end, try either less contextor or use haiku
a
Why isn't it the case that only that last 5 messages get rememebered each time
To make the context smaller
p
So for the context, you are using the whole history of the conversation?
If you are switching models and using a different one than chosen, you should at least indicate this somehow in the calls history. Otherwise, how could I know that the context was exceeded and I should amend my assistant? I don't want to use GPT4 for sure. It's 20 times more expensive than GPT3.5 and it's cost adds up pretty quickly
I tried with Haiku but the assistant hangs whenever I tell it to call a function. Call ids 95742ff0-6897-475e-8f32-46b95b1cdf61 bd028ccf-07bd-49d6-9a19-6a419beafcbf
@nikhil are you going to do anything about this? I can't use the agent if I can't control the model and the costs. How many last messages do you use for context?
v
Got it. Tagging @User for additional help.
m
In the API can't you chose your own fallback model
p
Unfortunately, there is no such option. You can't choose fallback model and you don't even see how many tokens were processed by main vs fallback model
m
There is
model.fallbackModels enum[]
**"fallbackModels": [ "gpt-4-0125-preview", "gpt-4-0613"**
p
I'm building assistant via web UI and there is no such option
m
Gotcha well there is on the API
p
Thanks, good to know that, but I still would need it on the UI. Using API wouldn't be very convenient in my case
m
and then edit it further in the UI
p
Yea that could work but workarounds like this are far from good user experience with the tool don't you agree?
But this is good idea, I will probably do it while waiting for this issue to be addressed
m
Not really, I think it's targeted for developers so most of us use the API. If they add too much to the ui of the webapp it will be way too much
p
If it was like that then they wouldn't create web app at all
Anyway, thanks for your help. Hopefully @nikhil responds soon
no problem boss