Incorrect OpenAI LLM cost
# support
i
In the cost breakdown on the dashboard for a call with gpt-4o-2024-11-20, I see the following: • $0.79 • 209,307 prompt tokens, 1,651 completion tokens https://dashboard.vapi.ai/calls/4d5ef79c-6ee1-476d-8a04-dc22ed8470fc I calculated the cost based on OpenAI’s pricing: 209,307 * 2.50 / 1,000,000 + 1,651 * 10 / 1,000,000 = $0.54 The calculated cost ($0.54) is roughly 1.5x lower than the reported cost ($0.79). Where is the discrepancy coming from? Also, a side question: do you apply the discounted cached input pricing for GPT-4o from OpenAI? If so, the actual cost should be even lower than $0.54. https://cdn.discordapp.com/attachments/1354143550987702352/1354143551411191930/image.png?ex=67e437f5&is=67e2e675&hm=65c70a5b05d2ac28286ac789e4b25aa7222d75a8b58fc2d0ac71ee6cd6b26114&
v
I'm sorry, I'm having trouble with your request. A member of the team will be notified and will get back to you as soon as possible.
k
@ivan checking if this is resolved for you?
i
Not really. The cost is still higher than when calculated manually: https://dashboard.vapi.ai/calls/cb919376-a8b2-4f59-9d2f-5b2b55bdd324 105313 * 2.50/1000000 + 915 * 10/1000000‎ = 0,272 Maybe your dashboard doesn’t show cached tokens? https://cdn.discordapp.com/attachments/1354143550987702352/1355114048987074663/image.png?ex=67e7bfce&is=67e66e4e&hm=7fa9acb28f47756be3d3a6c7dbf90ee181b5fc364c08c207fa0124bc6898478b&
s
## Transcription Costs - Provider: Deepgram - Model: Nova-2 - Usage: 6.798283 minutes of audio - Cost: $0.03375953 - Calculation: - Rate appears to be approximately $0.00496/minute - 6.798283 minutes × $0.00496/minute = $0.03375953 ## LLM Costs - Provider: OpenAI - Model: GPT-4o-2024-11-20 - Usage: - Prompt: 105,313 tokens - Completion: 915 tokens - Cost: $0.3687125 - Calculation: - Input rate: ~$0.0034/1K tokens - Output rate: ~$0.0168/1K tokens - (105,313 tokens × $0.0034/1K) + (915 tokens × $0.0168/1K) = $0.3580 + $0.0154 = $0.3687125 ## Voice Synthesis Costs - Provider: Eleven Labs - Model: Eleven Turbo v2.5 - Voice ID: ymDCYd8puC7gYjxIamPt - Usage: 2,908 characters - Cost: $0.1454 - Calculation: - Rate: $0.05/1K characters - 2,908 characters × $0.05/1K = $0.1454 ## VAPI Platform Costs - Type: Normal usage - Usage: 6.7211 minutes - Cost: $0.336055 - Calculation: - Rate: $0.05/minute - 6.7211 minutes × $0.05/minute = $0.336055 ## Knowledge Base Costs - Provider: Google - Model: Gemini-1.5-Flash - Usage: 0 tokens (not used) - Cost: $0.00 - Calculation: No usage, so no cost ## Total Cost - Sum of all services: $0.03375953 + $0.3687125 + $0.1454 + $0.336055 + $0 = $0.88392703 This breakdown shows how each component contributes to the total cost based on their specific usage units and pricing structure. The largest cost component is the OpenAI model usage at approximately 42% of the total, followed by the VAPI platform costs at approximately 38%.
The tokens' caching is managed by OpenAI itself, so we don't have any clue how they do it. We are not involved in the caching process done by the OpenAI.
i
According to https://platform.openai.com/docs/pricing, GPT-4o-2024-11-20 is priced as follows: Input: $**0.0025**/1K Input cached: $0.00125/1K Output: $**0.0100**/1K However, you are calculating the cost using different rates: (105,313 tokens × $**0.0034**/1K) + (915 tokens × $**0.0168**/1K) = $0.3580 + $0.0154 = $0.3687125 Could you explain why you’re billing 36% more for input tokens and 68% more for output tokens? I understand you’re not involved in the caching process, but you should still calculate token usage—including cached tokens—correctly. I’m fairly certain some of the input tokens were cached. But for now, let’s focus on the pricing without considering caching.
k
This is the transparent pricing which we receive from openAI, there’s nothing to hide regarding pricing. If user interrupts then still the current LLM request costs get aggregated into the final costs, and others depending upon call.
i
How can the pricing be transparent if the formula you provided includes an incorrect price for GPT-4o tokens? Either the token price is wrong, the token count is miscalculated, or both. None of these scenarios reflect transparency. As a Vapi customer for 10 months, I find this conversation very frustrating. > This is the transparent pricing which we receive from openAI OpenAI does not return the completion cost in their API—only the number of tokens used.
k
Hey Ivan, Regarding transparency, I'm sure that I've shared raw-cost tokens with you which are visible to you in the dashboard. There is nothing made up here. I understand your frustration. But saying it's not transparent isn't right. Regarding more transparency, I can drop all of the call logs to you. You can calculate it even by yourself. Let me know if required, I can drop the DM with the call logs.
Hey buddy, in case you're feeling like'something is wrong', nothing is wrong here. We are here to help you out. Happy to give your credits back. Happy to answer all of your questions. Can sit and actually share the logs with breakdowns. Let me know how can I help you further? Do you require Turn-by-Turn Course Breakdown?
i
Hi Shubham, Thanks for your willingness to help. I still don’t understand why the rates in your dashboard and your breakdown ($0.0034/1K for input and $0.0168/1K for output) are higher than OpenAI’s official pricing for GPT-4o -2024-11-20 ($0.0025 and $0.01 respectively). Can you please confirm if these rates are from OpenAI? If not, what explains the markup?
k
Hi, checking if this is resolved/solved for you?
i
It is not
s
Taking a look into it, @Ivan. Give me some time to go through all of this with team and solve it for you.
Marking this ticket as resolved.
j
@Sahil @Ivan I'm interested to know as well. I've used gpt-4o for long, with apparently incorrect rates ? Seems 4.1 are rather correct compared to openai public procing, although I'm also curious about the cached input, which doesn't seem to be applied?
3 Views