Pricing of GPT-4o-mini and Gemini-2.0-Flash Vapi AI #support

Pricing of GPT-4o-mini and Gemini-2.0-Flash

Theja D

03/24/2025, 5:06 AM

Hello. I am testing the calls with both OpenAI GPT-4o-mini and Google Gemini-2.0-flash. Generally Gemini-2.0-flash is cheaper than GPT-4o-mini. But I am observing for some reason Gemini-2.0-flash is costing more than the other. These are the call ids 7fe16f9c-e8ec-4731-add6-654abe6c004a (GPT-4o-mini) de1fa024-3f09-47e2-ac81-6f8afc60f683 (Gemini-2.0-flash)

Vapi Support Bot

03/24/2025, 5:07 AM

Hi there! 👋 I don't have access to the specific cost breakdowns for your particular call IDs, but I can definitely help explain the general pricing structures and potential reasons for the cost discrepancy you're seeing. ## Pricing Structures Gemini-2.0-flash pricing: According to the information I found, Gemini Flash is priced at [$0.10 per million tokens for input and $0.40 per million tokens for output](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333867863462903868). GPT-4o-mini: While I don't have the exact pricing for GPT-4o-mini specifically, I found examples of GPT-4o calls where costs were based on token usage. For instance, one call using GPT-4o cost [$0.1388 for processing 21,864 prompt tokens and 87 completion tokens](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1338521531101483040). ## Potential Reasons for Higher Gemini Costs Several factors might explain why Gemini-2.0-flash is costing more in your specific case: 1. **Tool Calling Issues**: There have been reports of errors with Gemini Flash 2.0 and tool calling, specifically related to invalid JSON payloads. These errors could lead to retries and increased processing time, [ultimately increasing costs](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1344710846865084616). 2. **Beta Status**: The Gemini 2.0 realtime models (including flash) are still in beta, which means you might experience "hiccups" that lead to [unexpected cost behavior](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336440478727667844). 3. **Looping Issues**: There have been mentions of "looping" issues with Gemini models that could [contribute to higher costs](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336440478727667844).

Vapi Support Bot

03/24/2025, 5:07 AM

4. **Configuration Problems**: Incorrect configurations can cause the model to fail or retry, leading to higher costs. For example, issues with parameter formatting in function declarations [could cause problems](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332510949940133960). 5. **Knowledge Base Issues**: If your calls involve a knowledge base, issues with retrieving information could lead to the model generating more tokens or providing incorrect information, [increasing costs](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339404492424089660). 6. **Long Prompts**: It's suggested to break long prompts into smaller parts when using Gemini-2.0-flash, implying that [long prompts could be less efficient or more costly](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333867863462903868). Would you like me to help you investigate any of these specific factors in more detail for your use case? Sources: - [Discord Thread about Gemini Flash pricing](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333867863462903868) - [Discord Thread about GPT-4o token usage and cost](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1338521531101483040) - [Discord Thread about Gemini Flash tool calling issues](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1344710846865084616) - [Discord Thread about realtime model beta status](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336440478727667844) - [Discord Thread about configuration problems](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332510949940133960) - [Discord Thread about knowledge base issues](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339404492424089660)

Kyle

03/24/2025, 10:41 PM

Looking at data, let me break down the costs for each call: **Call with GPT-4o-mini (7fe16f9c-e8ec-4731-add6-654abe6c004a)**: \- Model cost: $0.002 (18127 prompt tokens, 85 completion tokens) \- Voice cost: $0.006 (408 characters) \- Total cost (all components): $0.05 **Call with Gemini-2.0-flash (de1fa024-3f09-47e2-ac81-6f8afc60f683)**: \- Model cost: $0.002 (15973 prompt tokens, 122 completion tokens) \- Voice cost: $0.008 (541 characters) \- Total cost (all components): $0.06 You're actually correct that the Gemini-2.0-flash model itself is cheaper ($0.002) than GPT-4o-mini ($0.002). However, the total cost of the Gemini call is slightly higher because: 1\. The voice cost is significantly higher in the Gemini call ($0.008 vs $0.006) because it used more characters (541 vs 408) 2\. The completion tokens are higher for Gemini (122 vs 85) This explains why the overall cost of the Gemini call appears higher, even though the model itself is cheaper. The difference is primarily due to variations in other components of the call, especially the voice component which used about 33% more characters in the Gemini call.

2 Views

Previous Next