Clarification about prompt tokens and cost
# support
l
CallID: c6ae88ad-bd20-440b-9928-e7dd26633bee Timestamp: 1741023821641 Can you please help us clarifing how prompt tokens are calculated? I can see the call above is charging us for ~27K tokens. The call lasted ~2mins and there are only 300 words in the transcript. The assistant prompt contains ~700 words. There is no KB attached to assistant. What contributed to the overall ~27K tokens we got charged for? https://cdn.discordapp.com/attachments/1346207930889797764/1346207931091390605/Screenshot_2025-03-03_at_17.56.48.png?ex=67c75956&is=67c607d6&hm=7bfff084f539fdd1611ab8719273b60db59d82e29831a55062147bd07f2c259a&
v
Sure! Let's break down how the prompt tokens were calculated for your call. 1. **Total Tokens Charged**: You were charged for approximately 30,958 prompt tokens in total, which includes: - 27,657 prompt tokens from the main interaction. - 530 prompt tokens from the summary. - 1,771 prompt tokens from the success evaluation. 2. **Understanding the Breakdown**: - The main interaction (27,657 tokens) likely includes the assistant's prompt, the user's input, and any additional context that was processed during the call. Even though the transcript has only 300 words, the assistant's prompt of ~700 words can significantly increase the token count due to how tokens are calculated (where each word and punctuation mark can count as multiple tokens). - The summary and success evaluation also contribute additional tokens, which can add up quickly, especially if they involve complex processing or additional context. 3. **Factors Influencing Token Count**: - **Length and Complexity of Prompts**: Longer prompts and complex sentences can lead to higher token usage. Each word, punctuation, and formatting can contribute to the total token count. - **Context Management**: If the assistant is designed to maintain context over multiple interactions, this can also lead to increased token usage as it retains more information. - **No Knowledge Base**: While you mentioned there is no knowledge base attached, the assistant's internal processing and the way it handles prompts can still lead to high token counts. 4. **Cost Implications**: The total cost of $0.22 reflects the usage of these tokens across different services (STT, LLM, VAPI), which are charged based on the number of tokens processed. If you have any more questions or need further clarification, feel free to ask! 😊 Sources:
- [Breakdown of Costs Associated with Prompt and Completion Tokens in VAPI](https://discord.com/channels/1211482211119796234/1211483291191083018/1338521531101483040) - [Factors Contributing to Total Token Count Charged in VAPI Calls](https://discord.com/channels/1211482211119796234/1211483291191083018/1336010280819359845)
l
Can you someone shed some light on this please?
s
@Lucio Apologies for the delay in response.
I see you're looking at a call that's charging approximately 27K tokens. Let me explain how prompt tokens are calculated in Vapi. ## What Are Prompt Tokens? In the context of your Vapi call, prompt tokens represent the input text sent to the language model (in this case, GPT-4o) for each turn of the conversation. Looking at your call data, I can see:
Copy code
"llmPromptTokens": 27657,
"llmCompletionTokens": 341,
## How Prompt Tokens Accumulate For voice AI assistants like the one in your call, the prompt tokens include: 1. **System Instructions**: The detailed instructions given to the model about how to behave (personality, conversation style, clinic information, etc.) 2. **Conversation History**: All previous messages in the conversation 3. **Tool Calls & Results**: Function calls and their returned data ## Cost Implications For this call: - 27,657 prompt tokens × GPT-4o rate = $0.0816 for LLM costs - Other costs (STT, Vapi service, etc.) = $0.1383 - Total call cost = $0.219
@Lucio lmk if you have more doubts.
l
Yeah, I still don't really get how you reached 27K. The prompt was only ~700 words and full transcript of the conversation 300. There were no tool calls. It is like something is compunding or being double counted?
s
cc: @Sahil
s
After each turn the complete message array is sent which includes prompt + previous conversation + current message.
So, context length keeps increasing.
3 Views