Is VAPI using prompt caching with OpenAI? Is there...
# support
m
e.g. by not sending the whole conversation on each caller's answer?
v
@Manuel Saelices Absolutely, we can optimize the process by not sending the entire conversation after each caller's response. Instead, we can send incremental updates or wait until the end of the call to send a complete summary. This approach can save bandwidth and reduce processing overhead. If you need assistance with implementing this or have any specific requirements, please let me know, and I'll be glad to help!
You can continue this conversation by mentioning me in the message.
Got it. Tagging @User for additional help.
m
This is not how this works. On each interaction with the caller, after the caller says anything, VAPI sends the whole conversation, including the system messages, past function calls, etc, to the LLM, to get the new message for the Bot to say or the following function call to call, etc
First question is about if we are using [OpenAI Prompt caching](https://platform.openai.com/docs/guides/prompt-caching), as the dashboard does not show how many tokens are cached on each request.
The second question is about some other mechanism we can use to save LLM costs, e.g. with a context window, e.g. not sending the first messages to the LLM when the conversation is long or something like that
2 Views