Vapi is the Voice AI platform for developers.

Vapi AI

e.g. by not sending the whole conversation on each caller's answer?

<@374737898701848576>
Absolutely, we can optimize the process by not sending the entire conversation after each caller's response. Instead, we can send incremental updates or wait until the end of the call to send a complete summary. This approach can save bandwidth and reduce processing overhead. If you need assistance with implementing this or have any specific requirements, please let me know, and I'll be glad to help!

You can continue this conversation by mentioning me in the message.

Got it. Tagging <@&1231325148079194222> for additional help.

This is not how this works. On each interaction with the caller, after the caller says anything, VAPI sends the whole conversation, including the system messages, past function calls, etc, to the LLM, to get the new message for the Bot to say or the following function call to call, etc

First question is about if we are using [OpenAI Prompt caching](https://platform.openai.com/docs/guides/prompt-caching), as the dashboard does not show how many tokens are cached on each request.

The second question is about some other mechanism we can use to save LLM costs, e.g. with a context window, e.g. not sending the first messages to the LLM when the conversation is long or something like that