Hello!
I have a question about the integration with a custom LLM.
Our custom LLM runs on a stateful LangGraph graph. Everytime VAPI sends a POST request to our /chat/completions endpoint we take the latest message in the VAPI transcript and send it to LangGraph for processing. We then stream the response token by token back to VAPI.
The issue is that if the user speaks again before we streamed back the full response then a new POST request is received, a new message is sent to LangGraph and the response from the prior human message is never sent back to VAPI because VAPI seem to close the prior POST request connection.
At this point, the state of the LangGraph agent and the conversation between the user and the agent in VAPI are out of sync (e.g. the LangGraph agent thinks the user may be aware of things that were never vocalized to the user.)
What kind of strategies do people put in place to mitigate these issues?
Do people send the full transcript of the conversation to the LangGraph agent each time?
Is it possible to prevent VAPI from sending a new POST request before the previous one is fully answered?