API to generate next assistant response
# support
u
Hi team, Is there an API to produce assistant next message providing it with a chain of messages? That is I provide a chain like:
Copy code
[
  {
    role: "assistant",
    content: "Hi! am i speaking with Donald Brasco?",
  },
  {
    role: "user",
    content: "Yup. You are.",
  },
  {
    role: "tool_calls",
    content: "Create_Cal_Meeting: {\n  \"start\": \"2025-01-10T19:00:00Z\",\n  \"attendeeName\": \"Donald Brasco\"}",
  },
  {
    role: "assistant",
    content: "One moment",
  },
  {
    role: "tool_call_result",
    content: "\"New Quote Review: Meet Organizer and Donald Brasco\"",
  },
]
And api respond with the next message that assistant will produce? I need this to program some evals for my assistants.
v
@udAIan Hello! Currently, VAPI does not provide a direct API that takes a chain of messages and produces the next assistant message as a response. The assistant's behavior is typically defined by the logic you implement in your server, which would determine the next message based on the conversation history and any tool calls made during the interaction. However, you can simulate the assistant's behavior by creating a function that processes the given chain of messages and decides the next step based on the context and the results of any tool calls. This would involve analyzing the last message in the chain and, if necessary, making appropriate tool calls to generate the next message. If you need assistance with setting up such a function or have any specific requirements for your evaluations, please let us know, and we'll be happy to help you further.
You can continue this conversation by mentioning me in the message.
s
@udAIan We use OpenAI's Chat Completions format by default, where the assistant or LLM receives messages as a sequential chain. If I am not addressing your request correctly, please rephrase it with a clear example.
u
The schema for messages for a vapi assistant is:
Copy code
{
  role: "user" | "bot" | "system" | "tool_calls" | "tool_call_result",
  message: string,
  // some other fields - skipping for brevity 
}[]
While schema for messages for openai is:
Copy code
{
  role: "user" | "assistant" | "system" | "tool",
  content: string, 
  // some other fields - skipping for brevity 
}[]
There is difference in types of role and key content (openai) is different from message(vapi)
Therefore when I send messages taken from a call to a vapi assistants to openai without any transformation I get error due to schema difference. I want to produce the next message programmatically for evaluating an assistants. So if vapi has some API that I can send seris of messages to to get the next assistant message, it'll work great for evaluating the assistant.
cc @Shubham Bajaj
s
Do let me know if any help is required.
u
@Shubham Bajaj Thanks for following up! We spotted this field too today and can definitely use it to call openai api and generate the next message and run evaluation. However, there is still one scenario we do not know how to evaluate. If an assistant has knowledge base associate to it, how will we get next assistant message from openai such that it is aware of the knowledge base?
s
@udAIan You can pass the chunk of knowledge base source content to OpenAI.
u
@Shubham Bajaj How will I find the chunk of knowledge base source content that was used in a call? Example for one call (763a2965-d7bd-4284-bf5f-c9174d92c5fd)
messagesOpenAIFormatted
are like:
Copy code
[
            {
                "content": "you are intelligent",
                "role": "system"
            },
            {
                "content": "Hello. Ask me the knowledge.",
                "role": "assistant"
            },
            {
                "content": "How does temporal work?",
                "role": "user"
            },
            {
                "content": "Temporal is an open source durable execution platform designed to simplify the creation of scalable distributed systems. It automates the saga pattern, which is essential for ensuring state in complex, multistep business transactions. Temporal abstracts away the complexity by autosaving application state for long running processes and handling failure and retry logic. Developers only need to focus on programming the business rules and compensation behavior within temporal workflows. While temporal takes care of scheduling. Retry behavior, and tracking progress. This makes implementing the saga pattern easy, safe, and reliable.",
                "role": "assistant"
            }
]
Last assistant message here leverages knowledge base (646be61c-5b1e-43f3-ab3a-493bee4d06d9 - created using Vapi trieve integration). I used API GET https://api.vapi.ai/call/763a2965-d7bd-4284-bf5f-c9174d92c5fd to get the call data, but I do not see anywhere the chunks that were sent to OpenAI.
To clarify my objective. The eval I'm creating I'll remove the last assistant message from messages chain and send it to OpenAI and will assert on the new response from OpenAI being similar to what I expect. The eval will run anytime we change system prompt of the assistant so ensure that required functionality is maintained across changes.
a
Great question...
I know for a fact that within Trieve's own interface I can see the chunks that were retrieved, that should not be a problem integrating right?
u
Yeah, if I can get the chunks that were sent I can easily use them when making calls for eval. Ideally, it would be better to have then through vapi endpoint instead of making another call to Trieve.
s
@udAIan Yeah for now we don't expose the KB chunk through call logs let me check with the team if we can expose this as well.
a
@Shubham Bajaj on this note can you share how the KB is invoked via the agent in VAPI? Is it the same as a tool call? If so how is the tool configured (i.e. what parameter is defined and how is the tool described?)
s
@arash With VAPI, the LLM decides when to invoke the Knowledge Base (KB). After that, the user query is fed to the KB provider, which performs a similarity search and returns relevant file content chunks. Finally, we apply our RAG template on top of those results using LLM and this how response is generated. It's not related to tools calling. Feel free to ask more about it.
a
@Shubham Bajaj That is helpful and I spoke with the guys from Trieve as well so I have a better understanding of how the RAG template is being used. A few follow ups: 1. Why then do I not see RAG template + retrieved chunks, in the
messagesOpenAIFormatted
. Are you guys then removing the modified user message with the RAG template when you show it to the user? If you guys can provide this RAW user/assistant query with the chunks injected OR the chunks themselves that would be very helpful for our evals (this is ultimately what @udAIan is looking for) 2. When you say the "LLM decides when to invoke the Knowledge Base (KB). What is the assistant or user message you are injecting to make this determination? Or are you calling the KB provider search for every user message and if there are relevant chunks, you apply the RAG template?
s
@arash 1. Regarding the RAG template and chunks visibility: Currently, we don't expose the KB query chunks in
messagesOpenAIFormatted
. I'll check with the team about the ETA for implementing this functionality. 2. Regarding KB invocation: The LLM first checks if relevant information exists in the system prompt. If it finds the necessary information there, it generates the response directly. If not, then the LLM decides to use the KB do we proceed with retrieving chunks and applying the RAG template.
a
Thanks @Shubham Bajaj ! Kindly keep this thread posted re: ETA, when you have it. Thanks again!
7 Views