Test Automation With Vapi Vapi AI #support

Test Automation With Vapi

Ben

01/13/2025, 1:39 PM

As we enter any production system, we'll need some form of automated testing so we can ensure we have no serious regressions with edits to the prompt or LLM etc. 1. What does VAPI team recommend for automated testing purposes? I appreciate we could use the "analysis" feature to evaluate entire transcripts after a call, however this is human-in-the-loop testing, and would prefer something that can be ran by CI on each change to the prompt. 2. Is there a way we can call just the LLM without incurring voice related (STT, TTS) charges, whilst still working through the VAPI framework? I think if we do not use the new "Blocks" functionality, then we could just use the vanilla API of the LLM to write automated tests of the agent's LLM behaviour to different inputs (e.g. asserting that when the user asks X, it responds Y). However if we use the new "Blocks" functionality, I am unclear how we would emulate this outside of VAPI in automated tests

Vapi

01/13/2025, 1:40 PM

message has been deleted

Vapi

01/13/2025, 1:40 PM

@Ben 1. For automated testing purposes, VAPI recommends implementing a testing framework that can simulate voice interactions and validate the responses from your voice application. You can create mock requests and expected responses to test the logic of your application. This can be integrated into your continuous integration (CI) pipeline to run tests automatically with each change to the prompt or underlying logic. You can use unit testing frameworks like Jest, Mocha, or others depending on your programming language to write and run these tests. 2. Yes, you can interact with the LLM without incurring voice-related charges by using the API to send text-based requests directly to the LLM, bypassing the STT (Speech-to-Text) and TTS (Text-to-Speech) components. This allows you to test the logic and responses of the LLM within the VAPI framework. If you're using the "Blocks" functionality, you would need to simulate the Blocks environment in your tests. This might involve creating a mock environment that replicates the behavior of Blocks, allowing you to test the LLM's responses to different inputs as if it were running within the actual VAPI framework. If you need more specific guidance on how to set up such a testing environment or how to interact with the LLM directly for testing purposes, please ask support for more info on this.

Vapi

01/13/2025, 1:40 PM

You can continue this conversation by mentioning me in the message.

Ben

01/13/2025, 1:48 PM

i see from the linked thread that we should avoid using Blocks for anything serious as its undergoing a complete rework, OK. Still I'd be interested in the Vapi team's thoughts on best practices for automated testing with VAPI

Ben

01/13/2025, 1:48 PM

@User

slickVik

01/13/2025, 9:21 PM

I'd be intrested in knowing this as well.

Shubham Bajaj

01/14/2025, 3:38 PM

@Ben You could have one bot call another bot and use a model like GPT4-mini to review the transcript. Currently, we're working on testing modules - specifically automated testing for voice AI agents and LLM text-based testing. For now, keeping a human in the loop seems to be the best approach, but I'd love to hear about any experiments you've done and potential solutions you've considered.

Ben

01/14/2025, 3:44 PM

Thanks for your response, it's great to hear you are considering more in this area, as it seems one area where VAPI present challenges if we proceed to production. I think if there was a way to dump the precise prompt being provided on VAPI's side to the LLM, then we'd be able to do any type of testing using existing third party tools. Keeping a human in the loop is always going to be needed, but I'm conscious that if we want to e.g. switch out the LLM model, we'd need to re-evaluate all possible paths of the conversation, which would be very expensive - if we can verify some paths are working as expected automatically I could see it being very useful. I'm keen to keep the tests fast and cheap so I can run them regularly without racking up a huge CI bill and I'm conscous if we need to go through STT and TTS each time that adds a lot (as well as needing to wait in real time for calls to proceed etc.) For vanilla LLM prompt testing of conversational agents, I'm considering this framework via directly messaging the LLM: https://awslabs.github.io/agent-evaluation/

Copy code

tests:
  get_open_claims_with_details:
    steps:
    - Ask the agent which claims are open.
    - Ask the agent for details on claim-006.
    expected_results:
    - The agent returns a list of open claims.
    - The agent returns the details on claim-006.

I think I could use this in the simple case of a VAPI assistant with a vanilla prompt, but would get more difficult if we used e.g. VAPI blocks feature as its unclear what would be the prompt to emulate that via direct LLM call

Shubham Bajaj

01/14/2025, 7:56 PM

@Ben Thanks for sharing the agent-eval tool. I am completely aligned with you, and in fact, we will have something for testing VAPI Blocks with as much transparency as possible.

Ben

01/14/2025, 8:32 PM

@Shubham Bajaj For the time being, if I wanted to replicate the same LLM being used by Vapi in isolation of the VAPI orchestration layer and STT/TTS if I do VAPI GET assistant API call (where the tools are specified via tools= rather than tools-id so we see the tool details) - it shows many model details and the tools details Would this be enough details to be able to reconstruct an OpenAI model using their API or is there some 'hidden' prompts/messages injected by VAPI orchestration layer when communicating with it? I imagine e.g. if emotion detection is enabled that might have some extra items being injected in, or if we use the default endCall tool I think get assistant doesn't show the details of what prompt/tool config is being used

Shubham Bajaj

01/14/2025, 8:50 PM

@Ben There are no hidden prompts or messages injected. We convert tools to functions internally, and you can get the complete tools configuration from the GET /tool/:id endpoint if a tool ID is used. The reserved tools are converted into utility functions internally, so everything is transparent - just pure logic and engineering.

Ben

01/14/2025, 8:53 PM

I see! Okies so sounds like I can probably replicate the LLM in isolation then and test it that for more unitary tests of the core logic

4 Views

Previous Next