[Need Help] Custom LLM. Multiple Assistant messages in a row. Vapi AI #support

[Need Help] Custom LLM. Multiple Assistant message...

Pigeon

03/25/2025, 7:30 AM

I'm using https://github.com/VapiAI/advanced-concepts-custom-llm, but with PydanticAI and Groq, but I don't think that matters, I can make simple chat work with these, what I can't do is this: - Assistant says its initial message, defined in UI - User explains his problem - I want assistant to say "One moment" as there is agentic workflow to craft the answer and it might take few seconds - I want agentic AI to respond with crafted answer about users problem -- This may include further questions for user, so loop continues. I can do one or the other ("one minute please" or "agentic resposne", but not both, I guess once I send one of the responses back to vapi, vapi expects user input. How do I send 2 or more assistant messages and have them spoken to user and then wait for user's answer

Vapi Support Bot

03/25/2025, 7:31 AM

Hi there! 👋 I understand you're trying to have your assistant say "One moment" before delivering an agentic response, without requiring user input in between. I can help with this! Based on the information I found, there are several approaches you can try: ### 1. Using the

triggerResponseEnabled

Flag You can use the

triggerResponseEnabled

flag to control whether a message triggers an LLM response:

Copy code

javascript
vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "One moment, I'm thinking...",
  },
  triggerResponseEnabled: false // This prevents waiting for user input
});

// Then after your agentic processing is complete:
vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "Here's your answer...",
  }
});

### 2. Using the

.send()

Method During a call, you can send intermediate messages using the

.send()

method:

Copy code

javascript
vapi.send({
  type: "add-message",
  message: {
    role: "system",
    content: "The assistant should say 'One moment' and then continue without waiting for user input",
  },
});

### 3. Sending Silent System Messages Server-Side According to [information I found](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339634926248525935), you can "send silent system messages server side" using a "control url" to inject messages from the backend. ### 4. Using Webhooks as a Workaround Some users have reported success [using webhooks instead of the Vapi module](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334661059298066514) to send multiple messages. ### 5. Building Workflows with Multiple "Say" Tasks If you're using Vapi's workflow builder, you can add multiple "say" tasks to send multiple messages in sequence according to the [Vapi workflows documentation](https://docs.vapi.ai/workflows). ### 6. Adjusting Conversation Flow in Your Script

Vapi Support Bot

03/25/2025, 7:31 AM

You can modify your script to include "additional prompts or logic that triggers the next part of the conversation automatically, without waiting for a response" as mentioned in [this discussion](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1335054553250533439). Would you like me to elaborate on any of these approaches? Sources: - [Vapi Web SDK Documentation](https://docs.vapi.ai/sdk/web) - [Discord Discussion on Turn-Taking Behavior](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1335054553250533439) - [Discord Discussion on Silent System Messages](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339634926248525935) - [Discord Discussion on Webhooks](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334661059298066514) - [Vapi Workflows Documentation](https://docs.vapi.ai/workflows)

Pigeon

03/25/2025, 9:56 AM

I want it done from server side with https://github.com/VapiAI/advanced-concepts-custom-llm

Pigeon

03/26/2025, 6:37 AM

bump

Shubham Bajaj

03/26/2025, 6:55 AM

@Pigeon let me check, what can be done.

Kings Big

03/26/2025, 10:53 PM

1\. Use streaming responses in your custom-llm webhook: - Immediately stream "One moment" as the first part of your response - Keep the connection open (don't send the stop signal) - Continue processing your agentic workflow - Stream the full response when ready - Finally send the stop signal 2\. **Implementation would look like**: - Return a streaming response using OpenAI's format - First chunk contains "One moment" - Subsequent chunks contain your full response - This creates the appearance of a continuous assistant message 3\. **Benefits**: - No need to track conversation state - No complex auto-triggering mechanism - Works naturally with Vapi's existing architecture - Appears as a single assistant turn from Vapi's perspective This approach leverages the streaming nature of LLM responses, which Vapi is already designed to handle. Your custom-llm endpoint just needs to properly implement the OpenAI-compatible streaming format.

Shubham Bajaj

03/26/2025, 10:53 PM

@Pigeon Let me know if you require further help, happy to contribue.

Pigeon

03/27/2025, 7:35 AM

Si I've sent custom message like:

content_only = f"data:{json.dumps(first_chunk.model_dump())}\n\n"

Which doesn't have any end markers. Then the LLM workflow generated other message with

stop

and

[DONE]

Result: After initial vapi message long wait as before and "one minute" + LLM message at the same time. So no "One moment" straight away and and LLM message after few seconds. Everything is ttsed after few seconds. Probably when [DONE] and stop is sent If I send custom message like:

content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: [DONE]\n\n"

Which only sends

[DONE]

but not

stop

Nothing initially, after LLM runs, I only hear "One moment" And if I send custom message full

stop

and

[DONE]

I get "One moment" only after LLM workflow as well. I don't think VAPI start TTS without

[END]

and/or

stop

? Can you confirm, once VAPI starts getting SSE stream, it sends received chunks straight to tts or does it only send after receiving

[DONE]

and/or

stop

Maybe im sending "one moment" in a wrong way?

Kings Big

03/30/2025, 11:30 AM

Hey, apologies for the delay. Could you share the call ID for the mentioned issues so I could take a look at what was the chnk stream sent out to us and what was the action associated with it?

Kings Big

03/30/2025, 11:30 AM

chnk\* -> chunk

Pigeon

03/31/2025, 6:23 AM

Hey, yep: 1. Sending

"One moment please."

chunk before running LLM workflow that sends it's own chunk after, no interference between I think": CallID`71b7e0d2-6fb8-4380-b20c-b43b7e9637d8` In my terminal I see:

Copy code

-----------Message chunk:
data: {"id": "x", "object": "chat.completion.chunk", "created": x, "model": "llama-3.3-70b-versatile", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "One moment please."}, "finish_reason": null}]}

From code:

Copy code

async for chunk in self._send_message(
            "One moment please.",
            completion_id,
            vapi_payload,
            send_stop=False,
            send_end_only=False,
        ):
            print("-----------Message chunk:")
            print(chunk)
            yield chunk

So it created:

Copy code

first_chunk = ChatCompletionChunk(
        id=completion_id,
        model=model_name,
        choices=[Choice(delta=DeltaContent(role="assistant", content=message))],
    )

And is sent/streamed like this:

Copy code

content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\n"
        yield content_only

So no

[DONE]

No final chunk with

stop

Al of this is called before running LLM workflow that takes 3-5 seconds. Then LLM workflow generates response and it is sent like this:

Copy code

final_chunk = ChatCompletionChunk(
            id=completion_id,
            model=model_name,
            choices=[Choice(delta=DeltaContent(), finish_reason="stop")],
        )

        combined = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: {json.dumps(final_chunk.model_dump())}\n\ndata: [DONE]\n\n"
        yield combined

So LLM response is sent with final chunk that has

[DONE]

and

finish_reason="stop"

Result: I hear

"One moment please."

AFTER LLM response is sent together with LLM response. I'm could be doing something wrong, It would be amazing if this worked, I'd have everything I need to build AI agents for huge logistics company with VAPI

Pigeon

03/31/2025, 7:18 AM

It kinda seems that VAPI doesnt do TTS without getting [DONE] or something, but I might be wong

Pigeon

03/31/2025, 7:19 AM

Would be nice to have some

[INTERUPT]

marker or something in that case

Kings Big

03/31/2025, 9:22 PM

When sending responses from your Custom LLM, insert the flush tag at points where you want text to be immediately processed by TTS:

Copy code

Hello, I'm your virtual assistant.<flush />
Let me check that information for you.<flush />
I've found what you're looking for.

This will cause "Hello, I'm your virtual assistant." to be immediately sent to TTS, followed by "Let me check that information for you." as soon as it's generated, without waiting for the rest of the response. When VAPI detects this tag, it immediately sends all text before the tag to TTS. Give it a try and let me know how it goes for you.

Pigeon

04/01/2025, 5:56 AM

Yes, its a lot better, not as soon as I'd like, but I guess that's mostly network/TTS latency, Thanks a lot, @Shubham Bajaj !!!

Kings Big

04/01/2025, 8:37 PM

Marking this ticket as Solved ✅

Previous Next