[Need Help] Custom LLM. Multiple Assistant message...
# support
p
I'm using https://github.com/VapiAI/advanced-concepts-custom-llm, but with PydanticAI and Groq, but I don't think that matters, I can make simple chat work with these, what I can't do is this: - Assistant says its initial message, defined in UI - User explains his problem - I want assistant to say "One moment" as there is agentic workflow to craft the answer and it might take few seconds - I want agentic AI to respond with crafted answer about users problem -- This may include further questions for user, so loop continues. I can do one or the other ("one minute please" or "agentic resposne", but not both, I guess once I send one of the responses back to vapi, vapi expects user input. How do I send 2 or more assistant messages and have them spoken to user and then wait for user's answer
v
Hi there! 👋 I understand you're trying to have your assistant say "One moment" before delivering an agentic response, without requiring user input in between. I can help with this! Based on the information I found, there are several approaches you can try: ### 1. Using the
triggerResponseEnabled
Flag You can use the
triggerResponseEnabled
flag to control whether a message triggers an LLM response:
Copy code
javascript
vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "One moment, I'm thinking...",
  },
  triggerResponseEnabled: false // This prevents waiting for user input
});

// Then after your agentic processing is complete:
vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "Here's your answer...",
  }
});
### 2. Using the
.send()
Method During a call, you can send intermediate messages using the
.send()
method:
Copy code
javascript
vapi.send({
  type: "add-message",
  message: {
    role: "system",
    content: "The assistant should say 'One moment' and then continue without waiting for user input",
  },
});
### 3. Sending Silent System Messages Server-Side According to [information I found](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339634926248525935), you can "send silent system messages server side" using a "control url" to inject messages from the backend. ### 4. Using Webhooks as a Workaround Some users have reported success [using webhooks instead of the Vapi module](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334661059298066514) to send multiple messages. ### 5. Building Workflows with Multiple "Say" Tasks If you're using Vapi's workflow builder, you can add multiple "say" tasks to send multiple messages in sequence according to the [Vapi workflows documentation](https://docs.vapi.ai/workflows). ### 6. Adjusting Conversation Flow in Your Script
You can modify your script to include "additional prompts or logic that triggers the next part of the conversation automatically, without waiting for a response" as mentioned in [this discussion](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1335054553250533439). Would you like me to elaborate on any of these approaches? Sources: - [Vapi Web SDK Documentation](https://docs.vapi.ai/sdk/web) - [Discord Discussion on Turn-Taking Behavior](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1335054553250533439) - [Discord Discussion on Silent System Messages](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1339634926248525935) - [Discord Discussion on Webhooks](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334661059298066514) - [Vapi Workflows Documentation](https://docs.vapi.ai/workflows)
p
I want it done from server side with https://github.com/VapiAI/advanced-concepts-custom-llm
bump
s
@Pigeon let me check, what can be done.
k
1\. Use streaming responses in your custom-llm webhook: - Immediately stream "One moment" as the first part of your response - Keep the connection open (don't send the stop signal) - Continue processing your agentic workflow - Stream the full response when ready - Finally send the stop signal 2\. **Implementation would look like**: - Return a streaming response using OpenAI's format - First chunk contains "One moment" - Subsequent chunks contain your full response - This creates the appearance of a continuous assistant message 3\. **Benefits**: - No need to track conversation state - No complex auto-triggering mechanism - Works naturally with Vapi's existing architecture - Appears as a single assistant turn from Vapi's perspective This approach leverages the streaming nature of LLM responses, which Vapi is already designed to handle. Your custom-llm endpoint just needs to properly implement the OpenAI-compatible streaming format.
s
@Pigeon Let me know if you require further help, happy to contribue.
p
Si I've sent custom message like:
content_only = f"data:{json.dumps(first_chunk.model_dump())}\n\n"
Which doesn't have any end markers. Then the LLM workflow generated other message with
stop
and
[DONE]
Result: After initial vapi message long wait as before and "one minute" + LLM message at the same time. So no "One moment" straight away and and LLM message after few seconds. Everything is ttsed after few seconds. Probably when [DONE] and stop is sent If I send custom message like:
content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: [DONE]\n\n"
Which only sends
[DONE]
but not
stop
Nothing initially, after LLM runs, I only hear "One moment" And if I send custom message full
stop
and
[DONE]
I get "One moment" only after LLM workflow as well. I don't think VAPI start TTS without
[END]
and/or
stop
? Can you confirm, once VAPI starts getting SSE stream, it sends received chunks straight to tts or does it only send after receiving
[DONE]
and/or
stop
Maybe im sending "one moment" in a wrong way?
k
Hey, apologies for the delay. Could you share the call ID for the mentioned issues so I could take a look at what was the chnk stream sent out to us and what was the action associated with it?
chnk\* -> chunk
p
Hey, yep: 1. Sending
"One moment please."
chunk before running LLM workflow that sends it's own chunk after, no interference between I think": CallID`71b7e0d2-6fb8-4380-b20c-b43b7e9637d8` In my terminal I see:
Copy code
-----------Message chunk:
data: {"id": "x", "object": "chat.completion.chunk", "created": x, "model": "llama-3.3-70b-versatile", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "One moment please."}, "finish_reason": null}]}
From code:
Copy code
async for chunk in self._send_message(
            "One moment please.",
            completion_id,
            vapi_payload,
            send_stop=False,
            send_end_only=False,
        ):
            print("-----------Message chunk:")
            print(chunk)
            yield chunk
So it created:
Copy code
first_chunk = ChatCompletionChunk(
        id=completion_id,
        model=model_name,
        choices=[Choice(delta=DeltaContent(role="assistant", content=message))],
    )
And is sent/streamed like this:
Copy code
content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\n"
        yield content_only
So no
[DONE]
No final chunk with
stop
Al of this is called before running LLM workflow that takes 3-5 seconds. Then LLM workflow generates response and it is sent like this:
Copy code
final_chunk = ChatCompletionChunk(
            id=completion_id,
            model=model_name,
            choices=[Choice(delta=DeltaContent(), finish_reason="stop")],
        )

        combined = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: {json.dumps(final_chunk.model_dump())}\n\ndata: [DONE]\n\n"
        yield combined
So LLM response is sent with final chunk that has
[DONE]
and
finish_reason="stop"
Result: I hear
"One moment please."
AFTER LLM response is sent together with LLM response. I'm could be doing something wrong, It would be amazing if this worked, I'd have everything I need to build AI agents for huge logistics company with VAPI
It kinda seems that VAPI doesnt do TTS without getting [DONE] or something, but I might be wong
Would be nice to have some
[INTERUPT]
marker or something in that case
k
When sending responses from your Custom LLM, insert the flush tag at points where you want text to be immediately processed by TTS:
Copy code
Hello, I'm your virtual assistant.<flush />
Let me check that information for you.<flush />
I've found what you're looking for.
This will cause "Hello, I'm your virtual assistant." to be immediately sent to TTS, followed by "Let me check that information for you." as soon as it's generated, without waiting for the rest of the response. When VAPI detects this tag, it immediately sends all text before the tag to TTS. Give it a try and let me know how it goes for you.
p
Yes, its a lot better, not as soon as I'd like, but I guess that's mostly network/TTS latency, Thanks a lot, @Shubham Bajaj !!!
k
Marking this ticket as Solved ✅