Multiple input messages to custom LLM Vapi AI #support

Multiple input messages to custom LLM

Marco Falcone

01/21/2025, 10:04 AM

Hello 👋 I have a weird behaviour with messages sent to my custom llm. Sometimes I receive multiple messages for the same input, each one with a partial transcription until I receive the complete one. Lookin at the conversation logs I can see just one message so I can't understand why is this happening. For example, in conversation https://dashboard.vapi.ai/calls/2659f63c-9541-4f8e-9c48-9c735a3587f5 the last user message was sent as described (in multiple incremental requests). Is there any option I have to disable or there is some metadata I can look at to understand if the message is complete or not? Thanks for your support

Vapi

01/21/2025, 10:06 AM

You can continue this conversation by mentioning me in the message.

Vapi

01/21/2025, 10:06 AM

@Marco Falcone Hello! 😊 Ah, I see what you're describing. It sounds like you're experiencing an issue where you receive fragmented transcriptions before getting the full message. This can happen due to the way real-time transcription works, where the system tries to provide immediate feedback as the user is speaking. To address this, you might want to check if there are any settings related to real-time or streaming transcription in your custom LLM setup. Sometimes, there are options to adjust the behavior of the transcription service, such as setting a longer delay before the transcription is sent, to allow for more complete messages. Regarding metadata, many transcription services include a flag or a field in the metadata that indicates whether the transcription is final or interim. Interim transcriptions are those partial updates you mentioned, while a final transcription means the system believes the user has finished speaking. You should review the documentation of your LLM or the API responses to identify any such fields. If you're still unsure or can't find the relevant settings or metadata, please reach out to the support team for more detailed assistance on this matter. They should be able to guide you through the specific configuration of your custom LLM to resolve this issue. Would you like me to help you with anything else regarding this issue?

Marco Falcone

01/21/2025, 10:50 AM

@Vapi I use deepgram as transcriber but I can't see any option in the dashboard, how can I tell it to send only complete transcription?

Vapi

01/21/2025, 10:50 AM

@Marco Falcone Using Deepgram as your transcriber, you can control the behavior of the transcription results through the API parameters when setting up your transcription request. To ensure that you only receive the complete transcription, you should look into the

interim_results

parameter. When you set

interim_results

false

, Deepgram will only send the final transcription result, rather than sending incremental updates as the audio is being processed. This should prevent the issue of receiving multiple partial transcriptions. Here's an example of how you might set this parameter in your API request:

Copy code

json
{
  "interim_results": false
}

Make sure to include this parameter in your transcription request configuration. If you're not directly handling the API requests and are using a dashboard or an integration, you may need to look for a setting that corresponds to this parameter or reach out to support for assistance in configuring your transcriptions to send only complete results. If you need further assistance with setting this up or have any other questions, please let me know!

Shubham Bajaj

01/21/2025, 11:09 PM

@Marco Falcone If you look at the screenshots, due to endpointing or user input, your model receives requests from the VAPI side twice for similar user input, which gives the appearance of multiple inputs. To overcome this, set Deepgram's endpointing to 300ms and startSpeakingPlan waitSeconds to 0.8 seconds. Do let me know your thoughts on this. https://cdn.discordapp.com/attachments/1331202688741933087/1331400256595824670/Screenshot_2025-01-22_at_04.36.32.png?ex=67917a9d&is=6790291d&hm=b9a1d353f680076600ecc7563141356323a2ee503f294e60b4e4de4ba5d44546& https://cdn.discordapp.com/attachments/1331202688741933087/1331400257208062003/Screenshot_2025-01-22_at_04.37.01.png?ex=67917a9d&is=6790291d&hm=9947b87f4736c2048edd1afabb600c4a6b783663f9180fe0c6d508a3846998ba&

Marco Falcone

01/22/2025, 8:23 AM

Thank you @Shubham Bajaj I'll try with the suggeste configuration. Is there a way to completely disable partial transcription or some value in the input that acknowledge me the the input is partial?

Shubham Bajaj

01/22/2025, 6:28 PM

These are not partial transcriptions. Instead, it sends the conversation or user input to generate an LLM response when the user has stopped speaking, according to the endpoint rule.

Marco Falcone

01/22/2025, 7:55 PM

Ok but the inputs are incremental. For example if I say the sentence "i want a piece of cake" I can receive the following inputs: - i want - i want a piece - i want a piece of cake That's why I refer to them as "partial transcriptions" cause the least messages include the content of the previous ones

Shubham Bajaj

01/22/2025, 10:34 PM

@Marco Falcone could you share the call ID so I could take a look? Also, did you get a chance to test the custom LLM changes?

Marco Falcone

01/23/2025, 9:13 AM

Hi @Shubham Bajaj , the call is the one shared before https://dashboard.vapi.ai/calls/2659f63c-9541-4f8e-9c48-9c735a3587f5 I tested the changes in the configuration and I noticed an improvement, I just want to completely avoid the behaviour for sure

Shubham Bajaj

01/27/2025, 12:07 PM

@Marco Falcone As visible in the provided screenshot, the user input was duplicated in the last two messages due to the endpoint model, which required additional time to recognize the user final-input. I would appreciate it if you could inform me how often this occurs on your end and, if possible, share a few more call IDs for further analysis. https://cdn.discordapp.com/attachments/1331202688741933087/1333407897270620241/Screenshot_2025-01-27_at_5.09.11_PM.png?ex=6798c860&is=679776e0&hm=173c0cb62edb62d22b14f4095fcac9e7fd0a5b332abd85f070d6607db9383020& https://cdn.discordapp.com/attachments/1331202688741933087/1333407897668943952/Screenshot_2025-01-27_at_5.09.18_PM.png?ex=6798c860&is=679776e0&hm=76f5bad260d80f6af9c21c9d0af8db3c2e999a3188b9c295c4c886b2aa5f457a&

Marco Falcone

01/27/2025, 1:05 PM

I have no more examples since I changed the parameters as suggested. I just like to know If there is a way to recognize the partial transcriptions. For endpoint model do you mean the transcriber?

Shubham Bajaj

01/27/2025, 1:07 PM

Yes endpointing related to transcriber, and you can use server event of type transcript but not recommended by me personally.

Marco Falcone

01/27/2025, 1:12 PM

Understood, than I think there is nothing more to handle. I'd really like to have in the input message to llm some information about the transcription (Ex. partial/complete) in some field (maybe metadata) but probably is something bothering just me 😅 Thanks again for your kind and precious support 🙏

Marco Falcone

01/28/2025, 4:53 PM

Hi @Shubham Bajaj unfortunately the problem happened again with the configuration on my assistant ad661bd8-d418-4408-ba01-c2d4825740d4 Some conversation example: - 6f3feff6-7479-4424-b2c0-61e46bcb5cab - 38483201-42fc-44ef-bf6e-370633c00a41 - d31b03ff-5716-489b-80f0-e8abedf99042

Shubham Bajaj

01/30/2025, 10:22 AM

@Marco Falcone let's connect over the call, and get this fixed for you, works?

Marco Falcone

01/30/2025, 10:25 AM

Today I'm stucked in other calls 😦 Can you start telling me if we still receive in all cases multiple transcription from deepgram? We can solve this issue if we can tell from the input to our custom llm if the transcription is partial or complete in some way

Shubham Bajaj

01/30/2025, 10:40 AM

Call ID: d31b03ff-5716-489b-80f0-e8abedf99042

logs

🔵 16:30:27:818 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, {"role": "user", "content": "Ecco il mio codice"} ] 🔵 16:30:28:042

user

Partial Transcript : S: 0.68603516 🔵 16:30:28:043 [user LOG] Endpointing timeout 1500ms (rule:

heuristic

) 🔵 16:30:28:043 Completion Request Aborted (#1, provider: custom-llm, model: ndg, region: undefined, credential: true) 🔵 16:30:28:394

user

Final Transcript : S: 0.5131836 🔵 16:30:28:744 Endpointing Postponed Timeout Based On VAD Speech Detected. Remaining 1500ms... (Won't Log For Next 1s) 🔵 16:30:32:484 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S ns" } ]

Shubham Bajaj

01/30/2025, 10:40 AM

🔵 16:30:33:394

user

Partial Transcript : otto nove: 0.8364258 🔵 16:30:40:005 [non-user CHECKPOINT] Endpoint timeout triggered, leaving Endpointing Buffer 🔵 16:30:40:009 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S otto nove R 0 cinque B uno otto 0" } ] 🔵 16:30:41:610 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S otto nove R 0 cinque B uno otto 0 L" } ]

Shubham Bajaj

01/30/2025, 10:40 AM

🔵 16:30:47:174 [user CHECKPOINT] Model sent first output token 🔵 16:31:04:700 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S otto nove R 0 cinque B uno otto 0 L" }, { "role": "assistant", "content": "Grazie per aver fornito il tuo codice." }, {"role": "user", "content": "Hai bisogno di altro?"} ]

Shubham Bajaj

01/30/2025, 10:40 AM

@Marco Falcone As you can see from the logs we don't see partiall or final instead based on endpointing parameters defined by you and the model we decide if user has stopped speaking and it's time to send request to the LLM.

Shubham Bajaj

01/30/2025, 10:41 AM

Now what you have to do here is try playing with the startSpeakingPlan parameters to configgure the models according to your requirements be it specific to certains turns of the conversation. At Max I can help you with possibilities with startSpeakingPlan.

Shubham Bajaj

01/30/2025, 10:42 AM

@Marco Falcone Do let me know your thoughts on this.

Marco Falcone

01/30/2025, 10:44 AM

Is not what we see on our side, we receive multiple incremental messages with partial transcription and this is a multiple input for us

Marco Falcone

01/30/2025, 10:44 AM

Can be some interrupt mechanism?

Shubham Bajaj

01/30/2025, 11:02 AM

What you get on your side are these messages, in openAIMessages format. CustomLLMRequest Messages

Shubham Bajaj

01/30/2025, 11:03 AM

@Marco Falcone can you share the logs, so I can understand better what's going wrong for you.

Shubham Bajaj

02/03/2025, 3:52 AM

@Marco Falcone a gentle reminder to continue this thread.

Marco Falcone

02/03/2025, 8:14 AM

Thanks @Shubham Bajaj 🙏 I'm really busy these days but I'll try to send you some logs asap

Shubham Bajaj

02/03/2025, 8:41 AM

@Marco Falcone If that's the case you have to open a new #1211483291191083018 ticket.

Marco Falcone

02/03/2025, 8:44 AM

Ok, I'll open a new ticket as soon I'll be able to produce the required logs

faisalrehmanktk

03/25/2025, 7:28 AM

I am facing similar issue call id : 9956079c-db54-4bbc-95e5-8a2a4f9ca10c and when i set interim_results it gives this error from vapi {"message":["transcriber.property interim_results should not exist"],"error":"Bad Request","statusCode":400}% Avocado charm. Avocado charm. And club sandwich. Avocado charm. And club sandwich. and so onnnnnnnnn.... i have set smart endpointing to 500 but didnt work

faisalrehmanktk

03/25/2025, 7:30 AM

i have maximized all parameters available nothing works https://cdn.discordapp.com/attachments/1331202688741933087/1353994349251133550/image.png?ex=67e3ad00&is=67e25b80&hm=c75d7b534dd1b6c3ae6da69481e63a67a56fee2b92a0ae1de8f6892e6680ac0a&

faisalrehmanktk

03/25/2025, 7:40 AM

I see this bit is set to stream true how i can make it false if its causing issue in request 'customer': {'number': '+XXXXXXXXX'}, 'credentials': [], 'toolDefinitionsExcluded': [], 'metadata': {}, 'stream': True}

faisalrehmanktk

03/25/2025, 8:15 PM

@Vapi

Kyle

03/27/2025, 5:27 AM

Hey @faisalrehmanktk, please create a new support ticket for your issue. This ticket is solved/closed, so it's not covered by the SLA and won't receive any responses.

2 Views

Previous Next