Multiple input messages to custom LLM
# support
m
Hello 👋 I have a weird behaviour with messages sent to my custom llm. Sometimes I receive multiple messages for the same input, each one with a partial transcription until I receive the complete one. Lookin at the conversation logs I can see just one message so I can't understand why is this happening. For example, in conversation https://dashboard.vapi.ai/calls/2659f63c-9541-4f8e-9c48-9c735a3587f5 the last user message was sent as described (in multiple incremental requests). Is there any option I have to disable or there is some metadata I can look at to understand if the message is complete or not? Thanks for your support
v
You can continue this conversation by mentioning me in the message.
@Marco Falcone Hello! 😊 Ah, I see what you're describing. It sounds like you're experiencing an issue where you receive fragmented transcriptions before getting the full message. This can happen due to the way real-time transcription works, where the system tries to provide immediate feedback as the user is speaking. To address this, you might want to check if there are any settings related to real-time or streaming transcription in your custom LLM setup. Sometimes, there are options to adjust the behavior of the transcription service, such as setting a longer delay before the transcription is sent, to allow for more complete messages. Regarding metadata, many transcription services include a flag or a field in the metadata that indicates whether the transcription is final or interim. Interim transcriptions are those partial updates you mentioned, while a final transcription means the system believes the user has finished speaking. You should review the documentation of your LLM or the API responses to identify any such fields. If you're still unsure or can't find the relevant settings or metadata, please reach out to the support team for more detailed assistance on this matter. They should be able to guide you through the specific configuration of your custom LLM to resolve this issue. Would you like me to help you with anything else regarding this issue?
m
@Vapi I use deepgram as transcriber but I can't see any option in the dashboard, how can I tell it to send only complete transcription?
v
@Marco Falcone Using Deepgram as your transcriber, you can control the behavior of the transcription results through the API parameters when setting up your transcription request. To ensure that you only receive the complete transcription, you should look into the
interim_results
parameter. When you set
interim_results
to
false
, Deepgram will only send the final transcription result, rather than sending incremental updates as the audio is being processed. This should prevent the issue of receiving multiple partial transcriptions. Here's an example of how you might set this parameter in your API request:
Copy code
json
{
  "interim_results": false
}
Make sure to include this parameter in your transcription request configuration. If you're not directly handling the API requests and are using a dashboard or an integration, you may need to look for a setting that corresponds to this parameter or reach out to support for assistance in configuring your transcriptions to send only complete results. If you need further assistance with setting this up or have any other questions, please let me know!
s
@Marco Falcone If you look at the screenshots, due to endpointing or user input, your model receives requests from the VAPI side twice for similar user input, which gives the appearance of multiple inputs. To overcome this, set Deepgram's endpointing to 300ms and startSpeakingPlan waitSeconds to 0.8 seconds. Do let me know your thoughts on this. https://cdn.discordapp.com/attachments/1331202688741933087/1331400256595824670/Screenshot_2025-01-22_at_04.36.32.png?ex=67917a9d&is=6790291d&hm=b9a1d353f680076600ecc7563141356323a2ee503f294e60b4e4de4ba5d44546& https://cdn.discordapp.com/attachments/1331202688741933087/1331400257208062003/Screenshot_2025-01-22_at_04.37.01.png?ex=67917a9d&is=6790291d&hm=9947b87f4736c2048edd1afabb600c4a6b783663f9180fe0c6d508a3846998ba&
m
Thank you @Shubham Bajaj I'll try with the suggeste configuration. Is there a way to completely disable partial transcription or some value in the input that acknowledge me the the input is partial?
s
These are not partial transcriptions. Instead, it sends the conversation or user input to generate an LLM response when the user has stopped speaking, according to the endpoint rule.
m
Ok but the inputs are incremental. For example if I say the sentence "i want a piece of cake" I can receive the following inputs: - i want - i want a piece - i want a piece of cake That's why I refer to them as "partial transcriptions" cause the least messages include the content of the previous ones
s
@Marco Falcone could you share the call ID so I could take a look? Also, did you get a chance to test the custom LLM changes?
m
Hi @Shubham Bajaj , the call is the one shared before https://dashboard.vapi.ai/calls/2659f63c-9541-4f8e-9c48-9c735a3587f5 I tested the changes in the configuration and I noticed an improvement, I just want to completely avoid the behaviour for sure
s
@Marco Falcone As visible in the provided screenshot, the user input was duplicated in the last two messages due to the endpoint model, which required additional time to recognize the user final-input. I would appreciate it if you could inform me how often this occurs on your end and, if possible, share a few more call IDs for further analysis. https://cdn.discordapp.com/attachments/1331202688741933087/1333407897270620241/Screenshot_2025-01-27_at_5.09.11_PM.png?ex=6798c860&is=679776e0&hm=173c0cb62edb62d22b14f4095fcac9e7fd0a5b332abd85f070d6607db9383020& https://cdn.discordapp.com/attachments/1331202688741933087/1333407897668943952/Screenshot_2025-01-27_at_5.09.18_PM.png?ex=6798c860&is=679776e0&hm=76f5bad260d80f6af9c21c9d0af8db3c2e999a3188b9c295c4c886b2aa5f457a&
m
I have no more examples since I changed the parameters as suggested. I just like to know If there is a way to recognize the partial transcriptions. For endpoint model do you mean the transcriber?
s
Yes endpointing related to transcriber, and you can use server event of type transcript but not recommended by me personally.
m
Understood, than I think there is nothing more to handle. I'd really like to have in the input message to llm some information about the transcription (Ex. partial/complete) in some field (maybe metadata) but probably is something bothering just me 😅 Thanks again for your kind and precious support 🙏
Hi @Shubham Bajaj unfortunately the problem happened again with the configuration on my assistant ad661bd8-d418-4408-ba01-c2d4825740d4 Some conversation example: - 6f3feff6-7479-4424-b2c0-61e46bcb5cab - 38483201-42fc-44ef-bf6e-370633c00a41 - d31b03ff-5716-489b-80f0-e8abedf99042
s
@Marco Falcone let's connect over the call, and get this fixed for you, works?
m
Today I'm stucked in other calls 😦 Can you start telling me if we still receive in all cases multiple transcription from deepgram? We can solve this issue if we can tell from the input to our custom llm if the transcription is partial or complete in some way
s
Call ID: d31b03ff-5716-489b-80f0-e8abedf99042
logs
🔵 16:30:27:818 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, {"role": "user", "content": "Ecco il mio codice"} ] 🔵 16:30:28:042
user
Partial Transcript : S: 0.68603516 🔵 16:30:28:043 [user LOG] Endpointing timeout 1500ms (rule:
heuristic
) 🔵 16:30:28:043 Completion Request Aborted (#1, provider: custom-llm, model: ndg, region: undefined, credential: true) 🔵 16:30:28:394
user
Final Transcript : S: 0.5131836 🔵 16:30:28:744 Endpointing Postponed Timeout Based On VAD Speech Detected. Remaining 1500ms... (Won't Log For Next 1s) 🔵 16:30:32:484 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S ns" } ]
🔵 16:30:33:394
user
Partial Transcript : otto nove: 0.8364258 🔵 16:30:40:005 [non-user CHECKPOINT] Endpoint timeout triggered, leaving Endpointing Buffer 🔵 16:30:40:009 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S otto nove R 0 cinque B uno otto 0" } ] 🔵 16:30:41:610 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S otto nove R 0 cinque B uno otto 0 L" } ]
🔵 16:30:47:174 [user CHECKPOINT] Model sent first output token 🔵 16:31:04:700 CustomLLMRequest Messages: [ { "role": "assistant", "content": "Buongiorno, sono l'assistente virtuale di GPI Tieni il codice fiscale a portata di mano, mi occuperò di raccogliere I tuoi dati scrivervi scandire bene il tuo nome e cognome e il luogo e la data di nascita" }, { "role": "user", "content": "Cavolo Sansone brindisi cinque ottobre ottantanove" }, { "role": "assistant", "content": "Perfetto, ora leggi il codice completo lentamente ma senza pause. Per esempio RMN. Inizia con: Ecco il mio codice." }, { "role": "user", "content": "ecco il mio codice S otto nove R 0 cinque B uno otto 0 L" }, { "role": "assistant", "content": "Grazie per aver fornito il tuo codice." }, {"role": "user", "content": "Hai bisogno di altro?"} ]
@Marco Falcone As you can see from the logs we don't see partiall or final instead based on endpointing parameters defined by you and the model we decide if user has stopped speaking and it's time to send request to the LLM.
Now what you have to do here is try playing with the startSpeakingPlan parameters to configgure the models according to your requirements be it specific to certains turns of the conversation. At Max I can help you with possibilities with startSpeakingPlan.
@Marco Falcone Do let me know your thoughts on this.
m
Is not what we see on our side, we receive multiple incremental messages with partial transcription and this is a multiple input for us
Can be some interrupt mechanism?
s
What you get on your side are these messages, in openAIMessages format. CustomLLMRequest Messages
@Marco Falcone can you share the logs, so I can understand better what's going wrong for you.
@Marco Falcone a gentle reminder to continue this thread.
m
Thanks @Shubham Bajaj 🙏 I'm really busy these days but I'll try to send you some logs asap
s
@Marco Falcone If that's the case you have to open a new #1211483291191083018 ticket.
m
Ok, I'll open a new ticket as soon I'll be able to produce the required logs
f
I am facing similar issue call id : 9956079c-db54-4bbc-95e5-8a2a4f9ca10c and when i set interim_results it gives this error from vapi {"message":["transcriber.property interim_results should not exist"],"error":"Bad Request","statusCode":400}% Avocado charm. Avocado charm. And club sandwich. Avocado charm. And club sandwich. and so onnnnnnnnn.... i have set smart endpointing to 500 but didnt work
I see this bit is set to stream true how i can make it false if its causing issue in request 'customer': {'number': '+XXXXXXXXX'}, 'credentials': [], 'toolDefinitionsExcluded': [], 'metadata': {}, 'stream': True}
@Vapi
k
Hey @faisalrehmanktk, please create a new support ticket for your issue. This ticket is solved/closed, so it's not covered by the SLA and won't receive any responses.
2 Views