Duplicated messages in history with custom LLM
# support
n
I'm experiencing an issue with duplicate messages in my chat history when using Vapi.ai and a custom LLM server. Current Setup Using Vapi.ai for voice processing Connected to my custom LLM server backend When a user speaks, I'm seeing the same utterance appear multiple times in the chat history on our side. For example, when a user says "And, also, I want to talk about something, uh, that I think will work with AI," this message appears as multiple separate entries in the conversation history, rather than as a single message. It appears that the speech is being segmented during transcription, and each segment is being added to the chat history as a separate message with response I appreciate any guidance you can provide on how to improve the user experience by avoiding these duplications. https://cdn.discordapp.com/attachments/1349510034693816330/1349510034966450217/Screenshot_2025-03-13_at_12.30.15_AM.png?ex=67d35ca9&is=67d20b29&hm=b69cf79ea3db140c1de087af9e320342e7db57d2771d7c565f36898e3cca9f5c&
v
Hi there! I understand you're experiencing duplicate messages in your chat history when using Vapi.ai with your custom LLM server. This is a common issue when speech is being segmented during transcription. Here are some solutions to help you fix this: 1. **Configure Input Punctuation Boundaries**: Set up
inputPunctuationBoundaries
in your assistant's voice settings to specify punctuation marks that indicate the end of a complete message:
Copy code
json
   {
       "assistant": {
           "voice": {
               "inputPunctuationBoundaries": [".", "!", "?"]
           }
       }
   }
This helps Vapi recognize complete sentences based on punctuation, reducing message segmentation [source](https://discord.com/channels/1211482211119796234/1211483291191083018/1336360452208332831). 2. **Adjust Wait Seconds Parameter**: Increase the
waitSeconds
value to allow longer pauses after voice input:
Copy code
json
   {
       "startSpeakingPlan": {
           "waitSeconds": 0.8
       }
   }
This gives the AI more time to process complete thoughts [source](https://discord.com/channels/1211482211119796234/1211483291191083018/1336360452208332831). 3. **Review Your Server Implementation**: Implement logic in your custom LLM server to send only the latest response to Vapi:
Copy code
python
   # Example of message merging logic
   const mergedMessages = messages.reduce((acc, curr) => {
       if (acc.length === 0) {
           return [curr];
       }
       const lastMessage = acc[acc.length - 1];
       if (lastMessage.role === curr.role) {
           // Combine content of messages with same role
           lastMessage.content = `${lastMessage.content} ${curr.content}`;
           return acc;
       }
       return [...acc, curr];
   }, []);
4. **Check Event Handling**:
Examine your server's message event handling to verify there are no loops or repetitive triggering that could cause duplicate messages [source](https://discord.com/channels/1211482211119796234/1211483291191083018/1333406713931632731). 5. **Review clientMessages Field**: Ensure that the
clientMessages
field in your assistant's configuration contains only those messages intended to be sent back, without duplicates [source](https://discord.com/channels/1211482211119796234/1211483291191083018/1333406713931632731). Let me know if you need more specific guidance on any of these solutions! 👍 Sources: - [Discord Channel Vapi.ai](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336360452208332831) - [Discord Channel Vapi.ai](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1333406713931632731)
p
Hello Nazar, The transcriber chunks your responses and will update the final utterance with a final response flag. This is the result you should be using to display what the user said.
2 Views