<flush /> tag only works once
# support
d
Our application with Vapi uses a custom LLM. We have an agentic framework setup on this custom LLM where occasionally two separate AI messages may make up one response in the conversation. Here is an example: User: I'm doing well AI Message: I'm glad to hear you are doing well. AI Message: What do you have planned for the day? We use the tag to force TTS from the AI Messages (otherwise, the entire AI Message has to stream before TTS begins). This works well, but it only works for one of the AI Messages. For example: User: I'm doing well AI Message: I'm glad to hear you are doing well. (this will begin speaking as soon at it receives the correct amount of characters) AI Message: What do you have planned for the day? (this will not begin speaking until the entire AI Message has streamed. So, it will not speak until day? has been received.) The ideal behavior for us is for the first AI message to be sent to TTS as soon as possible and for the second AI Message to be sent to TTS as soon as possible.  Some short call examples: f33e1d61-2b14-458c-ac8f-397563b29e9b 367ad1ef-ec1d-4e49-971d-84ea3c4909ca 94c38384-d869-489e-8f75-9367143eebd8 Is it possible to make the tag work for both AI Messages?
v
Hi there! 👋 I understand you're experiencing an issue where the
<flush />
tag only works for the first AI message but not for subsequent messages in your agentic framework. While I couldn't find a definitive solution to make the
<flush />
tag work for both AI messages, I can share what I know about this limitation and some potential workarounds. According to the information I found, the
<flush />
tag is indeed used to "force TTS from the AI Messages, otherwise, the entire AI Message has to stream before TTS begins" ([Vapi Documentation](https://docs.vapi.ai/workflows/tasks/say)). However, there appears to be a limitation when using it with multiple AI messages. Here are some potential approaches to consider: 1. **Try using webhooks instead of the Vapi module**: One user found a workaround by "using the webhook to response then it is correct, rather than using Vapi module" ([Discord conversation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334661059298066514)). This might help bypass the limitation you're experiencing. 2. **Investigate synchronization issues**: There may be timing issues between message playback and other actions. According to a Discord conversation, "It seems like the timing between the message playback and the transfer action is not synchronized, causing the message to be skipped" ([Discord thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1316539646808686602)). You might need to adjust your call flow logic to ensure proper synchronization. 3. **Check TTS token limits**: There might be token limits affecting how AI messages are processed. Consider "breaking up the LLM output into smaller chunks" ([Discord thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332287510964994078)) if this is the case.
4. **Tailor prompts for Vapi**: Since "Transitioning from one platform to another can sometimes require adjustments to optimize performance," you might need to "tailor prompts specifically for VAPI to take full advantage of its unique features" ([Discord thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336989548357353524)). Is there anything else you'd like to know about working with the
<flush />
tag or other aspects of Vapi's TTS functionality? Sources: - [Vapi Documentation](https://docs.vapi.ai/workflows/tasks/say) - [Discord conversation](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1334661059298066514) - [Discord thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1316539646808686602) - [Discord thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332287510964994078) - [Discord thread](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1336989548357353524)
p
looking into it.
Looking at your log data, I can see some inconsistencies in how the flush tags are being processed: \### Working Example: \`\`\` 🔵 03:24:52:412 \[user LOG\] Model output: raised in Virginia must mean you are very familiar with hot, <flush /> humid summers and maybe 🔵 03:24:52:413 Voice Input Formatted: "Being raised in Virginia must mean you are very familiar with hot, ", Original: "Being raised in Virginia must mean you are very familiar with hot, " \`\`\` In this case, the flush tag is properly processed - the content before the tag is sent to TTS immediately. \### Non-Working Examples: \`\`\` 🔵 03:24:28:006 Voice Input Formatted: " I'm glad you're doing well. It's always nice to have a good day. Tell me about your hometown. ", Original: "<flush /> I'm glad you're doing well. <flush /> It's always nice to have a good day. <flush /> Tell me about your hometown. <flush />" \`\`\` Here, the flush tags in the original output are not being properly processed in the formatted output. The text is being combined.
\## The Problem Based on the logs, I can identify two key issues: 1\. **Flush Tags at Sentence Boundaries**: When flush tags are placed between complete sentences, they're being ignored during formatting. 2\. **Token-by-Token Processing**: Looking at the logs, your model is outputting tokens incrementally, but flush tags are only being properly processed when they appear in the middle of a continuous output stream. \## How to Fix It To ensure flush tags work properly in your model output, you should: 1\. **Ensure proper placement of flush tags**: - Place flush tags within the same token/chunk of text that contains the content you want to flush - Avoid placing flush tags at the beginning of a new token/chunk 2\. **Modify your model output format**: Instead of: \`\`\` <flush /> I'm glad you're doing well. <flush /> It's always nice to have a good day. \`\`\` Format it as: \`\`\` I'm glad you're doing well.<flush> It's always nice to have a good day. \`\`\` Notice the placement of the flush tag immediately after the content to be flushed, without spaces between the content and the tag. 3\. **Batch output properly**: Ensure that each complete thought and its associated flush tag are sent together in the same output chunk, rather than sending tokens individually.
\## Example Implementation For your specific case, format model output like: \`\`\` I'm glad you're doing well.<flush /> It's always nice to have a good day.<flush /> Tell me about your hometown.<flush /> \`\`\` This ensures each sentence is immediately sent to TTS as soon as it's received, creating a more responsive conversation flow. \## Technical Explanation The issue relates to how the VAPI
ttsBuffer
function processes text. Looking at the implementation: When a flush tag appears at the beginning of a chunk, there's no content before it to send to TTS (\`sendToTTS\` would be empty). The system might be ignoring these cases or combining multiple chunks before processing flush tags. By ensuring that flush tags are placed immediately after content to be flushed (without spaces) and within the same output chunk, you'll get more consistent behavior.
d
Thanks @Shubham Bajaj I'll implement your fix and get back to you.
2 Views