Custom LLM output is truncated
# support
m
Hi all 👋 , I'm having a problem reproducing long texts. My custom LLM returns chunks from a generation but the assistant only reproduce a partial voice generation. Some examples are: - c57aeebe-561b-4e41-b466-f59c27269982 - e8378b76-1323-4158-9216-e4e1e61259a5 Do you see anything wrong on my side or is there any configuration for 11labs tts the set a limit on the number of converted tokens? Thanks as always for your help, Marco
v
@Marco Falcone Hello Marco 👋, Great to hear from you! I'm here to help with your issue regarding the reproduction of long texts. From what you've described, it seems like you might be encountering a limitation on the length of text that can be processed by the TTS system in one go. Most TTS systems, including 11labs, often have a maximum token or character limit for each request to ensure performance and quality. To address this, you can: 1. Check the documentation for 11labs TTS to find the specific limits on text length. 2. If the limit is being exceeded, consider breaking up the LLM output into smaller chunks that are within the TTS limits and then sequentially sending these chunks for voice generation. 3. Ensure that the TTS system is configured correctly to handle the output from your custom LLM. If you need further assistance with the configuration or if you're unsure about the token limits, please let me know, and I can provide more detailed support or direct you to the right resources. Would you like me to check the token limits for you, or do you need assistance with another aspect of your TTS setup?
You can continue this conversation by mentioning me in the message.
m
@Vapi yes please, give me some details on token limits and how I can set it in the the assistant configuration
v
@Marco Falcone Certainly, Marco! The token limit for voice generations is typically set to ensure that the TTS system can handle the request without running into performance issues. In the context of VAPI, the token limit can be configured through the assistant's settings. For your assistant, the relevant configuration parameter is
squad.members.assistant.model.maxTokens
. This parameter defines the maximum number of tokens that the assistant will generate in each turn of the conversation. The default value is set to 250 tokens. If you're looking to adjust this limit to accommodate longer text generations, you would need to update the
maxTokens
parameter to a higher value, within the constraints of what the TTS system can handle. To set this in the assistant configuration, you would typically access the assistant's settings through your development environment or API configuration panel and update the
maxTokens
value accordingly. Would you like to proceed with updating this configuration, or do you need assistance with another setting?
m
@Vapi I'm not using squad but a simple assistant
v
@Marco Falcone I apologize for the confusion. If you're using a simple assistant without the Squads feature, the token limit issue might be related to the settings of the TTS service itself rather than the assistant configuration. For 11labs TTS, the token limit is often predefined and may not be directly configurable by the user. However, you can manage long text reproduction by: 1. Splitting the text into smaller segments that are within the TTS service's token limit. 2. Sending these segments sequentially to the TTS service for voice generation. If you're unsure about the exact token limits for 11labs TTS, I recommend checking their documentation or reaching out to their support team for the specific details. In the meantime, if your LLM is generating text that exceeds the TTS token limit, you'll need to implement a method to break the text into acceptable chunks. This can be done programmatically before sending the text to the TTS service. Would you like guidance on how to implement text segmentation for TTS, or is there anything else I can assist you with?
m
@Vapi I already send text in small chunks
v
@Marco Falcone That's good to hear, Marco. If you're already sending text in small chunks and still experiencing issues with partial voice generation, there might be a few other factors to consider: 1. **TTS Service Configuration**: Ensure that the TTS service is properly configured to handle multiple chunks. There might be settings related to pause duration between chunks, or how chunks are queued for processing. 2. **Chunk Size**: Even though you're sending small chunks, they might still be too large for the TTS service's limits. Double-check that each chunk is within the service's token or character limit. 3. **API Rate Limits**: The TTS service might have rate limits in place that could affect how quickly you can send multiple chunks. Ensure you're not hitting these limits, which could cause some chunks to be dropped or delayed. 4. **Error Handling**: Make sure your implementation has robust error handling to catch any issues when sending chunks to the TTS service. This can help identify if certain chunks are failing to process. 5. **Session Context**: If the TTS service uses session-based context, ensure that the context is maintained correctly across multiple chunks to avoid any discontinuity in voice generation. If you've checked all these factors and the issue persists, it might be helpful to look at the logs or error messages from the TTS service when the partial voice generation occurs. This can provide clues as to why the full text isn't being reproduced. If you're still unable to resolve the issue, please reach out to the support team of the TTS service for further assistance. They may need to look into your specific case to determine why the voice generation isn't completing as expected. Would you like to explore any of these suggestions further, or is there another way I can assist you?
m
@Vapi I need someone to look into the conversations
v
@Marco Falcone I understand that you need someone to review the conversations and investigate the issue with the voice generations further. Since I'm here to provide support, I can guide you on how to proceed, but for a detailed investigation, you would need to contact the actual support team of the TTS service you're using. Here's what you can do: 1. **Gather Information**: Compile all relevant information, including conversation IDs, timestamps, and any error messages or logs you have encountered. 2. **Contact Support**: Reach out to the support team of the TTS service with the information you've gathered. Provide them with the conversation IDs you mentioned earlier (c57aeebe-561b-4e41-b466-f59c27269982 and e8378b76-1323-4158-9216-e4e1e61259a5) and explain the issue in detail. 3. **Follow Up**: If you have a support ticket or a way to track your request, make sure to follow up regularly for updates on the investigation. If you'd like, I can assist you by closing this support ticket once you confirm that you have all the information you need to contact the TTS service support team. Would you like me to close this ticket for you, or is there anything else you need help with before doing so?
s
@Marco Falcone It looks like you've set the chunk plan to false. I'm curious if there's a specific reason for that! Currently, your model output is being sent to the voice provider as individual words, which isn't giving us the best results. Would you mind enabling it and giving it another try? If you've already done that, could you please share the call IDs with the chunk plan enabled? Thank you so much!
m
I disabled it for testing. On Monday I'll enable it again and will do some test. Thanks as always for your precious help 🙏
s
@Marco Falcone a gentle reminder for testing it out by setting the chunk plan set to true.
m
We solved this issue updating some parameters related to the TTS provider. Thansk for your support 🙏
When I click on mark as solved I have an error
s
what parameter did you update, would you like to share so other's can take help out of it?
m
I updated
optimizeStreamingLatency
to 1 and increased
chunkPlan.minCharacters
to 60. This seems to improve the output generation at the cost of a minimum initial latency
s
chunkPlan.minCharacters
this is used for voice output.