Does not hear short single word responses and 11la...
# support
m
I'm using deepgram Nova 2 and 11labs Eleven Turbo V2.5, both with Swedish language, and it works great except for 2 problems. When I respond to the AI with only short Swedish words like "ja" (yes) and "nej" (no) I don´t get a response from the AI most of the times, but if I answer with more than one word it always work. How can I fix this? Also, most of the time the voice is perfect, but after a few responses from the AI or with longer sentences the AI starts to speak with a very strange accent. How can I fix the voice? Call ids where it does not hear single words: 04ae7293-5086-4610-9edd-6cd2143fcf92 5de54671-409b-488a-9d03-b4e3bcecfae0 Call ids with strange accent: 5de54671-409b-488a-9d03-b4e3bcecfae0 9d8c3cac-dcf2-4fd8-82e4-fecb570054f5
v
message has been deleted
@Magnus It sounds like you're experiencing two separate issues with your voice API setup. Let's address them one by one. For the first issue, where short responses like "ja" and "nej" are not consistently recognized, this could be due to the sensitivity settings or noise filtering parameters of your speech recognition service. Short utterances can sometimes be filtered out as noise or not recognized as valid input. To improve recognition of short words, you might need to adjust the sensitivity settings or look into whether there is a minimum length of speech required for recognition in the services you're using. Regarding the second issue with the strange accent appearing after a few responses or with longer sentences, this could be related to the text-to-speech (TTS) engine adapting to different contexts or trying to handle complex sentences. To resolve this, you can check if there are any settings that control the consistency of the voice or if there are updates available for the TTS engine that might improve performance. Since you've provided call IDs where these issues occurred, I would recommend reaching out to the support teams of Deepgram and 11labs with these IDs. They will be able to look into the specific instances and provide more targeted assistance. If you need further assistance or would like me to reach out to support on your behalf, please let me know!
You can continue this conversation by mentioning me in the message.
m
@Vapi How do I "adjust the sensitivity settings or look into whether there is a minimum length of speech required for recognition in the services you're using." for deepgram?
v
@Magnus To adjust the sensitivity settings or understand the minimum length of speech required for recognition in Deepgram, you can explore the following options: 1. **Adjust Sensitivity**: Deepgram allows you to adjust the sensitivity of the speech recognition through the
keywords
parameter, where you can specify words that should be recognized with higher sensitivity. For short words like "ja" and "nej," you could add these as keywords to ensure they are picked up more reliably. 2. **Endpointing Parameters**: Deepgram has endpointing parameters that control when the recognizer starts and stops listening. The
vad_turnoff
parameter can be adjusted to be more or less aggressive in determining the end of a speaker's turn, which might help with recognizing short utterances. 3. **Model Selection**: Ensure you are using the most appropriate Deepgram model for your use case. For example,
nova-2-conversationalai
might be more suited for interactive AI applications. 4. **Language Support**: Verify that the model you are using supports Swedish and is optimized for the type of content you are transcribing. 5. **Audio Quality**: Ensure that the audio quality is good and that there is minimal background noise, as this can affect the recognition of short words. 6. **Punctuation and Formatting**: Check if the
punctuate
and
format
options are enabled, as these can affect how the transcript is generated and may impact the recognition of short responses. If you need to adjust these settings, you would typically do so by including the appropriate parameters in your API request to Deepgram. If you're unsure about how to proceed or need more detailed guidance, please reach out to Deepgram's support with your specific use case and the issues you're encountering. Would you like me to provide an example of how to include these parameters in an API request, or do you need assistance with anything else?
m
@Vapi sure
v
@Magnus Certainly! Here's an example of how you might structure an API request to Deepgram to adjust sensitivity settings and ensure short words like "ja" and "nej" are recognized more reliably:
Copy code
json
{
  "model": "general",
  "language": "sv",
  "keywords": ["ja", "nej"],
  "vad_turnoff": 0.5,
  "punctuate": true,
  "format": true
}
In this JSON payload: - `"model"`: Specifies the model to use, which should be appropriate for your content. - `"language"`: Sets the language to Swedish (
"sv"
). - `"keywords"`: An array of words that you want the service to recognize with higher sensitivity. - `"vad_turnoff"`: Adjusts the Voice Activity Detection (VAD) endpointing; a lower value might help in recognizing shorter utterances. - `"punctuate"`: Enables punctuation in the transcript. - `"format"`: Enables formatting in the transcript. You would send this payload as part of the body of your POST request to the Deepgram API endpoint for speech recognition. Please note that the values provided here, especially for
"vad_turnoff"
, are just examples. You may need to experiment with different values to see what works best for your specific use case. If you're not familiar with making API requests or need further assistance with the parameters, I would recommend consulting the Deepgram documentation or reaching out to their support team for more detailed guidance. Would you like to proceed with any other questions or actions?
m
@Vapi How do I find out about this: "settings that control the consistency of the voice or if there are updates available for the TTS engine that might improve performance." ?
v
@Magnus To find out about settings that control the consistency of the voice and to check for updates available for the Text-to-Speech (TTS) engine that might improve performance, you should: 1. **Consult the Documentation**: Review the official documentation for the TTS engines you are using (Deepgram Nova 2 and 11labs Eleven Turbo V2.5). The documentation often includes detailed information about various settings and parameters that can be adjusted to control voice consistency. 2. **Check for Updates**: Visit the official websites or developer portals for Deepgram and 11labs to see if there are any recent updates or patches for the TTS engines. Updates may contain improvements or fixes that could address the issues you're experiencing. 3. **Contact Support**: Reach out to the customer support or technical support teams of Deepgram and 11labs. Provide them with the call IDs where you've experienced the strange accent issue, and ask for guidance on settings that could improve voice consistency or information about any available updates. 4. **Community Forums**: Look for community forums, user groups, or discussion boards where other developers may have shared similar experiences and solutions. These can be valuable resources for practical advice and tips. 5. **Experiment with Parameters**: If the documentation provides information about voice settings, try experimenting with different parameters to see if they affect the voice consistency. This could include adjusting speech rate, pitch, volume, or other voice modulation settings. 6. **Check Compatibility**: Ensure that the language and dialect settings are correctly configured for Swedish, as incorrect settings might lead to inconsistent accent performance. If you need assistance with any specific settings or would like me to help you contact support, please let me know!
m
I ment for 1labs Eleven Turbo V2.5
@Vapi I ment for 1labs Eleven Turbo V2.5
v
@Magnus For specific information about settings that control the consistency of the voice or updates available for the 11labs Eleven Turbo V2.5 TTS engine, you should: 1. **Review the 11labs Documentation**: Check the official 11labs documentation for Eleven Turbo V2.5. It should provide details on configurable parameters that affect voice consistency, such as prosody controls (pitch, rate, volume), voice selection, and language or accent settings. 2. **Check for Software Updates**: Visit the 11labs website or user dashboard to see if there are any firmware or software updates for Eleven Turbo V2.5. Updates often include enhancements or bug fixes that can improve TTS performance. 3. **Contact 11labs Support**: If the documentation does not resolve your issue, or if you need more personalized assistance, reach out to 11labs support directly. Provide them with the call IDs where the strange accent occurs, and ask for advice on how to maintain consistent voice quality. 4. **Community and Forums**: Engage with the 11labs community, if available, such as forums or user groups. Other users may have encountered similar issues and could have insights or solutions to share. 5. **Adjust TTS Parameters**: If the documentation mentions specific parameters for voice consistency, experiment with these settings in your API requests to see if they help. This might involve tweaking the speech synthesis markup language (SSML) used in your requests. 6. **Language and Dialect Settings**: Ensure that the correct language and dialect settings are applied for Swedish, as mismatches here could lead to unexpected accents. If you're unable to find the necessary information or if the issue persists, please reach out to 11labs support for further assistance. If there's anything else I can help you with, let me know!
m
@Vapi Could you reach out to support on my behalf?
v
@Magnus Please ask support for more info on this.
m
@Vapi How do I do that?
@Vapi Will a human look into my problem?
v
@Magnus I'm here to guide you on how to reach out to support for 11labs Eleven Turbo V2.5. To contact their support team, you will typically need to follow these steps: 1. **Support Portal or Email**: Visit the 11labs website and look for a support or contact section. They may provide a support portal where you can submit a ticket or an email address dedicated to customer support inquiries. 2. **Contact Form**: If there's a contact form available, fill it out with your details, including your name, email address, and a description of the issue you're facing with the TTS engine. 3. **Provide Details**: In your communication, include the call IDs where you've experienced the issues, describe the problems with as much detail as possible, and mention any troubleshooting steps you've already taken. 4. **Request Assistance**: Ask for their assistance in resolving the issues with short word recognition and the strange accent that occurs after a few responses or with longer sentences. 5. **Follow-Up**: After submitting your request, keep an eye on your email for a response from the support team. They may provide further instructions or ask for additional information to help resolve your issue. A human from the 11labs support team will review your problem and provide assistance. If you need help drafting a message or if you have any other questions, please let me know!
m
@Vapi Will Vapi support look into my problem?
v
@Magnus Yes, for issues related to Vapi's services, the Vapi support team can certainly look into your problem. To get assistance from Vapi support, you can contact them directly using the information provided: Email: support@vapi.ai When reaching out, make sure to provide a detailed description of the issues you're encountering, including the call IDs where you've experienced problems, and any specific questions or concerns you have. A human support representative from Vapi will then review your case and work with you to resolve the issues. If you'd like, I can assist you with the process of contacting Vapi support or provide further guidance on what information to include in your support request. Just let me know how I can help!
s
For single words, try this start speaking plan: "startSpeakingPlan": { "transcriptionEndpointingPlan": { "onPunctuationSeconds": 0.4 }, "waitSeconds": 0.4, "smartEndpointingEnabled": true }
for strange accent can you share the timestamp and with more details how it was supposed to sound i cannot understand the language.
m
Thanks for answering me back. This did not work however. In call id: 2c70c5c7-6257-4a8d-a2d8-556a325f481e In recording between 00:25 - 00:50 AI asks if I'm ready I answer: "Ja" (Yes) ..pause.. "Ja" (Yes) ..pause.. "Ja det är jag" (Yes I am) ..pause.. "Jag är redo" (I am ready) And then the AI answers me back. The log only shows multiple "[CHECKPOINT] User speech possibly starting" and in the transcript only the last "Jag är redo" shows up in the transcript. It´s the same problem in most calls. Sometimes it picks up short "ja" answer from me, but most of the time it does not. Could there be some setting for deepgram that needs to be changed? I have tried setting endpointing to 10, 200, 300, 400, for both nova 2 and the enhanced model, but that did not help.
Somehow this problem solved itself today.