Which transcriber is more accurate
# support
w
I have tried all of the provider , I found that deepgram is just ok, it often make mistakes. Gladia and Talkscriber is better , but sensitive to the environment voice. It may stop by some low volume noise .If I want to use a more accurate and easy to use model . Which one should I use. What are the pros and cons of these models?
v
@WadeSong0121 It's great to hear that you've been exploring different transcription providers! Each provider has its own strengths and weaknesses, and the best choice often depends on your specific needs. Here's a brief overview of the pros and cons of the providers you mentioned: Deepgram - Pros: Cost-effective, good for general use, offers real-time transcription. - Cons: May not be as accurate in noisy environments or with low-quality audio. Gladia - Pros: Known for better handling of noisy environments, offers a range of voice-related services. - Cons: Potentially higher cost, may have limitations on customization. Talkscriber - Pros: Good accuracy in optimal conditions, user-friendly interfaces. - Cons: Sensitive to environmental noise, may struggle with accents or dialects. For a more accurate and easy-to-use model, you might want to consider the following: 1. **Customization**: Look for a service that allows you to train the model on your specific audio types and use cases. 2. **Noise Cancellation**: Choose a provider that offers advanced noise cancellation features if you are dealing with noisy environments. 3. **Language and Accent Support**: Ensure the provider supports the languages and accents you need to transcribe. 4. **Post-Processing**: Some services offer tools to correct and improve the transcription results. If you're looking for a recommendation, it would be helpful to know more about your specific use case, such as the type of audio you're transcribing, the environment, and your accuracy requirements. With that information, I could provide a more tailored suggestion.
You can continue this conversation by mentioning me in the message.
w
I need the voice of normal users speaking on their phones in the office and at home. My requirements are two-fold: 1. Identification must be accurate 2. It can reduce environmental noise. Avoid recognizing environmental noise as dialogue. How can this be achieved?
s
Hey @WadeSong0121 the best to use is deepgram. If none of the transcriber is not working as per your expectations then you can use,
custom-transcriber
. A great solution to your use case, you can use any transcriber which works for you. (make sure it supports streaming).
w
Thanks for your help
Hi, I found that the current speech-to-text service is using deepgram nova2 general, but the recognition accuracy for user input content is not high enough, and a lot of content is misidentified. There are two problems: 1. What are some general methods to improve accuracy, and what other models can be used? 2. If I want to integrate the OpenAI Whisper speech model, what specific development work do we need to do?
b
Hey @WadeSong0121 , we have assembly, gladia, and talkscriber
Talkscriber uses whisper
Assembly is potentially more accurate, but it has higher latency
w
Thanks for your help . But using Talkscriber, I've noticed that he's very sensitive to ambient sounds. As an example, I'm having a conversation with him, and after I've spoken, he's replying, and in the middle of his reply he'll suddenly stop for 0.5-1s, and then restart saying what he just said. This happens frequently. How should I deal with this?
b
I believe you can enable denoising to improve that, and we have a better implementation in-progress at the moment
w
By the way , you mean these two features? Is this config right? "backgroundSound": "off", "backgroundDenoisingEnabled": false, And I'm currently using the Whisper English model and I've found a small issue. When I speak in Chinese, some of the Chinese text gets translated to English, while other Chinese text is displayed directly. Is there a way to keep the conversation in Chinese?
s
- set backgroundDenoisingEnabled to true for background noise - Talk-scriber multilingual ins't working as expected in case your using it, if it's not multilingual then please share the call id.
w
Hi, I found that when using this model for dialogue, there are some contents that I have not said, but appear in the text, such as "okay, thank you" and the like. What is the reason for this, and how can it be solved? As shown in the image, the content within the red frame is something I did not actually say, but the model output the corresponding text https://cdn.discordapp.com/attachments/1306919092120322069/1310860737933213746/image.png?ex=6746c1b2&is=67457032&hm=72f980b8d2e98ef3ed92d6719380111f3ee7ad92c7f135e0089df1a06ed9ddde&
s
Hey @WadeSong0121 To help track down this issue, could you share: - The call ID - When exactly this happened (the timestamp) - What response you expected to get - What response you actually got instead
31 Views