Endpointing VAPI vs Deepgram
# support
d
Is vapi's endpointing model for determining 'end of turn' just a configuration of deepgram's endpointing or does vapi have its own secret sauce under the hood? On my VAPI app, the end of turn recognition works great, but I'm also realizing this is probably because end-speech is based on reaching the end of the text returned from the llm. I have a "suggestions" feature that suggests what a user should say when practicing speaking to an AI agent using VAPI's speech-end events. But now I'm trying to offer this same feature for real life conversations that are streaming real customer call audio instead of talking to AI customers and I have to figure out how to do the endpointing as nicely as vapi does it.
v
@danemyers VAPI does indeed have its own specialized approach to endpointing conversation speech, which goes beyond just a configuration of Deepgram's endpointing. While Deepgram provides robust speech-to-text capabilities, VAPI enhances the endpointing process with a combination of silence detection and advanced machine learning models. This combination allows VAPI to more accurately determine the 'end of turn' in a conversation, which is crucial for applications like yours that involve interactive speaking practice. For your "suggestions" feature in real-life conversations, you would need to implement a similar system that can handle the complexities of live dialogue. This involves not only detecting pauses but also understanding the context and flow of the conversation to make accurate predictions about when a speaker has finished their turn. If you're looking to replicate VAPI's endpointing efficiency in your application for real customer call audio, you might need to develop or integrate a solution that can analyze conversational dynamics in real-time. This could involve training your own machine learning models or leveraging existing conversation analysis tools that are designed for this purpose. If you need more detailed guidance on how to implement such a feature or if you have any other questions, please feel free to ask for more info.
You can continue this conversation by mentioning me in the message.
d
Is this accurate? And if so, do people ever integrate VAPI into their tech stack without using the text speech side of it? And using it just for this special sauce of recognizing endpoints?
s
Hey there is no secret sauce. The end of text is returned by the deepgram not by vapi llm.
d
For transcripttype==final AND speech-end?
s
these are provided by the transcriber
3 Views