Creating a conversational trainer.
# support
f
Hi team, so I am trying to build a conversational training Agent using Vapi agent for voice. Following is the quick description of it: The training agent has a knowledge base on which it can answer questions and create its flow. It also has a capability to show video clips which are muted and the agent can voiceover on the clip demonstrating what is happening in the video. Users can interrupt at any point and Agent can navigate to any part of video clips to answer any questions if needed. Now following are the functionalities that we are trying to solve that we couldn't find. Can you help answer how to resolve them or if there is a workaround on it. 1. Can vapi do function call directly on js client? It needs a server url currently which means it can do so only to server. 2. We want the voiceover the video to be synchronised with the video. While we can give the transcript, but it is not necessarily gonna stay in time since the speed of talking and pauses may differ. Is there any way we can manage this? (voiceflow etc. useful here to do it?) 3. We are also trying to dub the video using same voice from eleven labs. But still they are sounding quite different. Is it because vapi is changing the fidelity so as to customize it for calls? Lemme know if you need more clarifications on this.
v
@fibs Hello! I'm glad to hear you're working on building a conversational training agent with Vapi. Let's address your queries one by one: 1. **Function Calls on JS Client**: Vapi is designed to work with server-side operations to handle voice interactions. If you need to make function calls directly on the client-side, you would typically use JavaScript to call your server, which in turn would interact with Vapi. However, if you're looking for a way to bypass the server, you might consider using WebSockets or similar technologies that allow for real-time communication between the client and Vapi. This would still require a server component but can facilitate a more direct interaction pattern. 2. **Voiceover Synchronization**: To synchronize voiceover with video, you would need a mechanism to control the timing of the spoken audio. One approach could be to use timestamps or markers within the video to trigger specific sections of the transcript. This way, you can ensure that the voiceover matches the video content. Tools like Voiceflow can be useful for designing the conversational flow, but for precise synchronization, you might need to implement custom logic that takes into account the video's timeline and the duration of spoken phrases. 3. **Voice Fidelity with Eleven Labs**: If you're experiencing differences in voice quality between Vapi and Eleven Labs, it could be due to several factors, such as different text-to-speech engines, encoding formats, or processing techniques. Vapi is optimized for real-time interactions, which might affect the fidelity to ensure low latency. To maintain consistency, you would need to use the same TTS engine and settings across both platforms. If Eleven Labs offers an API, you could potentially integrate it with Vapi to use the same voice for both dubbing and real-time interactions. If you need more detailed assistance or have further questions, please don't hesitate to ask.
You can continue this conversation by mentioning me in the message.
Got it. Tagging @User for additional help.