Latency Reduction through Pre-recorded msgs?
# support
t
Hello VAPI support, I understand the assistant workflow works like this (or correct me if I'm wrong): Speech to Text - Send text to LLM - LLM response to Speech - output. However, if we have all the pre-recorded recordings (mp3's) response to all the scenarios, and the LLM just need to decide what recording needs to be pulled and played, will that somewhat reduce latency? I am already using the fastest LLM model and everything but we still want an improvement on it so it's closer to human reaction time. Or let me know if there is another way to reduce latency, possibly with the setup of workflow or functions? Thanks
v
Hi there! 👋 I can definitely help with your question about using pre-recorded messages to reduce latency, as well as suggest other optimization techniques. Yes, using pre-recorded messages can indeed reduce latency in your voice AI system! According to the information I found, pre-recorded messages can help bypass the text-to-speech synthesis step in the workflow, which is one of the factors contributing to overall latency. Since the LLM would only need to decide which recording to play rather than generating text that needs to be synthesized into speech, this approach can potentially reduce the voice-to-voice flow time [source30](https://docs.vapi.ai/quickstart). Here are some additional ways to reduce latency in your Vapi setup: ### Voice Provider Optimization - Switch to OpenAI as a Voice Provider which offers speed multipliers up to 4x - RimeAI Voice Provider has flexible speed settings with no upper limit [source](https://discord.com/channels/1211482211119796234/1339223223409184788) ### Parameter Tuning - Adjust
startSpeakingPlan
with
waitSeconds
set to around 0.8 seconds to minimize delays:
Copy code
json
"start_speaking_plan": {
  "waitSeconds": 0.8,
  "smartEndpointingEnabled": true
}
[source](https://discord.com/channels/1211482211119796234/1331035264751112202) ### Endpointing and Voice Activity Detection - Adjust voice activity detection settings to optimize responsiveness - Enable smart endpointing to improve the system's ability to determine when to respond [source](https://docs.vapi.ai/customization/speech-configuration) ### Network and Server Optimization - Ensure the network path between your servers and clients is optimized - Use Content Delivery Networks (CDNs) with edge locations close to your users - Configure servers for high performance with optimized settings [source](https://discord.com/channels/1211482211119796234/1336685825227755591) ### Caching Mechanisms
- Use static first messages or implement caching to reduce initial response latency - This approach can result in voice responses within sub-second timeframes [source](https://discord.com/channels/1211482211119796234/1339303957675245669) Would you like more specific information about implementing any of these latency reduction techniques in your particular setup? Sources: - [Vapi Quickstart Guide](https://docs.vapi.ai/quickstart) - [Discord Channel Voice Provider Speed Settings](https://discord.com/channels/1211482211119796234/1339223223409184788) - [Discord Channel Start Speaking Plan Configuration](https://discord.com/channels/1211482211119796234/1331035264751112202) - [Vapi Speech Configuration Documentation](https://docs.vapi.ai/customization/speech-configuration) - [Discord Channel Network Optimization](https://discord.com/channels/1211482211119796234/1336685825227755591) - [Discord Channel Caching Mechanisms](https://discord.com/channels/1211482211119796234/1339303957675245669)
t
What are the steps for pulling pre-recorded recordings, where do I store the recordings and all other steps. could you elaborate please@User
k
Host your pre-recorded audio files, then configure your assistant to respond with those using the audioUrl field in message responses, mapping user inputs to specific recordings via backend logic. This bypasses TTS and reduces latency.
t
Thanks for response! I have a couple follow-up questions: 1. By hosting pre-recorded files, do I host it somewhere on Vapi (like knowledge base) or anywhere like Dropbox? 2. By Configure your assitant to respond with audioUrl, do you suggest me creating workflows on Vapi or include that in prompt for LLM? 3. Could you elaborate on what to do for map user input to recording via backend logic Thank you so much!
k
You can host pre-recorded audio files externally (like Dropbox or S3) or reference them via URLs in Vapi workflows or LLM prompts, to map user input to recordings, use your backend to process the input and return the appropriate audioUrl to Vapi for playback.
5 Views