First Custom LLM request is extremely slow
# support
h
Title explains itself mostly
v
Hi there! 👋 I understand you're experiencing slowness with your first Custom LLM request. I don't have specific details about your particular implementation, but I can definitely help explain why first Custom LLM requests are often slow and provide some troubleshooting steps. ## Why First Custom LLM Requests Are Often Slow The first request to a Custom LLM can be slow for several reasons: 1. **Cold Start Effect**: According to the information I found, Custom LLMs are more complex and resource-intensive than standard implementations, which can lead to slower initial requests [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1317475563220504596). 2. **Network Latency**: The Custom LLM request/response cycle involves network communication, making it susceptible to network latency. High network latency can contribute to delays in voice applications [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). 3. **Data Transfer Overhead**: Custom LLMs use Server-Sent Events (SSE) for response streaming, and the size and format of request/response payloads can impact speed [source](https://docs.vapi.ai/documentation/advanced-concepts/customization/custom-llm/custom-llm-tool-calling-integration). 4. **Processing Time Factors**: Several factors can affect processing time, including: - Call volume (high volumes can slow down response times) - VAPI configuration settings - Token limits - Tool calling complexity [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272) ## Troubleshooting Steps Here are some steps you can take to address the slowness: 1. **Check VAPI Configuration**: Review settings like
silenceTimeoutSeconds
,
responseDelaySeconds
, and
llmRequestDelaySeconds
[source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272).
2. **Verify LLM Key Configuration**: Ensure your LLM key is correctly configured and hasn't exceeded usage limits [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1323288623629471866). 3. **Check Token Limits**: Ensure requests are within token limits [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1323288623629471866). 4. **Verify Streaming Response Format**: Make sure your LLM sends a streaming response in SSE format [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1328028159949537341). 5. **Check Network Latency**: Verify network latency from your system to the API endpoints [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). 6. **Review Custom LLM Configuration**: Double-check provider, URL, model, and tools configuration [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1328028159949537341). Is there any specific aspect of the slowness you're experiencing that you'd like me to elaborate on? Sources: - [Discord Thread on Custom LLM Complexity](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1317475563220504596) - [Discord Thread on Network Latency](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272) - [Vapi Documentation on Custom LLM Tool Calling Integration](https://docs.vapi.ai/documentation/advanced-concepts/customization/custom-llm/custom-llm-tool-calling-integration) - [Discord Thread on Token Limits](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1323288623629471866) - [Discord Thread on Custom LLM Configuration](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1328028159949537341)
h
1st request latency: Turn latency: 3663ms (transcriber: 0ms, endpointing: 1501ms, kb: N/A, model: 1812ms, voice: 331ms) 2nd request latency: Turn latency: 1307ms (transcriber: 0ms, endpointing: 101ms, kb: N/A, model: 640ms, voice: 528ms) 3rd request latency: Turn latency: 2518ms (transcriber: 0ms, endpointing: 101ms, kb: N/A, model: 1485ms, voice: 916ms) 4th request latency: Turn latency: 1607ms (transcriber: 0ms, endpointing: 101ms, kb: N/A, model: 1175ms, voice: 313ms)
p
Can you share the call ID?
h
2a3ec87e-cefb-4a67-a6ca-a2c1ea338f84
p
Hey Hugo, the custom LLM latency - we cannot help you much on that part as it is coming from your side. But I can help you reduce the endpointing latency. You need to change your Start Speaking plan, Smart Endpointing plan to the following. It uses LiveKit for English transcriptions and waits for the user to finish speaking within 200ms-400ms.
Copy code
"smartEndpointingPlan": {
          "provider": "livekit",
          "waitFunction": "200 + 200 * x"
        }
Let me know how it goes for you.
2 Views