First Custom LLM request is extremely slow Vapi AI #support

Join Discord

First Custom LLM request is extremely slow

# support

Hugo

03/31/2025, 1:33 PM

Title explains itself mostly

Vapi Support Bot

03/31/2025, 1:35 PM

Hi there! 👋 I understand you're experiencing slowness with your first Custom LLM request. I don't have specific details about your particular implementation, but I can definitely help explain why first Custom LLM requests are often slow and provide some troubleshooting steps. ## Why First Custom LLM Requests Are Often Slow The first request to a Custom LLM can be slow for several reasons: 1. **Cold Start Effect**: According to the information I found, Custom LLMs are more complex and resource-intensive than standard implementations, which can lead to slower initial requests [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1317475563220504596). 2. **Network Latency**: The Custom LLM request/response cycle involves network communication, making it susceptible to network latency. High network latency can contribute to delays in voice applications [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272). 3. **Data Transfer Overhead**: Custom LLMs use Server-Sent Events (SSE) for response streaming, and the size and format of request/response payloads can impact speed [source](https://docs.vapi.ai/documentation/advanced-concepts/customization/custom-llm/custom-llm-tool-calling-integration). 4. **Processing Time Factors**: Several factors can affect processing time, including: - Call volume (high volumes can slow down response times) - VAPI configuration settings - Token limits - Tool calling complexity [source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272) ## Troubleshooting Steps Here are some steps you can take to address the slowness: 1. **Check VAPI Configuration**: Review settings like

silenceTimeoutSeconds

responseDelaySeconds

, and

llmRequestDelaySeconds

[source](https://discord.com/channels/1211482211119796234/1211483291191083018/threads/1332286268805087272).

Vapi Support Bot

03/31/2025, 1:35 PM

Hugo

03/31/2025, 1:39 PM

1st request latency: Turn latency: 3663ms (transcriber: 0ms, endpointing: 1501ms, kb: N/A, model: 1812ms, voice: 331ms) 2nd request latency: Turn latency: 1307ms (transcriber: 0ms, endpointing: 101ms, kb: N/A, model: 640ms, voice: 528ms) 3rd request latency: Turn latency: 2518ms (transcriber: 0ms, endpointing: 101ms, kb: N/A, model: 1485ms, voice: 916ms) 4th request latency: Turn latency: 1607ms (transcriber: 0ms, endpointing: 101ms, kb: N/A, model: 1175ms, voice: 313ms)

Aniah

03/31/2025, 3:59 PM

Can you share the call ID?

Hugo

04/01/2025, 8:34 AM

2a3ec87e-cefb-4a67-a6ca-a2c1ea338f84

Aniah

04/02/2025, 2:40 PM

Hey Hugo, the custom LLM latency - we cannot help you much on that part as it is coming from your side. But I can help you reduce the endpointing latency. You need to change your Start Speaking plan, Smart Endpointing plan to the following. It uses LiveKit for English transcriptions and waits for the user to finish speaking within 200ms-400ms.

Copy code

"smartEndpointingPlan": {
          "provider": "livekit",
          "waitFunction": "200 + 200 * x"
        }

Aniah

04/02/2025, 2:41 PM

Let me know how it goes for you.

2 Views

Previous Next