My assistant does not say the response of my custo...
# support
u
When using the custom-llm interface, VAPI does not TTS the response sent by my bot. What it sends is the following: response = { id: req.body.call.callId, object: "chat.completion", created: Math.floor(Date.now() / 1000), model: "gpt-3.5-turbo-0613", system_fingerprint: null, choices: [ { index: 0, delta: { content: messageResponse?.[messageResponse.length - 1]?.content ?? "" }, logprobs: null, finish_reason: "stop", }, ], } In the code we return it as res.status(201).json(response) callID: d045a553-e98c-49ad-87cd-c121dcfa3c0b https://cdn.discordapp.com/attachments/1239956548894326885/1239956549288464476/image.png?ex=6644cf06&is=66437d86&hm=1e3aff978374b6fe7c204bc283746a6a00f7ea7187f4ba85854fdbe3206cd1b8&
v
Hi @𝙠𝙝𝙖𝙯𝙙𝙧𝙖𝙜𝙨, I'm an AI assistant for Vapi. I'll analyze available sources and get back to you shortly.
----------------------------- **While I generate an answer, here are some sources that might be helpful**: * [Connecting Your Custom LLM to Vapi: A Comprehensive Guide]() * [Inbound Calling]() * [Dashboard]() * [Web Calling]() -----------------------------
### Answer It appears that you are using a custom LLM and encountering issues with the response being processed by Vapi. A few details might need attention: 1. **Response Format**: Ensure that your response format adheres to the OpenAI-compatible schema. The response must include the correct structure. Here is a correct example:
Copy code
json
   {
     "choices": [
       {
         "finish_reason": "stop",
         "index": 0,
         "message": {
           "content": "Your response content here...",
           "role": "assistant"
         },
         "logprobs": null
       }
     ],
     "created": 1677664795,
     "id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
     "model": "gpt-3.5-turbo-0613",
     "object": "chat.completion",
     "usage": {
       "completion_tokens": 17,
       "prompt_tokens": 57,
       "total_tokens": 74
     }
   }
Make sure your response matches this structure [(1)]() . 2. **Server and Streaming**: If your custom LLM server does not support streaming completions, it may cause issues. Vapi expects the server to support streaming completions [(2)]() . Ensure your server configuration can handle streaming or adjust your integration accordingly. 3. **Error Handling**: If the assistant stops working after switching to custom LLM, ensure that you have executed a PATCH request to the
/assistant
endpoint and set
model.urlRequestMetadataEnabled
to false [(3)]() , [(4)]() . Please verify these points in your implementation to ensure the responses are correctly formatted and processed by Vapi.
s
Are you using a fine tuned model?
u
no, my api is a conversational bot that consults chatGPT in the middle and generates a response according to states and internal functions
s
Can you bring this up in tomorrow's Vapi Office Hour?
u
I'm sorry, I won't be able to do it because I don't speak English.
v
Got it. Tagging @User for additional help.
s
can you show some network logs about what is the payload from vapi and how do you handle the payload related to it?
a
We tried two different ways. One with a status response (201) and another with a streaming response. We also try to send you a pre-designed response just for testing. https://cdn.discordapp.com/attachments/1239956548894326885/1240331840011632640/image.png?ex=66462c8a&is=6644db0a&hm=f35331f3fc1ff7ea1e60003a29a86750c285e44a541fdff695c93ec6b816d0df&
s
@Agus Can you refer to this codebase and use the endpoint "/openai-advanced/chat/completions" https://github.com/VapiAI/server-side-example-python-flask/blob/main/app/api/custom_llm.py
a
@User Sure, I review it and try to replicate it. I will be informing you of news! for now thank you.
s
Okay.
u
s
call_id pls and did you added urlRequestMetadataEnabled:False?
u
call_id: 333a5b05-0a74-488c-b3c7-f96e5d464cb7 Where do I add the urlRequestMetadataEnabled?
s
You need to send an API request to disable it. https://docs.vapi.ai/api-reference/assistants/update-assistant
The issue is related to that extra Metadata
s
Seems correct to me.
u
That's how I had it when I did this
do I need to do anything else?
s
can you send me the request payload and output response once?
s
const express = require('express'); const app = express(); app.use(express.json()); app.post('/chat/completions', async (req, res) => { res.setHeader('Content-Type', 'text/event-stream'); res.setHeader('Cache-Control', 'no-cache'); res.setHeader('Connection', 'keep-alive'); const dataChunks = [ { id: 'chatcmpl-8c78110d-a5cf-4585-8619-c1f59b714a70', object: 'chat.completion.chunk', created: 1713300428, model: 'gpt-4-1106-preview', system_fingerprint: 'fp_5c95a4634e', choices: [ { index: 0, delta: { content: 'Let me think. ' }, logprobs: null, finish_reason: null, }, ], }, { id: 'chatcmpl-5ac71a11-2092-426d-8da3-9140806668aa', object: 'chat.completion.chunk', created: 1713300431, model: 'gpt-4-1106-preview', system_fingerprint: 'fp_5c95a4634e', choices: [ { index: 0, delta: { content: 'still thinking. ' }, logprobs: null, finish_reason: null, }, ], }, { id: 'chatcmpl-753f25e2-539e-44d4-8d96-5996409d51d2', object: 'chat.completion.chunk', created: 1713300434, model: 'gpt-4-1106-preview', system_fingerprint: 'fp_5c95a4634e', choices: [ { index: 0, delta: { content: 'still thinking. ' }, logprobs: null, finish_reason: null, }, ], }, ]; for (let index = 0; index < dataChunks.length; index++) { res.write(
data: ${JSON.stringify(dataChunks[index])}\n\n
); await new Promise((resolve) => setTimeout(resolve, 5000)); } res.write('data: [DONE]\n\n'); res.end(); req.on('close', () => { res.end(); }); }); app.post('/', (req, res) => { console.dir(req.body, { depth: null }); res.json({}); }); app.listen(3030, () => { console.log('Server is running on port 3030'); });
u
My bad, if that code worked, thanks!
s
Awesome.
a
Thanks @Sahil !
3 Views