Issue with using custom llm
# support
s
Hi there, I'm testing the possibility to use the custom-llm option and tried to reproduce the example from your doc: https://docs.vapi.ai/custom-llm-guide I have a local flask app:
Copy code
import time
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
  request_data = request.get_json()
  response = {
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "object": "chat.completion",
    "created": int(time.time()),
    "model": "gpt-3.5-turbo-0613",
    "system_fingerprint": None,
    "choices": [
      {
        "index": 0,
        "delta": {"content": "This is some test content"},
        "logprobs": None,
        "finish_reason": "stop"
      }
    ]
  }
  return jsonify(response), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)
Accessible with ngrok (https://cead-37-168-11-222.ngrok-free.app) Testing the endpoint with with Postman works (see image attached), I'm getting the expected response:
Copy code
{
  "choices": [
    {
      "delta": {
        "content": "This is some test content"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "created": 1715851284,
  "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "system_fingerprint": null
}
But when experimenting inside Vapi, I get no answer from the agent. Am I doing something wrong? Also why is the message's content stored in
completion.choices[0].delta.content
in your examples, contrary to OpenAI's API endpoint which stores it in
completion.choices[0].message.content
. Thank you for your answer. https://cdn.discordapp.com/attachments/1240596039006687232/1240596039396888586/image.png?ex=66472298&is=6645d118&hm=e30750e603436253a1f622212a7d67bb1b41d12255dbd759cbe64f0b0620153c&
v
Hi @Stakh, I'm an AI assistant for Vapi. I'll analyze available sources and get back to you shortly.
----------------------------- **While I generate an answer, here are some sources that might be helpful**: * [Connecting Your Custom LLM to Vapi: A Comprehensive Guide]() * [Create Assistant]() * [Server Events]() * [Update Assistant]() -----------------------------
### Answer To address your issue with using a custom LLM with Vapi, let's go through a few points based on the provided information and the cited Vapi documentation: 1. **Response Format**: The response structure you're using in your Flask app sends the content in
choices[0].delta.content
, but Vapi expects it in
choices[0].message.content
. You should adjust your response format in your Flask app to match this expectation. 2. **Flask App Code Adjustment**: Update your
basic_custom_llm_route
function to return the content in the correct format:
Copy code
python
    import time
    from flask import Flask, request, jsonify

    app = Flask(__name__)

    @app.route('/chat/completions', methods=['POST'])
    def basic_custom_llm_route():
      request_data = request.get_json()
      response = {
        "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": "gpt-3.5-turbo-0613",
        "choices": [
          {
            "index": 0,
            "message": {"content": "This is some test content"},
            "logprobs": None,
            "finish_reason": "stop"
          }
        ]
      }
      return jsonify(response), 201

    if __name__ == "__main__":
        app.run(debug=True, port=5000)
3. **Ensure
model.urlRequestMetadataEnabled
is Set**: Make sure to execute a PATCH request to the
/assistant
endpoint and ensure that
model.urlRequestMetadataEnabled
is set to
false
[(1)]() :
Copy code
json
    {
      "model": {
        "urlRequestMetadataEnabled": false
      }
    }
By making these changes, your custom LLM should respond correctly within Vapi. Ensure your Flask app is correctly formatted and check the settings within Vapi's dashboard as well [(2)]() , [(3)]() , [(1)]() . **SOURCES USED**: [(2)]() , [(3)]() , [(1)]()
s
I changed the "delta" key to "message" in the code but this does not solve the issue. The suggestion about the PATCH request is not relevant in my case I believe, since I'm not using a Fine-tuned OpenAI model.
v
Got it. Tagging @User for additional help.
s
I can confirm that my local endpoint is receiving a request from Vapi. So I guess the issue might be in receiving my endpoint's response, either because it is incorrectly formatted or for another reason.
Here is the beginning of the request I receive from Vapi:
Copy code
'model': 'gpt-3.5-turbo',
    'messages': [{
        'role': 'system',
        'content': 'This is a test agent.'
    }, {
        'role': 'assistant',
        'content': 'Hi. This is George How can I help?'
    }, {
        'role': 'user',
        'content': 'This is a test message.'
    }],
    'temperature': 0.7,
    'stream': True,
    'max_tokens': 250,
    'call': {
        'type': 'webCall',
        'callId': 'xxxxxxxxxxxxxxxxxx',
        'orgId': 'xxxxxxxxxxxxxxx',
        'transcriber': {
            'provider': 'deepgram',
            'model': 'nova-2',
            'keywords': [],
            'language': 'en',
            'smartFormat': False
        },
        'model': {
            'provider': 'custom-llm',
            'url': 'https://cead-37-168-11-222.ngrok-free.app',
            'urlRequestMetadataEnabled': True,
            'model': 'gpt-3.5-turbo',
            'temperature': 0.7,
I see it has
call.model.urlRequestMetadataEnabled=True
s
try the openai-advanced/chat endpoint which was mentioned in the documentation and make sure to disable the urlRequestMetadata as false otherwise you need to process the content before sending to your llm model
s
Hi Sahil, Thanks a lot for your answer 🙂 Do you mean this endpoint from your github? https://github.com/VapiAI/server-side-example-python-flask/blob/64fb07c880bec2844a26361eaad6361fb5decdce/app/api/custom_llm.py#L68
If this is the right endpoint to try, I get the following error on the server side:
TypeError: Completions.create() got an unexpected keyword argument 'call'
(line 86, in openai_advanced_custom_llm_route chat_completion_stream = client.chat.completions.create(**request_data))
s
It is because urlRequestMetadataEnabled=True just set it to false it will not send extra call parameters
s
True. Removing them with :
Copy code
del request_data['call']
  del request_data['metadata']
works. Thank you.
However, the custom API I would like to interact with does not stream the information. Does this mean I should emulate this streaming process?
This might be a stupid question but where do I set urlRequestMetadataEnabled=False?
s
through API inside the customLLM part.
s
yeah.
s
So, my takeaway from going through this example is that a single response with the whole answer will not work, I have to stream the response. Could you confirm?
Thanks a lot for your support in any case @Sahil, very helpful and much appreciated ! 🙏
s
yup
5 Views