Issue with using custom llm Vapi AI #support

Issue with using custom llm

Stakh

05/16/2024, 9:25 AM

Hi there, I'm testing the possibility to use the custom-llm option and tried to reproduce the example from your doc: https://docs.vapi.ai/custom-llm-guide I have a local flask app:

Copy code

import time
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
  request_data = request.get_json()
  response = {
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "object": "chat.completion",
    "created": int(time.time()),
    "model": "gpt-3.5-turbo-0613",
    "system_fingerprint": None,
    "choices": [
      {
        "index": 0,
        "delta": {"content": "This is some test content"},
        "logprobs": None,
        "finish_reason": "stop"
      }
    ]
  }
  return jsonify(response), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Accessible with ngrok (https://cead-37-168-11-222.ngrok-free.app) Testing the endpoint with with Postman works (see image attached), I'm getting the expected response:

Copy code

{
  "choices": [
    {
      "delta": {
        "content": "This is some test content"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "created": 1715851284,
  "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "system_fingerprint": null
}

But when experimenting inside Vapi, I get no answer from the agent. Am I doing something wrong? Also why is the message's content stored in

completion.choices[0].delta.content

in your examples, contrary to OpenAI's API endpoint which stores it in

completion.choices[0].message.content

. Thank you for your answer. https://cdn.discordapp.com/attachments/1240596039006687232/1240596039396888586/image.png?ex=66472298&is=6645d118&hm=e30750e603436253a1f622212a7d67bb1b41d12255dbd759cbe64f0b0620153c&

Vapi Support

05/16/2024, 9:25 AM

Hi @Stakh, I'm an AI assistant for Vapi. I'll analyze available sources and get back to you shortly.

Vapi Support

05/16/2024, 9:25 AM

----------------------------- **While I generate an answer, here are some sources that might be helpful**: * [Connecting Your Custom LLM to Vapi: A Comprehensive Guide]() * [Create Assistant]() * [Server Events]() * [Update Assistant]() -----------------------------

Vapi Support

05/16/2024, 9:26 AM

### Answer To address your issue with using a custom LLM with Vapi, let's go through a few points based on the provided information and the cited Vapi documentation: 1. **Response Format**: The response structure you're using in your Flask app sends the content in

choices[0].delta.content

, but Vapi expects it in

choices[0].message.content

. You should adjust your response format in your Flask app to match this expectation. 2. **Flask App Code Adjustment**: Update your

basic_custom_llm_route

function to return the content in the correct format:

Copy code

python
    import time
    from flask import Flask, request, jsonify

    app = Flask(__name__)

    @app.route('/chat/completions', methods=['POST'])
    def basic_custom_llm_route():
      request_data = request.get_json()
      response = {
        "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": "gpt-3.5-turbo-0613",
        "choices": [
          {
            "index": 0,
            "message": {"content": "This is some test content"},
            "logprobs": None,
            "finish_reason": "stop"
          }
        ]
      }
      return jsonify(response), 201

    if __name__ == "__main__":
        app.run(debug=True, port=5000)

3. **Ensure

model.urlRequestMetadataEnabled

is Set**: Make sure to execute a PATCH request to the

/assistant

endpoint and ensure that

model.urlRequestMetadataEnabled

is set to

false

[(1)]() :

Copy code

json
    {
      "model": {
        "urlRequestMetadataEnabled": false
      }
    }

Vapi Support

05/16/2024, 9:26 AM

By making these changes, your custom LLM should respond correctly within Vapi. Ensure your Flask app is correctly formatted and check the settings within Vapi's dashboard as well [(2)]() , [(3)]() , [(1)]() . **SOURCES USED**: [(2)]() , [(3)]() , [(1)]()

Stakh

05/16/2024, 9:35 AM

I changed the "delta" key to "message" in the code but this does not solve the issue. The suggestion about the PATCH request is not relevant in my case I believe, since I'm not using a Fine-tuned OpenAI model.

Vapi Support

05/16/2024, 9:35 AM

Got it. Tagging @User for additional help.

Stakh

05/16/2024, 9:41 AM

I can confirm that my local endpoint is receiving a request from Vapi. So I guess the issue might be in receiving my endpoint's response, either because it is incorrectly formatted or for another reason.

Stakh

05/16/2024, 9:43 AM

Here is the beginning of the request I receive from Vapi:

Copy code

'model': 'gpt-3.5-turbo',
    'messages': [{
        'role': 'system',
        'content': 'This is a test agent.'
    }, {
        'role': 'assistant',
        'content': 'Hi. This is George How can I help?'
    }, {
        'role': 'user',
        'content': 'This is a test message.'
    }],
    'temperature': 0.7,
    'stream': True,
    'max_tokens': 250,
    'call': {
        'type': 'webCall',
        'callId': 'xxxxxxxxxxxxxxxxxx',
        'orgId': 'xxxxxxxxxxxxxxx',
        'transcriber': {
            'provider': 'deepgram',
            'model': 'nova-2',
            'keywords': [],
            'language': 'en',
            'smartFormat': False
        },
        'model': {
            'provider': 'custom-llm',
            'url': 'https://cead-37-168-11-222.ngrok-free.app',
            'urlRequestMetadataEnabled': True,
            'model': 'gpt-3.5-turbo',
            'temperature': 0.7,

I see it has

call.model.urlRequestMetadataEnabled=True

Sahil

05/16/2024, 10:04 AM

try the openai-advanced/chat endpoint which was mentioned in the documentation and make sure to disable the urlRequestMetadata as false otherwise you need to process the content before sending to your llm model

Stakh

05/16/2024, 10:07 AM

Hi Sahil, Thanks a lot for your answer 🙂 Do you mean this endpoint from your github? https://github.com/VapiAI/server-side-example-python-flask/blob/64fb07c880bec2844a26361eaad6361fb5decdce/app/api/custom_llm.py#L68

Stakh

05/16/2024, 10:15 AM

If this is the right endpoint to try, I get the following error on the server side:

TypeError: Completions.create() got an unexpected keyword argument 'call'

Stakh

05/16/2024, 10:16 AM

(line 86, in openai_advanced_custom_llm_route chat_completion_stream = client.chat.completions.create(**request_data))

Sahil

05/16/2024, 10:20 AM

It is because urlRequestMetadataEnabled=True just set it to false it will not send extra call parameters

Stakh

05/16/2024, 10:25 AM

True. Removing them with :

Copy code

del request_data['call']
  del request_data['metadata']

works. Thank you.

Stakh

05/16/2024, 10:25 AM

However, the custom API I would like to interact with does not stream the information. Does this mean I should emulate this streaming process?

Stakh

05/16/2024, 10:29 AM

This might be a stupid question but where do I set urlRequestMetadataEnabled=False?

Sahil

05/16/2024, 10:31 AM

through API inside the customLLM part.

Stakh

05/16/2024, 10:36 AM

This one? https://docs.vapi.ai/api-reference/assistants/update-assistant

Sahil

05/16/2024, 10:36 AM

yeah.

Stakh

05/16/2024, 10:41 AM

So, my takeaway from going through this example is that a single response with the whole answer will not work, I have to stream the response. Could you confirm?

Stakh

05/16/2024, 10:45 AM

Thanks a lot for your support in any case @Sahil, very helpful and much appreciated ! 🙏

Sahil

05/16/2024, 10:48 AM

yup

5 Views

Previous Next