Stakh
05/16/2024, 9:25 AMimport time
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
request_data = request.get_json()
response = {
"id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
"object": "chat.completion",
"created": int(time.time()),
"model": "gpt-3.5-turbo-0613",
"system_fingerprint": None,
"choices": [
{
"index": 0,
"delta": {"content": "This is some test content"},
"logprobs": None,
"finish_reason": "stop"
}
]
}
return jsonify(response), 201
if __name__ == "__main__":
app.run(debug=True, port=5000)
Accessible with ngrok (https://cead-37-168-11-222.ngrok-free.app)
Testing the endpoint with with Postman works (see image attached), I'm getting the expected response:
{
"choices": [
{
"delta": {
"content": "This is some test content"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"created": 1715851284,
"id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
"model": "gpt-3.5-turbo-0613",
"object": "chat.completion",
"system_fingerprint": null
}
But when experimenting inside Vapi, I get no answer from the agent.
Am I doing something wrong?
Also why is the message's content stored in completion.choices[0].delta.content
in your examples, contrary to OpenAI's API endpoint which stores it in completion.choices[0].message.content
.
Thank you for your answer.
https://cdn.discordapp.com/attachments/1240596039006687232/1240596039396888586/image.png?ex=66472298&is=6645d118&hm=e30750e603436253a1f622212a7d67bb1b41d12255dbd759cbe64f0b0620153c&Vapi Support
05/16/2024, 9:25 AMVapi Support
05/16/2024, 9:26 AMchoices[0].delta.content
, but Vapi expects it in choices[0].message.content
. You should adjust your response format in your Flask app to match this expectation.
2. **Flask App Code Adjustment**:
Update your basic_custom_llm_route
function to return the content in the correct format:
python
import time
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
request_data = request.get_json()
response = {
"id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
"object": "chat.completion",
"created": int(time.time()),
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {"content": "This is some test content"},
"logprobs": None,
"finish_reason": "stop"
}
]
}
return jsonify(response), 201
if __name__ == "__main__":
app.run(debug=True, port=5000)
3. **Ensure model.urlRequestMetadataEnabled
is Set**:
Make sure to execute a PATCH request to the /assistant
endpoint and ensure that model.urlRequestMetadataEnabled
is set to false
[(1)]() :
json
{
"model": {
"urlRequestMetadataEnabled": false
}
}
Stakh
05/16/2024, 9:35 AMVapi Support
05/16/2024, 9:35 AMStakh
05/16/2024, 9:41 AMStakh
05/16/2024, 9:43 AM'model': 'gpt-3.5-turbo',
'messages': [{
'role': 'system',
'content': 'This is a test agent.'
}, {
'role': 'assistant',
'content': 'Hi. This is George How can I help?'
}, {
'role': 'user',
'content': 'This is a test message.'
}],
'temperature': 0.7,
'stream': True,
'max_tokens': 250,
'call': {
'type': 'webCall',
'callId': 'xxxxxxxxxxxxxxxxxx',
'orgId': 'xxxxxxxxxxxxxxx',
'transcriber': {
'provider': 'deepgram',
'model': 'nova-2',
'keywords': [],
'language': 'en',
'smartFormat': False
},
'model': {
'provider': 'custom-llm',
'url': 'https://cead-37-168-11-222.ngrok-free.app',
'urlRequestMetadataEnabled': True,
'model': 'gpt-3.5-turbo',
'temperature': 0.7,
I see it has call.model.urlRequestMetadataEnabled=True
Sahil
05/16/2024, 10:04 AMStakh
05/16/2024, 10:07 AMStakh
05/16/2024, 10:15 AMTypeError: Completions.create() got an unexpected keyword argument 'call'
Stakh
05/16/2024, 10:16 AMSahil
05/16/2024, 10:20 AMStakh
05/16/2024, 10:25 AMdel request_data['call']
del request_data['metadata']
works. Thank you.Stakh
05/16/2024, 10:25 AMStakh
05/16/2024, 10:29 AMSahil
05/16/2024, 10:31 AMStakh
05/16/2024, 10:36 AMSahil
05/16/2024, 10:36 AMStakh
05/16/2024, 10:41 AMStakh
05/16/2024, 10:45 AMSahil
05/16/2024, 10:48 AM