pipeline-error-openai-403-model-access-denied , I ...
# support
i
CallID: 5fc36ac3-c36c-4f4d-b7a4-093fc391cbc6 My Assistant uses a bring your own LLM key (which I'm guessing has a different code path then use VAPIs LLM account). My random guess, incase it's helpful: * I'm guessing the issue here is a too many tokens, but I suspect it's being calculated wrong? Stuff that will help other developers hitting this; 1. Listing the reason for the failure on the 403 maybe in a details feild. 2. Listing the token used in the last LLM message somewhere, even in cost calculation could be helpful (currently it's only sum)
v
@IgorD Thank you for reaching out with your concerns and suggestions regarding the use of your own LLM key with VAPI. It's great to see you're actively thinking about potential causes and solutions. Regarding the issue you're experiencing, it does sound like it could be related to token limits, but without specific details, it's hard to confirm. Your suggestions for improving error transparency are very valuable. Providing a detailed reason for a 403 error and the token count for the last LLM message could indeed help developers diagnose and resolve issues more efficiently. For now, to address the problem you're facing: 1. Please ensure that your LLM key is correctly configured and has not exceeded any usage limits set by the LLM provider. 2. Check if the requests sent to the LLM are within the token limits. If you suspect that the token count is being calculated incorrectly, it would be helpful to review the request payload to confirm the token count. If you continue to experience issues or need further assistance, please provide more details or reach out to support for a deeper investigation into your specific case.
You can continue this conversation by mentioning me in the message.
s
@IgorD it's because your key doesn't has access to the model. 🔵 13:12:37:690 Couldn't Complete Model Sequence. Request Failed And Sequence Empty (provider: openai, model: gpt-4-turbo-preview, region: undefined, credential: true) (Project
proj_ots____________Sw
does not have access to model
gpt-4-turbo-preview
)
i
@Shubham Bajaj - No it has access. I have lots of calls that work, and then drop ... Latest being "id": "eb312c7e-7396-4fb8-991e-e3962b08ec76", Also model is:
Copy code
json
    "model": {
      "model": "gpt-4o-2024-11-20",
      "tools": [
Sorry that one worked, an example call that dropped after being used for a while is: { "id": "7cc2a848-6d8e-465a-81d4-f9ca97b50135",
s
@User
logs
🔵 23:26:50:398 [user LOG] Model request started (attempt #1, gpt-4o-2024-11-20, openai) 🔵 23:27:01:378 Completion Request Starting (#1, provider: openai, model: gpt-4o-2024-11-20, region: undefined, credential: true)... 🔵 23:27:03:968 Model Sequence Request Taking Too Long (attempt #1). 🔵 23:27:04:063 [user WARN] Model request failed (attempt #2, reason: (Project
proj_otsWtnL8lgNPZeZGJLw6kQSw
does not have access to model
gpt-4o-2024-08-06
)) 🔵 23:27:04:123 [user WARN] Model request failed (attempt #3, reason: (Project
proj_otsWtnL8lgNPZeZGJLw6kQSw
does not have access to model
gpt-4o-2024-05-13
)) 🔵 23:27:04:409 [user WARN] Model request failed (attempt #8, reason: (Project
proj_otsWtnL8lgNPZeZGJLw6kQSw
does not have access to model
gpt-4-1106-preview
)) 🔵 23:27:04:410 Couldn't Complete Model Sequence. Request Failed And Sequence Empty (provider: openai, model: gpt-4-1106-preview, region: undefined, credential: true) (Project
proj_otsWtnL8lgNPZeZGJLw6kQSw
does not have access to model
gpt-4-1106-preview
) 🔵 23:27:04:410 Live Call Manager Ending...(pipeline-error-openai-403-model-access-denied) The issue is
gpt-4o-2024-11-20
model took time to respond during the call and when tried for other models, your key doesn't has access for them so it starts to failing.
What's happening as messages length gets increased openai takes time to respond and we wait for around 1.5seconds and then switch to another model to maintain the response time and your key doesn't access to other models so then it starts to failling from that point of time during the call.
i
Switching models sounds very wrong. Isn't the right answer to extend the timeout? Do you have this behavior when you don't bring your own key?
@Shubham Bajaj - SUPER INSIGHTFUL, thank you!
s
@IgorD no it's not right to extend the timeout because it will increase the response time. Yes we have the same behaviour when you don't bring your own key.
@IgorD Could you please check your key logs to identify what is causing the delay in request responses? This would help with debugging and diagnosis.
i
It doesn’t increase the response time. It is the response time of the provider. That’s nothing can be done about that. The reason it’s the wrong behavior is things like pricing and performance are a function of the model being used. if you’re switching models from what the user is asking for that’s a bug. Arguably it’s a feature to say we can hunt for different models, but it’s not a feature I would use. Could you make this behavior optional or the timeout user specified? @Shubham Bajaj
a humorous analogy imagine user requesting a hamburger and you decide because hamburgers are slow to make you’ll give them a grilled cheese sandwich instead without asking.
s
This analogy doesn't apply here. I will check with the team about what can be done regarding the timeout, While we don't have this functionality available right now. Please allow me some time to investigate this further.
i
@Shubham Bajaj , one other thought here, if the issue is transient latency spike (which cycling through the other models implicity addresses), then I suspect a retry on the current model would work, and be a better alternative then timeout.
(Though both retry and timeout are standard knobs on most downstream API calls)
And another example why keeping the same model is important is caching. By having caching I have a very large price reduction on my tokens:
s
@IgorD Can you set fallbackModels as an empty array manually? Currently there is no way to manually set a large timeout. We will be working on adding the ability to set a large timeout for the last fallback option.
i
Do do you have an example of the JSON I need to send back to support that.
By the way, I also think retry before longer time out is a good idea. Almost like exponential back off on timeout to get past transient error.
@Shubham Bajaj
s
@IgorD It's on the list and will be considered in the next pipeline changes.
example-for-your-support
Copy code
json
  "model": {
    "provider": "openai",
    "fallbackModels": [],
    "model": "gpt-4o"
  }
b
I believe this issue might be related to https://discord.com/channels/1211482211119796234/1329809513812987958 -> I noticed by default it falls back to different regions of same model by default, indeed would be useful to be able to instead extend the timeout, or at least make sure that if both the original request and fallback provide a response, that we only consider one of the responses for next action (not all of them)
s
@Ben I will look into your ticket in some time. > indeed would be useful to be able to instead extend the timeout, We cannot extend the timeout for model requests due to turn/response time constraints. Currently, you can set fallbackModels as an empty array manually, and we implement a large timeout for the last option which is either the default or first model. > at least make sure that if both the original request and fallback provide a response, that we only consider one of the responses for next action (not all of them) If the current model or fallback model fails to respond within the specified timeframe, we must switch to another model. We cannot continue waiting for previous requests, as this would effectively extend the timeout until a response is received, resulting in long periods of silence in your calls. In the future, we may implement extended timeouts and fallback messages for delayed request-responses to keep users engaged and maintain perceived performance. > noticed by default it falls back to different regions of same model by default We switch to different regions to optimize response times, as you have noted. cc: @IgorD
b
> We cannot extend the timeout for model requests due to turn/response time constraints. C Thanks, are these technical constraints from the STT/TTS providers themselves or more of a UX constaint to ensure a good user experience? I would have though VAPI would have control of this, as the LLM would be agnostic as to how long between turns is allocated. Maybe some STT/TTS providers expect input within a certain timeframe before they timeout when streaming data? I appreciate of course the need in terms of UX to ensure quick response times - that said I think silence is preferable to unpredictable behaviour for most production use cases (there is already idle messages feature in VAPI to ensure the long silences are never completely silent after all). It seems what I observed on that related ticket might be that we fallback to different regions, and then get the response from all the regions, and so end up with unpredictable behaviour where the model responds to both of the calls (resulting in duplication of responses) - at least that's my theory.
s
@Ben > Thanks, are these technical constraints from the STT/TTS providers themselves or more of a UX constaint to ensure a good user experience? I would have though VAPI would have control of this, as the LLM would be agnostic as to how long between turns is allocated. Maybe some STT/TTS providers expect input within a certain timeframe before they timeout when streaming data? First it's because of UX and second because of service providers. > I appreciate of course the need in terms of UX to ensure quick response times - that said I think silence is preferable to unpredictable behaviour for most production use cases (there is already idle messages feature in VAPI to ensure the long silences are never completely silent after all). Just set the fallbackModels to empty array. > It seems what I observed on that related ticket might be that we fallback to different regions, and then get the response from all the regions, and so end up with unpredictable behaviour where the model responds to both of the calls (resulting in duplication of responses) - at least that's my theory Have to look into your ticket, and logs couldn't get chance.
b
I've set fallback provider to empty array for now
Seems it's still possible to have fallbacks even with fallback models set to empty array e.g. call id 64cb327e-1011-40e8-aec7-d65679ef2010
Copy code
"model": "gpt-3.5-turbo",
        "fallback_models": [],
Copy code
17:07:33:000
[LOG]
Model request started (attempt #1, gpt-35-turbo-0125, azure-openai, southcentralus)

17:07:33:528
[LOG]
Model request started (attempt #2, gpt-3.5-turbo-1106, openai)

17:07:33:528
[WARN]
Model request taking too long (attempt #1, started 527.36ms ago)

17:07:33:712
[LOG]
Model request cost (attempt #2, $0, 0 prompt, 0 completion)
i
@Ben - Sorry where are these logs from?
OK, I have an idea for a hack to enable retries. Could we just allow the same model to be listed in teh fallback list? E.g. remove this if check: failure_reason: Invalid Assistant. Errors: [ "assistant.model.All fallbackModels's elements must be unique" ]
@Shubham Bajaj
s
@IgorD Sorry, we cannot remove this check as it is required for the process.
b
Those logs were copied from the Dashboard under the "Logs" for a specific call id with fall back models configured as an empty array
i
Thanks @Ben , I didn't even ever notice that UI. Doh. BTW! For me, I'm using gpt-4o, which has a natural alias to gpt-4o-2024-11-20, hopefully that will reduce my spurious dropped callls.
s
@Ben @IgorD Regarding fallback models: if fallback models are set to an empty array, then we will look for the current model's replicate and get all models of the same type. This is a feature we discussed that needs to be implemented from our side.
i
Hey, is there an ETA for this?
@Shubham Bajaj
q
Are you able to change fallback models to different providers?
@Shubham Bajaj
s
Added to the feature requests.
i
Hello, any ETA on this? @Shubham Bajaj
s
Already added to the feature request, most likely will be part of this quarter.
i
This is still happening all the time. for me, do we know the month?
Hey @Shubham Bajaj This is still broken, is there an ETA here?
k
Hi, I was out of office for a few days and was unable to respond to your message earlier. I apologize for any inconvenience this may have caused. 
You can raise an feature request for it over here: https://roadmap.vapi.ai
Once you raise the feature request, it will be visible to the team. Eventually based on upvotes and team capacity, we will pick it.
k
@IgorD
4 Views