pipeline-error-openai-403-model-access-denied , I think because of incorrect max token limit Vapi AI #support

pipeline-error-openai-403-model-access-denied , I ...

IgorD

12/30/2024, 1:56 PM

CallID: 5fc36ac3-c36c-4f4d-b7a4-093fc391cbc6 My Assistant uses a bring your own LLM key (which I'm guessing has a different code path then use VAPIs LLM account). My random guess, incase it's helpful: * I'm guessing the issue here is a too many tokens, but I suspect it's being calculated wrong? Stuff that will help other developers hitting this; 1. Listing the reason for the failure on the 403 maybe in a details feild. 2. Listing the token used in the last LLM message somewhere, even in cost calculation could be helpful (currently it's only sum)

Vapi

12/30/2024, 1:58 PM

@IgorD Thank you for reaching out with your concerns and suggestions regarding the use of your own LLM key with VAPI. It's great to see you're actively thinking about potential causes and solutions. Regarding the issue you're experiencing, it does sound like it could be related to token limits, but without specific details, it's hard to confirm. Your suggestions for improving error transparency are very valuable. Providing a detailed reason for a 403 error and the token count for the last LLM message could indeed help developers diagnose and resolve issues more efficiently. For now, to address the problem you're facing: 1. Please ensure that your LLM key is correctly configured and has not exceeded any usage limits set by the LLM provider. 2. Check if the requests sent to the LLM are within the token limits. If you suspect that the token count is being calculated incorrectly, it would be helpful to review the request payload to confirm the token count. If you continue to experience issues or need further assistance, please provide more details or reach out to support for a deeper investigation into your specific case.

Vapi

12/30/2024, 1:58 PM

You can continue this conversation by mentioning me in the message.

Shubham Bajaj

01/02/2025, 9:06 AM

@IgorD it's because your key doesn't has access to the model. 🔵 13:12:37:690 Couldn't Complete Model Sequence. Request Failed And Sequence Empty (provider: openai, model: gpt-4-turbo-preview, region: undefined, credential: true) (Project

proj_ots____________Sw

does not have access to model

gpt-4-turbo-preview

)

IgorD

01/03/2025, 3:34 PM

@Shubham Bajaj - No it has access. I have lots of calls that work, and then drop ... Latest being "id": "eb312c7e-7396-4fb8-991e-e3962b08ec76", Also model is:

Copy code

json
    "model": {
      "model": "gpt-4o-2024-11-20",
      "tools": [

IgorD

01/03/2025, 3:35 PM

Sorry that one worked, an example call that dropped after being used for a while is: { "id": "7cc2a848-6d8e-465a-81d4-f9ca97b50135",

Shubham Bajaj

01/03/2025, 5:56 PM

@User

logs

🔵 23:26:50:398 [user LOG] Model request started (attempt #1, gpt-4o-2024-11-20, openai) 🔵 23:27:01:378 Completion Request Starting (#1, provider: openai, model: gpt-4o-2024-11-20, region: undefined, credential: true)... 🔵 23:27:03:968 Model Sequence Request Taking Too Long (attempt #1). 🔵 23:27:04:063 [user WARN] Model request failed (attempt #2, reason: (Project

proj_otsWtnL8lgNPZeZGJLw6kQSw

does not have access to model

gpt-4o-2024-08-06

)) 🔵 23:27:04:123 [user WARN] Model request failed (attempt #3, reason: (Project

proj_otsWtnL8lgNPZeZGJLw6kQSw

does not have access to model

gpt-4o-2024-05-13

)) 🔵 23:27:04:409 [user WARN] Model request failed (attempt #8, reason: (Project

proj_otsWtnL8lgNPZeZGJLw6kQSw

does not have access to model

gpt-4-1106-preview

)) 🔵 23:27:04:410 Couldn't Complete Model Sequence. Request Failed And Sequence Empty (provider: openai, model: gpt-4-1106-preview, region: undefined, credential: true) (Project

proj_otsWtnL8lgNPZeZGJLw6kQSw

does not have access to model

gpt-4-1106-preview

) 🔵 23:27:04:410 Live Call Manager Ending...(pipeline-error-openai-403-model-access-denied) The issue is

gpt-4o-2024-11-20

model took time to respond during the call and when tried for other models, your key doesn't has access for them so it starts to failing.

Shubham Bajaj

01/03/2025, 5:58 PM

What's happening as messages length gets increased openai takes time to respond and we wait for around 1.5seconds and then switch to another model to maintain the response time and your key doesn't access to other models so then it starts to failling from that point of time during the call.

IgorD

01/04/2025, 2:16 PM

Switching models sounds very wrong. Isn't the right answer to extend the timeout? Do you have this behavior when you don't bring your own key?

IgorD

01/04/2025, 2:16 PM

@Shubham Bajaj - SUPER INSIGHTFUL, thank you!

Shubham Bajaj

01/07/2025, 9:06 AM

@IgorD no it's not right to extend the timeout because it will increase the response time. Yes we have the same behaviour when you don't bring your own key.

Shubham Bajaj

01/07/2025, 9:07 AM

@IgorD Could you please check your key logs to identify what is causing the delay in request responses? This would help with debugging and diagnosis.

IgorD

01/09/2025, 2:10 AM

It doesn’t increase the response time. It is the response time of the provider. That’s nothing can be done about that. The reason it’s the wrong behavior is things like pricing and performance are a function of the model being used. if you’re switching models from what the user is asking for that’s a bug. Arguably it’s a feature to say we can hunt for different models, but it’s not a feature I would use. Could you make this behavior optional or the timeout user specified? @Shubham Bajaj

IgorD

01/09/2025, 2:13 AM

a humorous analogy imagine user requesting a hamburger and you decide because hamburgers are slow to make you’ll give them a grilled cheese sandwich instead without asking.

Shubham Bajaj

01/09/2025, 11:44 AM

This analogy doesn't apply here. I will check with the team about what can be done regarding the timeout, While we don't have this functionality available right now. Please allow me some time to investigate this further.

IgorD

01/11/2025, 7:56 PM

@Shubham Bajaj , one other thought here, if the issue is transient latency spike (which cycling through the other models implicity addresses), then I suspect a retry on the current model would work, and be a better alternative then timeout.

IgorD

01/11/2025, 7:57 PM

(Though both retry and timeout are standard knobs on most downstream API calls)

IgorD

01/11/2025, 7:59 PM

And another example why keeping the same model is important is caching. By having caching I have a very large price reduction on my tokens:

IgorD

01/11/2025, 8:04 PM

https://cdn.discordapp.com/attachments/1323288623629471866/1327729736717308068/image.png?ex=6784202d&is=6782cead&hm=c08a54509cba1ffd0617926d24a96ee0425b7e78923d6982db304d2681a447d6&

Shubham Bajaj

01/15/2025, 1:47 PM

@IgorD Can you set fallbackModels as an empty array manually? Currently there is no way to manually set a large timeout. We will be working on adding the ability to set a large timeout for the last fallback option.

IgorD

01/16/2025, 2:21 AM

Do do you have an example of the JSON I need to send back to support that.

IgorD

01/16/2025, 2:22 AM

By the way, I also think retry before longer time out is a good idea. Almost like exponential back off on timeout to get past transient error.

IgorD

01/16/2025, 2:22 AM

@Shubham Bajaj

Shubham Bajaj

01/17/2025, 1:18 PM

@IgorD It's on the list and will be considered in the next pipeline changes.

example-for-your-support

Copy code

json
  "model": {
    "provider": "openai",
    "fallbackModels": [],
    "model": "gpt-4o"
  }

Ben

01/17/2025, 1:50 PM

I believe this issue might be related to https://discord.com/channels/1211482211119796234/1329809513812987958 -> I noticed by default it falls back to different regions of same model by default, indeed would be useful to be able to instead extend the timeout, or at least make sure that if both the original request and fallback provide a response, that we only consider one of the responses for next action (not all of them)

Shubham Bajaj

01/17/2025, 2:17 PM

@Ben I will look into your ticket in some time. > indeed would be useful to be able to instead extend the timeout, We cannot extend the timeout for model requests due to turn/response time constraints. Currently, you can set fallbackModels as an empty array manually, and we implement a large timeout for the last option which is either the default or first model. > at least make sure that if both the original request and fallback provide a response, that we only consider one of the responses for next action (not all of them) If the current model or fallback model fails to respond within the specified timeframe, we must switch to another model. We cannot continue waiting for previous requests, as this would effectively extend the timeout until a response is received, resulting in long periods of silence in your calls. In the future, we may implement extended timeouts and fallback messages for delayed request-responses to keep users engaged and maintain perceived performance. > noticed by default it falls back to different regions of same model by default We switch to different regions to optimize response times, as you have noted. cc: @IgorD

Ben

01/17/2025, 2:27 PM

> We cannot extend the timeout for model requests due to turn/response time constraints. C Thanks, are these technical constraints from the STT/TTS providers themselves or more of a UX constaint to ensure a good user experience? I would have though VAPI would have control of this, as the LLM would be agnostic as to how long between turns is allocated. Maybe some STT/TTS providers expect input within a certain timeframe before they timeout when streaming data? I appreciate of course the need in terms of UX to ensure quick response times - that said I think silence is preferable to unpredictable behaviour for most production use cases (there is already idle messages feature in VAPI to ensure the long silences are never completely silent after all). It seems what I observed on that related ticket might be that we fallback to different regions, and then get the response from all the regions, and so end up with unpredictable behaviour where the model responds to both of the calls (resulting in duplication of responses) - at least that's my theory.

Shubham Bajaj

01/17/2025, 2:33 PM

@Ben > Thanks, are these technical constraints from the STT/TTS providers themselves or more of a UX constaint to ensure a good user experience? I would have though VAPI would have control of this, as the LLM would be agnostic as to how long between turns is allocated. Maybe some STT/TTS providers expect input within a certain timeframe before they timeout when streaming data? First it's because of UX and second because of service providers. > I appreciate of course the need in terms of UX to ensure quick response times - that said I think silence is preferable to unpredictable behaviour for most production use cases (there is already idle messages feature in VAPI to ensure the long silences are never completely silent after all). Just set the fallbackModels to empty array. > It seems what I observed on that related ticket might be that we fallback to different regions, and then get the response from all the regions, and so end up with unpredictable behaviour where the model responds to both of the calls (resulting in duplication of responses) - at least that's my theory Have to look into your ticket, and logs couldn't get chance.

Ben

01/17/2025, 2:47 PM

I've set fallback provider to empty array for now

Ben

01/17/2025, 5:13 PM

Seems it's still possible to have fallbacks even with fallback models set to empty array e.g. call id 64cb327e-1011-40e8-aec7-d65679ef2010

Copy code

"model": "gpt-3.5-turbo",
        "fallback_models": [],

Copy code

17:07:33:000
[LOG]
Model request started (attempt #1, gpt-35-turbo-0125, azure-openai, southcentralus)

17:07:33:528
[LOG]
Model request started (attempt #2, gpt-3.5-turbo-1106, openai)

17:07:33:528
[WARN]
Model request taking too long (attempt #1, started 527.36ms ago)

17:07:33:712
[LOG]
Model request cost (attempt #2, $0, 0 prompt, 0 completion)

IgorD

01/17/2025, 11:33 PM

@Ben - Sorry where are these logs from?

IgorD

01/18/2025, 12:07 AM

OK, I have an idea for a hack to enable retries. Could we just allow the same model to be listed in teh fallback list? E.g. remove this if check: failure_reason: Invalid Assistant. Errors: [ "assistant.model.All fallbackModels's elements must be unique" ]

IgorD

01/18/2025, 12:07 AM

@Shubham Bajaj

Shubham Bajaj

01/20/2025, 6:41 AM

@IgorD Sorry, we cannot remove this check as it is required for the process.

Ben

01/20/2025, 11:20 AM

Those logs were copied from the Dashboard under the "Logs" for a specific call id with fall back models configured as an empty array

IgorD

01/21/2025, 12:58 AM

Thanks @Ben , I didn't even ever notice that UI. Doh. BTW! For me, I'm using gpt-4o, which has a natural alias to gpt-4o-2024-11-20, hopefully that will reduce my spurious dropped callls.

Shubham Bajaj

01/21/2025, 5:04 PM

@Ben @IgorD Regarding fallback models: if fallback models are set to an empty array, then we will look for the current model's replicate and get all models of the same type. This is a feature we discussed that needs to be implemented from our side.

IgorD

01/26/2025, 5:20 PM

Hey, is there an ETA for this?

IgorD

01/26/2025, 5:20 PM

@Shubham Bajaj

quickwhip.

01/29/2025, 11:18 PM

Are you able to change fallback models to different providers?

quickwhip.

01/29/2025, 11:19 PM

@Shubham Bajaj

Shubham Bajaj

01/30/2025, 6:36 AM

Added to the feature requests.

IgorD

02/04/2025, 4:16 PM

Hello, any ETA on this? @Shubham Bajaj

Shubham Bajaj

02/06/2025, 7:51 AM

Already added to the feature request, most likely will be part of this quarter.

IgorD

02/27/2025, 3:10 PM

This is still happening all the time. for me, do we know the month?

IgorD

04/19/2025, 4:53 PM

Hey @Shubham Bajaj This is still broken, is there an ETA here?

Kings Big

04/21/2025, 8:06 PM

Hi, I was out of office for a few days and was unable to respond to your message earlier. I apologize for any inconvenience this may have caused.

Kings Big

04/21/2025, 8:06 PM

You can raise an feature request for it over here: https://roadmap.vapi.ai

Kings Big

04/21/2025, 8:07 PM

Once you raise the feature request, it will be visible to the team. Eventually based on upvotes and team capacity, we will pick it.

Kings_big💫

04/21/2025, 11:00 PM

@IgorD

4 Views

Previous Next