Bug: Timestamp in future and agent performing oddl...
# support
b
call_id: 6c3d83d1-af38-4862-959c-3f01d4bdcc48 I've attached the client messages as a JSON. I ask using tool calling to perform RAG based on users queries. The user asked a question, the agent said "please wait" (ちょっと待って。) whilst it did the tool call then did the tool call and got the result. Then it seems for some reason it got stuck in a loop, and did this three times before answering the question. The VAPI logs under
messages
show differently to the client messages I received and attached. It shows all of these "please wait" messages as a single message:
Copy code
{
  "role": "bot",
  "time": 1737250366597,
  "source": "",
  "endTime": 1737104336499,
  "message": "ちょっと待って。 ちょっと待って。 ちょっと待って。 ちょっと待って。 ちょっと待って。 経費精算の締め切りは翌月5日の正午です。 ただし、土日祝日の場合は翌営業日となります。 具体的には以下のスケジュールになっています。 対象期間。 当月一日末日。 提出期日翌月5日正午。 支払日翌月10日。 もし他にご質問がありましたら、遠慮なくお知らせくださいね。",
  "duration": 5341100000,
  "secondsFromStart": 146100
}
If you see the
timestamp
here it is in the future, and 2 days AFTER the
endTime
so it seems clear there is a bug here. I can see there are warnings after the tool calls like this:
Copy code
[LOG]
Model request started (attempt #1, gpt-35-turbo-1106, azure-openai, westus)

08:58:22:152
[WARN]
Model request taking too long (attempt #1, started 445.14ms ago)

08:58:22:152
[LOG]
Model request started (attempt #2, gpt-35-turbo-1106, azure-openai, canada)
So I suspect the duplicated messages are related to this issue: https://discord.com/channels/1211482211119796234/1323288623629471866/1324798624525451367 The LLM is getting slow response and so VAPI falls back to other regions maybe? And maybe it is keeping the response of the "slow responses" rather than correctly timing them out and discarding them, resulting in the model repeating itself several times? In summary looks like two issues: - A bug in the timestamps (reported to be 2 days in the future) - An issue maybe where VAPI retries due to timeout but does not discard late response from timed out calls? https://cdn.discordapp.com/attachments/1329809513812987958/1329809514085482537/call_id_6c3d83d1-af38-4862-959c-3f01d4bdcc48.json?ex=678bb11e&is=678a5f9e&hm=8cb1ba93fa7f4b752a41f267bb0b5fb7d63b9ec4818f513eb026174ad1d6187d&
v
@Ben Thank you for bringing this to our attention. It does seem like there are a couple of issues at play here. Firstly, regarding the timestamp anomaly, it's clear that there is a bug since the
timestamp
should not be in the future, nor should it be after the
endTime
. This discrepancy definitely needs to be investigated by our technical team. Secondly, the issue with the duplicated messages and the warnings in the logs suggest that there might be a problem with how VAPI is handling slow responses from the LLM. It appears that VAPI is not correctly timing out and discarding the late responses, which could be causing the model to repeat itself. To address these issues, I would recommend reporting them to our technical support team so they can investigate and resolve the bugs. In the meantime, if there's anything else I can assist you with, please let me know.
You can continue this conversation by mentioning me in the message.
Got it. Tagging @User for additional help.
s
@Ben I am looking into it. Please give me some time to complete the investigation.
@Ben > An issue maybe where VAPI retries due to timeout but does not discard late response from timed out calls? Your last responses are always discarded no doubt on this. > A bug in the timestamps (reported to be 2 days in the future) You can see the attach screenshot for the timestamp. https://cdn.discordapp.com/attachments/1329809513812987958/1330901996064079913/Screenshot_2025-01-20_at_19.26.54.png?ex=678faa92&is=678e5912&hm=53a7a0c62abc18aebf105898e7d997600cd0f90dcc9162b3e0657de2dd8b63cf& https://cdn.discordapp.com/attachments/1329809513812987958/1330901996340772894/Screenshot_2025-01-20_at_19.35.45.png?ex=678faa92&is=678e5912&hm=911bd253da703dd0230987d2b8bdd80049f2b85b76bb697903f54196ca1aa0bb&
@Ben Can you make another API call using the GPT-4 model with a temperature between 0-0.3? Please only return the essential text content from the response, formatted as a single-line string without any line breaks. https://docs.vapi.ai/tools-calling#server-response-format-providing-results-and-context
expected
Copy code
json
{
    "results": [
        {
            "toolCallId": "X",
            "result": "Y"
        }
    ]
}
b
@User It seems the timestamps you have on your backend in watchdog look plausible, so it's not clear why
1737250366597
was reported as
time
(Sunday, 19 January 2025 01:32:46.597) I believe in the above case, it was gpt3.5 turbo in use - would you like me to re-run it with gpt4? This particular issue I've only seen occasionally, so I'm not sure if I can reproduce it easily. The temperature in use was the default - so it was already set to 0 to my knowledge. The response at the time was formatted like this:
Copy code
response = {
        "results": [
            {
                "toolCallId": parsed_response["tool_call_id"],
                "result": {"status": "success",
                           "message": result,
                           "tool_call_arguments": arguments}
            }
        ]
    }
i.e. I had an object as the response not a string. I saw today [here](https://discord.com/channels/1211482211119796234/1329542939160084624/1329910047081500712) that at least for the Realtime API it must be a pure string. I'm a little unclear what the suggested practice is if we wish to return structured data to the LLM to be put into its context without speaking it directly as is often the case in a RAG setting? Would it be something like this?
Copy code
"results": [
    {
      "toolCallId": "<insert-your-tool-call-id-here>",
      "result": "<JSON stringified response data here>",
      "message": {
        "type": "request-complete",
        "role": "assistant",
        "content": ""
      }
    }
  ]
}
If there is some documentation explaining (or a code reference) what OpenAI API call input the VAPI middleware uses to insert the tool call result into the LLM, then that would likely answer my query above I see the docs you linked say "It could be a string, a number, an object, an array, or any other data structure that is relevant to the tool’s purpose."; so it's clear the result can be an object in some cases at least
s
> I believe in the above case, it was gpt3.5 turbo in use - would you like me to re-run it with gpt4? This particular issue I've only seen occasionally, so I'm not sure if I can reproduce it easily. The temperature in use was the default - so it was already set to 0 to my knowledge. Yes, I am aware about the model GPT-3.5 Turbo and temperature settings for shared callID. For this, I suggest running another call with the GPT-4 model and temperature between 0 - 0.3. >
Copy code
json
results": [
    {
      "toolCallId": "<insert-your-tool-call-id-here>",
      "result": "<insert-tool-call-result-here>",
      "message": {
        "type": "request-complete",
        "role": "assistant",
        "content": "<insert-the-content-here>"
      }
    }
  ]
}
The message content will be voiced or spoken out, while the result will be used in the LLM context. We expect the result to be in string form without any line breaks, as an object might break it and cause "no_results_found" to appear in the message history. > I see the docs you linked say "It could be a string, a number, an object, an array, or any other data structure that is relevant to the tool’s purpose."; so it's clear the result can be an object in some cases at least Yes, but it has to be encapsulated inside the string, which you can verify in the screenshot above. > If there is some documentation explaining (or a code reference) what OpenAI API call input the VAPI middleware uses to insert the tool call result into the LLM, then that would likely answer my query above Yes, it's a tool call result being used. The result will be fed to the LLM, and if there is message content, then the content is voiced out and the result still will be used in LLM messages or completion history. https://cdn.discordapp.com/attachments/1329809513812987958/1330916291296628736/Screenshot_2025-01-20_at_20.32.21.png?ex=678fb7e3&is=678e6663&hm=727df948c89e203f8acf6c26e94e41f3ba040cf61ff5a58484779822c72e7b7a&
@Ben Please let me know if you have any questions. I will be available today around https://discord.gg/E4TEgnHV?event=1329840374939914240
b
I see thanks, I'll make sure to JSON stringify all results and to use messages when they should be spoken
s
@Ben can i close this ticket?
b
Shall I rerun with GPT4 and 0-0.3 temperature and share a call id? Like I say I'm not sure I can reproduce the issue even with 3.5 turbo easily, so not sure if it's going to be too valuable with a different model. Is the implication gpt-3.5 might be buggy versus gpt4? Regarding the timestamp 2 days in the future, that's definately a bug somewhere in VAPI's framework, so if you've logged that with the devs then it's fine to close from my point of view - it does not affect my work much beside making a wierd timestamp in the transcript Regarding the way the model behaved - i.e. repeating its output several times after it tried different servers, do you suspect this was related the formatting of a tool call being non-string format? If you're confident thats the reason, then it seems sensible to resolve this and I can re-open if we see re-occurance after fixing that
s
> Regarding the way the model behaved - i.e. repeating its output several times after it tried different servers, do you suspect this was related the formatting of a tool call being non-string format? If you're confident thats the reason, then it seems sensible to resolve this and I can re-open if we see re-occurance after fixing that, The completion requests were aborted because of this you observed model calling the tool repeatedly. https://cdn.discordapp.com/attachments/1329809513812987958/1330919515566903397/Screenshot_2025-01-20_at_19.png?ex=678fbae3&is=678e6963&hm=955de6264bde17f5488739ae90b88064fc25787cb0b93dc10ddcd268b76e5eaa&
b
I see, and the Aborted reason is likely the format being non string? I'm a little unsure thats the reason given that it does work most of the time, so if it was a formatting issue I'd expect them to be aborted all the time
s
> Regarding the timestamp 2 days in the future, that's definately a bug somewhere in VAPI's framework, so if you've logged that with the devs then it's fine to close from my point of view - it does not affect my work much beside making a wierd timestamp in the transcript You can ignore this for now. It might have been due to a local and UTC timezone conflict, but since it cannot be reproduced, it's not worth forwarding further. It seems to have been a one-time occurrence.
b
🤞 yes if you see other reports, worth maybe spending more time on that timestamp issue
s
The abortion is not related to the tool calling format. If it were, you would have received a "no results found" log. Instead, this is solely related to a timeout issue. I will look into this, latter as well, keeping open internally.
b
OK thanks for clarifying, I see so if it's a regular timeout, it does seem odd to me it should trigger multiple repeated tool calls; at least in a production setting it would be great to have a way to handle timeouts like this elegantly - e.g. to either just wait longer in silence, or to provide a way we can inform the user what is happening ("I tried to use ${toolName} but I'm having difficulties right now [due to timeouts to my server] so I'm going to try again") or somesuch
s
Regarding the timeout, it depends on your key and server. We are looking to support you in the future by either extending the timeout or implementing silence, as this will require a pipeline change. I will let you know about future updates.
@Ben just checking if it's happening again for you?
b
I haven't spotted it again yet