image upload to assistant Vapi AI #support

image upload to assistant

Avenir

06/23/2024, 8:11 AM

Hi. I didn’t find info in the docs about it, but I was wondering if it’s possible to upload images while having a conversation with assistant. If not, how would you recommend doing it?

Vapi

06/23/2024, 8:13 AM

@Avenir Hello! Currently, VAPI is focused on voice interactions and does not support the uploading of images during a conversation. If you need to work with images, I would recommend using a separate service or API that specializes in image uploads and processing. You can then integrate that service with your voice application to handle the image-related aspects of your workflow. If you have any more questions or need further assistance, feel free to ask!

Vapi

06/23/2024, 8:13 AM

You can continue this conversation by mentioning me in the message.

Sahil

06/23/2024, 11:20 AM

You will need to go with the custom llm flow and then you can add it.

Avenir

06/23/2024, 3:27 PM

Could you let me know if my understanding is correct: 1) I use the custom llm flow, that is call gets send to my backend and I return the json in right format 2) the response get sent to the assistant handling the call? My aim is to get llm to understand what is on the image and then get that info used within the main conversation, as a form of its continuation I thought also of doing the processing of an image on the server and then sending the vapi.send info along with the image description but i have no idea if it actually makes sense

Shubham Bajaj

06/23/2024, 3:47 PM

What's your workflow/plan on using image in b/w the call?

Avenir

06/23/2024, 3:58 PM

User is having a conversation using vapi. He is able to draw stuff on the web and then export the blob, pass it on to the llm to analyze it and further output should be used within conversation as well. For instance: a) we're having a cheat with expert teacher in terms of how to paint stuff b) user paints something c) llm knowing the conversation history guesses what it is and describes it d) I send this info back to main conversation e) according to the prompt, llm reacts

Avenir

06/23/2024, 3:58 PM

But I'm not sure how to wrap it up with vapi tbh

Shubham Bajaj

06/23/2024, 4:07 PM

1. What If you ask user indirectly to trigger an action which fetches the image description and then injects into your vapi conversation memory, i guess this will work. 2. [Hypothetically] Another is vapi assistants listening to webhook requests and updating conversation memory, and then saying w.r.t to new message. 3. As already informed by Sahil, custom llm flow.

Avenir

06/23/2024, 4:17 PM

1 makes sense, but how would I do it in terms of API?

Shubham Bajaj

06/23/2024, 4:19 PM

So you have to ask user indirectly to say something which triggers the tool or function, which makes the post request to the server and gets the image description.

Avenir

06/23/2024, 4:21 PM

Kk. What’s the difference between the tool and the function here? This whole realm is new to me 🙂

Shubham Bajaj

06/23/2024, 5:37 PM

Semantically they are same, plain functions. Tools are gloabl utilities available for all of your agents. Functions are assistant specific tools which uses openai function specification.

iamjackharper

06/24/2024, 6:52 PM

Can you elaborate a bit on point 3? Should I still use the vapi.send() function or how to integrate it with the flow of call? Example: User: "want to see my latest photo?" Assistant: "sure, show me" Here vapi.send would be the natural choice but doesn't allow images, how it's different with custom-llm?

Shubham Bajaj

06/24/2024, 7:33 PM

You don't have to send images, you have to send image description to vapi bot using whatsoever the convention is.

iamjackharper

06/25/2024, 9:30 AM

Ok but that is totally not as effective as sending the image to the model you are talking with

Shubham Bajaj

06/27/2024, 7:31 AM

then can use vectors

iamjackharper

06/27/2024, 8:55 AM

How is that?

iamjackharper

06/28/2024, 12:43 PM

I found a workaround by using a combination of vapi.say() and vapi.send(). Basically I send the whole conversation to gpt-4o plus the image as last message and then I use vapi.send with role = system and something like "the user send an image", and vapi.say with the response from gpt. Now of course the vapi assistant doesn't receive the image so any follow up question won't work or hallucinates, but until they implement the function to send images I think it's the best we can do.

Shubham Bajaj

06/28/2024, 7:00 PM

Even if they implement they will store either the description of the image or vectors which bot will use for generating responses. Which I believe you can still do it.

iamjackharper

06/28/2024, 11:17 PM

They just have to store the url not the whole file, if it’s a storage problem

5 Views

Previous Next