Couple questions:
# support
p
1. whats best practice regarding passing tools variables? (let's say i have various calendar uuids, shall I put the logic inside the tool parameter such as: if called for this person use this uuid) or rather on the prompt (this person = this uuid when calling the tool?) 2. regarding the knowledge base, are there any documents that would potentially allow the mode to format the output so it eases pronounciation in french? 3. whats best approach for forcing user to spell his name (I know LLMs are not good at it in general, any input appreciated)
v
@paul gaston 1. Best practice for passing variables to tools, such as calendar UUIDs, is to handle the logic outside of the tool parameters. This means you should determine the correct UUID in the prompt or in your application logic before calling the tool. This approach keeps the tool's functionality clear and focused, and it allows for easier maintenance and updates to the logic without modifying the tool itself. 2. Regarding the knowledge base and formatting output for pronunciation in French, I don't have access to specific documents that address this. However, general best practices for improving pronunciation in text-to-speech systems include using phonetic spelling for difficult words, adding punctuation to guide intonation, and potentially using SSML (Speech Synthesis Markup Language) tags if supported by the API. Please ask support for more info on this if you need documents or guides specific to French pronunciation. 3. To force a user to spell their name, you could prompt them explicitly to do so. For example, you could ask, "Could you please spell your name for me?" This direct approach is often effective. Additionally, you could implement a system that asks for confirmation of the spelling by repeating it back to the user and asking for verification. If you're using voice input, ensure that the system is optimized to recognize and differentiate between letters. If LLMs have difficulty with this, consider integrating a specialized speech recognition tool designed for capturing spelled input.
You can continue this conversation by mentioning me in the message.
p
ok, for part 2, here I am, asking support 🙂 for part 3 what do you mean when you refer to specific tool? external api/system in charge of this specific part? Sincerely
@User
s
1. If the uuids are lesser in numbers then can use the prompt else if they are more in number then it's best to do on the server. 2. You can use markdown formatting and it works. Regarding pronunciation it depends upon the voice mostly, but you can share the call id where you felt it's incorrect as:- - What response you expected to get - What response you actually got instead 3. You need to ask user to spell his name character by character and then ask LLM to verify the same with the user this way it will be transcribed and voiceout correctly. If user doesn't follows the script or context you can ask the user to focus on the script. Now your question is I need prompting help, to help you on this I need your existing prompt recent call id.
p
thank you so much for taking the time to reply to me. The answers make a lot of sense, mostly about markdown in the knowledge base, i was using pdf and it wasn't working so great but i'll try with md tomorrow. I'm already doing this, the problem is that having this part in the same monolythic prompt makes the prompt heavier, more complexx and coast effective. If my understanding is correct i could articulate all my logic between various members of the squad (and by implementing correct backend workflow) silently chain interractions from assistants so the user wouldn't notice they are different assistants. My idea is to have specific models doing specific things organized as a workflow, backed by knowledge bases, coast effective. To give you the full picture, we want to sell vapi whitemarked to specific people, so we will do whatever it takes, even considering premium package at the moment. I have 3 remaining blocks to build for the technology: 1. 98%+ correct spelling accuracy through an assistant 2. backend implementation to organize 3/4 assistants within a squad with smooth and solent interractions. I feel confident for the backend part but i'm struggling to make squads work on the vapi website ui, i'll try with api. Can you confirm the way to make it hapen is by directly replyin with my API on call init, with the squad in order to delegate the logic to the squad rather than simply the assistant linked to the phone number? It can sound stupid but i'm struggling to wrap my head around this process. 3. end to end testing
s
> 98%+ correct spelling accuracy through an assistant Can you share your existing prompt? > Can you confirm the way to make it hapen is by directly replyin with my API on call init, with the squad in order to delegate the logic to the squad rather than simply the assistant linked to the phone number? Your correct here. > end to end testing What are your expectations?
p
sure, my current approach consists in using a stack of letter to validate/invalidate. I have a corpus of 3 md documents (list of first names, specific lastnames rules based on country of origin, radio french alphabet with correct prononciation and phonetics). Here is the whole package: - prompt - special rules - phonetics/radio alphabet I tried liek 10 various approachs, my goal is to have a decent tradeoff between handled complexity/user experience and coasts. I'm up for extended coasts for this assistant if it's required. let me know if I can provide anything else to help. Sorry it's in french of course. https://cdn.discordapp.com/attachments/1307195940389654538/1308476712845181001/alphabet_phonetique.md?ex=673e1566&is=673cc3e6&hm=60650d647adbdb794c8a0afadc489607008e2fe8ce4f69ed0a68389f163695ef& https://cdn.discordapp.com/attachments/1307195940389654538/1308476713315078154/regles_speciales_pour_validation_des_noms.md?ex=673e1566&is=673cc3e6&hm=9157f9e3247e402a8626e676884b505e9c98d3abce3532a07b455090d5fa9791&
concerning point3: end to end testing it will be a pre production stage where the solution is tested end to end with all the specifications: twilio ported french number to reduce latency and coasts for callers, business specific knowledge base, etc.. For now i'm just trying to build the "shell" of this project with generic data assuming it would be the same if I change a brick in the knowledge base. For the record i tried MD files rather than pdfs today, it seems to work a little better, still bunch of tests to be done tho. I don't think it's relevant to talk about testing at this point, i'm still working on the spelling brick, it's not easy to reach high accuracy. Any input appreciated on this matter. French also needs to handle a few specific characters such as é,è,ê. The prononciation of letters are not very convenient either (y is prononced like i but with something after like i is hiii, and y is i-grec). Do you think the explicit stack approach is correct? I can share a repo with 15 other trials if you wanna see my research on this matter
for the record i tried various models but so far, gpt 4o (the normal one, not mini cluster) seems to be the best. Any input appreciated on this matter also.
not sure this is my best approach
s
> I can share a repo with 15 other trials if you wanna see my research on this matter @paul gaston sure please share it. use gpt-4o it's best and better than all.
p
ok, i was on anthrpoic 3.5 sonnet but i will switch to 4o I'm almost done with spelling, works pretty well tbh
i think transcriber was the core of the issue on my first attempts with 4o thank you for your help tho, appreciate it
i updated a few questions in my other ticket (about squads) closing this one if you don't mind.
(mark as resolved not working for some reason)