Hey guys, Im starting a thread on performance evaluation and detecting failure modes before going live. this can help with selling better to clients and also getting confidence in how well your agent performs. Im curious what processes you all follow when you create a new agent. most (including myself) try using it a few times for a few scenarios. but always discover new edge cases in live calls. So, how do you guys evaluate your agents?
also, i'm building an internal solution for my own app, where i simulate conversations with different personas and anlayze transcripts and failure patterns. looking for 1-2 teams who would like to try this with me, interested?