Speech to speech was already possible months ago but even if your main point is that the latency is low enough to make speech to speech more realistic, how does that replace or negate the VAPI infrastructure? Also the infrastructure is LLM agnostic that means you don't get tied down or locked in by any one provider. Also the infrastructure allows for using your own local models if privacy is that important to an organization.