Everyone told me not to buy servers. "Just use Vapi or OpenAI's Realtime API," they said. "Don't manage infrastructure."
But I hit a wall. I couldn't get latency under 800ms reliably with a wrapper. And the "Compliance Tax" (charging $1k/mo just for a BAA) felt predatory for the small agencies I work with.
So, I did the "stupid" thing: I built a bare-metal cluster with NVIDIA Blackwells to run local LLMs and TTS/ASR.