Running agents in production is getting expensive fast especially when something loops, retries, or a user abuses the system. Curious what others are actually doing:
Relying on provider-side billing alerts?
Hard limits set on the OpenAI/Anthropic dashboard?
Custom solution you built yourself?
Nothing yet and just hoping for the best?
I've been deep in this problem lately actually built something around it (launching tomorrow on PH). Would love to hear real approaches first though, especially from anyone running multi-tenant SaaS where you need per-user cost control.