Robat Das

How are you handling LLM API costs in production? Billing alerts? Hard limits? Nothing?

Running agents in production is getting expensive fast — especially when something loops, retries, or a user abuses the system. Curious what others are actually doing:

  • Relying on provider-side billing alerts?

  • Hard limits set on the OpenAI/Anthropic dashboard?

  • Custom solution you built yourself?

  • Nothing yet and just hoping for the best?

 

 I've been deep in this problem lately — actually built something around it (launching tomorrow on PH). Would love to hear real approaches first though, especially from anyone running multi-tenant SaaS where you need per-user cost control.

111 views

Add a comment

Replies

Best
Ashton Blake

The hardest part is not normal traffic. It's edge cases and unexpected loops.

Robat Das

@ashton_blake Loops are exactly what started this for me. I setup openclaw with openAI and when woke up I saw to a $40 bill from a loop on a $5 task.

Graham Lewis

Per-user tracking became necessary way earlier than expected. Shared limits were impossible to manage in multi-tenant setups.

Robat Das

@graham_lewis Exactly the use case that pushed me to build SQLite-backed per-user quotas into baar-core. Shared limits fall apart the moment one tenant starts hammering the API. Launching it today on PH if you want to check it out.

Bradley Simon

Hard limits help, but they also create awkward user experiences when something suddenly stops mid-task.

Robat Das

@bradley_simon That's why baar-core does a pre-flight estimate instead of cutting mid-response. It checks before the call fires — if it would exceed the budget, it never starts. No mid-task interruption.

Aurora Parker

We built internal dashboards because provider billing pages were too delayed for real monitoring.

Robat Das

@aurora_parker I am building noburn.dev for this. If you are interest you can knock me at hello@robatdasorvi.com. Launching the open source library today. I would love if you shout out to me.

Farrukh Butt

Hard limits per user or workspace feel necessary once LLM usage becomes part of the core product. Provider alerts help, but they are usually too late if an agent loops or retries aggressively.

Robat Das

@farrukh_butt1 Hey, I am launching an open source library for this. If you like this oss, I am expecting shoutout and support :)

Nida E Zahra Zaidi

honestly... once you start running multi-tenant ai products, provider-side billing alerts stop being enough pretty quickly 😅

we’ve seen cases where retries/background tasks quietly push usage way higher than expected... especially when multiple models and async workflows are involved

per-user tracking and internal limits start becoming really important at that point, otherwise it’s very hard to understand where costs are actually coming from in production

curious to see what you’re launching tomorrow 👀

Robat Das

@nidaezahraaa I have launched the library already. Hopefully within next week I will launch the product dashboard. You will gonna love it. Can you email me or contact me so that Ic an reach you out later for testing the product.
https://www.producthunt.com/products/baar-core

Jacob McDaniel

Built an in-app admin AI monitor dashboard to see how different models were being utilized and by who, then revealing the costs and margin.

Robat Das
wow can I see it