How are you tracking token usage and costs across AI workflows?

As AI applications become more complex, are people actually tracking token usage and costs at a workflow level?

It's easy enough to see usage for individual model calls, but once a feature spans multiple prompts, models, tools, retries, and background jobs, I've found it much harder to answer questions like:

Which workflow is driving costs?
Where is latency being introduced?
Which step failed?
How much does a single user action actually cost?

Curious what others are using today.

Custom logging? OpenTelemetry? LangSmith? Something else entirely?

I'm building PromptLayer around this problem, but I'm genuinely interested in how teams are solving it in production.

2 views

How are you tracking token usage and costs across AI workflows?

Replies