Launching today

Tokenwise

Launching today

A smart LLM proxy that shows where you're overpaying

86 followers

A smart LLM proxy that shows where you're overpaying

86 followers

Visit website

AI Infrastructure Tools

•

LLM Developer Tools

Tokenwise is a one-line LLM proxy (OpenAI-compatible baseURL) for makers and small teams. It learns from your real requests, shows exactly where you're overpaying, proven with quality checks on your own traffic, not public benchmark, and lets you apply the fix in one click while it verifies the savings in real dollars.

Free Options

Launch tags:Analytics•Developer Tools•Artificial Intelligence

Launch Team / Built With

FramerLaunch websites with enterprise needs at startup speeds.

Promoted

Tokenwise

Maker

📌

Hey everyone, Theo here. I build a few small SaaS on the side of a full-time data engineering job, and at some point every one of them started leaning on LLMs. My API bills crept up every month and honestly I could never tell you why. Which feature, which prompt I'd changed last week, which model I picked without really thinking about it. I'd just top up credits and move on. The part that really got to me was the spend I couldn't even see. Claude Code running all day while I work, plus Cursor and Codex. None of that shows up anywhere until the invoice lands, and it turned out to be the money I understood the least. I tried the tools that already existed. One felt like it was in maintenance mode, one needed a whole observability setup just to get started, and one only worked if your stack was built around a specific framework. None of them were made for someone like me who just wanted to know where the money went and what to do about it. So I built Tokenwise. You add one line of code, or point your coding agents at it with no production changes, and you see every call: cost, latency, tokens, and what's being wasted. Then it tells you what to cut. A cheaper model here, a cache there, a bloated prompt to trim. Every fix gets checked against your own quality bar first, so you're never trading cost for worse output. The idea shifted a lot while I was building it. I started out thinking it was a dashboard. Then I realised nobody wants another dashboard, they want the answer: here's the $842 a month you're burning, and here's the one click to fix it. The real value was proving the savings on your own traffic, live. It's early and I'd genuinely love your honest feedback. Tell me what's missing, what's confusing, what you'd never use. That's more useful to me right now than anything. Thanks for taking a look.

Report

1d ago

@tofil congrats on the launch Theo, this is very useful (I can never match the advertised input/output costs to my work either). What's the overhead fo r this and how deep does it go reporting wise?

Report

6h ago

Tokenwise

Maker

Thanks @zolani_matebese , really appreciate it

On overhead: the proxy runs on Cloudflare Workers at the edge, so we add ~30-50ms p50 in most regions (the actual provider call dominates latency anyway).

On reporting depth, here's what you get per request:

Exact cost (we re-tokenize and apply current pricing tables, so the "I can never match the bill" problem you mentioned goes away)
Input/output token counts, latency (TTFT + total), status, error type if any
Full prompt/response payload if you opt-in per project (off by default for privacy)
Model + provider + project + custom tags you set

And on top of that, aggregations by prompt template (we cluster semantically), recommendations with quality proof on your own data, and a "saved this month" counter that tracks the impact of applied recos in real $.

Report

4h ago

Observe-only is probably where I’d start, especially for Claude Code spend. The scary part is the “apply” step.

Before swapping a model, does Tokenwise show exactly which traffic it will touch, and is there an easy rollback?

Report

6h ago

Tokenwise

Maker

@novamaker01

Here's how it works:

Before apply, you see exactly:

The prompt template(s) affected (with a sample of recent requests)
The estimated traffic % (e.g. "this rule will route ~12% of your project's requests")
Optional scoping: limit to a tag, a project, exclude certain endpoints

The "apply" doesn't blindly cutover. By default it runs as an A/B split, say 10% of matching traffic on the new model and you watch the quality scores + latency + cost for 24h before deciding to ramp to 100%. You can also choose immediate cutover if you prefer.

Report

4h ago

mailX by mailwarm

Does Tokenwise break down coding agent spend by session or only by model?

Report

7h ago

Tokenwise

Maker

Hey @bengeekly not by session yet, but the workaround works well today.

You can pass a tag (or session ID) on each request via the X-Tokenwise-Tags header, and Tokenwise clusters all requests sharing that tag, so for a coding agent you'd set X-Tokenwise-Tags: session-{conversationId} and see the full cost breakdown per session: total spend, which model, which prompt template (we cluster those semantically too), outliers, etc.

First-class sessions view (auto-grouped, no header needed) is on the short roadmap.

Report

4h ago