Tokenflood

Figure out who or what is stealing your LLM latency

2 followers

Figure out who or what is stealing your LLM latency

2 followers

Visit website

AI Infrastructure Tools

•

AI Metrics and Evaluation

•

LLM Developer Tools

Tokenflood allows you to 1) figure out how to slash LLM latency by adjusting prompt parameters 2) assess the load curve of LLM providers before going to production with them

Free

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team

NMI Payments — Don’t Integrate Payments Until You Read This Guide

Don’t Integrate Payments Until You Read This Guide

Promoted

Maker

📌

Hey folks,

I just released a new version of tokenflood featuring an all new data viz dashboard and observation mode. Observation mode allows you to track an endpoint's latency over a longer period of time before sending your prod data there. Basically, you can find out at what time during the day everybody starts stealing your LLM latency 😉.

TLDR:

figure out how to slash LLM latency by adjusting prompt parameters
assess the load curve of LLM providers before going to production with them

Why I built tokenflood:

Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? 🔥💸🔥💸🔥). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.

Hope this is useful for some people!
Thomas

Report

1mo ago

This is a needed tool for anyone serious about LLM performance. The "observation mode" to track provider load curves before committing prod traffic is brilliant for cost/latency planning.

A key question on the data: How do you collect the latency and throughput metrics for the provider load curves? Is it from your own synthetic probes, aggregated user data, or a combination?

Report

29d ago

Maker

Hey@olajiggy321 , thank you for your assessment! Good question. It is all from the synthetic probes sent by the user of the tool. Technically there would be a lot of potential to share / aggregate some of this data among users.

Report

29d ago

@twerkmeister
Thanks for the clarification—keeping it to user-specific synthetic data is the right call for accuracy and privacy, and the potential for aggregated insights is interesting.

I have a small, practical idea related to that potential data-sharing model and user incentives that you could consider.

If you're open to a suggestion, what's the best way to share it? (Email, DM, etc.)

Report

29d ago

Have a question about Tokenflood? Ask it here and get a real answer.

Do you use Tokenflood?

Maker Comment

Maker

📌

TLDR:

figure out how to slash LLM latency by adjusting prompt parameters
assess the load curve of LLM providers before going to production with them

Why I built tokenflood:

Report

1mo ago

See discussion

TLDR:

figure out how to slash LLM latency by adjusting prompt parameters
assess the load curve of LLM providers before going to production with them

Why I built tokenflood:

Tokenflood

Figure out who or what is stealing your LLM latency

Figure out who or what is stealing your LLM latency

Have a question about Tokenflood? Ask it here and get a real answer.

Do you use Tokenflood?

Maker Comment

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads

Have a question about Tokenflood? Ask it here and get a real answer.

Do you use Tokenflood?

Maker Comment