Tokenflood

Tokenflood

Figure out who or what is stealing your LLM latency

2 followers

Tokenflood allows you to 1) figure out how to slash LLM latency by adjusting prompt parameters 2) assess the load curve of LLM providers before going to production with them
Tokenflood gallery image
Tokenflood gallery image
Tokenflood gallery image
Tokenflood gallery image
Free
Launch Team
NMI Payments
NMI Payments
Donโ€™t Integrate Payments Until You Read This Guide
Promoted

What do you think? โ€ฆ

Thomas Werkmeister
Maker
๐Ÿ“Œ

Hey folks,

I just released a new version of tokenflood featuring an all new data viz dashboard and observation mode. Observation mode allows you to track an endpoint's latency over a longer period of time before sending your prod data there. Basically, you can find out at what time during the day everybody starts stealing your LLM latency ๐Ÿ˜‰.

TLDR:

  • figure out how to slash LLM latency by adjusting prompt parameters

  • assess the load curve of LLM providers before going to production with them

Why I built tokenflood:

Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ๐Ÿ”ฅ๐Ÿ’ธ๐Ÿ”ฅ๐Ÿ’ธ๐Ÿ”ฅ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.

Hope this is useful for some people!
Thomas

Agbaje Olajide

This is a needed tool for anyone serious about LLM performance. The "observation mode" to track provider load curves before committing prod traffic is brilliant for cost/latency planning.

A key question on the data: How do you collect the latency and throughput metrics for the provider load curves? Is it from your own synthetic probes, aggregated user data, or a combination?

Thomas Werkmeister

Hey@olajiggy321ย , thank you for your assessment! Good question. It is all from the synthetic probes sent by the user of the tool. Technically there would be a lot of potential to share / aggregate some of this data among users.

Agbaje Olajide

@twerkmeisterย 
Thanks for the clarificationโ€”keeping it to user-specific synthetic data is the right call for accuracy and privacy, and the potential for aggregated insights is interesting.

I have a small, practical idea related to that potential data-sharing model and user incentives that you could consider.

If you're open to a suggestion, what's the best way to share it? (Email, DM, etc.)