Tokenflood allows you to 1) figure out how to slash LLM latency by adjusting prompt parameters 2) assess the load curve of LLM providers before going to production with them
NMI Payments โ Donโt Integrate Payments Until You Read This Guide
Donโt Integrate Payments Until You Read This Guide
Promoted
Maker
๐
Hey folks,
I just released a new version of tokenflood featuring an all new data viz dashboard and observation mode. Observation mode allows you to track an endpoint's latency over a longer period of time before sending your prod data there. Basically, you can find out at what time during the day everybody starts stealing your LLM latency ๐.
TLDR:
figure out how to slash LLM latency by adjusting prompt parameters
assess the load curve of LLM providers before going to production with them
Why I built tokenflood:
Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ๐ฅ๐ธ๐ฅ๐ธ๐ฅ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.
Hope this is useful for some people! Thomas
Report
This is a needed tool for anyone serious about LLM performance. The "observation mode" to track provider load curves before committing prod traffic is brilliant for cost/latency planning.
A key question on the data: How do you collect the latency and throughput metrics for the provider load curves? Is it from your own synthetic probes, aggregated user data, or a combination?
Report
Maker
Hey@olajiggy321ย , thank you for your assessment! Good question. It is all from the synthetic probes sent by the user of the tool. Technically there would be a lot of potential to share / aggregate some of this data among users.
Report
@twerkmeisterย Thanks for the clarificationโkeeping it to user-specific synthetic data is the right call for accuracy and privacy, and the potential for aggregated insights is interesting.
I have a small, practical idea related to that potential data-sharing model and user incentives that you could consider.
If you're open to a suggestion, what's the best way to share it? (Email, DM, etc.)
Have a question about Tokenflood? Ask it here and get a real answer.
Do you use Tokenflood?
Maker Comment
Maker
๐
Hey folks,
I just released a new version of tokenflood featuring an all new data viz dashboard and observation mode. Observation mode allows you to track an endpoint's latency over a longer period of time before sending your prod data there. Basically, you can find out at what time during the day everybody starts stealing your LLM latency ๐.
TLDR:
figure out how to slash LLM latency by adjusting prompt parameters
assess the load curve of LLM providers before going to production with them
Why I built tokenflood:
Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ๐ฅ๐ธ๐ฅ๐ธ๐ฅ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.
Hey folks,
I just released a new version of tokenflood featuring an all new data viz dashboard and observation mode. Observation mode allows you to track an endpoint's latency over a longer period of time before sending your prod data there. Basically, you can find out at what time during the day everybody starts stealing your LLM latency ๐.
TLDR:
figure out how to slash LLM latency by adjusting prompt parameters
assess the load curve of LLM providers before going to production with them
Why I built tokenflood:
Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ๐ฅ๐ธ๐ฅ๐ธ๐ฅ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.
Hope this is useful for some people!
Thomas
This is a needed tool for anyone serious about LLM performance. The "observation mode" to track provider load curves before committing prod traffic is brilliant for cost/latency planning.
A key question on the data: How do you collect the latency and throughput metrics for the provider load curves? Is it from your own synthetic probes, aggregated user data, or a combination?
Hey@olajiggy321ย , thank you for your assessment! Good question. It is all from the synthetic probes sent by the user of the tool. Technically there would be a lot of potential to share / aggregate some of this data among users.
@twerkmeisterย
Thanks for the clarificationโkeeping it to user-specific synthetic data is the right call for accuracy and privacy, and the potential for aggregated insights is interesting.
I have a small, practical idea related to that potential data-sharing model and user incentives that you could consider.
If you're open to a suggestion, what's the best way to share it? (Email, DM, etc.)