LLM Beefer Upper

LLM Beefer Upper

Automate Chain of Thought with multi-agent prompt templates

174 followers

Simplify automating critique, reflection, and improvement, aka getting the model to 'think before it speaks', for far superior results from generative AI. Choose from pre-built multi-agent templates or create your own with the help of Claude Sonnet 3.5.
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
Free Options
Launch Team / Built With
Framer
Framer
Launch websites with enterprise needs at startup speeds.
Promoted

What do you think? …

Lee Mager
I built this app for myself because I was getting bored of having to hunt down my prompt templates and copy/paste them to take advantage of the chain-of-thought / critique / reflection / improvement boost from LLMs. It automates the multi-agent process as well as making it easy to add and refine prompt templates. Each agent is 'dedicated' to a task, e.g. accuracy verification, improvement suggestions, polishing off, and each one displays so you can see the AI 'showing its' working'. I use it constantly for tasks where I want the best results and I personally can't get enough of it. It's expensive ($0.75 for the best quality 4-agent run *** UPDATE following feedback, I've cut pricing by 33% so now the best quality run is $0.50) because it absolutely demolishes tokens, but for me it's a no brainer. Had a few people see it and basically demand I make it available publicly, so that's why it's here!
@leemager Great job! This looks revolutionary.
Kyrylo Silin
Hey Lee, I'm curious about the pre-built templates. What kinds of tasks or industries are they designed for? Given the high token usage, have you considered any options for reducing costs, like caching frequent requests? Congrats on the launch!
Lee Mager
@kyrylosilin Hey Kyrylo and thanks! I haven't yet experimented with potential cost-cutting, but it's definitely something I need to think about. But I'm firm on only using the best LLM because otherwise the main value of the app - getting the best possible result without manually prompting for the additional critique/reflection/improvement stages - would deteriorate. I've experimented with GPT4o mini, instead of Claude Sonnet 3.5 and the results are impressive relative to GPT4o mini, but not to the standard I'm used to. Here's an experiment I did recently for a really quite complex knowledge work task that GPT4o and Claude Sonnet 3.5 don't do well enough at (they miss key points), but adding the 4 agents to think through carefully made a massive improvement: https://llmbeeferupper.com/artic.... This is the kind of higher quality task the app is focused on and I personally wouldn't want to dilute it with a cheaper model for now. That said, the feedback I'm getting in the post-task survey results right now are pretty clear - 100% have said that the final agent's response is better than the first, but <50% say the cost is worth it. So I will need to think about making this cheaper for sure.
Lee Mager
@kyrylosilin Regarding the prebuilt templates, I'm always expanding them based on requests I get. I work in higher education so I have some like dissertation planning, exam question drafting, critical feedback on drafts, targerted study notes from an academic paper, marker guides for exam papers, curriculum and lesson plans etc. But also tasks like drafting blogs, application cover letters, project planning, risk analysis, FAQ generation etc. All the usual kinds of language tasks that LLMs are great at, and the multi-agent reasoning steps just make them a lot better :)
Lee Mager
I've also added an automation script template, things like python, powershell, vb. I won't be doing anything for software dev because the token length makes that impossible and it would only disappoint people. But for scripts of 200 lines or less that one works well. The day Claude (or GPT5) can handle millions of tokens and not suck, I will absolutely be adding some actual software dev templates!
blank
This is a fantastic initiative, @leemager! The way you've simplified the automation of critique and reflection in generative AI could be a game-changer for anyone looking to enhance their output. Love the idea of having dedicated agents for specific tasks like accuracy verification and improvement suggestions; it really mimics a more interactive and thoughtful creative process. I can see how this would seriously save time for many Makers and help elevate the quality of content generated. Plus, the transparency of seeing the AI 'show its working' is such a clever touch—definitely aligns well with the principles of iterative improvement! While the cost may seem high at first glance, the ROI on quality results and enhanced efficiency might just make it worthwhile for a lot of users. Excited to see more feedback once it's live!
Ruby
While the costs may be high, the value it provides in terms of time savings and higher-quality output seems well worth it for those who rely heavily on prompt engineering. Congrats on the launch!
Ben
So far very impressed, the final result was definitely much better and it's nice to see each agent's thought process before the final improvement, will definitely play around with this, nice one
Ludovic Denis
Honestly, while this seems like it might save time, the cost is pretty steep for what it does, I'm not sure everyone will be willing to pay that much just for a more automated prompt process, plus the whole token thing sounds like it could get really expensive real fast, so feels kinda risky to me
Lee Mager
@bankingbossdauphine Yep, the cost is a problem I'm painfully aware of - as I mention in one of the other comments here, while 100% of app users who provided feedback after a task so far have said the post- reflection results are better, less than half think it was worth the price. For me, when I think about the time saving, 58p is an absolute no brainer for the more complex knowledge-work tasks I use it for constantly. This has become my no.1 used productivity app (it started a couple of months ago just as a local desktop app), and I'm saying this as someone obsessed with automation and better ways of working. It's probably doubled my use of LLMs (and I was already a heavy user for less complex tasks) because there are tasks where even Claude wasn't good enough to be worth the effort before, but with the additional steps and the audit trail of 'reasoning'. it becomes worth it. I'll post some more blogs with real, complex task experiments like the recent post I did on how COT reduces hallucinations. I think once people see more direct examples of the value, the cost barrier will be less of a concern. That said, even if everybody still thinks it's too expensive, this is an app I'm personally always going to use, it's difficult for me to accept going back to the default 'good enough, most of the time' quality from GPT4/Claude for certain tasks now. Edit: actually on the latter note, I think the experiment I need to do next is around getting an LLM to write in a specific style. This is something that LLMs are abysmal at out of the box, even if you give them a sample and plead with them - it's just the nature of token prediction that LLMs can't 'change course' properly within the same generation and need that extra step. Things like emails, reports, technical or user documentation etc., previously I would never use an LLM because the style was so vomit-inducingly generic. Even with the multi-agent approach it's never perfect, but it's good enough that it's become worth adding to my LLM-supported tasks, because the editing needed isn't excessive and the time saving becomes worth it.
Konstantinos Choutos
Hi Lee and the team! Huge congratulations on launching your app! 🎉 Simplifying the automation of critique, reflection, and improvement for generative AI is a brilliant idea. The pre-built multi-agent templates and the ability to create custom ones with Claude Sonnet 3.5 are fantastic features. It's great to see how you've streamlined the process of refining prompt templates and enhancing the output quality. Thank you for making such a valuable tool available to the public. Looking forward to seeing the impact it has on the AI community! Best, Konstantinos Choutos
12
Next
Last