Ben Lang

Empromptu AI - Train Fine Tuned Models With AI Apps You're Already Building

by
Most AI apps launch on someone else’s model and stay there forever. Empromptu AI turns live AI features into custom models you own. As your app runs, Empromptu AI captures real-world usage, human corrections, and edge cases from live AI workflows, then uses that signal to train a custom model you own. Improve accuracy, lower inference costs, and stop depending forever on rented intelligence from the same providers moving into your category.

Add a comment

Replies

Best
Lakshminath Reddy Dondeti
Does Empromptu work with all frontier models and open source/weights models? What happens to the trained/tuned model in the long term if the frontier model significantly advances? Say from Opus 4.x to 5.y …
Shanea Leven

@lakshminath_dondeti you can only fine tune oss models. The frontier models are deprecating the ability for you to fine tune them. 😬 Which is one of the reasons we're making this available. But Empromptu is totally model agnostic

Sean Robinson

@lakshminath_dondeti To add some architecture context, the fine tuned model you build on Alchemy is your asset regardless of what happens upstream with frontier models. When a new frontier model drops, you're not starting over. Your domain knowledge, your corrections, your edge cases are portable. We can use a newer base and retrain on your existing data. The expertise your team captured doesn't deprecate with the model version.

Andrew West

@lakshminath_dondeti Your built up dataset that you fine tune on is the real value that builds up over the use of your app. Alchemy let's you turn that into a fine tuned model, but when a new model releases you can quickly fine tune it on your existing data.

Jordan Hanson

@lakshminath_dondeti  We want to remove complexity and enhance outcomes from building with AI, sort of universally. What we would suggest is that because what you have is a dataset created by the 'shape' of the AI function and its trained outcomes, we can take that dataset and re-train it using the new model at an accelerated rate because the judgment exhibited in training is what creates the signal that matters, not the foundation model provider.

Optimizing our platform in this way helps us to refocus the conversation on "which model" to "what's the best/most optimal outcome" of the AI function, and to engineer the solution/function set accordingly!

Shanea Leven

I’m Shanea, co-founder and CEO of @Empromptu AI


We built Empromptu AI's Alchemy because we believe the next phase of AI is not just building apps faster.
It is building AI that learns how your business works.

Right now, everyone is rushing to learn or add AI whether you are someone trying to figure out how to survive as an employee or take your expertise and monetize it. Everyone is plugging into the same frontier models, shipping the same generic workflows, and calling it a moat. But if everyone is using the same intelligence, no one is differentiated for long.

We're changing that.


With Empromptu AI's Alchemy you can fine tune a model with no ml expertise simply by building an AI applications. Our platform automatically captures customer usage, your corrections as a subject matter expert, edge cases, and application feedback. Alchemy turns those signals into a fine tune model that can keep itself up to date. Yes a self learning, self improving AI that doesn't cost trillions.

The simple version:
You've spent the last 5-10 years at your job learning really valuable skills whether its engineering, content, or more insane a highly regulated or specialized field.


Now your AI can learn from you so you can own your expertise.

This matters because the best knowledge usually lives inside people’s heads. The accountant knows the exception. The support lead knows when an escalation is real. The operator knows the edge case. The product team knows what “good” actually looks like.

Alchemy gives everyone a way to turn that expertise into AI that gets self improves with up to 98% accuracy.

Thank you for checking us out today. I’d love your feedback, questions, and brutal honesty.

Zolani Matebese

@shanealeven Congrats on the launch team. How do you stop bad user feedback, noisy corrections, or outdated domain assumptions from being absorbed into the tuning loop

Shanea Leven

@zolani_matebese first the user can set the eval and we can automatically optimize towards that goal and as a back you can actually go in and correct the training set manually with natural language

Gaurav Aroraa
💡 Bright idea

Instrumenting production app usage as a fine-tuning data source is genuinely clever. You avoid the cold start problem of manually curating datasets that don't reflect real user behavior. We hit that exact wall building our AI features and ended up with synthetic data that didn't generalize well. What does your quality filtering pipeline look like between raw app interactions and the training checkpoint?

Shanea Leven

@retain_dev Actually thats kinda the best part if I do say so myself Dr Sean Robinson my cofounder has a PhD is computational astrophysics and he invented a way to get up to 98% accurate outputs out of any model! Its built in to happen completely automatically based on the eval that you write and that last mile is what you label. Thanks for the complement. We know the problem is unless youre a founder, the smes and ai/ml eng are usually separate so you never really get that perfect dataset.

Sean Robinson

@retain_dev To add the engineering layer to what Shanea described, the quality filtering is what makes the self-improving loop actually work in practice rather than in theory.

The eval you write upfront becomes the ground truth signal. Every production interaction gets scored against it automatically. What surfaces for labeling isn't a random sample of outputs, it's specifically the cases that fell outside your accuracy threshold. You're not reviewing everything, you're reviewing the exact delta between what the model did and what your domain requires.

That's why the labeled dataset stays small and high signal over time. The model improves, the eval catches the new edge cases, and you're always training on the real distribution of your own users rather than synthetic approximations.

The part that resonates with what you described about SMEs and ML eng being separate is exactly the failure mode we designed around. The eval layer is built to be owned by the domain expert, not the ML team. The person who knows what correct looks like is the one defining the signal, not translating it through an eng team.

Shanea Leven

@retain_dev  @sean_robinson1 Yes. Second what Sean said. We think that this is truly overlooked in the tools out there today.

Jordan Hanson

@retain_dev Appreciate that you're sharing this insight!

We train on raw app interactions, actually. We find that the real-world edge case training often provides the most unique and distinguishing insights that help us to effectively compress training timelines. We often will say "your model is unique because of these edge cases," whereas most people generally find them intrusive or something they should try to mitigate.

The reality is quite the opposite.

Tanjum 🔥 🚀🚀
Love seeing tools that bridge the gap between AI experimentation and real-world deployment. This feels built for teams that are serious about shipping AI products.
Shanea Leven

@tanjum thank you so much yes. We always try to make everything we do accessible.

Sean Robinson

@tanjum Two years of production deployments across healthcare, retail, and financial workflows is what shaped the architecture. The edge cases you only hit in production are exactly what we built around.

Josh Leven

@tanjum I’ve seen this in my work too. It’s been hard with the tools to help non technical folks get excited about labeling. But it really is required to get that high level of accuracy

Shanea Leven

@tanjum  @joshua_leven thanks so much for the support it's been really incredible bringing this to life

Jordan Hanson

@tanjum Exactly! A really common reaction for us is: I feel like this is the perfect conclusion to my Claude Code / Codex projects, a real deployment environment!

Priya K

The 'bring your own expertise' angle is the right way to think about the next wave of AI. We struggle constantly with customer support AI missing the nuance of our specific software updates. If this plugs directly into customer usage signals to self-improve, it solves a massive operational headache. Amazing job @shanea_leven

Boyuan Deng

Congrats on the launch 🎉
Curious, when the dynamic prompt optimization kicks in after 30 runs, does the user get visibility into what changed, or does it just happen silently in the background?

Shanea Leven

@boyuan_deng1 The user absolutely gets visibility into what runs. You can also do this manually as well for all the tinkerers out there.

Sean Robinson

@boyuan_deng1 And the visibility is intentional from an architecture standpoint. A model you can't inspect is one you can't trust in production. You should always know what signal drove a change before it goes into training.

Jordan Hanson

@boyuan_deng1 Our users always know what's happening and why -- we have evaluations and audits built in from the beginning, not added in as an afterthought.

MD Amirul Islam
Impressive vision! Turning real-world AI usage, human feedback, and edge cases into custom models that continuously improve is a compelling approach. The focus on ownership, accuracy, and reducing long-term dependency on external models really stands out. Excited to see how Empromptu AI helps teams build AI products that get smarter over time. Congratulations on the launch! 🚀
Shanea Leven

@1mirul thanks so much. If you own your asset you should be able to decide what you do with it whether you compete or whether you decide to sell that asset but you and everyone else should be able to capitalize on the data you own. Your data is getting scrapped and captured anyway. You should at least be compensated

Sean Robinson

@1mirul The compounding part is what makes it structurally different. Most AI deployments get smarter for the vendor. This one gets smarter for you. That asymmetry is the whole point.

Jordan Hanson

@1mirul Thanks! We simply believe there's a better way to build great, value-additive functions that are governed entirely by AI, and that the discussion about 'AI costs' is actually a discussion about implementation discipline and tightly controlling deployments around known workflows instead of chaotic experimentation everywhere.

Büşra Şeker

I like the part abt capturing corrections and edge cases from real usage. That feels more useful than trying to guess everything upfront. One thing I wonder, how do you keep the model from leaaning the wrong patterns when user feedback is inconsistent or when diff experts correct the same situation in diff ways?

Shanea Leven

@busra_seker1 that's a great question. There is one ground truth so SMEs can override customers but SMEs have to agree what is ground truth

Sean Robinson

@busra_seker1 Exactly right on the ground truth architecture. On the inconsistency problem specifically, that's where the eval becomes the arbitration layer. Conflicting corrections don't both make it through, the eval scores against a defined expected outcome so noise and contradictions get filtered before they touch training. The model learns from signal that passed a quality bar, not raw feedback volume.

Shanea Leven

@busra_seker1  @sean_robinson1 Evals are the most important thing and yet some tools make evals really inefficient for people learning this technology to access. and take advantage of it's true power.

Jordan Hanson

@busra_seker1  I think that what you're talking about is like the '7 out of 10 dentists' sort of conjecturing, and I think it's important to note that if policy is to enable flexibility from like a healthcare provider in making recommendations based on particular signals, it would seem only appropriate that the way the governance works would be similar, sort in a tightly-banded range of outcome likelihoods.

We're focused on optimizing toward outcomes, and find that helping people focus on results instead of the process itself helps to create alignment in the directionality of training outcomes. I hope that makes sense! If you'd like to discuss more, we're available for meetings to talk about your specific use case or scenario.

Germán Merlo

This is awesome Shanea! Wish you all the best on this impressive launch

Shanea Leven

@german_merlo1 Thank you so much. Excited to get this out to the world.

Sean Robinson

@german_merlo1 Thank you Germán!

Andrew West

@german_merlo1 Thank you!

Jordan Hanson

@german_merlo1 Thanks for the support!

Moh

@shanea_leven Congratulations on the launch.

One thing I’m trying to understand from your positioning. If the underlying model providers keep improving rapidly every few months, how do you measure whether the gains your customers see are actually coming from Empromptu’s learning layer versus improvements in the foundation model itself?

It seems like that’s a pretty important distinction because both could lead to better outputs over time, but only one creates a real competitive advantage for the customer. Are you able to quantify that difference in a meaningful way?

Shanea Leven

@shanea_leven  @moh_codokiai yes absolutely you will actually see the performance and accuracy improvements directly in the product. And frontier models have deprecated the ability for you to fine tune their models any more. Also a model trained on your data for your product and your users is always going to be eventually more accurate than a general model.

Moh

@shanea_leven  @shanealeven That makes sense. I agree a model trained on a company’s own users should eventually outperform a general-purpose model for that specific use case.

What I’m curious about is how you prove that improvement to customers. Do you have any benchmarking or evaluation framework that shows accuracy before and after the learning process, or is the validation mainly based on production outcomes and user feedback?

Shanea Leven

@shanea_leven  @moh_codokiai yes! It's literally built into the product and you can see in our optimizer the performance. It's actually one of the things that my co-founder invented. Our optimizer is what our entire platform is based on. We have benchmarks that models trained on our platform are 30% more accurate than frontier models

Jordan Hanson

@shanea_leven  @moh_codokiai First, we're model agnostic -- the user can specify whatever they'd like to use, and we'll adapt the 'baseline' accordingly; the difference is often, for our users, the other efficiencies, such as the training, vertical integrations and other optimization components that, combined, make an enormous difference in both the quality of life they experience while building (setting up actual databases, auth sequencing, et cetera, with real-world best practices), and doing all of the AI-focused functionality from a template-based system that let's me say something like: "I want to build a growth function, and that's going to be me following up with leads we haven't talked to in more than 3 weeks, and I want that sort of outreach to look like this."

I haven't met a business leader yet who wants to replace someone on their team with a system that does that at 98% accuracy. And that's the difference they walk away remembering.

123
Next
Last