fmerian

Agenta - Open-source prompt management & evals for AI teams

by
Agenta is an open-source LLMOps platform for building reliable AI apps. Manage prompts, run evaluations, and debug traces. We help developers and domain experts collaborate to ship LLM applications faster and with confidence.

Add a comment

Replies

Best
Mahmoud Mabrouk

Hi Product Hunt 👋

I'm Mahmoud, co-founder of Agenta. The team and I are excited to launch Agenta today.


What is Agenta?

Agenta is an open-source platform that helps AI teams ship reliable LLM applications.


The Problem

Building a demo is easy. Building a reliable app is hard.

  • Small prompt changes improve one case but break another

  • Subject matter experts and engineers can't collaborate easily (prompts end up scattered across code and spreadsheets)

  • Teams don't know if their prompts are working in production

How Agenta Solves This

  • Playground for the whole team. Everyone can experiment with prompts and models, not just engineers.

  • Deploy without code changes. Anyone can push a working prompt instantly.

  • Test before you ship. Create test cases and validate prompts against them (no more vibe-based prompting).

  • Monitor in production. Track mistakes, user feedback and costs after deployment.

Who's Using Agenta

Hundreds of teams use Agenta Cloud (generous free tier) or self-host it. They run more experiments, ship AI features faster, and collaborate in one place.

Try It Yourself

⭐ GitHub: https://github.com/agenta-ai/agenta

☁️ Cloud (free, no credit card): https://cloud.agenta.ai

📚 Docs: https://agenta.ai/docs

Looking forward to your feedback!

Savian Boroanca

@mabrouk, congrats on the launch! you and the team are building something important here. we are fellow Antler company building a cloud platform to optimize the devops cycle. feel free to reach out on LinkedIn and let's chat. I think we can have a win-win here. godspeed!

Mahmoud Mabrouk

@savian_boroanca Thanks for your kind words! I will reach out!

Savian Boroanca

@mabrouk, looking forward to it! have a fantastic launch day :-)  

Siful

Nice to see a tool that lets both devs and non-tech team members collaborate. Best wishes to the team. One thing I am curious how Agent handles versioning for prompts and evaluations?

Mahmoud Mabrouk

@getsiful Thanks for your comment! For prompt versioning, we use a Git-like system where you can create branches. Each branch has its own prompt history, so team members can work on their versions independently and then deploy to production. The cool thing is that when you deploy to an environment like production, you don't need to change any code. It all happens within Agenta, and the agent fetches the prompts directly from there.

For evaluation, you create test sets and define evaluators (the metrics you want to measure). When you run evaluations, they connect directly to your prompts so you can see how changes affect performance.

henricook

I recently evaluated Agenta vs Langfuse for Prompt Management and tracing. I went with Langfuse this time but all the best for this project. Open Source FTW.

Something that would really set you apart, that no one else seems to have, would be approval workflows for Prompt management. Managing prompts in the UI is great but in a remotely business-y environment I can't let one person have the ability to push new prompts without checks and balances. We'll probably have to manage this with source control (e.g. Github) and write some script to push prompts up to Langfuse once they gain approval.

Mahmoud Mabrouk

@henricook Thanks for the feedback. We like Langfuse too; we know the team and we're both based in Berlin :)

One differentiator for us is the focus on collaboration between subject matter experts (non-technical) and developers. We're building a workflow that's easy to use from the UI and feature-equal to what you can do from code.

We've discussed approval workflows on the team. Right now we solve this through role-based access control. You can configure Agenta so part of the team works on prompts outside of production (we have a branching system for this, so they can work on their branches), and only certain members (like team leads) can deploy prompts to production.

Lyndsay H. Roberts

Tried a prompt tracing tool last year and TBH the hardest part wasn’t the traces themselves but connecting them to test suites. Agenta's evals + test-case approach sounds promising because we need deterministic tests for regression checks. In our case we was able to catch a prompt drift only after a month, so automated evals would be huge. Would love to know how easy it is to author evaluators for domain-specific metrics. IMO good CI hooks and a lightweight API make the difference between a demo and something you can rely on in production.

Fernando Scharnick

@mabrouk Love seeing more momentum in the LLMOps space, especially with an open-source approach. Most teams trying to ship AI features hit the same wall: lots of prompts, zero visibility, and no reliable way to evaluate or debug what’s actually happening under the hood.

A platform that unifies prompts, evals, and trace debugging feels like a real unlock for both devs and domain experts who don’t want to depend on guesswork.

Curious: what’s been the biggest challenge so far, capturing consistent traces, defining evaluation metrics, or helping teams collaborate around prompt changes?

Mahmoud Mabrouk

@fernando_scharnick That depend on the team. For dev-only teams, usually the starting point is observability. They want to debug their agent and start from there. For cross-functional teams, the biggest pain is usually collaboration on prompts.

Fernando Scharnick

@mabrouk Makes total sense, devs want to see inside the black box first, while cross-functional teams just need a shared place to iterate without stepping on each other.

Always interesting how the same AI stack creates totally different bottlenecks depending on who’s using it.

Karena Patch

TBF this looks promising. Curious how Agenta handles traces when you have async, high-latency LLM calls. We've seen trace sampling drop important edge cases in our infra and that bit us in prod. Are evaluators configurable to run off real traffic vs synthetic test sets? Also, who stores the logs if self-hosted, does it require extra infra or it included?

Arda Erzin

Hi everyone! 👋 We built Agenta to have a way for AI teams to collaborate on prompts. We offer a complete workflow for building reliable AI apps, form prompt engineering, to evaluation and observability. We'd love to hear your thoughts, feedback or ideas — thanks for checking us out! 🙌

Juan Pablo Vega

Hi there!

Agenta is a workspace where AI teams collaborate effectively to build reliable AI applications.

Whether you’re building interactive chat apps, single-prompt workflows, or more agentic systems, Agenta keeps everything in one place instead of having prompts, experiments, and evaluations scattered across different tools.

We’d really appreciate your feedback or ideas, and thanks for taking the time to check it out at cloud.agenta.ai (free forever) or contact us at agenta-hq.slack.com for a demo !

Chilarai M

Oh wow, this is really amazing. Collaborating with the team on prompts and debugging with evaluations is a really cool idea. It seems like AI tools are really evolving :) Also, I see APIs, and that makes it even more exciting.
Would love to try that out.

Mahmoud Mabrouk

@chilarai Thanks for the kind words! Let me know your feedback!

Chilarai M

@mabrouk Absolutely! I'll share detailed feedback as I try things out.

Also, since I’m building Swytchcode (AI-powered API workflow + testing engine), I'd love to explore if there’s room for open collaboration. Agenta’s evaluation and debugging layer feels super complementary to what we’re doing on the API side.

Happy to sync on LinkedIn, if you’re open to it!

Mahmoud Mabrouk

@chilarai Definitely! We'll reach out!

Cruise Chen

seldomly can see an opensource project for LLMops like Agenta! Great launch and congrats team!

Mahmoud Mabrouk

@cruise_chen Thanks Cruise!

123
Next
Last