Open-source prompt management & evals for AI teams

Agenta - Open-source prompt management & evals for AI teams

by•2mo ago

Agenta is an open-source LLMOps platform for building reliable AI apps. Manage prompts, run evaluations, and debug traces. We help developers and domain experts collaborate to ship LLM applications faster and with confidence.

Replies

Best

Agenta

Maker

Hi Product Hunt 👋

I'm Mahmoud, co-founder of Agenta. The team and I are excited to launch Agenta today.

What is Agenta?

Agenta is an open-source platform that helps AI teams ship reliable LLM applications.

The Problem

Building a demo is easy. Building a reliable app is hard.

Small prompt changes improve one case but break another
Subject matter experts and engineers can't collaborate easily (prompts end up scattered across code and spreadsheets)
Teams don't know if their prompts are working in production

How Agenta Solves This

Playground for the whole team. Everyone can experiment with prompts and models, not just engineers.
Deploy without code changes. Anyone can push a working prompt instantly.
Test before you ship. Create test cases and validate prompts against them (no more vibe-based prompting).
Monitor in production. Track mistakes, user feedback and costs after deployment.

Who's Using Agenta

Hundreds of teams use Agenta Cloud (generous free tier) or self-host it. They run more experiments, ship AI features faster, and collaborate in one place.

Try It Yourself

⭐ GitHub: https://github.com/agenta-ai/agenta

☁️ Cloud (free, no credit card): https://cloud.agenta.ai

📚 Docs: https://agenta.ai/docs

Looking forward to your feedback!

Report

2mo ago

@mabrouk, congrats on the launch! you and the team are building something important here. we are fellow Antler company building a cloud platform to optimize the devops cycle. feel free to reach out on LinkedIn and let's chat. I think we can have a win-win here. godspeed!

Report

2mo ago

Agenta

Maker

@savian_boroanca Thanks for your kind words! I will reach out!

Report

2mo ago

@mabrouk, looking forward to it! have a fantastic launch day :-)

Report

2mo ago

EasyFrontend

Nice to see a tool that lets both devs and non-tech team members collaborate. Best wishes to the team. One thing I am curious how Agent handles versioning for prompts and evaluations?

Report

2mo ago

Agenta

Maker

@getsiful Thanks for your comment! For prompt versioning, we use a Git-like system where you can create branches. Each branch has its own prompt history, so team members can work on their versions independently and then deploy to production. The cool thing is that when you deploy to an environment like production, you don't need to change any code. It all happens within Agenta, and the agent fetches the prompts directly from there.

For evaluation, you create test sets and define evaluators (the metrics you want to measure). When you run evaluations, they connect directly to your prompts so you can see how changes affect performance.

Report

2mo ago

MailDrop.dev

I recently evaluated Agenta vs Langfuse for Prompt Management and tracing. I went with Langfuse this time but all the best for this project. Open Source FTW.

Something that would really set you apart, that no one else seems to have, would be approval workflows for Prompt management. Managing prompts in the UI is great but in a remotely business-y environment I can't let one person have the ability to push new prompts without checks and balances. We'll probably have to manage this with source control (e.g. Github) and write some script to push prompts up to Langfuse once they gain approval.

Report

2mo ago

Agenta

Maker

@henricook Thanks for the feedback. We like Langfuse too; we know the team and we're both based in Berlin :)

One differentiator for us is the focus on collaboration between subject matter experts (non-technical) and developers. We're building a workflow that's easy to use from the UI and feature-equal to what you can do from code.

We've discussed approval workflows on the team. Right now we solve this through role-based access control. You can configure Agenta so part of the team works on prompts outside of production (we have a branching system for this, so they can work on their branches), and only certain members (like team leads) can deploy prompts to production.

Report

2mo ago

Tried a prompt tracing tool last year and TBH the hardest part wasn’t the traces themselves but connecting them to test suites. Agenta's evals + test-case approach sounds promising because we need deterministic tests for regression checks. In our case we was able to catch a prompt drift only after a month, so automated evals would be huge. Would love to know how easy it is to author evaluators for domain-specific metrics. IMO good CI hooks and a lightweight API make the difference between a demo and something you can rely on in production.

Report

2mo ago

@mabrouk Love seeing more momentum in the LLMOps space, especially with an open-source approach. Most teams trying to ship AI features hit the same wall: lots of prompts, zero visibility, and no reliable way to evaluate or debug what’s actually happening under the hood.

A platform that unifies prompts, evals, and trace debugging feels like a real unlock for both devs and domain experts who don’t want to depend on guesswork.

Curious: what’s been the biggest challenge so far, capturing consistent traces, defining evaluation metrics, or helping teams collaborate around prompt changes?

Report

2mo ago

Agenta

Maker

@fernando_scharnick That depend on the team. For dev-only teams, usually the starting point is observability. They want to debug their agent and start from there. For cross-functional teams, the biggest pain is usually collaboration on prompts.

Report

2mo ago

@mabrouk Makes total sense, devs want to see inside the black box first, while cross-functional teams just need a shared place to iterate without stepping on each other.

Always interesting how the same AI stack creates totally different bottlenecks depending on who’s using it.

Report

2mo ago

TBF this looks promising. Curious how Agenta handles traces when you have async, high-latency LLM calls. We've seen trace sampling drop important edge cases in our infra and that bit us in prod. Are evaluators configurable to run off real traffic vs synthetic test sets? Also, who stores the logs if self-hosted, does it require extra infra or it included?

Report

2mo ago

Agenta

Maker

Hi everyone! 👋 We built Agenta to have a way for AI teams to collaborate on prompts. We offer a complete workflow for building reliable AI apps, form prompt engineering, to evaluation and observability. We'd love to hear your thoughts, feedback or ideas — thanks for checking us out! 🙌

Report

2mo ago

Agenta

Maker

Hi there!

Agenta is a workspace where AI teams collaborate effectively to build reliable AI applications.

Whether you’re building interactive chat apps, single-prompt workflows, or more agentic systems, Agenta keeps everything in one place instead of having prompts, experiments, and evaluations scattered across different tools.

We’d really appreciate your feedback or ideas, and thanks for taking the time to check it out at cloud.agenta.ai (free forever) or contact us at agenta-hq.slack.com for a demo !

Report

2mo ago

Swytchcode

Oh wow, this is really amazing. Collaborating with the team on prompts and debugging with evaluations is a really cool idea. It seems like AI tools are really evolving :) Also, I see APIs, and that makes it even more exciting.
Would love to try that out.

Report

2mo ago

Agenta

Maker

@chilarai Thanks for the kind words! Let me know your feedback!

Report

2mo ago

Swytchcode

@mabrouk Absolutely! I'll share detailed feedback as I try things out.

Also, since I’m building Swytchcode (AI-powered API workflow + testing engine), I'd love to explore if there’s room for open collaboration. Agenta’s evaluation and debugging layer feels super complementary to what we’re doing on the API side.

Happy to sync on LinkedIn, if you’re open to it!

Report

2mo ago

Agenta

Maker

@chilarai Definitely! We'll reach out!

Report

2mo ago

Agnes AI

seldomly can see an opensource project for LLMops like Agenta! Great launch and congrats team!

Report

2mo ago

Agenta

Maker

@cruise_chen Thanks Cruise!

Report

2mo ago

1 2 3

Agenta - Open-source prompt management & evals for AI teams

Replies

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads