Hey everyone! 👋 I built EvalDesk because I kept running into the same problem: the people who actually know if an AI answer is correct — doctors, lawyers, teachers, compliance officers — can't use any existing evaluation tool because they all require writing Python code. Tools like DeepEval and Langfuse are great for engineers. But when I needed my non-technical team to validate AI outputs, I had nothing to give them. No-code alternatives like Confident AI charge $500+/month and lock you in. So I built EvalDesk to fill that gap — open source, self-hostable, and genuinely no-code. Your domain experts write test cases in plain English, paste an agent URL, hit Run, and rate answers with Pass/Fail/Partial. That's it. It runs with one Docker command. No cloud dependency. Your data stays on your server. Would love to hear what you think — what features would make this useful for your team?

EvalDesk - Test AI agents without writing code

Replies