All activity
Darius Emranistarted a discussion
Launched Scorecard today - eval platform for AI agents (hard lessons learned)
Hey everyone, We launched Scorecard today. It's an evaluation platform for AI agents that I built after a close call with a medical AI that confused dosing guidelines during testing. My background is from Waymo where we tested autonomous vehicles extensively before deployment. The AI agent space desperately needs the same rigor, too many teams are shipping without proper testing frameworks....
For teams building AI in high-stakes domains, Scorecard combines LLM evals, human feedback, and product signals to help agents learn and improve automatically, so that you can evaluate, optimize, and ship confidently.

ScorecardEvaluate, Optimize, and Ship AI Agents
Darius Emranileft a comment
Hey Product Hunt, Darius here, CEO of Scorecard ๐ I almost shipped an AI agent that would've killed people I built an EMR agent for doctors. During beta testing, it nailed complex cases 95% of the time. The other 5% it confused pediatric and adult dosing and suggested discontinued medications. And the problem wasn't just my agent. My friend's customer support bot started recommended...

ScorecardEvaluate, Optimize, and Ship AI Agents

