Alex U

Polarity - The Self-Improvement Stack For agents

by
Polarity monitors every agent decision in production, surfaces failure patterns before users hit them, and turns trajectories into evals that compound your agent’s reliability over time!

Add a comment

Replies

Best
Alex U
Hey Product Hunt 👋 Alex here, founder of Polarity. Most agent teams I've talked to have a 95% pass rate on their eval suite and a 60% pass rate in production. The gap is where products die, and most teams find out from a customer ticket hours later. Polarity closes that loop gap with ease: → craft agent behaviors in the dashboard → learns from agent behaviour and finds new opportunities for tracking → Slack alerts the second your agent misbehaves.  Wrong tool call, skipped guardrail, latency going past thresholds; it’ll all show up in your team's slack channel with the trace. Three SDKs currently supported:
 → Go → Python → TypeScript Leave any feedback in the comments, thank you product hunt! - Alex ❤️
Jay Chopra

Hi everyone! My name is Jay and I'm glad you're reading this :)

We're super excited to have@polaritycoout and ready for devs to start integrating within their Slack Channels!
Given the validation with design partners, VCs, and testers- we're excited to release this to the public after many
days ideating and building.

With a full revamp of the site and its core, would love to hear how you find the product launch: www.polarity.so
We're accepting as many demos as time allows this week, request here

Don't forget to follow the company page for future releases!🫡

Polarity Team -- I’m in the corner ;p

ᐯenus Bhatia

@polarityco  @jaychopra love it! let's fuckin gooo!

Othman Katim

How much labeling does it need from humans before the evaluations are actually useful?

Jay Chopra

@othman_katim Great question! Depending on the agent’s functions and where it’s incorporated, we’ve found small- to medium-sized PRs that include the agent is often enough to give teams the metrics they need for evaluations to become useful.

TLDR: not that much labeling is required for accurate results, more always helps :)

If you want more info, check out our docs: https://docs.polarity.so/

Farrukh Butt

The production gap is a real problem. Eval suites can look fine, but once agents hit messy user behavior, traces and fast Slack alerts become much more useful than another dashboard nobody checks.

Ryan Mason

Would love a free trial from this to confirm fit for my 31 agentic/queue 36 skills, TAM authority enforcement system. Ai citation readiness and answers interpretation verification infra.. Python(11), Node(16) and TS(4)?

mihir
🧐 Good find

The landing page's hero section has an issue. Instead of showing a case study for cal, it's navigating to ohm, or you placed the link at wrong place.

Akshaypal

The 95% eval / 60% production gap is the most honest stat I've seen on a launch page in a while.. that's exactly the failure mode I hit adding LangSmith tracing to my own agentic project. Evals pass because you wrote them against known failure cases, production breaks on the ones you didn't think of. Polarity learning new tracking opportunities from actual agent behavior is the right direction.. you can't write evals for failure modes you haven't seen yet.

Curious how the failure pattern detection actually works under the hood.. is it clustering similar failed trajectories, or something more structured like anomaly detection against a baseline of successful runs? That distinction matters for how quickly it catches genuinely novel failure modes vs just variations of known ones.