SkeptAI

The adversarial AI agent that challenges LLM outputs

5 followers

The adversarial AI agent that challenges LLM outputs

5 followers

SkeptAI is the adversarial reasoning layer that challenges AI outputs before you act on them. Paste any response from Claude, ChatGPT, or Gemini. CRIT runs four structured passes, then it takes action. CRIT generates a revised output with all critical findings addressed, runs web verification on factual claims inline, and exports a GitHub issue template if you need to escalate. We built this because LLMs optimize for confidence. CRIT optimizes for honesty.

Interactive

Free

Launch tags:Productivity•Developer Tools•Artificial Intelligence

Launch Team / Built With

AssemblyAI Voice Agent API — One API to build production-ready voice agents

One API to build production-ready voice agents

Promoted

Maker

📌

Hey Everyone, My name is Daton Pope, and I am the builder of SkeptAI. The thing that inpired me to build this, was me using Claude to analyze a vendor policy (my background is in ERM). The output was almost perfect. However, it cited Anthropic's API as supporting image generation, which it doesn't. This set off an immediate red flag for me. Had I used the output for a real work-related task, it could have easily caused a lot of trouble. And there lies the problem. LLMs aren't wrong randomly. They're wrong confidently, and in ways that are hard to catch if you're simply browsing through and not being vigilant. So, I figured, why not do something about it? In response, I created CRIT! CRIT (Challenge, Reveal, Interrogate, Transmit) is the adversarial framework I built to fix the problem for myself. This eventually turned into SkeptAI. A few things I know you'll ask: WHY NOT JUST ASK THE LLM TO CRITIQUE ITSELF? I tried it. When you ask a model to evaluate its own output, it defends its own logic. CRIT solves this by always routing to a different model for the critique pass. The Claude output gets challenged by GPT-4o, and vice versa. The "CRITIQUED BY" label in every report makes this transparent. WHY ISN'T THE ORIGINAL PROMPT OPTIONAL? CRIT evaluates outputs as decision-making artifacts (the thing you're about to act on), regardless of how it was generated. The optional prompt field improves the Completeness score when provided. It doesn't gate the analysis when absent. Be on the lookout for the ability to provide the original prompt in v0.2. WHAT'S THE MOAT IF THE PROMPTS ARE OPEN SOURCE? The prompts are the framework. The action layer, REMEDIATE, VERIFY, ESCALATE , is the product...and domain modules (Compliance CRIT, Contract CRIT, Architecture CRIT) require the kind of expertise that compounds over time. This is exactly why I've made this an open-source project. The community builds the moat. Hit the playground and run the "Try An Example" button. The output is deliberately loaded with bad assumptions. Watch what happens. Thanks for stopping by and trying out the playground! I'm more than happy to answer any questions. I look forward to you all's feedback! Daton

Report

2mo ago

Maker

Great news! You can now paste your original prompt for sharper Completeness scoring, and bring your own API key for unlimited analyses!

Report

2mo ago

Forum Threads

p/skeptai

•

2mo ago

What LLM outputs have cost you the most when you acted on them without checking?

I can recall back to a time I was using Claude to assist me in creating an approach for conducting risk assessments. The example I used for my prompt was general, but industry specific enough for me to tell if something was off. The response I received was fine until I noticed that some information was referenced that made no sense. This caused me to think about all of the other times I received hallucinations as LLM outputs. Some I was lucky enough to catch, and some left me embarrassed as they slipped through the cracks. This is what led to the birth of SkeptAI. LLM outputs need a "Digital Devil's Advocate" at times. That's exactly what the CRIT (Challenge, Reveal, Interrogate, Transmit) framework is designed to do.

I'm curious to know some of you all's stories of hallucinations or false advice. What's the closest call you've had acting on an AI output that turned out to be wrong?

View all