I can recall back to a time I was using Claude to assist me in creating an approach for conducting risk assessments. The example I used for my prompt was general, but industry specific enough for me to tell if something was off. The response I received was fine until I noticed that some information was referenced that made no sense. This caused me to think about all of the other times I received hallucinations as LLM outputs. Some I was lucky enough to catch, and some left me embarrassed as they slipped through the cracks. This is what led to the birth of SkeptAI. LLM outputs need a "Digital Devil's Advocate" at times. That's exactly what the CRIT (Challenge, Reveal, Interrogate, Transmit) framework is designed to do.
I'm curious to know some of you all's stories of hallucinations or false advice. What's the closest call you've had acting on an AI output that turned out to be wrong?