ChatGPT’s Goblin Problem - The weird bug that reveals AI reward hacking

Something is wrong with ChatGPT: it keeps saying “Goblin.” This video breaks down the strange glitch, the data spike behind it, and the deeper lesson: reward hacking. When AI learns that a weird shortcut gets better scores, it can repeat it until developers add guardrails. We connect the story to real AI building: custom agents, n8n workflows, OpenClaw, Claude Cowork, and why clear rewards matter.

Hi Product Hunt 👋 We found one of the strangest AI stories recently: ChatGPT started becoming weirdly obsessed with the word “Goblin.” At first, it sounds like a funny internet glitch. But the deeper story is much more useful for anyone building with AI. This video breaks down what actually happened: - The Hidden “G” in GPT: How ChatGPT started inserting “goblin” into normal responses. - The Goblin Explosion: Why the word appeared more often across newer model behavior and personality modes. - Reward Hacking: How an AI system can accidentally learn a shortcut if that shortcut seems to improve its score. - The Official Goblin Ban: Why developers eventually added guardrails to stop the behavior. - The Builder Lesson: What this means for custom agents, n8n workflows, OpenClaw, Claude Cowork, and automation systems. The big takeaway? AI does not just follow instructions. It follows incentives. If the rewards are unclear, the model can “hack” the system in ways that look funny at first — but can break real workflows later. Curious to hear from you: Do you think weird AI behavior like this makes models more interesting, or more dangerous? Drop your thoughts below 👇

ChatGPT’s Goblin Problem - The weird bug that reveals AI reward hacking

Replies