We've been building AI support agents for a while now, and we kept hitting the same wall with standard RAG implementations: The Pizza Problem.
You slice a document (pizza) into arbitrary chunks and hope the retrieval system grabs the right slice. But often, it grabs half a mushroom and some unrelated crust. The real issue isn't just bad answers it's that you can't measure the accuracy of a random slice. If the retrieval is "mostly" correct, how do you score that?