Hey Product Hunt! 👋 We built ContextCheck to solve a challenge many of us face: how do we reliably test LLMs and RAG systems? After battling with inconsistent outputs, hidden regressions, and sneaky hallucinations in production, we wanted a systematic way to validate AI systems before deployment. ContextCheck automates the heavy lifting - generating test queries, detecting regressions, and assessing hallucinations. Everything's configurable via YAML and fits right into your CI pipeline. We'd love to hear about your LLM testing challenges and workflows. What aspects of AI testing keep you up at night? How do you ensure reliability in production? If you find this useful, a ⭐️ on GitHub would mean a lot! It helps make the project more visible to other developers who might benefit from it.

ContextCheck

Framework for testing and evaluating LLMs, RAG & chatbots.

Framework for testing and evaluating LLMs, RAG & chatbots.