Hello Product Hunt,
I'm Alex, here with Jean-Marie, Andrey, and the rest of the Giskard team. We're thrilled and slightly nervous to present Giskard 2.0. This has been 2 years in the making, involving a group of passionate ML engineers, ethicists, and researchers, in partnerships with leading standard organizations, such as AFNOR and ISO.
So, why Giskard? Because we understand the dilemma you face. Manually creating test cases, crafting reports, building dashboards, and enduring endless review meetings - testing ML models can take weeks, even months!
With the new wave of Large Language Models (LLMs), testing models becomes an even more impossible mission. The questions keep coming: Where to start? What issues to focus on? How to implement the tests? 🫠
Meanwhile, the pressure to deploy quickly is constant, often pushing models into production with unseen vulnerabilities. The bottleneck? ML Testing systems.
Our experience includes leading ML Engineering at Dataiku and years of research in AI Ethics. We saw many ML teams struggling with the same issues: slowed down by inefficient testing, allowing critical errors and biases to slip into production.
Current MLOps tools fall short. They lack transparency and don’t cover the full range of AI risks: robustness, fairness, security, efficiency, you name it. Add to this compliance to AI regulations, some of which could be punitive, costing up to 6% of your revenue (EU AI Act).
Enter Giskard:
📦 A comprehensive ML Testing framework for Data Scientists, ML Engineers, and Quality specialists. It offers automated vulnerability detection, customizable tests, CI/CD integration, and collaborative dashboards.
🔎 An open-source Python library for automatically detecting hidden vulnerabilities in ML and LLMs, tackling issues from robustness to ethical biases.
📊 An enterprise-ready Testing Hub application with dashboards and visual debugging, built to enable collaborative AI Quality Assurance and compliance at scale.
∭ Compatibility with the Python ML ecosystem, including Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain.
↕️ A model-agnostic approach that serves tabular models, NLP, and LLMs. Soon, we'll also support Computer Vision, Recommender Systems, and Time Series.
Equip yourself with Giskard to defeat your AI Quality issues! 🐢🛡️
We build in the open, so we’re welcoming your feedback, feature requests and questions.
For further information:
Website: https://www.giskard.ai/
GitHub: https://github.com/Giskard-AI/gi...
Discord Community: https://gisk.ar/discord
Best,
Alex & the Giskard Team
Report
@a1x Great product and indeed strong potential to improve MLOps across the board
Report
Exciting! Does Giskard detect data drift (when production data significantly changes from the training set) and how?
Hi @ari_bajo_rouvinen . Yes, data drift tests are part of our open source library
https://docs.giskard.ai/en/lates...
It's also possible to combine these tests in test suites and execute them regularly with CI to monitor drift in production
@ero1311 Yes totally, the plan is to cover all big families of AI models!
Report
Maker
@ero1311 thanks for your comment! Yes we are currently looking at how we can support generative vision models, you can stay updated on that by following our newsletter or LinkedIn page! 🙏 We'll definitely keep you posted.
Report
Congratulations on the launch team! I'm very curious to know about how giskard detects biases and what is the LLM behind it?
Report
Maker
Thank you @manoj_11! For each model type (tabular, NLP and LLM) we detect biases differently. For LLMs, we have various methods to do it:
1. Generate edge case prompts based on your use case, context, and task and use GPT-4 as a judge to evaluate your LLM's response for hallucination & misinformation.
2. Generate adversarial prompts to try and trick your LLM into generating an output that promotes harmful activities or stereotypes.
3. Inject perturbations and carefully crafted prompts designed to make your model break free from instructions (Injection attacks).
4. Push your LLM to disclose sensitive information and detect if it does.
5. etc. For more details and reference check our documentation: https://docs.giskard.ai/en/lates...
Great job @a1x I love the direction you've taken with the product. Is the LLM monitoring solution part of the core product or a separate offering?
Report
Maker
@a1x@_felx thanks for your support! LLM monitoring is the third pillar of our current suite of products. We have a beta version at the moment and are looking forward to improving it over time with the help of our design partners. Feel free to reach out if you're interested in trying it out!
Artificial intelligence is expanding rapidly, and product teams are under pressure to integrate new AI features into their products quickly. While it may be easy to put together a prototype for a demonstration, releasing it for production comes with many other considerations.
In particular, LLMs can produce hallucinations (misinformation) and show biases. These errors can harm both product quality and the trust users place in our technology.
@mehdi_rifai Thanks! We do plan to go much deeper on LLM Testing methods in the coming months, while answering the community's bug reports and feature requests.
We also plan next year to launch a beta version to test Computer Vision models. Stay tuned!
Giskard
Giskard
Manot
Giskard
Twenty
Typeform
Giskard
Giskard