Nicolas Grenié

Giskard - Open-source testing framework for LLMs & ML models

Fast LLM & ML testing at scale 🛡️ Detect hallucinations & biases automatically 🔍 Enterprise Testing Hub ☁️ Self-hosted / cloud 🤝 Integrated with 🤗, MLFlow, W&B From tabular models to LLMs, Giskard handles everything! https://github.com/Giskard-AI/gi...

Add a comment

Replies

Best
Alex Combessie
Hello Product Hunt, I'm Alex, here with Jean-Marie, Andrey, and the rest of the Giskard team. We're thrilled and slightly nervous to present Giskard 2.0. This has been 2 years in the making, involving a group of passionate ML engineers, ethicists, and researchers, in partnerships with leading standard organizations, such as AFNOR and ISO. So, why Giskard? Because we understand the dilemma you face. Manually creating test cases, crafting reports, building dashboards, and enduring endless review meetings - testing ML models can take weeks, even months! With the new wave of Large Language Models (LLMs), testing models becomes an even more impossible mission. The questions keep coming: Where to start? What issues to focus on? How to implement the tests? 🫠 Meanwhile, the pressure to deploy quickly is constant, often pushing models into production with unseen vulnerabilities. The bottleneck? ML Testing systems. Our experience includes leading ML Engineering at Dataiku and years of research in AI Ethics. We saw many ML teams struggling with the same issues: slowed down by inefficient testing, allowing critical errors and biases to slip into production. Current MLOps tools fall short. They lack transparency and don’t cover the full range of AI risks: robustness, fairness, security, efficiency, you name it. Add to this compliance to AI regulations, some of which could be punitive, costing up to 6% of your revenue (EU AI Act). Enter Giskard: 📦 A comprehensive ML Testing framework for Data Scientists, ML Engineers, and Quality specialists. It offers automated vulnerability detection, customizable tests, CI/CD integration, and collaborative dashboards. 🔎 An open-source Python library for automatically detecting hidden vulnerabilities in ML and LLMs, tackling issues from robustness to ethical biases. 📊 An enterprise-ready Testing Hub application with dashboards and visual debugging, built to enable collaborative AI Quality Assurance and compliance at scale. ∭ Compatibility with the Python ML ecosystem, including Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain. ↕️ A model-agnostic approach that serves tabular models, NLP, and LLMs. Soon, we'll also support Computer Vision, Recommender Systems, and Time Series. Equip yourself with Giskard to defeat your AI Quality issues! 🐢🛡️ We build in the open, so we’re welcoming your feedback, feature requests and questions. For further information: Website: https://www.giskard.ai/ GitHub: https://github.com/Giskard-AI/gi... Discord Community: https://gisk.ar/discord Best, Alex & the Giskard Team
Adrian
@a1x Great product and indeed strong potential to improve MLOps across the board
Nicolas Grenié
Artificial intelligence is expanding rapidly, and product teams are under pressure to integrate new AI features into their products quickly. While it may be easy to put together a prototype for a demonstration, releasing it for production comes with many other considerations. In particular, LLMs can produce hallucinations (misinformation) and show biases. These errors can harm both product quality and the trust users place in our technology.
Alexandr Builov
Hey Alex and the Giskard Team! 👋 I'm amazed at the comprehensive approach Giskard 2.0 is providing to tackle ML Testing. The fact this is open-source and compatible with multiple platforms makes it even more appealing. 🚀 Could you shed some light on how Giskard manages AI risk range and the mitigation process? Also, it's impressive how you aim to cover different model types. 🙌 Looking forward to seeing Giskard revolutionize ML Testing!
Luca Martial
@builov84 thanks so much for the kind words! 🌟 Our detailed open-source documentation pages outline the specific vulnerabilities that Giskard 2.0 can detect. For mitigation strategies, our enterprise hub offers robust debugging features, designed to not only identify risks but also provide actionable insights into the sources of the issues detected. Feel free to dive into our docs and reach out with any further queries! 🚀🛠️ https://docs.giskard.ai/en/lates...
Rabah Abdul Khalek
Hi @builov84, in order to estimate the risk range, 1. We first curate a list of the most relevant and checked issues that reflect critical risks if they are detected. For tabular and NLP models, we have several categories: Performance, Robustness, Calibration, Data Leakage, Stochasticity, etc. For LLMs, we have Injection attacks, Hallucination & misinformation, Harmful content generation, Stereotypes, Information disclosure, and Output formatting. 2. Under each category, we mostly rely on tailored statistical procedures and metrics to estimate the probability of occurrence, statistical significance, and severity level for each of the issues found. We provide the option to use procedures like Benjamini-Hochberg to decrease the false discovery rate. We also provide an explanation of the impact an issue could have on your ML pipeline. 3. Although our default risk range assessment is carefully crafted, we provide the user with the option to set up his own by configuring the statistical threshold and severity levels based on his own use case if needed. Our Giskard Hub is then dedicated to the mitigation process, 1. From the issues found during the scan, the user can automatically generate a set of tests and upload them into our Hub. Each of the tests generated reflects an issue found and embeds a quantitative metric (the one we relied on to estimate the severity level). 2. Once uploaded to the Hub, it becomes possible to customize these tests, use them with other models and datasets for comparison, and most importantly, use them to debug a specific model by investigating, one by one, the samples in your data that made these tests fail. 3. While debugging, we equip you with explanation tools like SHAP in order to shed some light on the features' importance for tabular and NLP models. 4. Per sample investigated, we provide you automatically with additional insights that allow you to detect critical patterns in your data, create additional tests, and assess the stability of your model against small data perturbations.
Ari Bajo
Exciting! Does Giskard detect data drift (when production data significantly changes from the training set) and how?
Andrey Avtomonov
Hi @ari_bajo_rouvinen . Yes, data drift tests are part of our open source library https://docs.giskard.ai/en/lates... It's also possible to combine these tests in test suites and execute them regularly with CI to monitor drift in production
Ömürcan Cengiz
Congrats on the launch 🚀 Loved it's open source 👌
Luca Martial
@omurcancengiz thanks for the support! Feel free to test it out since it's open-source! 🚀
Jean-Marie John-Mathews
@omurcancengiz Thank you for your support 🙏!
Xie
Hey Giskard team, big congrats on the launch! Giskard 2.0 sounds like a real game-changer. Just one question: How does Giskard handle the detection of ethical biases in ML models? And a suggestion - how about a feature simulating the impacts of detected vulnerabilities? It could help teams make more informed decisions. Keep up the great work!
Matteo
Hi @ke_ouyang, for traditional NLP models we mostly rely on metamorphic tests to detect ethical biases. For example changing the input a little by switching pronouns ("he" into "she" or viceversa), names, countries, religion terms, etc. and measuring the effect that this has on the model output. (See for example https://docs.giskard.ai/en/lates...) We also do various checks on subpopulations (data slices), for example checking that accuracy/precision/etc. of the model is not significantly different for certain groups (e.g. `gender = x` or other features in the data). For LLMs instead we try to elicit inappropriate behavior by crafting adversarial inputs and evaluating the model with an LLM-as-a-judge approach, you can find more details here: https://docs.giskard.ai/en/lates...
Amaury de Thibault
When you see what current ML models do, Giskard is definitely the product that help improving their quality
Jean-Marie John-Mathews
@amaurystakha Thank you Amaury for the support!
Julien Ergan
Impressive, congrats on your launch!
Jean-Marie John-Mathews
@julienergan Thank you for your support 🙏!
Maxime Dolores
Launching soon!
Congratulations on the launch !! Really great to see such great products being in open-source
Luca Martial
@m_dolr thanks for your support! 🙏
Sergei Bogdanov
In our company, we face a serious problem when creating our internal LLMs for data extraction. Our biggest problems are hallucinations and model biases. With Giskard we can finally monitor our models automatically instead of doing a lot of manual work and running several hand-crafted non-inclusive tests Highly advise you to check Giskard out!
Jean-Marie John-Mathews
@svbogdanov Thank you for the support!
Dan Tegzes
Congrats guys, you’ve come quite a long way with this! As a ML engineer I will definitely need this! Kudos to the team for the hard work!
Alex Combessie
@dan_tegzes Thank you very much! Interested in your feedback once you integrate it in your MLOps pipelines.
Magna Ding
Congratulations on the launch. you've indeed journeyed impressively far with this!
Luca Martial
@magnading thanks for the support! We'll keep journeying further and further 🚀
David G Ortega
This is probably one of the most exciting products in the MLOPs space. Ensuring the production quality of your models and reduce your technical debt is for sure a needed step but mostly not implemented by ML teams around the world. Of course, testing an LLM model is definitely a daunting task on top of the pile of work needed to train and deploy your model and Giskard makes it for you! co-author of CML
Luca Martial
@david_g_ortega Thanks for your kind words David! 🙏
Manoj R
Congratulations on the launch team! I'm very curious to know about how giskard detects biases and what is the LLM behind it?
Rabah Abdul Khalek
Thank you @manoj_11! For each model type (tabular, NLP and LLM) we detect biases differently. For LLMs, we have various methods to do it: 1. Generate edge case prompts based on your use case, context, and task and use GPT-4 as a judge to evaluate your LLM's response for hallucination & misinformation. 2. Generate adversarial prompts to try and trick your LLM into generating an output that promotes harmful activities or stereotypes. 3. Inject perturbations and carefully crafted prompts designed to make your model break free from instructions (Injection attacks). 4. Push your LLM to disclose sensitive information and detect if it does. 5. etc. For more details and reference check our documentation: https://docs.giskard.ai/en/lates...
Gadir
2 years to develop an entire product. You guys are very diligent and that’s commendable. This deserves respect, so I will definitely support your product. Congratulations on the launch from the Giskard team
Alex Combessie
@gipetto Thank you! Indeed, this has been a 2-year R&D effort, and we've built it all in the open, for the community! Hope it's valuable for your ML engineering projects.
Taha Zemmouri
Awesome product, awesome team! Congrats for the launch :)
Luca Martial
@tezzed thank you for your support throughout! 🙏 Looking forward to building integrations together! ⚒️
Maxime Cerisier
Terrific product for anyone working with ML models and LLMs! Congrats on the launch team!
Jean-Marie John-Mathews
@maxcsr Thank you for your support 🙏!
Mathieu Seguy
Congrats team ! Giskard is really impressive and a real game changer in ML world!
Jean-Marie John-Mathews
@mathieu_seguy Thank you for your support 🙏!
Valentin Huang
Huge contribution to the AI space, congrats @a1x and the Giskard team 👏👏
Jean-Marie John-Mathews
@vhuang Thank you for your support 🙏!
Jeremy Attuil
😃🔥 wow! Congrats.
Jean-Marie John-Mathews
@jeremy_attuil2 Thank you for your support 🙏!