Forums
ConvoProbe - Automated scenario testing for Dify chatbots
ConvoProbe lets you design multi-turn conversation scenarios and run them against your Dify chatbot automatically to measure response quality.
Existing eval tools (LangSmith, Langfuse, Opik) work great for tracing and single-turn evaluation ā but they don't support designing and executing multi-turn conversation scenarios end-to-end. ConvoProbe fills that gap.
AdaptGauge - Detect when few-shot examples make your LLM worse
AdaptGauge detects when adding few-shot examples degrades LLM performance instead of improving it.
Testing 8 models across 4 tasks revealed three failure patterns:
⢠Peak regression ā 64% at 4-shot, crashed to 33% at 8-shot
⢠Ranking reversal ā best zero-shot model dropped to third with examples
⢠Selection collapse ā TF-IDF examples broke a model from 50%+ to 35%
Tracks learning curves, auto-detects collapse, classifies patterns, and compares example selection methods.
Demo results included.
