AdaptGauge
p/adaptgauge
Detect when few-shot examples make your LLM worse
β€’0 reviewsβ€’2 followers
Start new thread
trending
Shuntaro Okumaβ€’

2mo ago

AdaptGauge - Detect when few-shot examples make your LLM worse

AdaptGauge detects when adding few-shot examples degrades LLM performance instead of improving it. Testing 8 models across 4 tasks revealed three failure patterns: β€’ Peak regression β€” 64% at 4-shot, crashed to 33% at 8-shot β€’ Ranking reversal β€” best zero-shot model dropped to third with examples β€’ Selection collapse β€” TF-IDF examples broke a model from 50%+ to 35% Tracks learning curves, auto-detects collapse, classifies patterns, and compares example selection methods. Demo results included.