All activity
rieu leleft a comment
Here are a few extra thoughts for anyone curious about what’s happening behind the scenes—or the broader bet we're placing: Benchmarks vs. taste: Simon Willison’s one-liner—“draw a pelican on a bike”—revealed more about multimodal model quirks than most formal benchmark suites. At the same time, Andrej Karpathy notes that random crowds often can’t reliably pick the better output. Predict is...

Recall PredictUngameable, community-powered AI benchmarks
