Something we ve noticed is that AI answers aren t usually wrong, they re just uneven. Parts are solid, parts are hand-wavy, and it s not always obvious which is which.
We ve been experimenting with a workflow where multiple models answer the same prompt, review and score each other s responses, and then we combine the strongest parts into a single result.