I use Gemini by Google almost daily, especially for research and structuring the flow of my papers. It helps me organize ideas, clarify concepts, and get quick explanations without breaking my momentum. Whether I'm outlining sections, exploring unfamiliar topics, or refining my writing, Gemini consistently gives clear and useful responses. It also saves me a lot of time I’d normally spend switching between tabs or searching for references.
The interface is simple and responsive, which makes it easy to work with. At this point, it’s become one of the tools I naturally reach for in my research workflow.
Humans in the Loop
The AI race continues. OpenAI launched GPT-5.3-Codex 2 weeks ago. Anthropic, Sonnet 4.6 this week. And Google? They just announced @Gemini 3.1 Pro, "a smarter, more capable model for complex problem-solving."
Available in products like @Google AI Studio, @Kilo Code, and @Raycast.
Game on!
VYVE
@fmerian I am now enjoying doing so much research work on Gemini, things that I used to be doing with deep research. It's like I'm swinging between capabilities.
@peter_albert nailed it. I'm running Gemini models in production for Aitinery (AI travel planner) and this is exactly the gap.
Benchmarks say Gemini is world-class. My production logs say it sometimes hallucinates restaurant names that don't exist and occasionally generates itineraries with 16-hour driving days. Benchmarks don't test "can this model reliably plan a family trip to Puglia without suggesting a 3am dinner reservation?"
That said — 3.1 Pro feels like Google is finally closing the gap between benchmark performance and real-world agentic reliability. The reasoning improvements matter more for agent builders than the raw intelligence bump.
The uncomfortable truth about the AI model race: for 95% of real applications, the difference between GPT-5.3, Sonnet 4.6, and Gemini 3.1 Pro is negligible. What matters is reliability, cost, and speed — not who wins on ARC-AGI-2.
Curious to see how 3.1 Pro handles multi-step planning tasks. That's where Gemini has historically struggled compared to Claude for agentic workflows.
Gemini is alwasy good at benchmarks, but usually not great at agentic behaviour. The models have very weird behaviour. Almost like the Gemini team is not really testing them themselves.
I can't keep using Antigrativy, there is no update available; and I can't use the previous model.
Hey there, congrats on this launch!!
For SaaS use cases involving long-context multimodal inputs (e.g., analyzing full user-uploaded PDFs + screenshots + code snippets to generate UI code, migration scripts, or automated test plans), what's the practical sweet spot you've seen for token efficiency and accuracy at the 200k–1M range?
Nice benchmark numbers. My concern is always the gap between benchmarks and the actual developer experience. I use Claude primarily for coding because, from my personal experience, it follows instructions pretty closely (though there's always room for improvement). For me, Gemini has historically been frustrating for me, inserting comments and refactoring code I didn't ask it to do. Would love to hear from anyone who's tested 3.1 Pro on real coding workflows, not benchmarks, and whether that's actually improved.
vibecoder.date
Does google read these?
I'll give it a shot in gemini CLI and see what's up