Hassan Jahan

What is actually a “complex problem” for LLMs?

by

I keep seeing advice like “use this model for the easy stuff and that one for complex problems.” But it makes me wonder — what really counts as a complex problem for an LLM?

For us, complex usually means lots of steps, deep reasoning, or tricky knowledge. But for AI, the definition might be different. Some things that feel easy for us can be surprisingly hard for models, while things that seem tough for us (like scanning huge datasets quickly) might be trivial for them.

So, I’m curious — how do you think about this? What do you consider a “complex problem” when working with LLMs?

154 views

Add a comment

Replies

Best
Ahmad

Love this question 🔥. I’ve noticed the mismatch too — for humans, “complex” often means multi-step reasoning or “keep a lot of context in working memory.” For LLMs though, that’s exactly where things start to break down — they lose track, hallucinate, or collapse under ambiguity.

Meanwhile, things we’d call “tedious but simple” (summarizing 200 pages, parsing logs, generating 50 variations of a design brief) are effortless for them.

For me, I define a “complex problem” for LLMs as one where clarity, context, and structure are fuzzy. Anytime nuance, conflicting goals, or hidden assumptions come in, it becomes less about raw compute and more about whether the model really understands what you mean.

Curious — do you think prompt engineering (framing problems in a structured way) reduces complexity for LLMs? Or is it more about model capability itself?

Jason Gelsomino

@ahmad63 I think it largely comes down to both the prompt and the context in which it’s given. At the same time, the model itself has to be designed to handle that context. For instance, I wouldn’t expect Gemini 2.5-Pro to manage the same level of complexity as OpenAI o1-Pro. Gemini is optimized for speed and efficiency, not deep reasoning. So even with the most comprehensive prompt, it will still struggle when the task requires complex reasoning.

Marcello Cultrera

@ahmad63  @jason_gelsomino Exactly. That’s where architectural intent really matters.

A model’s ability to handle complexity resides on how well it encodes structure, adapts to ambiguity and maintains coherence across steps. You can throw a perfect prompt at a speed-optimized model, but if it lacks semantic scaffolding or reasoning depth, it’ll still misinterpret edge cases or worse - move into entropy.

Ahmad

@jason_gelsomino  @marcello_cultrera1 Beautifully said — “semantic scaffolding” is exactly it. You can feel when a model has structure beneath the surface versus when it’s just guessing step by step.

What’s interesting is how some newer models seem to simulate coherence without truly reasoning — they’re great at flow, but not always at logic. Makes you wonder how much of reasoning is structure vs illusion.

Jason Gelsomino

@ahmad63  @marcello_cultrera1 Absolutely agree. When a model is architected with a strong intent using clear design principles for structure and reasoning, it’s better equipped to interpret ambiguity and maintain logic even as it handles complex or multi-step tasks. Semantic scaffolding isn’t just about responding to input - it’s about creating internal connections, sustaining context through the entire chain, and reliably resolving edge cases. Great dialog.....

Ahmad

@jason_gelsomino That’s a great point — context and architecture really do set the ceiling for what a model can do, no matter how strong the prompt is.

I’ve noticed the same: Gemini feels lighter and fast but trips up when the reasoning chain gets deep. It’s almost like comparing a sprinter to a marathon runner — both great, just built for completely different terrains.

Jason Gelsomino

@ahmad63 That's a perfect analogy for comparing models...

Sanskar Yadav

A true complex problem for LLMs is anything with multi-step reasoning, ambiguity, or lots of context to manage (surely)
They’re great at repetitive tasks, but struggle when nuance, judgment, or deep logic are involved, even with agentic AI being integrated. Structured prompts help, but models can still get tripped up by messy, real-world scenarios.

Andrej Good

this is such a good question 🤔 i actually have categorized over 125 AI models for Onada.ai and had to ask that exact same question. for me, complex problems for LLMs usually involve multiple reasoning steps, keeping track of context over a long conversation, or mixing knowledge from different domains.

also things that feel obvious to us, like understanding sarcasm or subtle nuance, can totally trip them up. combining info in the right way is where they struggle. basically, the harder it is to ‘connect the dots’ properly, the more likely it’s complex for an LLM.

Jason Gelsomino

A problem is “complex” for an LLM in vibe coding when it requires long-term coherence, strict constraints, or deep cross-domain integration. These are things that humans handle through preparedness and careful design and iterations, while an LLM working generatively can struggle to capture all of those interdependencies in a single pass.

Marcello Cultrera

@jason_gelsomino Generative fluency with structural awareness; models that don’t just vibe with the prompt, but scaffold meaning across time, modality and intent. That’s the frontier I’m quite excited about and something we're building at canvaseight.io

Marcello Cultrera

Hi @cyberiaa, complexity in human terms is multi-step reasoning, abstract logic, contextual variation but this isn’t always what’s complex for a language model.

LLMs excel at pattern recognition across vast datasets, so tasks like summarising thousands of documents or generating boilerplate code can be trivial. But things we consider simple like understanding sarcasm, preserving intent across a UI flow or maintaining accessibility in generated code can be surprisingly brittle.

LLMs, in their evolution and when approaching complexity at the edge-case level, require extensive validation, careful scaffolding and deep semantic awareness - this parameterised and fine-tuned to preserve intent, handle such edge cases - maintaining structural integrity across dynamic inputs and real-world content.

For me, a “complex problem” in the context of LLMs is one where success depends on semantic fidelity, intent preservation, and multi-modal alignment.

This is a personal challenge I’ve spent years solving through code generators and infrastructure-native design systems.

Models still struggle to reason through ambiguity, adapt to edge cases or maintain structure across changing inputs.

That’s why the new more efficient models will focus on smarter scaffolding, better interfaces and systems that encode meaning, not just syntax.