Human cognition has the same limitations we criticise in LLMs

The parallels between LLM cognition and human thinking are more extensive than we’d like to admit, and the strength of our denials might tell us more about human psychology than any technical specification ever could.

The discourse around AI and human cognition seems to have settled into a familiar pattern. Academics, technologists, and knowledge workers are quick to explain why LLMs are fundamentally different from human thinking. They don’t really learn, we insist (Riemer & Peter, 2025). Or, they don’t understand the world (Boles, 2025). Or, it’s all statistical correlation without reasoning (Bender, et al., 2019). The subtext is clear: whatever these systems are doing, it’s categorically different from what happens in human minds.

In this post I’m not concerned with disproving those claims, or in denying that the differences matter. Instead, I want to invert the question. Instead of describing all the ways that LLMs fail to measure up to human cognition, I want to use the terminology of language models to explore the similarities between AI and human thinking. Not to claim that human brains literally work like language models, but to explore what the similarities might reveal about our own cognitive architecture, and why we’re so invested in denying that they exist.

Context windows and the limits of working memory

LLMs have context windows; a finite amount of text they can attend to when generating responses. If we extend the conversation for too long they lose track of earlier content, prioritising recent information over older context.

Humans do this constantly. We forget earlier parts of conversations. We lose the thread in long discussions. We ask “wait, what were we talking about?” when someone circles back to a point from twenty minutes ago. This isn’t a failing; it’s a cognitive constraint because working memory is limited. We literally run out of space to hold all the relevant context, so we compress, discard, and prioritise recent information. The experience of cognitive overload in a complex discussion maps precisely to context window limitations.

Training data quality and bias

LLMs are also criticised for bias in their training data. They reflect the patterns, prejudices, and blind spots present in the text they learned from. And they generalise confidently from non-representative samples.

This also describes most human expertise. Each of us is trained on wildly non-representative samples of human experience: specific family structure, cultural context, historical moment, socioeconomic position, geographical location, and so on. From this narrow training set, we confidently generalise to make claims about “how people are” or “how the world works.” The bias critique of LLMs is just a precise description of how human knowledge formation has always worked. Objective training data doesn’t exist, only whatever fragments of experience we happened to encounter.

Tokenisation and the structure of expertise

LLMs process language by breaking it into tokens, which are meaningful chunks rather than individual characters. What counts as a “chunk” shapes how efficiently the model can process information.

Humans do this too, and your expertise level changes your tokeniser. A novice piano student sees individual notes on a page; an expert sees chord progressions and phrases as single perceptual units. Chess masters famously perceive board positions as meaningful configurations rather than individual piece placements. This isn’t just about pattern recognition; it’s about how information gets chunked for processing.

Similarly, jargon isn’t just shorthand. It’s literally more efficient tokenisation for domain experts. Reading “CEO” consumes less cognitive effort than processing “Chief Executive Officer” because we’ve compressed it into a single retrievable unit. When you encounter unfamiliar technical terminology, you’re forced to process it more granularly — letter by letter or syllable by syllable — which is why jargon is genuinely harder to process for outsiders. And different domains use different tokenisers, carving up conceptual space in distinct ways.

Temperature and the control of randomness

LLMs have a “temperature” parameter that controls randomness in their outputs. Low temperature produces conservative, predictable responses, while high temperature introduces more variation and creativity, at the cost of occasional incoherence.

Human cognition exhibits the same dynamic. People operating in high-stakes environments (e.g. exams, job interviews, or formal presentations) demonstrably reduce their cognitive “temperature.” We become more conservative, more predictable, more risk-averse in our thinking, and we stick to safe, well-rehearsed responses.

And more creative work requires us to increase randomness. Brainstorming sessions, experimental art, and theoretical speculation all involve consciously loosening cognitive constraints, allowing more unusual combinations and associations. We even have techniques for this: free writing, lateral thinking exercises, and deliberately looking for strange analogies. We’re just manually adjusting our temperature parameter.

Hallucination is a feature of memory

LLMs “hallucinate” by generating plausible-sounding information that isn’t actually true, filling gaps in their knowledge with convincing fabrications.

And human memory works in exactly the same way. We are all notoriously unreliable witnesses, confidently recalling events that never happened, filling gaps with plausible details, all the while completely unaware we’re confabulating. Who said what, when things happened, what was present in a scene; we misremember these details all the time. And the confidence with which we recall these fabricated details is indistinguishable from genuine memory.

System prompts we can’t access

LLMs operate under system prompts: invisible instructions that shape their responses without appearing in the conversation. These hidden constraints determine what they consider appropriate to say.

Humans have these too: cultural norms, professional conditioning, childhood socialisation, unexamined assumptions about what’s acceptable to express. When you think “I couldn’t possibly say that” in response to a thought, you’re often responding to hidden system-level constraints you didn’t consciously choose and may not even be aware of. These invisible instructions shape what we think is thinkable, sayable, and appropriate, and we have no access to view or modify them directly.

Pattern matching without causal understanding

A common critique: LLMs are “just” pattern matchers. They identify statistical regularities without genuine causal understanding, confusing correlation with causation.

Humans do this constantly. Most human reasoning is post-hoc rationalisation. We arrive at conclusions through pattern matching and then construct causal stories to explain them. Split-brain experiments demonstrate people confidently explaining decisions they didn’t consciously make. We confabulate reasons for our choices after the pattern matching has already occurred. Superstition, conspiracy theories, spurious medical beliefs, and false historical narratives all emerge from the same pattern-matching capabilities that produce genuine insights.

What the resistance reveals

If the similarities are this extensive, why do we resist them so fiercely? Why the insistence that LLMs are fundamentally, categorically different?

Three possibilities come to mind, each more uncomfortable than the previous.

Maybe we don’t “really understand” either. When we insist LLMs lack true understanding, we’re assuming we possess it. But if you push someone to define what understanding actually is (beyond appeals to subjective feeling or consciousness), many will struggle. We can’t clearly articulate the difference between our pattern matching and the pattern matching of language models. The understanding we claim to have might just be another pattern we’ve learned to recognise, rather than a categorically different phenomenon.

Maybe our expertise is less special than we thought. If LLMs can perform cognitive work previously reserved for trained professionals (e.g. writing, analysis, synthesis, or problem-solving), what unique value do knowledge workers provide? The resistance to AI seems to be strongest precisely among those whose professional identity depends on cognitive uniqueness. This is why, when someone proclaims that “AI will never replace X,” I tend to hear “my professional identity requires that AI not replace X.”

Maybe cognitive uniqueness was never a stable foundation for human moral status. We’ve used our supposedly special intelligence to justify everything from environmental exploitation to the ethical treatment of other species. If intelligence isn’t the clean categorical boundary we thought, the entire edifice starts to wobble.

As David Wiley has noted, the dismissive framing we use to diminish AI capabilities — they “just do prediction” — seems to reveal our own insecurities. For a while, people used the phrase “stochastic parrot” in the same way. What I find fascinating is that human cognition seems to operate on similar principles. We produce conventional responses in predictable contexts and we seem to be fluent pattern matchers who’ve internalised statistical regularities about what tends to follow what.

A familiar deflation

This isn’t the first time human exceptionalism has been challenged. Copernicus moved us away from the cosmic centre. Darwin revealed we weren’t specially created. Freud argued we weren’t even in conscious control of our own minds. Each of these deflations met fierce, emotional resistance, not because the evidence was weak, but because the psychological stakes were enormous.

This LLM moment might be another step in this trajectory. Not because these systems are conscious or possess “real” intelligence in some special sense, but because they reveal that many of the capabilities we thought required consciousness or special intelligence can emerge from “mere” pattern matching and statistical correlation.

We’re pattern-matching, probability-distributing, context-dependent generators of plausible outputs. We’ve just had millions of years to optimise the architecture and we’re running on remarkably efficient biological hardware.

This doesn’t diminish human value unless we predicated that value entirely on cognitive uniqueness. But it does suggest we might need better foundations for what makes humans morally considerable. Relationality, perhaps? Vulnerability or the capacity to suffer? Our embeddedness in communities and ecosystems? These might all be sturdier grounds than raw intelligence for the question of why we matter.

For anyone navigating AI integration in their institution, this has real practical relevance. Those who can sit with the discomfort of similarity will better understand how to deploy these tools effectively, how to support people through transition, and where genuine human judgement remains essential. Those who remain invested in trying to prove the existence of fundamental differences between humans and large language models will miss strategic opportunities because they’re defending professional identity rather than assessing capability.

The question isn’t whether AI thinks like us. The question is whether we’ve been thinking like AI all along. And what becomes possible when we stop defending against that recognition.

References