In 2017, a team at Google published a paper that would reshape artificial intelligence. "Attention Is All You Need" introduced the Transformer architecture, and with it, a bold epistemological claim masquerading as an engineering title. Eight years and several trillion parameters later, we're living in the world that paper built—a world of large language models that can write poetry, debug code, and pass the bar exam.

And yet.

Something essential is missing. Not in the engineering—that's been spectacularly successful. What's missing is in the philosophy, in the unexamined assumption that attention mechanisms, scaled sufficiently, will eventually yield understanding. They won't. And the reason they won't is something philosophers have articulated for centuries, even if AI researchers have been too busy scaling to notice.

The Attention Illusion

Let's be precise about what attention mechanisms actually do. In a Transformer, attention is a learned weighting function that determines how much each token in a sequence should influence every other token. It's correlation discovery at scale. The model learns which words tend to co-occur, which syntactic patterns predict which semantic relationships, which contextual cues suggest which completions.

Here's the sleight of hand: the paper's title suggests that attention is all you need for understanding. But what Transformers actually demonstrate is that attention is all you need for prediction. These are not the same thing. A sufficiently sophisticated autocomplete system can simulate understanding with such fidelity that we mistake the simulation for the real thing. But simulation and instantiation remain categorically distinct.

Consider: a Transformer trained on every physics textbook ever written can produce flawless explanations of quantum mechanics. But does it understand quantum mechanics? Does it grasp why the measurement problem is philosophically disturbing? Does it experience the vertigo of confronting non-locality? Or is it simply doing very sophisticated interpolation across its training distribution?

The honest answer is: we don't know. And we don't know because we haven't bothered to define what understanding would even mean in this context. We've been so dazzled by the outputs that we've neglected to ask what's actually happening—or not happening—inside.

The Hard Problem Resurfaces

Philosophers of mind will recognize this territory. David Chalmers famously distinguished between the "easy problems" of consciousness—explaining cognitive functions, behaviors, reportability—and the "hard problem": explaining why there is subjective experience at all. Why does information processing feel like anything from the inside?

AI research has been spectacularly successful at the easy problems. Language models demonstrate sophisticated functional capabilities. But the hard problem lurks beneath every benchmark: is there any "inside" at all? And if not, can systems without subjective experience ever genuinely understand, or only simulate understanding?

This isn't mysticism; it's philosophy of mind 101. John Searle's Chinese Room argument made the point decades ago: syntactic manipulation, however sophisticated, doesn't yield semantic understanding. A system that perfectly manipulates symbols according to rules needn't understand what those symbols mean.

The AI community's response has largely been to ignore the argument or declare it irrelevant to engineering. But the question of whether understanding requires consciousness isn't merely academic—it determines whether our current approach can ever succeed at what we claim to want.

The Epistemology We Forgot

Western AI research proceeds as if epistemology were a solved problem. Feed the model enough data, tune the loss function, scale the parameters, and knowledge will emerge. This is naive empiricism dressed in mathematical finery.

Consider what genuine knowledge acquisition requires:

Perception: Direct sensory experience. Note that this requires a perceiver—an experiencing subject, not just a sensor array. A camera captures images; a mind sees.

Inference: Logical deduction from observed facts. The classic example: seeing smoke and inferring fire. This isn't pattern matching; it's understanding causal relationships. It requires grasping why smoke implies fire, not just that they correlate.

Testimony: Knowledge obtained through language, particularly authoritative transmission. This is most relevant for language models—it's what they purport to do. But genuine testimony requires a speaker with intention and a listener with comprehension. It's not just information transfer; it's meaning transfer.

A Transformer processes testimony statistically. It learns the distribution of linguistic forms without access to the intentions behind them or the comprehension that would verify them. It's like a brilliant alien cryptographer who has decoded the patterns of human language without ever understanding that language refers to anything at all.

The Meaning Problem

Language philosophers have long recognized multiple layers of meaning:

Literal meaning: What words denote according to convention. "The cat is on the mat" refers to a particular spatial relationship.

Figurative meaning: What's conveyed when literal interpretation fails. "He has a heart of stone" doesn't describe cardiology.

Implied meaning: What's suggested but not said—the spaces between words, the resonances and implications that depend on shared understanding.

A language model can approximate literal meaning reasonably well—that's what embeddings capture. It can sometimes stumble into figurative meaning through pattern recognition. But implied meaning—the dimension of suggestion, of what's meant but not said—requires what we might call a "sympathetic reader": a receiver who can resonate with the sender's intention.

Attention heads, however numerous, don't resonate; they calculate. They detect statistical regularities, not intentional meaning.

From Pattern to Meaning: The Case for Semantic Architecture

This isn't merely philosophical musing. I've spent three decades in information architecture, and the last several years focused specifically on domain-specific languages and what I've come to call Semantic AI Agents.

When you design a domain-specific language, you're not just creating syntax. You're defining what exists in the domain, how entities relate, what operations are meaningful. You're encoding semantics directly, not hoping they'll emerge from sufficient data. The grammar itself carries meaning because it was designed by minds that understand the domain.

This is why DSLs can do with hundreds of rules what LLMs struggle to do with billions of parameters: they encode human understanding in executable form. They're not simulating comprehension; they're instantiating it.

Consider the difference between asking GPT to validate a legal contract and running that contract through a DSL built on deontic logic—on the formal semantics of obligation, permission, and prohibition. The LLM can tell you if the contract sounds right. The DSL can tell you if it is right, because it operates on the actual structure of legal meaning.

This is the future of AI that actually matters: not bigger attention mechanisms but smarter semantic architecture. Not more parameters but deeper understanding. Not correlation at scale but meaning by design.

The Consciousness Question

Some will object that I'm sneaking metaphysics into engineering. Fair enough. But consider: quantum mechanics has grappled with consciousness for a century, and the measurement problem remains genuinely unsolved.

Before measurement, a quantum system exists in superposition—a probability distribution across possible states. Upon measurement, this superposition "collapses" into a definite outcome. But what constitutes a measurement? The equations don't tell us.

John Wheeler, one of the twentieth century's greatest physicists, proposed the "participatory universe"—the idea that observers are not passive witnesses to reality but active participants in its creation. "No phenomenon is a real phenomenon," he wrote, "until it is an observed phenomenon."

I'm not claiming this proves anything. Quantum mechanics is notoriously resistant to philosophical interpretation, and plenty of serious physicists reject consciousness-based explanations. But the parallel is suggestive: perhaps the reason we can't get from attention to understanding is that we're missing the one ingredient that might actually matter.

Understanding isn't passive pattern detection. It's active participation. It requires engagement, being implicated in what's understood.

What We Actually Need

So if attention is not all we need, what else is required?

Intention: Semantic AI agents need goals, not just training objectives. They need to want something, in some meaningful sense. Whether artificial systems can genuinely have intentions is an open question, but it's the right question—far more important than whether they can pass benchmarks.

Grounding: Meaning doesn't float free of reality. It requires connection to the world, to action, to consequence. A language model that has never seen a tree, touched water, or felt cold has at best a theoretical relationship to the words it processes. Embodiment matters.

Ontological commitment: The promiscuous pattern-matching of LLMs treats all correlations as potentially meaningful. Semantic systems need to commit to what exists, what matters, what's possible. This constraint isn't a limitation; it's a requirement for genuine understanding.

Participation: Wheeler was right. Observation isn't passive. Understanding requires engagement, requires being implicated in what's understood.


The title of the original Transformer paper was elegant marketing. It was also philosophically careless. Attention is a mechanism, and mechanisms don't yield meaning. You can't get semantics from syntax alone, no matter how much compute you throw at the problem.

What we actually need is a science of mind sophisticated enough to build minds—or humble enough to admit we can't. The philosophy of consciousness offers resources that AI research has ignored. Quantum mechanics offers puzzles that point in the same direction. And the craft of language engineering offers practical tools for encoding meaning directly rather than hoping it emerges from scale.


Attention gets you to the door.
Intention opens it.
Consciousness walks through.

Comments welcome.

Share this post

Written by

Comments

Generative AI and Information Governance: A Practitioner's Guide
Generative AI: Revolutionizing Information Governance

Generative AI and Information Governance: A Practitioner's Guide

By John F. Holliday 12 min read