The verification trap

"Always verify AI" trains students to trust it more, not less

The instruction to check every AI output is meant to instil healthy scepticism. But as models improve, the errors that survive checking are precisely the ones students can’t catch — so the lesson they actually learn is that the tool is reliable. The mandate doesn’t build AI literacy; it runs an extinction schedule on the very disposition it claims to develop.

Use these tools, but always verify what they produce. Almost every institution now gives some version of that advice. Universities tell it to students, employers to staff, professional bodies to their members. And lately the message has only hardened: verification has become the reflexive answer to every anxiety about AI — whatever else you do, check the output.

The advice has nagged at me for a while, because it doesn’t square with my own experience. Verifying AI output is hard, slow, and often inconclusive, even in domains where I’m confident I can judge the work. If checking is that demanding for someone with expertise, what are we really asking of a student we tell to verify everything?

So I wanted to take the advice at its word and push it to its limit: follow “always verify” all the way, watch what happens as the models improve, and see what breaks. This post is that exploration.

The advice is doing two jobs at once, and the second is where it breaks. The stated job is to build a skill — checking. The unstated job, the one nobody writes into the policy because writing it down would expose how thin it is, is a hope: that through repeated verification, students will discover for themselves that AI can’t be trusted. The checking is meant to teach the distrust. That hope is self-defeating, and seeing why changes how we should think about AI literacy altogether.

The hope depends on getting caught

For verification to teach distrust, the errors have to be catchable. The student checks, finds the mistake, and learns the lesson: the tool is fallible, stay sceptical. This worked when the tools were obviously unreliable, because the errors were frequent and crude enough to surface on a careful read.

That condition is disappearing. As models improve, the errors that survive are the rare, well-formed ones — plausible, fluent, sitting in exactly the domains where independent checking was always hardest. A confident, wrong, beautifully written answer doesn’t announce itself. It looks like all the right ones.

Better models break the mechanism

Here is the bind. The better the model, the rarer the error a student can actually catch. So the occasions on which the “learn that it can’t be trusted” lesson can land become fewer and fewer.

Worse, every error that slips through teaches the opposite lesson. The student checks, finds nothing wrong — because the model is good — and quietly updates towards trust. We set out to build calibrated scepticism and instead ran a training regime that, as the tool improves, increasingly teaches misplaced confidence, while we go on believing it teaches the reverse.

Why the disposition dies, not just the skill

The deeper failure isn’t about ability; it’s about appetite. To see it, consider why you trust your satnav without checking. Not because it’s accurate on average — because the feedback structure rewards trust: the outcome is observable, attributable, fast; a mistake costs you ten minutes. Trust is earned there because checking would so obviously be a waste.

The verification exercises we mandate have the inverse structure. The student checks, almost always finds nothing (the model’s good), gets no signal that checking mattered, and pays a real cost in effort. That is an extinction schedule. You are not strengthening a habit; you are punishing it until it fades. The disposition to verify withers precisely because the activity you mandated stops rewarding it.

What verification is actually for

None of this means verification shouldn’t be taught. It means we’ve mistaken a ritual for a competence. Checking every output in the hope that disappointment breeds wisdom is not a skill — it’s a wish.

The skill worth teaching is narrower and harder: knowing which claims are independently checkable and which aren’t, and running real checks where the feedback structure actually rewards them. A student who can tell the difference between a claim they can verify and one they can only trust on provenance has something durable. A student trained to perform checking as a reflex, against a tool that rarely rewards it, has been taught a habit designed to die.

So the question for anyone writing AI guidance isn’t “should people verify?” It’s sharper, and more uncomfortable: does your verification exercise have a feedback structure that rewards checking — or are you quietly teaching the people who follow it that the machine is right?

/home/michael

Table of Contents

The verification trap

The hope depends on getting caught

Better models break the mechanism

Why the disposition dies, not just the skill

What verification is actually for

Graph View

/home/michael

Table of Contents

The verification trap

The hope depends on getting caught

Better models break the mechanism

Why the disposition dies, not just the skill

What verification is actually for

Continue reading

Graph View