Fluency is not evidence of quality

AI-generated text is grammatically correct, well-structured, and coherent, but none of that tells you whether the content is accurate or well-reasoned. Fluency has become noise: unusually convincing noise that exploits the cognitive habits readers have developed, where clarity once signalled genuine thinking. That instinct is now a liability.

AI-generated text is fluent: grammatically correct, well-structured, and coherent. And this is a problem, although not for the reason that’s commonly considered.

The concern isn’t that fluent text signals AI authorship. It’s that fluency signals quality — that the underlying idea is sound.

Fluency says nothing about whether a claim is right or wrong, regardless of who or what generated the claim. A language model produces well-formed prose independent of whether its content is accurate, well-reasoned, or even meaningful. The words cohere. The argument moves forward. Nothing on the surface signals what lies beneath.

The practical implication is that fluency needs to be filtered out entirely, treated as irrelevant to the question of quality. Confident delivery is irrelevant to whether a speaker’s facts are correct; fluency is irrelevant to whether a claim is sound. The test isn’t “does this read well?” but “is this true?”

The response that’s emerged in AI literacy frameworks is to teach the ability to evaluate AI outputs: spotting hallucinations, checking citations, identifying logical gaps. This is an argument I’ve made myself, and it was reasonable at the time.

But it assumes the quality gap between AI and expert output is stable. It is not. The models available now are materially better than those available two years ago, and the trajectory is consistent. What happens when AI output is PhD-level across every artefact we might care to measure: citations accurate, reasoning sound, evidence appropriately qualified?

Evaluating model output doesn’t work in that world. It’s a temporary fix that relies on a gap that is closing. Today an AI might hallucinate a reference if the user doesn’t know what they’re doing; tomorrow it won’t. The question isn’t how to identify current failure modes. It’s what you’ll do when those failure modes are gone.

Output quality is not a stable criterion. Treating fluency — at any level of quality — as signal is a dead end.

Reading fluent prose unaffected by its fluency runs against deep cognitive habits. The brain finds coherent text convincing. And that instinct isn’t irrational; in student writing especially, fluency was a reasonable trace of genuine engagement with ideas. A student who wrote clearly had usually thought clearly. That relationship no longer holds, and the instinct it trained is now a liability.

Fluency and output quality are no longer signals. They’re noise, and unusually convincing noise at that.