Designing AI out of assessment should be an academic offence

When AI is integral to authentic professional practice, assessment that excludes it is not rigorous — it tests performance in a professional context that no longer exists. The problem is not students who use AI, but educators who have designed assessments without accounting for the world their graduates are entering. Academic offences committees are investigating the wrong party.

Academic offences committees are busy. Across higher education, academics are reviewing submitted work for signs of AI involvement, triangulating output against detection software, and deliberating over whether a student used a language model to write their essay. The driving question is: did the student use AI to produce this artefact? But it’s the wrong question, and the fact that we’re asking it says something important about where we’ve located the problem.

We could argue that in contexts where AI is part of authentic professional practice, we have a moral obligation to require AI use, and that excluding it from assessment is a dereliction of our responsibility. In this version, the offence (if we want to use that word), belongs to the assessment designer. I’m taking the strongest position of this argument deliberately: the same logic that frames student AI use as academic misconduct, applied consistently, can point to the assessment designer instead.

The validity argument

A valid assessment tests what students will actually need to do once they graduate. This isn’t controversial; it is foundational to assessment theory, and health professions educators have long accepted it in other contexts. We design clinical placements, simulation labs, and case-based learning precisely because we accept that authentic preparation requires authentic conditions.

The question, then, is whether AI now belongs in the category of authentic professional conditions. I think, for most graduates entering health professions practice from now on, it does. AI tools are being integrated into clinical decision support, diagnostic reasoning, documentation, and knowledge synthesis at an accelerating pace. Designing assessments that prohibit AI is not protecting rigour. It is testing performance in a professional context that no longer exists. An invalid assessment is not rigorous, no matter traffic light system you subscribe to.

If we accept this, then the educator who excludes AI from their assessment design without justification has produced an instrument that doesn’t measure what it claims to measure. And that’s a professional failure, not a principled position.

We’ve been here before

The inevitable objection is that using AI prevents students from developing foundational knowledge and threshold concepts. But this is precisely the argument made about calculators in mathematics education, about word processors in writing instruction, and about search engines in research practice. The idea that Google was making students intellectually lazy was, for a period, a genuinely held position in higher education.

What each of these transitions revealed, was that the technology didn’t remove the need for foundational understanding; it changed the form that understanding needed to take. A student who can’t evaluate search results isn’t well served by being taught to use a card catalogue. A student who can’t think critically with AI isn’t well served by being taught to perform tasks that AI will do for them throughout their career. The premise that using AI and developing genuine understanding are mutually exclusive isn’t supported by evidence. It’s an assumption masquerading as a principle.

Foundational knowledge matters. Threshold concepts matter. The question is whether students can develop them through engagement with AI. The answer, in my experience, is that they can, provided the assessment is designed to make that engagement visible.

The artefact was always a poor proxy

Which brings us to the question that academic offences committees are structured to avoid: not what was produced, but how.

When I write an essay (here are some of them), the process takes at least ten hours, often spread across days or weeks. It involves sustained back-and-forth with Claude: exploring arguments, hitting dead ends, questioning assumptions, reading further, producing drafts and then more drafts. Notes accumulate and get edited. Relevant papers get pulled in. Positions get tested and sometimes abandoned when they can’t be defended. When the conversation has converged — framework sound, argument clear, small decisions settled — I ask Claude to write the essay. Then I read it in full, editing, correcting, and clarifying before asking Claude to write it again. By that point I have something I can stand behind, and I publish it.

That process is cognitively demanding. It is creative. Above all, it is critical. It requires me to defend positions under pressure and to concede when I cannot. The output is almost entirely AI-generated in a technical sense. But the thinking is something that’s distributed across Claude and myself, where our engagement has produced something sharper than I could have created independently.

Now consider someone who spends five minutes writing a single prompt and accepts whatever comes back. An offences committee looking at both outputs sees the same thing: an essay that AI produced. Our existing framework offers no way to distinguish between them, because the framework is looking at the wrong thing. AI hasn’t created this problem. It has made visible a limitation that was always present: the artefact was never a reliable window into the process of learning and good assessment design has always needed to account for this.

Equity is a design condition, not a structural objection

The access argument deserves acknowledgement. Not every student can afford premium model subscriptions, and assessment frameworks that assume high-capability AI access risk compounding existing disadvantage. But this concern points to an institutional design responsibility, not an argument against AI-integrated assessment. Institutions already provision the infrastructure students need to complete assessed work; library access, software licences, and computing facilities. AI access belongs in the same category. Where institutions provide a sufficient baseline tool, the equity concern is addressed.

The goal is not to assess whether a student can extract the best possible output from the most capable model available. You don’t need to practise in a Porsche to pass your driving test; you can learn in a twenty-year-old hatchback and the test measures the same thing regardless. What matters is whether the student can think well with AI, and that capability can be developed and assessed with any reasonably capable tool. The quality of the process doesn’t depend on the quality of the model.

The accountability relationship is inverted

When an offences committee investigates whether a student used AI, it positions the student as the source of the problem. But if the assessment doesn’t account for AI, then the committee is, in effect, penalising students for operating in the world we have failed to prepare them for. The accountability relationship is inverted.

A more productive institutional question is not whether students are using AI appropriately, but whether educators are designing assessments that make such use visible, and whether institutions are building the conditions that make this possible. Until we take that question seriously, academic offences committees will remain busy with the wrong problem.