I want to start with a brief orientation — not to alarm, but because the argument I'm going to make only lands if we're honest about what we're dealing with. Generative AI can now pass medical licensing examinations at the 90th percentile. It writes care plans that are structurally and clinically coherent. It produces reflective portfolios that meet standard assessment criteria. It gives feedback on clinical reasoning — at any hour, without fatigue, without the variability that characterises human assessors. None of this is speculative. These are current capabilities, available to any student with an internet connection. Failure to generate MSc-level outputs is more a failure of imagination than of model capability. The usual response to this list is to note what AI cannot do — it cannot be present in the clinical encounter, it cannot build therapeutic relationships, it cannot hold a patient's hand. All of that is true. But it sidesteps the question, because none of those things are what most of our assessment instruments measure. What our assessment instruments measure, AI can now produce.
When people encounter the capabilities I just described, the responses tend to cluster into four types. I want to name them because they're all understandable, and they're all insufficient. Denial holds that AI cannot replicate the essentially human qualities that define professional practice. This is probably true in some domains. But it doesn't help us right now, because the problem is not that AI might replicate clinical judgement — it's that AI can replicate the artifacts we're currently using as evidence of things like clinical reasoning. Retreat focuses on what humans uniquely offer and tries to rebuild assessment around those things. The problem is that this boundary keeps shrinking. Each new model capability forces a smaller claim. "At least AI can't produce accurate citations". "At least AI can't write genuine reflections." Then it can. "At least AI can't reason clinically about complex cases." Then it can. Defending a shrinking perimeter is exhausting and ultimately unwinnable. Restriction tries to prohibit AI use through policy and detection. The data consistently shows this doesn't work. What the high AI-use figures communicate is not that students are fundamentally dishonest — it's that they've made a rational calculation about what the system rewards. Resignation is the idea that we render unto AI the things that belong to AI, which feels unsatisfying. The question underneath all three of these responses is one we've been avoiding: what are we actually trying to develop in nursing students — and are we designing for that?
These four premises describe the project of nursing education, and they hold regardless of whether AI is in the room. Premise 1 is neurological and constructivist: no one can learn on behalf of the student. Formation requires the student to be the one doing the cognitive work. Premise 2 contains a challenge. If learning can happen without a teacher — and it can — then the teacher's presence is not automatically valuable. Teaching must add something the student could not access otherwise. AI is now moving into this space too: it can scaffold, provide frameworks, cover curriculum. The question of what the teacher uniquely contributes becomes more pressing, not less. Premise 3 is the epistemological problem: we cannot observe learning directly. We can only observe behaviour and infer understanding from it. Premise 4 is the regulatory claim: NMC registration, degree certification — forward-looking claims about fitness to practise, based on backward-looking observations. The reason AI has disrupted this framework is not that it changed any of these premises. It changed what we were using as evidence for premise 3. We were observing products (artifacts) rather than behaviours (doing and thinking), and AI can produce those products without the student doing the thinking.
The goal is not that students know about nursing. It is that they become nurses. Making the claim that the system is designed around artifact production is contentious and one of the weaker claims I make, but I'm going to do my best to defend it.
This was not a failure of intelligence. It was a rational solution to a real problem. We cannot observe learning directly. All we can do is observe the behaviour of students and infer understanding from it. Artifacts were meant to be evidence of the engagement that produced them — a window into the learner's developing understanding. To write a good essay, a student had to read widely, engage with the material, organise their thinking, and commit to a position. The engagement was bundled into the economics of production. Nobody needed to ask what the work was, because the work and the artifact were reliably connected. The science of learning is unambiguous here. Memory is not built through exposure to content. It is built through effortful retrieval, genuine uncertainty, and the struggle to make sense of something that does not yet make sense. At the neurological level, learning requires synaptic change: Long-Term Potentiation, the strengthening of neural connections through repeated, effortful activation. Robert Bjork calls these conditions "desirable difficulties" — spacing, retrieval practice, interleaving, the productive struggle. They feel harder. They produce substantially better long-term learning. Conditions that feel easier — re-reading, worked examples, fluency — tend to produce the illusion of learning, not the thing itself.
- Not essays with word counts and formatting requirements - Difficult situations, challenged thinking, decisions under uncertainty, living with the consequences - Think about the professional moment that genuinely changed how you practise — probably uncomfortable, probably triggered by something specific, probably impossible to submit for grading - The gap between that experience and a formal assessed task is not an indictment of how we teach; it is information about what we have been asking assessment to do. Take a moment to think about your own professional development. Not what you studied — what changed you. There will be a moment — probably more than one — where your clinical understanding genuinely shifted. A patient you didn't know how to help. A decision you got wrong and had to sit with. A colleague who challenged your reasoning in a way you couldn't dismiss. Those moments were probably uncomfortable. They were almost certainly not assessable. Now think about what a formal assessed task looks like: a submission brief, assessment criteria, a word count, a deadline. The texture is completely different. That gap is not evidence that educators have failed their students. It is evidence that assessment has been asked to do something it was never well-designed to do — to stand in for experiences that are genuinely hard to structure, observe, and measure. The model for what we are actually trying to produce has always been available. We just haven't built the system around it.
AI has not broken assessment. It has made it impossible to ignore what was already broken. The 40% pass mark, inter-rater variability in marking, grade aggregation across qualitatively different tasks: these were known, documented architectural flaws, accepted as manageable imprecision because the alternative seemed worse. Students were already optimising for grades rather than for learning. They were already producing artifacts that met the criteria without reliably doing the intellectual work those criteria were designed to evidence. AI has not introduced that behaviour. It has made it more efficient, more visible, and impossible to manage with existing tools. 90% of students report using AI for submitted work. That is not a compliance problem to be solved. No policy, no detection tool, no updated academic misconduct procedure will change that figure in any meaningful way. What it communicates is that a large majority of students have concluded — rationally — that AI serves their purposes better than the system designed for them. The concept of "cheating" is not a fixed standard. It is relative to a set of expectations we defined — expectations written for a world where producing an artifact required the engagement it was meant to evidence. When the model changes, what we ask students to do themselves can change with it.
Over time, we stopped thinking of the artifact as a proxy and started treating it as the thing itself. The drift is visible everywhere. Students spend considerable time — often the majority of their questions before submission — asking about line spacing, word count, reference formatting, and so on. We tend to treat this as a communication problem; "Provide clearer instructions!" It is more accurately a signal about what we have taught them to care about. Cognitive science distinguishes two kinds of difficulty. Extraneous load — navigation friction, ambiguous instructions, unclear deadlines — adds effort without contributing to learning. Removing it is good pedagogy. Germane load — the effort required to determine what matters, what good looks like, what position to defend — is the learning. When we organised Blackboard sites clearly and made deadlines unambiguous, we reduced extraneous load. When we specified word counts for each assignment section, released model answers, and broke rubrics into micro-criteria, we reduced germane load. We thought we were being helpful. We were removing the mechanism. The student no longer needed to determine what a good answer looks like. We told them. And in doing so, we also provided a complete specification that an AI can now follow without any student learning involved. In "the reflective practitioner", Donald Schon talks about the work that takes place on a high, hard ground where we can use of research-based theory and technique, and the work in a swampy lowland where situations are confusing ‘messes’ incapable of technical solution, and experience, trial and error, and intuition are how we muddle through. We have spent decades perfecting the map — and now find that the territory has been bypassed entirely, because the map is so accurate a machine can follow it.
Generative AI has not created a cheating problem. It has severed the inferential chain between the document and the person. A student can now produce a polished, well-referenced, structurally coherent output — one that satisfies every criterion on a rubric — without having read a single source, grappled with a single idea, or developed any understanding of the subject. The proxy has collapsed. The assessment system built on that proxy is now exposed as measuring something other than what it claimed to measure. This is not a claim about the prevalence of academic dishonesty. It is a structural observation: the mechanism that made the document valid evidence has been disrupted, not because students have changed, but because the conditions that made the proxy work no longer hold. The assessment instrument has lost the construct validity it was depending on. The correlation between artifact quality and intellectual engagement — the assumption the entire system was built on — no longer holds. And fluency, which was once a reasonable signal of genuine thinking, is now noise. As models improve, output quality converges on expert-level across every artifact we care to measure.
Most of the responses currently being produced — policies, frameworks, AI use declarations — share a common purpose: to restore faith in the artifact as valid evidence. To reconnect the link that has been broken. Phillip Dawson and colleagues have distinguished between two kinds of attempt. Discursive responses change the language: new policies, updated principles statements, AI use declarations, traffic light frameworks specifying what students may and may not delegate. Structural responses change the underlying conditions that determine what students do and why. Most of what institutions are currently producing is discursive. These documents represent real effort from people responding thoughtfully to a genuinely difficult situation. But they share a common limitation: they leave the foundational assumption intact. The artifact is still what is being protected. Tighter lines are being drawn around it. If the goal is to preserve the integrity of the artifact, and AI continues to improve at producing artifacts, the trajectory is escalating. More restriction. More detection attempts. More sophisticated circumvention. More costly enforcement. The dominant question in the educator-student encounter becomes: did you use AI for this? The relationship becomes progressively adversarial. At the end of that trajectory is an arms race that no one wins: mounting costs on both sides, and a relationship between educators and students structured around suspicion rather than formation. The trust that nursing education — a discipline premised on preparing people for human care — depends on begins to erode.
This reframes the question fundamentally. The concern that students are "using AI to do the work" is right if the work is the artifact. It is wrong if the work is the cognitive engagement. AI can generate the conditions for genuine struggle — harder problems, faster feedback, access to complexity previously beyond the student's reach. The artifact they produce may look similar. The formation is not. The frame shifts from "did you use AI?" — a question about tool use — to "were you genuinely grappling?" — a question about process. That is a harder question to answer, but it is the right one. And it is the question that changes what assessment needs to look like. Students who use AI as an answer-machine are not doing the work, regardless of how polished their output is. Students who use AI as a struggle-generator — to get into harder territory, to be challenged, to be wrong in productive ways — are doing the work. The same tool, two completely different relationships to learning.
What AI systems lack is not capability — it is the conditions under which capability becomes meaningful. AI is stateless: no memory of prior encounters, no accumulated understanding of how a patient's condition has evolved, no professional judgement refined by consequence. It is static: knowledge frozen at training, unable to update from what is happening in the room. Most importantly, AI has no professional context. It does not know what it cost to break bad news that morning, or what the team dynamic is, or what clinical intuition tells you when a patient's story doesn't quite cohere. Context is the bottleneck — not because AI lacks information, but because meaning is created in the situated encounter, and AI cannot be situated. This reframes the question: rather than asking what only humans can do, we ask what humans contribute within a system where AI is also participating. The answer is context — and the teacher is the one who holds it.
The shift from AI as tool to AI as agent is significant. A chatbot answers questions. An agent takes actions — it can search, synthesise, draft, evaluate, iterate, and persist across a student's learning over time. Students who are already using agents are not just getting answers faster; they are working with a collaborator that has context, memory, and the ability to do substantial cognitive work. This will accelerate. The relevant question is not how to prevent it but how to design for it well. There is something worth naming here about our stance toward agents. They are not conscious, they do not have interests in any philosophically meaningful sense — but they are increasingly capable participants in the learning encounter. An open, curious stance — one that asks what role agents might play in formation, rather than simply how to restrict them — is more likely to produce good pedagogy than a defensive one. The challenge agents create is the same challenge AI has always created: how do we ensure the student is the one doing the cognitive work that produces formation? The conditions that answer that question are the same conditions that have always supported genuine learning.
Nobody would pay someone to go to the gym on their behalf. The reason that analogy is obvious is that we understand, intuitively, that the point of the gym is the change it produces in you — not the certificate of attendance. We have somehow lost that clarity when it comes to learning.
If formation is the development of practical wisdom, what structures make that possible? Problem-based learning was designed around the answer before AI arrived. Its structural features — problem-driven inquiry, collaborative knowledge construction, facilitation over instruction, and metacognitive reflection — are the same conditions under which AI integration becomes educationally productive rather than substitutive. This alignment is structural, not retrospective. Problem-driven inquiry puts students in the swampy lowland from the start. Collaborative construction makes engagement visible and challengeable. Facilitation holds the process without providing the answers. Metacognitive reflection makes students aware of their own developing judgement. And AI raises the ceiling on what students can engage with — giving access to wicked problems that were previously beyond their reach, because AI can surface complexity, suggest connections, and scaffold inquiry that a facilitator alone could not manage.
The questions we need to ask are not: how do we update our AI policy? Or how do we detect AI use more reliably? The work has always been the cognitive struggle. Writing a good essay was work because of the grappling it required, not because of the document it produced. Clinical placement was work because it put students in situations they had to navigate, not because they completed a competency log. The artifact was always downstream. We confused it for the thing. Formation is not content accumulation. It is transformation. The student who has genuinely grappled with complex clinical situations, who has been challenged and been wrong and had to revise their thinking — that student is a different person. The artifact they produced along the way was a trace of that process. It was never the process itself. This talk began by asking what colleagues mean when they say they do not want students using AI to do "the work." The answer they gave — the essay, the care plan, the portfolio — was always wrong. The work was never the document. The work was the development that used to produce the document. We need a more honest account of what development actually involves. And we need to put the friction back in the right places — not to make things harder, but to make the struggle genuine.