Problem-based learning and the structural conditions for productive AI integration

About this essay

Author: Michael Rowe (ORCID)

Affiliation: University of Lincoln (mrowe@lincoln.ac.uk)

Created: March 28, 2026

Version: 0.8 (last updated: March 31, 2026)

Keywords: AI integration, assessment, cognitive offloading, health professions education, problem-based learning, professional judgement, wicked problems

DOI: https://doi.org/10.35542/osf.io/haet3_v2

License: Creative Commons Attribution 4.0 International

Abstract

Higher education’s response to AI has focused on the artefact: detecting it, restricting it, and restoring confidence in what students produce. This essay argues for a different focus. The structural features of problem-based learning — problem-driven inquiry, collaborative knowledge construction, facilitation over instruction, and metacognitive reflection — are the same conditions under which AI integration becomes educationally productive rather than substitutive. This alignment is structural, not retrospective: PBL was designed around these conditions before AI existed, rooted in the recognition that professional competence requires adaptive judgement rather than routine execution. The argument extends beyond compatibility. AI shifts what category of problem PBL can engage with, expanding access to complex, wicked problems that were previously beyond students’ reach. Investing in PBL’s structural conditions is simultaneously investing in AI readiness, whether or not institutions recognise it that way.

Introduction

A common question in discussions about student use of AI in higher education is, “How will we know if students did the work?” A different way of thinking about this question is to ask instead, “When AI can produce the artefact, what is the work?” A student submits an essay that is well-structured, clearly argued, and appropriately referenced. It demonstrates familiarity with the relevant literature, engages with competing perspectives, and arrives at a defensible conclusion. By every conventional criterion, it is good work. It is also, in its entirety, produced by a language model. The student who submitted it may or may not understand the subject. In today’s context, the essay no longer provides meaningful information about what the student knows.

This is not a new problem. But it is a newly visible one. The system that treated artefacts (essays, reports, case analyses, presentations) as evidence of learning was always inferential: we observed the product and inferred the process behind it. For as long as producing a quality artefact required genuine intellectual engagement, the inference was reasonable. AI has severed that connection. A polished output no longer requires the engagement it was designed to evidence, and the assessment systems built around that assumption are exposed as measuring something other than what they claimed to measure.

Most institutional responses have addressed the symptom: AI detection software, transparency declarations, revised assessment rubrics, increasingly detailed guidelines specifying what students may and may not delegate to AI. These are understandable responses to a genuine disruption, and they share a common orientation; they are aimed at restoring the reliability of the artefact as a proxy for learning. The question of what valid evidence of learning actually looks like when the artefact can no longer be trusted to provide it receives less attention, in part because it is harder and in part because it implicates assessment structures that institutions have invested heavily in building.

This essay proposes a different response. Rather than asking what AI threatens, it asks what AI makes newly visible: that the structural features of a learning environment determine whether AI integration supports learner development or substitutes for it. The claim is that these features are identifiable. We can specify, with theoretical warrant, what conditions a learning environment must create in order for AI to function as a partner in professional development rather than a shortcut around it.

The argument is developed in three steps. First, AI breaks the artefact-as-proxy for learning but does not break learning itself; engagement remains observable and evidential, even as the products of engagement become unreliable. Second, the structural features of problem-based learning (problem-driven inquiry, collaborative knowledge construction, facilitation over instruction, metacognitive reflection) are the same conditions under which AI integration is educationally productive rather than substitutive. These features were PBL’s pedagogical rationale long before AI arrived; the alignment is structural, not retrospective. Third, AI does more than fit within existing PBL practice. It shifts what category of problem PBL can engage with, raising the ceiling on complexity and making accessible problems that were previously beyond students’ reach. PBL is the worked example because it was designed around these conditions as its explicit rationale, making the alignment traceable. But the structural argument is not exclusive to it: any pedagogical approach that shares the relevant features, including certain forms of case-based learning, inquiry-based learning, team-based learning, and collaborative clinical reasoning, would support the same analysis. The principle is the contribution, not the endorsement of a single pedagogy.

The method is conceptual synthesis: convergences across theoretical traditions are traced against PBL’s founding design logic, and the resulting structural alignment is examined at a granular level rather than asserted as a general resemblance. This produces theoretical warrant, not empirical proof.

An honest essay must also name what it does not resolve. The argument is about PBL’s founding structural commitments, specifically what the approach was designed to create, and not about what every programme empirically delivers. Assessment at scale remains an open problem. And the structural claim, while theoretically grounded, requires the kind of empirical investigation that a conceptual essay cannot provide. These qualifications are distributed throughout the essay rather than gathered in a closing limitations section, because they are part of the argument, not caveats appended to it.

One further qualification belongs here. This argument is developed from within a prior commitment to PBL-compatible pedagogy and to the productive integration of AI in professional education. These commitments have shaped the author’s agenda and inform the theoretical framework applied here. The structural alignment identified in this essay is genuine, but it was found by someone inclined to look for it. That said, the alignment is traceable in PBL’s design documents and founding rationale, not only in the author’s interpretation of them. Readers should weigh that accordingly.

What AI breaks, and what it doesn’t

How the artefact became the target

Learning does not leave direct traces. We cannot observe understanding forming, or judgement developing, or a student’s grasp of a concept shifting from fragile to robust. What we observe is behaviour: answers given, problems attempted, artefacts produced. From that behaviour we infer that learning has occurred. This is not a measurement problem awaiting a better instrument. Learning is emergent, contextual, and resistant to specification: what constitutes meaningful learning for one student at one point in their development may be irrelevant for another, or for the same student a year later (Rowe, 2025b). The phenomenon itself resists the kind of direct observation that would make proxy measures unnecessary.

For most of the history of formal education, artefacts served as the dominant proxy. Essays, reports, case analyses, presentations, examination scripts: these products of student work were treated as evidence of the engagement that produced them. The logic was reasonable: producing a good essay or case study required reading widely, synthesising across sources, organising information, and articulating a position with clarity. The artefact was not the learning, but the correlation between artefact quality and cognitive engagement was reliable enough that the proxy held. Assessing the essay was, in practice, a defensible way of assessing the thinking behind it.

This system was not wrong. It made sense given what was observable, and it served education adequately for as long as the correlation between producing an artefact and engaging with the material remained intact. The problem was not the proxy itself but what happened to it over time. Assessment structures gradually shifted, imperceptibly and without anyone making a deliberate decision, from asking “is this student developing their understanding?” to asking “how good is this product?” (Carless, 2007; Fischer et al., 2023). The artefact, which had been a window onto the learner’s developing understanding, became the target. Students, as rational agents operating within an incentive system, optimised for what was measured and rewarded.

AI did not create this problem. It exposed it. When a student can produce a polished, well-referenced, structurally sound essay without having read the sources, wrestled with the argument, or developed any understanding of the subject matter, the correlation between artefact quality and cognitive engagement is severed (Dawson et al., 2024; Corbin et al., 2025; Walton et al., 2025). While the essay still looks like evidence of learning, it is no longer evidence of learning. Walton et al. (2025), in a qualitative study tracking students’ actual AI interactions during assessment tasks, found that the most common failure was not outright fabrication but a subtler collapse of evaluative judgement: students frequently adopted AI-generated ideas with low levels of criticality, misattributed AI contributions as their own thinking, and in some cases submitted content without exercising any judgement about it at all. What AI has done is make cheap, fast, and visible the proxy optimisation that was already possible and that some students were practising through strategic surface engagement. The artefact-as-proxy did not break because AI is powerful. It broke because the proxy was already vulnerable, and AI removed the friction that had been masking that vulnerability.

AI breaks the artefact-as-proxy for learning.

The correlation between artefact quality and intellectual engagement has collapsed, not because AI is powerful, but because the proxy was already vulnerable and AI removed the friction that was masking it.

This is a Goodhart dynamic: when a measure becomes a target, it ceases to be a good measure. Institutions set up systems where the artefact was the target (submit the essay, receive the grade), and students responded rationally to the incentive structure they were given. AI changed the cost of that response, not its logic. The institutional responses described above follow the same logic in reverse: they attempt to restore friction to the proxy rather than questioning whether the proxy should remain the primary evidence of learning. The underlying condition is that artefact-based assessment was already measuring the difficulty of production rather than the depth of learning behind it. And AI has made that misalignment impossible to ignore.

Engagement as the surviving proxy

If the artefact is no longer a reliable proxy, what is? The answer is not that we have no proxies left. The proxy we should have been prioritising is engagement. It remains observable and has not been broken by AI.

Engagement, in the sense used here, refers to the observable cognitive and social processes through which students interact with problems, with each other, and with the material they are investigating (Kuh, 2009). Students can still grapple with questions they cannot immediately answer. They can still test their reasoning against a peer’s challenge and find it wanting. They can still revise their understanding when new evidence disrupts what they thought they knew. And they can still sit with uncertainty when a problem resists the clean resolution they were hoping for. These processes are visible in tutorial discussions, in the quality of questions students ask, in the way they respond to challenge, in the arc of their reasoning over time, and they are genuinely indicative of learning in a way that a polished final product, considered in isolation, no longer is.

The shift AI demands is not from assessment to no assessment but from assessing products to making engagement visible and taking it seriously as evidence. This requires a more deliberate account of what we are looking for when we look for learning, an account that was always needed but that artefact-based assessment allowed us to defer. When the artefact was a reliable enough proxy, the question of what engagement actually looks like and how to observe it systematically could be treated as a philosophical nicety rather than a practical urgency. AI has removed that deferral and this question is now operational.

An honest account must acknowledge, however, that engagement is a better proxy than artefacts but not an invulnerable one. Engagement has its own Goodhart dynamic. Students can also perform engagement: participating visibly in group discussion, producing fluent reflective accounts, asking questions that signal effort without reflecting genuine cognitive investment, all without the depth of intellectual commitment those behaviours are meant to evidence. Assessed participation in PBL tutorials, for instance, can become a performance of engagement rather than engagement itself, particularly when students understand what the facilitator is looking for. The argument is therefore not that engagement solves the proxy problem. It is that engagement is harder to decouple from underlying learning than artefact production is. The gap between performing engagement convincingly and actually engaging is narrower than the gap between producing a polished essay and actually understanding the subject. Consider the difference concretely: a student can submit an AI-generated essay on antibiotic resistance without knowing what a plasmid is. But a student who is asked, in a tutorial, to explain why their proposed intervention would work differently in a community with high rates of self-medication must either reason through the question in real time or visibly fail to do so. The performance cannot be pre-produced and submitted; it must be sustained under conditions that are responsive, social, and unpredictable. AI has made artefact proxies trivially gameable. Engagement proxies retain meaningful resistance — not immunity, but resistance. The direction of travel is from a proxy whose correlation with learning has collapsed to one where the correlation, while imperfect, still holds.

The focus, in other words, should be on who students are becoming: what capabilities they are developing, what professional judgement they are forming, how their reasoning changes over time. That is a different question from what they are producing at any given moment. AI forces this reorientation by removing the option of treating production as a reliable window onto development. The reorientation was always warranted. It is now unavoidable.

Why scale remains an open problem

None of this makes the practical problem disappear. Assessing engagement rather than artefacts at scale is genuinely difficult, and the difficulty should not be understated or papered over.

The alternatives most commonly proposed are each partial answers with real limitations (Masters et al., 2025; Zhan et al., 2025): portfolios, oral examinations, observed clinical practice, reflective accounts, and programmatic assessment approaches that triangulate across multiple evidence sources. Portfolios are resource-intensive to design and assess consistently. Oral examinations introduce variability that raises equity concerns. Observed practice is expensive, context-dependent, and limited to settings where observation is feasible. Reflective accounts can be performed as strategically as any other artefact, and AI can generate convincing reflective writing as readily as it generates convincing essays. Programmatic assessment, which distributes judgement across multiple data points over time, addresses some of these concerns but requires institutional infrastructure that many programmes do not have and that takes years to build (Bearman et al., 2024).

The direction of travel is clearer than the destination. Assessment must move toward evidence of engagement and away from reliance on artefact quality as the primary indicator of learning. The specific forms this will take, and whether they can be made scalable, equitable, and resistant to strategic performance, remain open questions. This essay does not resolve them. What it does is argue that certain pedagogical structures are better positioned than others to support this transition, because they were already designed around the conditions that make engagement visible and valued. That argument is developed in the sections that follow.

The structural conditions for productive AI integration

Why PBL’s core features are the right features

The question of what makes a learning environment productive for AI integration is, at its foundation, a question about what makes a learning environment productive for learning. If multiple independent theoretical traditions converge on identifiable conditions under which professional learning is effective (conditions that hold regardless of whether AI is involved), then those conditions also describe the environments in which AI integration supports rather than undermines development. Rowe (2025a) conducted a structured analysis of social constructivism, critical pedagogy, complexity theory, and connectivism through a common analytical framework and identified six such convergences: dialogic knowledge construction, critical consciousness, adaptive expertise, contextual authenticity, metacognitive development, and networked knowledge building. These are not aspirational ideals. They are conditions that each theoretical tradition, from its own starting point and for its own reasons, identifies as necessary for learning that transfers to practice and develops professional judgement. The theoretical apparatus behind these convergences, and the analytical method used to derive them, is developed fully in that work. What matters here is the conclusion: effective professional learning requires environments that foster dialogue over transmission, critical evaluation over uncritical consumption, adaptive response over routine reproduction, authentic complexity over decontextualised simplicity, reflective awareness over unreflective performance, and knowledge building across boundaries over learning within silos.

Problem-based learning was designed around exactly these conditions. This is not a retrospective reinterpretation. From its origins in medical education at McMaster University, PBL was developed as a deliberate response to the well-documented failures of transmission-based curricula, programmes that produced graduates who could recall information but struggled to apply it in the complex, uncertain conditions of clinical practice (Barrows, 1986; Norman & Schmidt, 1992). The response was structural, not cosmetic. Rather than adjusting the delivery of content, PBL reorganised the relationship between the learner, the problem, and the process of inquiry itself.

The defining features of PBL are familiar but worth stating precisely, because the argument that follows depends on their specificity. The problem serves as the starting point for learning, not as an application exercise following content delivery. Students direct their own inquiry rather than following predetermined pathways through a syllabus. Knowledge is constructed collaboratively through group deliberation, not transmitted from instructor to individual. The facilitator holds the process rather than the answers, pressing on reasoning, surfacing assumptions, and managing the group’s direction of travel. And metacognitive reflection is an explicit, structured expectation, not an optional add-on (Savery, 2006; Mubuuke et al., 2017).

These are not incidental features of PBL. They are its pedagogical rationale, the reason the approach exists. PBL does not happen to involve group discussion and self-directed learning as stylistic preferences. It was designed around the recognition that professional judgement develops through grappling with authentic problems under conditions of genuine uncertainty, and that this grappling must be collaborative, reflective, and learner-directed if it is to produce practitioners capable of adaptive response rather than routine reproduction (Schmidt et al., 2006). The structural features are the pedagogy. Remove them and what remains is not a simpler version of PBL but a different kind of education entirely.

PBL's structural features are the conditions for productive AI integration.

Problem-driven inquiry, collaborative knowledge construction, facilitation over instruction, and metacognitive reflection are what make AI integration educationally productive rather than substitutive.

This distinction matters because the argument is not that PBL is a good teaching method that also happens to accommodate AI. The argument is that the specific structural conditions PBL was built around, for reasons that predate AI by decades, are the same conditions under which AI integration becomes educationally productive rather than substitutive. The next sections develop that claim.

Not all cognitive demand is equally formative

A persistent concern in discussions of AI in education is cognitive offloading, the worry that when students delegate difficult cognitive work to AI, they are impoverished by the delegation. This is a legitimate concern (Gerlich, 2025; Lodge & Loble, 2026) but it is imprecisely aimed, and this imprecision leads to responses that protect the wrong things.

The problem is not cognitive demand in itself. Not all difficulty develops something that matters. The question is whether the demand being removed was building a capability the learner needs as a practitioner: whether, in other words, the cognitive struggle was load-bearing. Some cognitive work develops professional judgement: the capacity to reason through competing evidence, to weigh contextual factors against general principles, to make decisions under uncertainty. Other cognitive work is in service of execution mechanics: the logistics of finding, formatting, retrieving, and organising that must be completed before the professionally formative work can begin. Both are genuinely demanding. Only one is load-bearing for professional formation.

An example makes the distinction concrete. Consider a nursing student investigating the evidence base for a wound care intervention. The cognitive demands of that inquiry are layered. Searching databases effectively, retrieving relevant papers, and summarising what individual studies report are real intellectual tasks: they take time, require skill, and can be done well or badly. They are also offloadable without developmental cost, because the formation of a nurse does not depend on mastering Boolean search syntax or the mechanics of literature retrieval. What cannot be offloaded is the work that follows: the judgement about whether the evidence applies to this patient, in this ward, with these resource constraints; the reasoning through conflicting findings where one trial supports the intervention and another does not; the decision about what the evidence, taken together, means for the care this patient should receive. Offloading the retrieval and summarisation may in fact concentrate the student’s time and cognitive energy on the second layer, where the professionally formative struggle actually lives.

This distinction finds theoretical grounding in the framework Rowe and Lynch (2025) developed around context sovereignty. What they term information context (content that anyone could retrieve or provide, the kind of material that populates a literature search or a factual summary) corresponds to the offloadable layer. What they term personal context (the values, professional commitments, and meaning-making frameworks through which information becomes professionally significant) corresponds to the layer that cannot be offloaded without developmental loss. The reason some cognitive demand is load-bearing is not simply that it is more difficult. It is that it operates at the layer where professional identity is formed: where a student is not just processing information but deciding what it means, whose interests it serves, and what should be done in light of it.

One nuance deserves attention. Writing is often treated as a single category in this debate, as if it were uniformly offloadable or not. But writing-as-thinking (forcing half-formed intuitions into explicit propositions that can be examined and challenged) and writing-as-production (formatting, referencing, structural mechanics) are different activities with different developmental functions. The first is load-bearing; the struggle to articulate is part of the learning. The second is largely offloadable. An essay that outsources the articulation of reasoning to AI has offloaded load-bearing work. An essay that uses AI to handle formatting while the student wrestles with what they are trying to say has not.

The reframe dissolves rather than rebuts the cognitive offloading concern. The question shifts from “is AI removing cognitive demand?” to “is AI removing the cognitive demand that was developing the right things?” This is a diagnostic question, not a permissive one. It does not license indiscriminate offloading but instead requires a more precise account of what each layer of difficulty is actually building. And PBL’s structure already provides an implicit answer. Because PBL was designed to place cognitive demand on problem framing, collaborative reasoning, evaluation of competing approaches, and judgement about what matters, rather than on information retrieval and content reproduction, its structural features concentrate learning time on exactly the layer that cannot be offloaded. The offloading reframe is not just compatible with PBL; it describes the logic PBL was already operating on.

Not all cognitive demand is equally formative.

The distinction between load-bearing demand (judgement, reasoning, professional identity) and offloadable demand (retrieval, formatting, production) determines whether AI integration develops competence or substitutes for it.

Critical faculty, contextual judgement, metacognition: the same capabilities

The preceding sections have established two claims: that independent theoretical traditions converge on identifiable structural conditions for effective professional learning, and that not all cognitive demand is equally formative. The demand that matters is the demand that develops judgement, reasoning, and professional identity. These claims set up the central move of the essay: demonstrating, at a granular level rather than as a general resemblance, that the capabilities required for productive AI use are structurally identical to what PBL was designed to develop.

Rowe (2025c) identifies three capabilities constitutive of what he terms taste in AI collaboration: critical faculty (the capacity to evaluate outputs and processes rather than accepting them at face value), social awareness (the capacity to recognise contextual appropriateness, to understand that what works in one setting may mislead in another), and metacognitive sensitivity (the capacity to maintain awareness of one’s own values, assumptions, and purposes throughout a collaborative process). These three capabilities describe what separates someone who uses AI to extend their thinking from someone who uses AI to replace it. They also describe what PBL demands of students in every session.

Consider critical faculty first. When a PBL group weighs competing explanations for a clinical presentation, debating whether a set of symptoms points toward one diagnosis or another, evaluating the strength of evidence for different intervention approaches, interrogating whether a proposed solution actually addresses the problem or merely sounds plausible, they are exercising the same evaluative judgement that productive AI use requires. The student who receives an AI-generated differential diagnosis and accepts it because it is fluent and well-structured has offloaded the load-bearing work. The student who reads that same output and asks whether the reasoning holds, whether an alternative explanation has been excluded too quickly, whether the evidence cited actually supports the conclusion drawn: that student is doing what PBL has always asked, treating available information as a starting point for critical evaluation rather than an endpoint (Messeri & Crockett, 2024; Vendrell & Johnston, 2026). The cognitive move is the same whether the information comes from a textbook, a peer, or a language model.

Social awareness maps onto PBL’s collaborative knowledge construction with similar precision. PBL’s group process requires students to navigate different perspectives, to recognise that a peer’s framing of a problem may illuminate aspects their own framing has missed, to adapt their reasoning in response to challenge rather than defending a position for its own sake. This is the same capability that productive AI use demands. A fluent AI response may be entirely appropriate in one context and misleading in another, not because the information is wrong, but because contextual factors determine what matters, what is relevant, and what should be weighted most heavily. The student who recognises that an AI’s analysis of a public health intervention fails to account for the specific community context in which it would be implemented is exercising the same contextual judgement they develop when a peer in a PBL group challenges their reasoning on similar grounds (Yan et al., 2025). The capability is not AI-specific. It is the capacity to evaluate whether a given response, regardless of the source, is appropriate to the situation at hand.

Metacognitive sensitivity is perhaps the most consequential alignment. PBL builds structured reflection into its process as an explicit expectation, not an afterthought. Crucially, this reflection is social as well as individual: Akyol and Garrison (2011) show that metacognition is not a purely private internal activity but is also socially situated, developing through the kind of collaborative inquiry that PBL’s group structure provides. Students are asked to examine their own reasoning: what they understood, what they assumed without evidence, what they would investigate further, where their confidence outran their knowledge. This reflective practice is what prevents AI use from becoming uncritical consumption. Without metacognitive awareness, a student cannot distinguish between recognising a correct output and understanding the reasoning behind it, and that distinction is precisely the one that determines whether AI interaction produces learning or the comfortable illusion of learning. The student who reads an AI-generated explanation of a pathophysiological mechanism and thinks “yes, that’s right” has recognised correctness. The student who asks “do I actually understand why that is right, and could I explain it in a different context?” has engaged metacognitively. PBL’s insistence on structured reflection cultivates the second response as a habit of practice rather than an occasional achievement.

The structural identity claim can now be stated precisely. The capabilities that distinguish productive AI use from substitutive AI use are not merely compatible with what PBL develops: evaluating rather than accepting, contextualising rather than generalising, reflecting rather than consuming. They are what PBL was designed to develop, for reasons that had nothing to do with AI and everything to do with what professional practice demands. PBL places cognitive demand on problem framing, collaborative reasoning, evaluation of competing approaches, and judgement about what matters. Productive AI use requires the same. The overlap is not coincidental. Both respond to the same underlying challenge, that professional competence requires adaptive judgement rather than routine execution, and both respond with the same structural solution: placing the learner at the centre of a process that demands precisely the cognitive work that cannot be delegated.

This structural alignment also explains why AI integration is structurally disposed to go wrong in pedagogies that do not share these features. In a transmission-based environment, where the product is the point, AI offers an efficient route to a polished output that the assessment system will reward. In PBL, the process is the point. The group deliberation, the reasoning under uncertainty, the evaluation of competing perspectives: these are visible, social, and resistant to substitution precisely because they happen between people in real time.

But the more interesting case is not when AI is excluded from the tutorial but when it is invited into it, as a participant rather than a private tool. A group that presents an AI-generated analysis to the tutorial, challenging its assumptions collectively, testing its recommendations against the specific case, pressing it with follow-up questions, using its responses as material for deliberation, is not circumventing PBL’s structural conditions. It may be extending them. The AI becomes another voice in the epistemic community: one with particular strengths (rapid synthesis across a large evidence base, access across disciplines) and particular limitations (no stake in the patient, no memory of the prior discussion, no access to what the group already understands and values). Learning to use that voice appropriately, to weigh it against other sources, to recognise what it cannot see, to keep the group’s own reasoning primary rather than reactive, is a form of professional judgement that did not exist as a curriculum aim before AI arrived. This is not about removing AI from the tutorial conversation but about adding it in ways that require students to remain the authors of the inquiry. How facilitation works when AI is a visible participant, how to prevent deliberation from collapsing into a sequence of prompts with light commentary, and what epistemic agency looks like when AI is in the room are largely uninvestigated questions. They are also more generative questions than whether AI should be in the room at all.

The tutorial process resists substitution through its visibility, sociality, and unfolding in real time. That resistance is what distinguishes AI used within a collective inquiry from AI that substitutes for one. And it raises a further question: if AI absorbs the execution layer, what should the freed cognitive space be used for?

When execution is cheap, judgement becomes valuable

The offloading debate focuses on what is lost when difficulty is removed. It rarely asks the more consequential question: what becomes possible when difficulty is redistributed? When execution becomes cheap, direction becomes everything. The scarce capability is no longer the ability to produce, to find information, synthesise sources, generate polished outputs; it is the ability to judge: which problems deserve investigation, what solutions serve the right ends, what knowledge is worth constructing, and whose interests are advanced or neglected by the choices made along the way.

Rowe and Olivier (forthcoming) term this capability taste, distinguishing it from technique. Technique is the execution layer of knowledge work, the skills of searching, drafting, coding, and formatting that AI is rapidly absorbing. Taste is the judgement layer, the capacity to determine what is worth doing, to recognise quality, to distinguish the substantive from the merely competent. Technique can be offloaded. Taste cannot, because it requires the kind of contextual, values-laden, identity-informed reasoning that operates at the personal context layer described earlier. A language model can produce a well-structured argument for a given position. It cannot determine whether that position is worth arguing, whether the framing serves the right purposes, or whether the question being addressed is the one that matters most. These are acts of taste, and they belong irreducibly to the learner.

The taste framework does important work here, though it carries its own complications. What counts as relevant judgement is not culturally neutral: which contextual factors matter, whose values should anchor professional formation, and what good clinical reasoning looks like are all contested questions that vary across healthcare systems, professional traditions, and communities of practice. This means that PBL’s development of taste is itself culturally situated; what gets cultivated as professional judgement in one context may not transfer straightforwardly to another. The framework names the layer at which formation occurs without settling what that formation should produce, and claims about the structural alignment between PBL and productive AI use should be read with that scope limitation in view. The structural argument holds regardless — what cannot be offloaded is the layer where professional identity and values are formed, whatever specific content those values carry in a given context — but the universality of the claim is about the layer, not about what fills it.

The distinction is easier to see in practice than to define in the abstract. Consider two groups of nursing students investigating the same problem: the integration of early discharge planning for older adults with complex needs. The first group asks an AI to generate a care coordination framework, reviews what returns, and concludes it looks comprehensive. The second group does the same, then asks: what assumptions does this framework make about family availability? What does it say about patients without adequate home support? Is the evidence behind these recommendations drawn from comparable healthcare systems? What would a patient advocate say about how this model positions the patient? The first group has a product. The second has used the product to generate a process, one that produces something the AI could not generate alone: a framework evaluated against the specific values, constraints, and stakeholder perspectives that this particular problem requires. That evaluative work is taste in action. It is not a more efficient version of what AI produced. It is a different kind of intellectual act entirely.

The connection to PBL is direct. Taste is not developed through instruction or transmitted through content delivery. It is developed through sustained engagement with consequential problems under conditions of genuine uncertainty, through the repeated exercise of judgement in situations where the right answer is not predetermined and the stakes are real enough to matter. This is a precise description of what well-designed PBL creates. Students confronting an authentic clinical or community problem must decide what aspect of the problem to investigate, how to evaluate the approaches available, what trade-offs to accept, and what counts as a good enough response given the constraints they face. These are exercises in taste, whether or not the term is used. PBL does not merely accommodate the development of taste; it is one of the few pedagogical structures that systematically requires it.

The taste argument also reframes what it means for AI to be educationally productive. The value of AI in a PBL environment is not that it makes the existing process more efficient. It is that by absorbing the execution layer, AI creates space for students to spend more of their time on the judgement layer, where professional formation actually occurs. This is not a minor efficiency gain. It is a structural reallocation of what students spend their cognitive energy on, and it favours exactly the capabilities that matter most for professional practice and that are hardest to develop under time pressure. The question is not whether AI saves time. The question is what that time is redirected toward. In a PBL environment, the answer is judgement, and the structure ensures it.

Remaining the author of the inquiry

An anticipated objection deserves a direct answer. If AI is handling retrieval, summarisation, modelling, and preliminary analysis, if the execution layer is substantially offloaded, where precisely is the student’s thinking? What prevents this from being a sophisticated form of delegation dressed up as collaboration?

The answer is that learner agency is preserved not by limiting what AI does but by the learner maintaining authorship over the meaning-making environment that determines what AI does and why. The student who directs an AI to summarise the evidence on a clinical intervention has not delegated their thinking if they are the one who identified the clinical question, determined what kind of evidence would be relevant, evaluated what came back against their developing understanding of the patient’s context, and decided what the evidence means for the care decision at hand. Agency lives in the direction and evaluation of the inquiry, not in the manual execution of each step within it. What Rowe and Lynch (2025) term context sovereignty captures this precisely: the learner’s authorship of the values, commitments, and interpretive frameworks that shape the interaction is what makes it a learning experience rather than a transaction. When that authorship is present, AI extends the learner’s reach. When it is absent, when the student accepts the output without interrogation because they have no framework against which to interrogate it, AI substitutes for learning regardless of how the interaction is labelled.

PBL’s existing structure creates the conditions for this kind of authorship. Learner-directed inquiry means the group determines what questions to pursue, not the facilitator or the AI. Collaborative problem framing means the interpretive framework is negotiated among peers, not received from an authority. Self-directed learning means students take responsibility for identifying what they need to know and evaluating whether they have learned it. These are not features that need to be added to PBL in order to accommodate AI. They are the features that make PBL what it is, and they ensure the learner remains the author of the inquiry even when AI is a participant in it.

What the structural argument does and does not claim

The argument developed across this section has been about what PBL was designed to create, rather than about what every PBL programme empirically delivers. This distinction must be made explicitly, because the distance between design logic and institutional practice is often considerable.

This essay does not argue that PBL is the solution, only that it is solution-shaped. Its founding structural commitments, problem-driven inquiry, collaborative knowledge construction, facilitation over instruction, and metacognitive reflection, create the necessary conditions for productive AI integration. But the drift toward artefact measurement that affects conventional curricula affects PBL programmes too. When case analyses, group presentations, and problem reports become the target of assessment rather than the indicator of engagement, the structural misalignment recurs within PBL itself. A PBL programme that assesses primarily on the quality of a written case analysis is vulnerable to exactly the same proxy problem described in Section 1, regardless of how its tutorial process is designed.

The empirical record reinforces this caution. Norman and Schmidt (1992), in their review of PBL’s evidence base, found equivocal results for its effects on knowledge acquisition and clinical reasoning, outcomes that depend heavily on how faithfully a given programme realises PBL’s structural features. Norman and Schmidt (2000) later argued that such mixed findings reflect failures of curriculum implementation rather than flaws in PBL’s design, noting that interventions at the curriculum level are liable to fail when programmes do not faithfully realise the structural principles on which PBL’s effectiveness depends. The structural argument does not require PBL to uniformly deliver superior outcomes across all implementations. It requires that PBL’s design creates the conditions under which productive AI integration is possible, while transmission-based approaches do not structurally create those conditions. The distinction is between what PBL’s architecture makes possible and what any given implementation empirically delivers. Whether a particular programme realises these structural commitments in practice is an empirical question this essay cannot answer, and one that becomes more pressing, not less, as AI raises the stakes of getting the structural conditions right.

The argument is also not exclusive to PBL. Any pedagogical approach that shares these structural features, including certain forms of case-based learning, inquiry-based learning, team-based learning, and collaborative clinical reasoning, would support the same analysis. PBL is the worked example developed here because it was designed around these conditions as its explicit rationale, making the alignment visible and traceable. But the structural principle is the contribution, not the endorsement of a single pedagogy. The reader whose context is not PBL should ask whether their own pedagogical approach creates the conditions described, and, if it does not, what structural changes would be needed to create them.

Beyond compatibility: what AI makes newly possible

The load-bearing/offloadable distinction developed in Section 2 has a further consequence that the compatibility argument alone does not capture. If AI absorbs the execution layer — the retrieval, summarisation, and preliminary analysis that previously consumed much of students’ cognitive bandwidth — and PBL’s structure concentrates learning on the judgement layer, the result is not simply that PBL continues to work as it did before. The freed cognitive space opens up problems that previously required too much execution-layer work to be tractable. AI does not simply fit within existing PBL practice. It shifts what category of problem PBL can engage with, and in doing so changes the curriculum question from what problems are manageable to what problems become possible.

Raising the ceiling on problem complexity

PBL has always faced a design constraint that is rarely named as such. Problems must be calibrated to what students can access unaided, or more precisely to what a small group can realistically investigate within the limits of their current knowledge, the resources available to them, and the time a module allows. This calibration is not a failure of ambition. It is a practical necessity. A first-year physiotherapy group cannot meaningfully investigate the epidemiology of falls in older adults if doing so requires statistical modelling capabilities they do not possess, access to population-level data they cannot easily obtain, and familiarity with policy frameworks they have not yet studied. The problem must be tractable, and tractability has historically been bounded by the group’s existing reach.

AI changes the boundary. Not by removing the cognitive demand that makes a problem educationally valuable (the grappling, the reasoning under uncertainty, the evaluation of competing approaches), but by lowering the access barriers that previously kept certain problems out of reach. A group that could not have engaged with epidemiological modelling can now surface and interrogate relevant models. A group that could not have accessed policy frameworks across multiple jurisdictions can now retrieve, compare, and evaluate them. A group that could not have synthesised evidence across disciplines, combining clinical research with health economics and community perspectives, can now hold all three in view simultaneously. The productive struggle remains. What changes is the ceiling: how far into a problem students can go before the complexity outruns their grasp.

This is not the same as making problems easier. Lowering access barriers and reducing cognitive demand are different operations. The distinction matters because it maps directly onto the load-bearing framework developed in Section 2. What AI removes is the execution-layer difficulty, the logistical and technical barriers to reaching the problem’s substantive core. What it leaves intact is the judgement-layer difficulty, the work of determining what the evidence means, whose interests are at stake, what trade-offs are acceptable, and what should actually be done. If anything, expanding the range of accessible material increases the demand on judgement, because students must now evaluate and synthesise across a broader field than they could previously reach (Bjork & Bjork, 2009; Lodge et al., 2023).

Orchestrating inquiry rather than delegating it

Consider what this looks like in practice. A group of health sciences students investigating health inequalities in a specific community can now, with AI support, engage with epidemiological data on disease prevalence in the region, policy frameworks governing resource allocation, published evaluations of interventions trialled in comparable communities, and stakeholder perspectives from public health officials, community organisations, and affected populations. Previously, marshalling this range of material would have required a level of research capacity, disciplinary breadth, and institutional access that an undergraduate group simply did not have. The problem would have been simplified, narrowed to a single dimension the students could manage, and the simplification would have removed exactly the interdisciplinary complexity that makes health inequalities a problem worth investigating.

An obvious objection: surely the reason problems are simplified for novices is precisely that they are novices. They lack the background knowledge and disciplinary expertise to engage meaningfully with the full complexity. This is true, but it conflates two different kinds of complexity. The content complexity of a problem — the depth of disciplinary reasoning required to interpret epidemiological models or evaluate health economics — does need calibrating for novices, and AI does not change that. What AI primarily reduces is access complexity: the logistical and technical barriers to reaching material that students could engage with productively if they could get to it. A first-year group cannot build an epidemiological model, but they can interrogate one — asking what assumptions it makes, what populations it was derived from, whether its conclusions transfer to the community under investigation. This is precisely what Lave and Wenger described as legitimate peripheral participation: learners engaging with the practices and artefacts of a professional community at a level appropriate to their developing competence, without needing to have mastered the full disciplinary apparatus that produced those artefacts. AI expands the periphery that students can legitimately participate in.

With AI lowering the access barriers, the group can engage with more dimensions of the problem simultaneously than was previously possible. But engaging with more dimensions is not the same as learning from them — the engagement is partial, mediated through AI, and dependent on the facilitation structure doing the work of converting access into understanding. The orchestration of that inquiry remains irreducibly theirs: deciding what questions to ask, determining which sources are credible, evaluating what comes back against what they already understand, identifying what is missing from the picture, reconciling competing perspectives, and arriving at a position they can defend. This is the student-as-orchestrator framing: directing multiple lines of inquiry toward different aspects of a complex problem is a higher-order cognitive act, not a delegation of intellectual work (Alfaro et al., n.d.). The student who asks AI to summarise the epidemiological evidence, then asks it to identify the strongest counterarguments to a proposed intervention, then asks it to take the perspective of a community member affected by the policy. That student is not outsourcing their thinking. They are conducting an inquiry whose scope exceeds what any individual could manage alone, and they are doing so through acts of judgement at every step: what to ask, how to evaluate what returns, what to do with the tensions between different responses.

A sceptical reader will note, however, that expanded access is not the same as learning, and the essay’s own framework demands this objection be addressed directly. When a student group “engages with epidemiological modelling” via AI, is that engagement load-bearing or surface-level? The distinction between physical access and epistemological access, drawn from Rowe (2025a, building on Morrow, 2009, and Warschauer, 2004), applies with full force here. Physical access, the ability to reach a resource, retrieve information, or interact with a tool, does not automatically produce the epistemological access necessary for learning: the knowledge, practices, and dispositions required to make that resource intellectually productive. A student who receives an AI-generated summary of epidemiological findings has physical access to the material. Whether they have epistemological access, whether they can interpret the findings critically, recognise their limitations, connect them to the specific community context under investigation, and integrate them with other forms of evidence, depends entirely on what the learning environment provides beyond the AI output itself.

This is where PBL’s facilitation structure does essential work. The facilitator’s role in PBL is not to deliver answers but to hold the process: to press on reasoning that has not been examined, to ask what the group has not yet considered, to surface assumptions that are being treated as settled when they are not (Azer, 2005). In the context of AI-expanded problems, this facilitation is what converts physical access into epistemological access. The group that retrieves epidemiological data via AI and moves straight to a conclusion has consumed an output. The group whose facilitator asks “what assumptions does that model make about the population it studied, and do those assumptions hold for the community you are investigating?” has been pushed from retrieval into critical evaluation, from the information layer into the personal context layer where learning occurs. The category-of-problem argument depends on this scaffolding being present. Without facilitation that presses students beyond surface engagement with AI-provided material, expanded access produces the illusion of deeper inquiry rather than the reality of it.

The essay should survive its own diagnostic. Applying the load-bearing framework reflexively: the epidemiological modelling itself sits at the information layer. It is offloadable: a student does not need to build the model from scratch to learn from engaging with it. The judgement about what the model means for this community, whether its assumptions hold, what it leaves out, what other forms of evidence should be weighed alongside it, and what should be done in light of the full picture: that is at the personal context layer. It is where the productive struggle lives in these expanded problems, and it is precisely what PBL’s structure (the group deliberation, the facilitated inquiry, the structured reflection) is designed to support.

From manageable problems to problems worth engaging with

If AI expands the range of problems students can meaningfully engage with, curriculum design faces a question it has not previously had to answer. The traditional approach to PBL problem design involves calibrating difficulty to student capability, identifying problems that are complex enough to generate productive inquiry but tractable enough that the group can make meaningful progress with the resources available to them. This calibration has always involved simplification: reducing the number of variables, narrowing the disciplinary scope, removing stakeholder complexity, controlling for the messiness that characterises real-world practice. These simplifications were not pedagogical choices in any deep sense. They were logistical accommodations to the limits of what students could access.

When those access limits change, the rationale for simplification weakens. This is not an argument for removing scaffolding; students still need structured support, and facilitation becomes more important, not less, as problems become more complex. It is an argument for reconsidering which constraints on problem design were genuinely pedagogical and which were merely logistical. A problem narrowed to a single disciplinary perspective because students could not access the others is a problem that was simplified for access reasons. If AI removes that access barrier, the simplification may no longer serve the learning and may in fact impoverish it by stripping away the interdisciplinary complexity that makes the problem authentic.

The implications are particularly significant for problems that Rittel and Webber (1973) characterised as wicked: defined by incomplete knowledge, stakeholder disagreement, and interconnection with other problems such that any intervention produces unanticipated consequences. Wicked problems resist the simplification that traditional curriculum design requires. They cannot be meaningfully narrowed to a single discipline or a single set of stakeholders without losing the features that make them wicked and worth engaging with. These are the problems that healthcare systems most need graduates equipped to think about: health inequalities, the management of complex multimorbidity across fragmented services, the health consequences of climate change, the integration of care across institutional boundaries (Head & Alford, 2015; Fraser & Greenhalgh, 2001). They are also the problems that PBL curricula have historically been forced to simplify to the point where the wickedness is removed.

AI does not solve wicked problems. But it changes the terms on which students can engage with them. A group that can access epidemiological evidence, policy analysis, economic modelling, community perspectives, and ethical frameworks simultaneously, even imperfectly and with the limitations of AI-mediated access, is engaging with a problem closer to its actual complexity than a group whose problem has been pre-simplified to fit within the boundaries of what they could reach unaided.

AI raises the ceiling on problem complexity.

By lowering access barriers, AI makes wicked problems — defined by incomplete knowledge, stakeholder disagreement, and systemic interconnection — accessible to student inquiry in ways that were not previously possible.

The curriculum question shifts accordingly: not what problems are manageable for students at this level, but what problems become possible when access barriers shrink, and which of those possible problems are the ones most worth engaging with, given the challenges graduates will face in practice.

Pedagogy and AI readiness are the same investment

This is where the argument carries weight beyond the tutorial room. Investing in PBL’s structural conditions, problem-driven inquiry, facilitation capacity, collaborative learning infrastructure, and assessment practices that value process over product, is not solely a pedagogical choice. It is, whether or not institutions recognise it as such, a decision about AI readiness.

Institutions that have built their curricula around transmission and artefact-based assessment face a structural problem when AI arrives: the artefacts that anchored their assessment systems are no longer reliable evidence of learning, and the pedagogical infrastructure to support process-based alternatives does not exist. Institutions that have invested in PBL’s structural conditions find themselves with an environment already configured for productive AI integration, not because they anticipated AI, but because the conditions that make PBL effective and the conditions that make AI integration productive are, as this essay has argued, the same conditions.

The choice is not between investing in pedagogy and investing in AI readiness. It is a single investment. Institutions choosing pedagogical infrastructure are simultaneously making decisions about what kind of graduates they produce and what kind of AI integration those graduates are prepared for. The structural argument developed in this essay does not require institutions to adopt PBL specifically. It requires them to ask whether their pedagogical infrastructure creates the conditions under which AI integration supports professional development, and, if it does not, to recognise that the pedagogical deficit and the AI readiness deficit are the same deficit.

The pedagogical and AI readiness deficits are the same deficit.

Institutions investing in PBL’s structural conditions are simultaneously deciding what kind of AI integration their graduates are prepared for, whether or not they recognise it that way.

Conclusion

Earlier in the essay there was a question about what “the work” is, and as the subsequent argument has suggested, the work is the judgement: the reasoning through competing evidence, the evaluation of what matters and what does not, the direction of inquiry toward problems that deserve investigation, the orchestration of multiple perspectives into a position that can be defended and revised. That was always the work. The artefact was never the point; it was the trace left behind by a process that was. AI has made this visible by removing the execution layer that was obscuring it: the retrieval, the summarisation, the first-draft production that consumed so much of students’ cognitive energy that the judgement layer was often reached, if at all, only at the end.

The structural claim is that PBL’s founding commitments — dialogic inquiry, authentic problems, distributed authority, metacognitive reflection — are the same conditions under which AI integration is educationally productive rather than substitutive. The alignment is structural, rooted in a shared response to the same underlying challenge: that professional competence is constituted by adaptive judgement, not routine execution. PBL was not designed for an AI age. It was designed around the recognition that professional formation requires exactly the capabilities that productive AI use also demands. The overlap is not retrospective. It is traceable in PBL’s design logic.

The argument extends beyond compatibility. When AI absorbs the execution layer, the ceiling rises. Problems that were previously simplified to fit within the limits of what students could reach unaided can now be engaged with closer to their actual complexity. This matters because the simplifications that curriculum design has historically imposed were not pedagogically neutral: they stripped away precisely the interdisciplinary, multi-stakeholder, systemically interconnected features that characterise the problems graduates will actually face. A curriculum built on simplified problems was always, in this sense, producing a gap between what students practised and what practice would demand of them. AI does not close that gap automatically, but it removes some of the access constraints that forced the gap open in the first place. The curriculum question shifts from what problems are manageable to what problems are worth engaging with — and whether the answer to that question has been artificially constrained by limitations that no longer apply.

What remains open should be named clearly. Assessment at scale, making engagement visible and evidential across large cohorts without relying on artefact quality as the primary indicator, is genuinely unsolved, and the alternatives proposed so far are partial answers with real limitations. Whether any given PBL programme realises the structural commitments this essay describes is an empirical question the essay cannot answer, and the mixed evidence base on PBL outcomes suggests that the distance between design logic and institutional practice is often considerable. What the expanded category of problem demands from facilitators, in terms of their own capacity to scaffold student engagement with AI-mediated complexity, and their own developing judgement about where productive struggle lives in these new problems, requires investigation that has barely begun. And the structural claim itself, while theoretically grounded, awaits the empirical work that would test whether the alignment described here translates into measurable differences in how students develop professional competence. Whether the structural conditions this essay identifies will be built, and whether institutions will recognise the building of them as the most consequential pedagogical decision the AI era has placed before them, is genuinely open. What is not open is the cost of not building them — or whose development bears it.

References

Akyol, Z., & Garrison, D. R. (2011). Assessing metacognition in an online community of inquiry. The Internet and Higher Education, 14(3), 183–190. https://doi.org/10.1016/j.iheduc.2011.01.005

Alfaro, G. D., Fiore, S. M., & Oden, K. (n.d.). Externalized and extended cognition: Cognitive offloading for human-machine teaming.

Azer, S. A. (2005). Facilitation of students’ discussion in problem-based learning tutorials to create mechanisms: The use of five key questions. [Journal details not confirmed in library record; verify before submission.]

Barrows, H. S. (1986). A taxonomy of problem-based learning methods. Medical Education, 20(6), 481–486.

Bearman, M., Tai, J., & Dawson, P. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education.

Bjork, E. L., & Bjork, R. A. (2009). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In Psychology and the Real World.

Carless, D. (2007). Learning-oriented assessment: Conceptual bases and practical implications. Innovations in Education and Teaching International. [Volume and page details not confirmed — verify before submission.]

Corbin, T., Dawson, P., & Liu, D. (2025). Talk is cheap: Why structural assessment changes are needed for a time of GenAI. Assessment & Evaluation in Higher Education.

Dawson, P., Bearman, M., & Dollinger, M. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education.

Fischer, J., Bearman, M., & Boud, D. (2023). How does assessment drive learning? A focus on students’ development of evaluative judgement. Assessment & Evaluation in Higher Education. [Volume and page details not confirmed — verify before submission.]

Fraser, S. W., & Greenhalgh, T. (2001). Coping with complexity: Educating for capability. BMJ, 323(7316), 799–803.

Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), 6. https://doi.org/10.3390/soc15010006

Head, B. W., & Alford, J. (2015). Wicked problems: Implications for public policy and management. Administration & Society, 47(6), 711–739.

Kuh, G. D. (2009). The national survey of student engagement: Conceptual and empirical foundations. New Directions for Institutional Research.

Lodge, J. M., & Loble, L. (2026). Artificial intelligence, cognitive offloading and implications for education. University of Technology Sydney. https://doi.org/10.71741/4PYXMBNJAQ.31302475

Lodge, J. M., Yang, S., & Furze, L. (2023). It’s not like a calculator, so what is the relationship between learners and generative artificial intelligence? Learning: Research and Practice.

Masters, K., MacNeil, H., & Benjamin, J. (2025). Artificial intelligence in health professions education assessment: AMEE Guide No. 178. Medical Teacher.

Messeri, L., & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature.

Morrow, W. (2009). Bounds of democracy: Epistemological access in higher education. HSRC Press.

Mubuuke, A. G., Louw, A. J. N., & Van Schalkwyk, S. (2017). Self-regulated learning: A key learning effect of feedback in a problem-based learning context. African Journal of Health Professions Education, 9(1). https://doi.org/10.7196/AJHPE.2017.v9i1.715

Norman, G. R., & Schmidt, H. G. (1992). The psychological basis of problem-based learning: A review of the evidence. Academic Medicine.

Norman, G. R., & Schmidt, H. G. (2000). Effectiveness of problem-based learning curricula: Theory, practice and paper darts. Medical Education, 34(9), 721–728. https://doi.org/10.1046/j.1365-2923.2000.00749.x

Rittel, H. W. J., & Webber, M. M. (1973). Dilemmas in a general theory of planning. Policy Sciences, 4(2), 155–169.

Rotgans, J. I., & Schmidt, H. G. (2011). Cognitive engagement in the problem-based learning classroom. Advances in Health Sciences Education.

Rowe, M. (2025a). A theoretical framework for integrating AI into health professions education. Preprint. https://doi.org/10.31219/osf.io/c764f_v1

Rowe, M. (2025b). The learning alignment problem: AI and the loss of control in higher education. Essay.

Rowe, M. (2025c). Taste and judgement in human-AI systems. Essay.

Rowe, M., & Lynch, C. (2025). Context sovereignty for AI-supported learning: A human-centred approach. Preprint. https://doi.org/10.31219/osf.io/8czva_v1

Rowe, M., & Olivier, B. (forthcoming). AI and your PhD.

Savery, J. R. (2006). Overview of problem-based learning: Definitions and distinctions. Interdisciplinary Journal of Problem-Based Learning.

Schmidt, H. G., Vermeulen, L., & van der Molen, H. T. (2006). Long-term effects of problem-based learning: A comparison of competencies acquired by graduates of a problem-based and a conventional medical school. Medical Education.

Vendrell, M., & Johnston, S.-K. (2026). Scaffolding critical thinking with generative AI: Design principles for integrating large language models in higher education. Computers and Education: Artificial Intelligence. https://doi.org/10.1016/j.caeai.2026.100572

Walton, J., Bearman, M., Crawford, N., Tai, J., & Boud, D. (2025). How university students work on assessment tasks with generative artificial intelligence: Matters of judgement. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2025.2570328

Warschauer, M. (2004). Technology and social inclusion: Rethinking the digital divide. MIT Press.

Yan, L., Pammer-Schindler, V., & Mills, C. (2025). Beyond efficiency: Empirical insights on generative AI’s impact on cognition, metacognition and epistemic agency in learning. British Journal of Educational Technology.

Zhan, Y., Boud, D., & Du, Z. (2025). Designing for authentic assessment: A scoping review. Higher Education. https://doi.org/10.1007/s10734-025-01588-9

/home/michael

Table of Contents

Problem-based learning and the structural conditions for productive AI integration

Abstract

Introduction

What AI breaks, and what it doesn’t

How the artefact became the target

Engagement as the surviving proxy

Why scale remains an open problem

The structural conditions for productive AI integration

Why PBL’s core features are the right features

Not all cognitive demand is equally formative

Critical faculty, contextual judgement, metacognition: the same capabilities

When execution is cheap, judgement becomes valuable

Remaining the author of the inquiry

What the structural argument does and does not claim

Beyond compatibility: what AI makes newly possible

Raising the ceiling on problem complexity

Orchestrating inquiry rather than delegating it

From manageable problems to problems worth engaging with

Pedagogy and AI readiness are the same investment

Conclusion

References

Graph View

/home/michael

Table of Contents

Problem-based learning and the structural conditions for productive AI integration

Abstract

Introduction

What AI breaks, and what it doesn’t

How the artefact became the target

Engagement as the surviving proxy

Why scale remains an open problem

The structural conditions for productive AI integration

Why PBL’s core features are the right features

Not all cognitive demand is equally formative

Critical faculty, contextual judgement, metacognition: the same capabilities

When execution is cheap, judgement becomes valuable

Remaining the author of the inquiry

What the structural argument does and does not claim

Beyond compatibility: what AI makes newly possible

Raising the ceiling on problem complexity

Orchestrating inquiry rather than delegating it

From manageable problems to problems worth engaging with

Pedagogy and AI readiness are the same investment

Conclusion

References

Continue reading

Graph View