I wanted to use this as an opportunity to think carefully about, and to take seriously, the question of what is happening, and going to happen, with AI in research. I think there's a tendency to think of this, at best, as an inconvenience, or at worst, a threat to be neutralised. We have a tendency to accept the existing system as being inherently good or correct, while change must be resisted. I want to explore what that change might look like, and whether we can see opportunities for hope.
I want to make one simple point here: AI is not just inside one part of one workflow. These are not speculative tools. They are being used right now by students, clinicians, supervisors, journal editors, and funders. Elicit already outperforms a novice researcher on structured literature search. NotebookLM will take a folder of PDFs and produce a walkthrough podcast in minutes. Perplexity and OpenEvidence are being used in clinics. Journal screening systems are already filtering submissions before a human sees them. If we talk about AI in research as a future question, we are already a year or two behind what is actually happening.
This is the slide that most academic audiences need and rarely get. The AI conversation in universities is usually calibrated on an experience of ChatGPT-3.5 from 2023. That is not what frontier systems are today. A frontier model can now read a 300-page thesis and discuss it with you. It can act as an agent: taking a task, making plans, executing them, checking its own work, and returning with a structured output hours later. It can use tools: search engines, databases, code interpreters, the web. It can run deep research: autonomously working through dozens of sources to produce a synthesis. It can reason at graduate level across most disciplines. It can see figures, read scans, interpret speech, watch video. The gap between what these systems can do and what the average researcher assumes they can do is what people are calling the capabilities overhang. The consequence is that your students are using tools whose capabilities you may not have calibrated on. The policies being written often describe a system that no longer exists. I want to be careful here: this is not a sales pitch. It is a diagnostic. If we are going to talk sensibly about AI in research, we need a shared picture of what these systems can actually do.
For this audience that means something concrete. Research matters because it has the potential to change what happens in the world. The outputs of research are the evidence of the work that's been done, but they're not the point of the work in themselves. We've confused the purpose of research, with the products of research. That is the first distinction I want to hold on to: purpose versus infrastructure. Journals, publishers, funders, doctoral programmes, peer review, ethics review, assessment systems. These all exist to support the work of understanding. Sometimes they do. Sometimes they become ends in themselves. AI has arrived inside that tension.
This is the slide where I want to name the mechanism. The infrastructure I described a moment ago is not neutral. Publishers, universities, funders, metrics, and rankings now form what I think of as the research-industrial complex: a tightly-coupled system whose survival depends on continuous output, regardless of whether understanding is actually moving. It is self-perpetuating. The incentives reward quantity, prestige, and speed. The careers, revenues, and league-table positions depend on throughput. And you can already see the strain. Submission volumes are at record highs. Editorial boards are drowning. Hallucinated citations are now a routine reviewer problem. Paper mills industrialise the production of publishable-looking work. Journals are triaging with AI because humans cannot keep up with what AI is producing. There is an implicit assumption running through a lot of this — that more research activity means more understanding. It does not. The research-industrial complex has made activity and understanding come apart. That matters here because AI will not arrive into a neutral system. It will be pulled toward what the system already rewards. That is why "just use AI to do research faster" is the wrong frame. The system does not need research faster. It needs research that moves understanding — and it is not currently set up to reward that.
This institutional reflex is understandable. If AI is now touching the whole research ecosystem, the first response is usually to try to control the visible outputs. Universities write policy. Doctoral programmes worry about the integrity of the thesis and the viva. Publishers and funders worry about originality, trust, disclosure, and screening. That is not irrational. Outputs are what the system can see, count, archive, inspect, and regulate. But that instinct also keeps us fixed on the wrong object. It encourages us to treat the paper, the thesis, the assessment, or the review as if that were the thing we most cared about.
This is the central move. Research systems have always relied on artifacts. We use papers, theses, grants, reviews, and other outputs as evidence that certain kinds of thinking and judgement happened behind them. That was never perfect, but it was workable. The artifact stood in for a process we could not directly observe. AI does not eliminate the artifact. It weakens the inference. The paper can exist, the thesis can exist, the review can exist, while telling us less than before about how much understanding, judgement, struggle, or development sits behind it. So the problem is not just AI touching outputs. The problem is that outputs are weaker proxies for the processes we actually care about.
Research is not reducible to its outputs. An impact metric is not the same thing as a researcher. A publication list is not the same thing as intellectual development. And a polished output is not the same thing as the process that gave rise to it. That matters here because AI pushes us toward the measurable artifact even more strongly. If we are not careful, we end up defending the proxy rather than the purpose.
This is an important intermediate step in the argument. For a long time, part of the cognitive struggle of research lived in the labour of producing words. Writing a paragraph forced you to discover whether the thought was actually there. The friction was not always pleasant, but it often sharpened the thinking. Now that friction can be delegated. An agent can produce acceptable wording quickly. That is not automatically a problem. But if the struggle disappears from the writing, it has to reappear somewhere else. It can move upward into the direction layer: choosing the question, setting the frame, defining the constraints, deciding what standard the output must clear, and judging whether it does. If that shift does not happen, then all we have done is accelerate output. We have removed the resistance that used to reveal weak thinking without replacing it with stronger thinking elsewhere. That is where slop comes from: fluent words with no real substance behind them.
This is where I want to distinguish between tedious friction and formative struggle. Formatting references, fixing grammar, reducing search friction, getting past the blank page: there is no moral reason to preserve these as hardships. But making decisions under uncertainty, refining a question, recognising what a field actually needs, responding to challenge from someone who knows your project deeply: those are part of the developmental mechanism. This is also part of how research taste develops: by making choices, seeing what follows from them, and slowly refining your sense of what is worth pursuing. This is also where the supervision literature matters. Chatbot feedback can be useful, but supervision is relational, contextual, developmental, and cumulative. The AI is not wrong to be what it is. It is just not the whole of what development requires.
This is the diagnostic in its clearest form. In the first case, AI is helping the researcher get into harder territory. It is provoking judgement, clarification, and real decisions. In the second, it is helping the researcher submit something acceptable while insulating them from the engagement that was supposed to produce development. That second case matters because it is ordinary, not sensational. Fluency can now be produced without understanding. The artifact can exist without the development being visible inside it.
This is why I want to move the conversation. Once AI is present across the ecosystem, once our instinct has been to police outputs, once we recognise that outputs are only proxies, and once we see that development can happen or not depending on how AI is used, the old question stops helping. The useful question is not whether AI is being used. Of course it is. The useful question is what each use is for. Is it helping us better understand the world? Is it helping us think, judge, refine, and interpret? Or is it mainly helping us satisfy the infrastructure around research more quickly? That is the framing shift I want for the rest of the talk.
Once we accept that the question is about what AI is for, we have to be honest about what AI actually is. Most AI policy in universities is written as if these systems were tools — in the same category as email, Word, or SPSS. That framing misses what is actually happening. A tool is passive. You pick it up, you use it, you put it down. It doesn't hold context. It doesn't change depending on who is using it. It doesn't remember what you said last week. It doesn't push back. SPSS does not become a different piece of software when a different researcher sits in front of it. Agents are not like that. They hold context across a conversation, and increasingly across sessions. They take on roles — thinking partner, research assistant, writing collaborator, mediator, project manager — and they behave differently in each. They make local decisions about how to move through a problem. They push back. They remember. They can produce different versions of themselves depending on who is working with them and how. That makes the interaction relational, not instrumental. And relational interactions have two-way effects: they shape the work, and they shape the person doing the work. A researcher who uses an agent as a thinking partner ends up a different kind of researcher from one who uses it as a reassurance machine, even if the papers look similar. And it affects every point of the research process, not just the drafting stage.
If agents are relational counterparties rather than tools, the work itself changes shape. These systems can now hold context, take a role, do work across multiple steps, and come back with something. The important change is not only that more work can be delegated. It is that the human contribution shifts upward. Less execution, more direction and judgement. Less doing every step yourself, more deciding what should be done, how the work should be sequenced, what constraints matter, what standard it must clear, and whether it actually does. And that changes the bottleneck. The limiting factor is often no longer what the model can do. It is how much coherent direction I can maintain across multiple parallel workstreams.
I want to give this some shape before the closing diagnostic. It is easy to describe what bad AI use looks like. It is harder to describe what a disciplined, developmental use of AI in research looks like. These are a handful of concrete examples. A supervisor can use an agent to sharpen their own feedback: brief it with the student's draft and their own initial notes, and compare what comes back. A clinician can read a paper first and then use an agent to challenge their interpretation, rather than letting the agent tell them what to think. A doctoral candidate can treat their agent interactions as part of the record of their thinking — not as something to hide, but as evidence of how their judgement developed. A journal can use AI to make scholarship more connected, not just to triage submissions. A research team can treat the brief — the standards, the constraints, the exemplars — as the serious intellectual work, with drafting handled downstream. In every case, the human contribution is still the scarce, valuable thing. The agent is pointed at understanding, not at output.
This is the keynote's main diagnostic and it travels beyond the PhD. The same distinction applies to papers, peer review, evidence summaries, grant applications, and the way clinicians engage with research. AI is good at whatever it is pointed at. The question is whether we are pointing it at better understanding or merely at faster throughput. This is why the framing shift matters. "Are you using AI?" is too blunt. "What are you using AI for?" gives us a way to judge the use in relation to the purpose of the work.
This is the constructive bridge from the blog posts. The first requirement is planning before handoff. A half-formed task produces generic work. The brief is not a formality. It is the mechanism by which you stay accountable for what the agent produces. The second is documentation treated as infrastructure. Notes, metadata, tagged references, project briefs, exemplars, and explicit connections are no longer just records for your future self. They become working context that agents can use. The third is expertise. Delegation only works if you can evaluate what comes back. You need enough domain knowledge to tell plausible from real, to see when a draft is thin, and to redirect the next iteration precisely. The fourth is protecting the capacity to evaluate in the first place. If agents increasingly do the reading, summarising, comparing, and drafting, then the human capacity to judge those outputs can itself start to weaken. That is the self-undermining risk: the better the delegation works, the easier it is to stop building the expertise needed to evaluate the delegation. So disciplined work with agents is not casual prompting. It is planning, infrastructure, and evaluative judgement deliberately kept alive.
Up to this point I have mainly been talking about one researcher and their agents. But that is only the beginning. Very soon the research environment is likely to contain multiple people, each bringing their own agents into the interaction. The student may have an agent helping them draft, plan, and interpret. The supervisor may have an agent helping compare versions or generate feedback. The institution may have agents involved in compliance, support, monitoring, or audit. Even participants or publics may arrive with agents that mediate consent, understanding, or representation. And once that happens, the challenge is no longer only how one person uses AI. The challenge becomes relational, institutional, and orchestrational. How do we negotiate student-agent, supervisor-agent, institution-agent, and participant-agent interactions? Who coordinates the overlapping workstreams? What counts as fair use, appropriate disclosure, or legitimate delegation? Where does confidentiality sit? How is intellectual property handled? What professional and social norms govern these interactions? We do not yet have settled language for this, let alone settled norms. And the bottleneck may not be what the agents can do, but how much coherent direction and coordination the human system around them can sustain.
This is the complementarity move and the wider systemic picture together. If we are serious about AI capabilities, we have to be honest: words are now a commodity. Anyone can produce fluent, publishable-looking text on any topic. That is the new baseline. Pretending otherwise is not a strategy. Execution is becoming delegable in the same way. Searching, summarising, drafting, comparison, routine analysis — these used to be the bulk of the work and they are now the easiest parts to hand off. But formats are also becoming fluid. The same understanding can be rendered as a paper, a clinical brief, a podcast, a set of slides, a plain-language summary, a piece of interactive code, or a visualisation. That is not just stylistic — it means research can reach audiences and formats that the journal-paper system was never going to serve well. So what becomes scarce? The things that decide which words matter: taste, judgement, trust, responsibility, and the capacity to recognise which questions are worth asking in the first place. The system-level risk is a flood of plausible work with thin judgement behind it. The system-level opportunity is that understanding can now travel into formats and contexts that were previously closed to it. Which of these we get depends on what we point AI at.
This is where I want to land. Research exists to help us better understand the world. AI will be pointed at whatever we point it at. The research-industrial complex will pull it toward throughput, because that is what the system already rewards. The opportunity — the one worth protecting — is to point it at understanding. That means using AI to sharpen questions, pressure-test interpretations, and make thinking more visible. It means treating the artifact as evidence of a process, not a substitute for it. It means holding onto the human capacities — taste, judgement, trust, responsibility — that give words significance. Every person here occupies more than one position in the research ecosystem: researcher, supervisor, reviewer, clinician, teacher, user of evidence. In each of those positions, the choice is the same. Point AI at understanding. The infrastructure can take care of itself.