AI across the ecosystem

  • Doing: Producing research questions, literature maps, methods, analysis, and drafts (Elicit, Consensus, SciSpace, Research Rabbit)
  • Learning: Shaping how people learn research through explanation, rehearsal, feedback, and planning (NotebookLM, ChatGPT, Claude)
  • Using: Mediating how clinicians, students, and patients interpret and use research (Perplexity, OpenEvidence, Scholar GPTs)
  • Platform: Influencing policy, triage, screening, ranking, and visibility across the wider ecosystem (journal AI screening, preprint triage, funder scoring)

The future is already here, it's just not evenly distributed (William Gibson)

Frontier AI capabilities

  • Long-context windows (1M tokens): Ingest and reason across a full thesis, systematic review, or decades of a literature in one pass
  • Multimodal: Interpret figures, scans, audio, and video in real time
  • Deep research: Conduct multi-source evidence synthesis unattended
  • Tool use: Search databases, run code, read PDFs, navigate the web, build prototypes
  • Domain reasoning: Postgraduate level across most disciplines (expertise on demand)
  • Agentic workflows: Plan, execute, critique, and revise across hours of autonomous work; task horizon doubling every ~7 months (Kwa et al., 2025)

Research exists to help us better understand the world.

Everything else is infrastructure.

The research-industrial complex

  • Publishers, universities, funders, and metrics form a tightly-coupled system (Buckley, 2024; Ezell, 2024)
  • Self-perpetuating: its survival depends on continuous output, not on understanding moving (Chu & Evans, 2021; Edwards & Roy, 2017)
  • Incentives reward quantity, prestige, and speed; not depth, judgement, or impact (Büttner et al., 2021)
  • The impact of AI on this system: record submission volumes, fake results in the scholarly record, editorial boards drowning (Eaton & Soulière, 2023)

The system rewards output.

Control the outputs

  • Universities roll out AI-detection tools, oral vivas for proposals, and prescribed disclosure statements
  • Doctoral programmes worry about what a thesis, viva, or supervision process can still certify
  • Publishers deploy AI detectors, mandatory disclosure, and desk-rejection of suspected AI work
  • Funders add attestations, audit requirements, and compliance checks to applications
  • Systems domesticate technology; see the internet (Reich, 2020)

Outputs are proxies

  • Papers, theses, grants, reviews are things the system can see
  • They were always proxies for things we can't see: thinking, judgement, development, understanding
  • The inference was workable when producing the artifact required doing the thinking
  • AI breaks that inference: a fluent artifact no longer evidences a thinking process

Fluency used to be signal. Now it is noise. And the infrastructure can't adapt.

I am not my h-index.

Writing as a proxy for thinking

  • If writing friction disappears, the cognitive struggle / thinking has to move somewhere else
  • You can keep the cognitive struggle where it is i.e. form, test, and refine the words yourself
  • You can use AI to produce fluent output faster with less thinking
  • Or you can delegate the writing and move the struggle "upward" into framing, direction, standards, and judgement

Remove the friction without relocating the thinking, and you get AI slop.

Cognitive struggle worth keeping

Not all difficulty is productive. But some is:

  • Locating your thinking in a field
  • Choosing which question is worth asking
  • Defending a decision you have actually made
  • Returning to a draft and seeing how your understanding changed
  • Working with a supervisor in a productive collaboration where your thinking is challenged

These are not arbitrary obstacles. They are how taste and judgement develops.

Same output, different process

Serves development (understanding) Serves the artifact (throughput)
You choose a method, then use AI to pressure-test the decision AI produces the methods section from a design you don't understand
You read and annotate the sources; AI helps sharpen your synthesis AI writes the synthesis from a set of titles and abstracts
You draft an interpretation; AI challenges it; you revise AI writes the discussion from your results table
You defend, revise, and own the position The position is well-articulated but you cannot defend it

Not: "Are you using AI?"

"What are you using AI for?"

Agents are not tools

  • Email, Word, and SPSS are tools; they don't change depending on who is using them
  • Agents hold context, take on roles, push back, remember across sessions, and make local decisions
  • They are relational, shaping the output and the person
  • These relationships run through every point of the research process

Tools don't change who you become. Relationships do.

From execution to direction

  • A lot of research execution can now be delegated; agents hold context, take a role, and work across multiple steps
  • But research direction means someone still sets goals, constraints, sequencing, delegation, and coordination (AI doesn't bring meaning to the work)
  • The limiting factor used to be how quickly you could produce words
  • A new limiting factor may be how well you coordinate agents and keep multiple workstreams coherent

Not just faster outputs but a shift from doing work to coordinating work.

What this (might) look like in practice

  • A supervisor briefs an agent with a candidate's draft, writes feedback alongside it, and compares the two before the next meeting
  • A clinician reads a trial, then uses an agent to pressure-test their intent to apply in local context
  • A doctoral candidate uses AI to push their thinking into more challenging areas than they'd be capable of without that support
  • A journal uses AI to surface connections between papers and communities, not just to screen for compliance
  • A research team briefs agents with explicit standards, exemplars, and constraints before any drafting begins

When AI accelerates understanding, it serves the purpose.

When it accelerates output, it serves the infrastructure.

Working with agents

  • You need a real plan before handoff to agents, with goals, constraints, exemplars, and standards made explicit
  • You need documentation that works as infrastructure, not just private record-keeping
  • You need the expertise to evaluate the output
  • Note: Protect that evaluative judgement, because if agents do too much, judgement fades

Good agent work depends less on the intelligence of the model than on the conditions around it.

A wider agent ecosystem

  • Every stakeholder in the process will soon have their own agents: students, supervisors, institutions, even patients and participants
  • Every interaction becomes multi-agent
  • The bottleneck shifts from what agents or humans can do independently to how well human-AI coalitions can coordinate activity
  • We have no idea how to do this (PhD project, anyone?)

Research becomes a coordination problem, not a usage problem.

What is scarce, is valuable

  • Words are now a commodity (abundant and cheap): Anyone can produce fluent text on any topic
  • Formats are fluid: AI translates understanding across formats; paper, brief, podcast, slides, data visualisations
  • What becomes valuable when output is commoditised? Taste, judgement, trust, responsibility, and the capacity to choose problems worth pursuing

Point AI at helping to better understand the world,

not at feeding the system around it.

I wanted to use this as an opportunity to think carefully about, and to take seriously, the question of what is happening, and going to happen, with AI in research. I think there's a tendency to think of this, at best, as an inconvenience, or at worst, a threat to be neutralised. We have a tendency to accept the existing system as being inherently good or correct, while change must be resisted. I want to explore what that change might look like, and whether we can see opportunities for hope.

I want to make one simple point here: AI is not just inside one part of one workflow. These are not speculative tools. They are being used right now by students, clinicians, supervisors, journal editors, and funders. Elicit already outperforms a novice researcher on structured literature search. NotebookLM will take a folder of PDFs and produce a walkthrough podcast in minutes. Perplexity and OpenEvidence are being used in clinics. Journal screening systems are already filtering submissions before a human sees them. If we talk about AI in research as a future question, we are already a year or two behind what is actually happening.

This is the slide that most academic audiences need and rarely get. The AI conversation in universities is usually calibrated on an experience of ChatGPT-3.5 from 2023. That is not what frontier systems are today. A frontier model can now read a 300-page thesis and discuss it with you. It can act as an agent: taking a task, making plans, executing them, checking its own work, and returning with a structured output hours later. It can use tools: search engines, databases, code interpreters, the web. It can run deep research: autonomously working through dozens of sources to produce a synthesis. It can reason at graduate level across most disciplines. It can see figures, read scans, interpret speech, watch video. The gap between what these systems can do and what the average researcher assumes they can do is what people are calling the capabilities overhang. The consequence is that your students are using tools whose capabilities you may not have calibrated on. The policies being written often describe a system that no longer exists. I want to be careful here: this is not a sales pitch. It is a diagnostic. If we are going to talk sensibly about AI in research, we need a shared picture of what these systems can actually do.

For this audience that means something concrete. Research matters because it has the potential to change what happens in the world. The outputs of research are the evidence of the work that's been done, but they're not the point of the work in themselves. We've confused the purpose of research, with the products of research. That is the first distinction I want to hold on to: purpose versus infrastructure. Journals, publishers, funders, doctoral programmes, peer review, ethics review, assessment systems. These all exist to support the work of understanding. Sometimes they do. Sometimes they become ends in themselves. AI has arrived inside that tension.

This is the slide where I want to name the mechanism. The infrastructure I described a moment ago is not neutral. Publishers, universities, funders, metrics, and rankings now form what I think of as the research-industrial complex: a tightly-coupled system whose survival depends on continuous output, regardless of whether understanding is actually moving. It is self-perpetuating. The incentives reward quantity, prestige, and speed. The careers, revenues, and league-table positions depend on throughput. And you can already see the strain. Submission volumes are at record highs. Editorial boards are drowning. Hallucinated citations are now a routine reviewer problem. Paper mills industrialise the production of publishable-looking work. Journals are triaging with AI because humans cannot keep up with what AI is producing. There is an implicit assumption running through a lot of this — that more research activity means more understanding. It does not. The research-industrial complex has made activity and understanding come apart. That matters here because AI will not arrive into a neutral system. It will be pulled toward what the system already rewards. That is why "just use AI to do research faster" is the wrong frame. The system does not need research faster. It needs research that moves understanding — and it is not currently set up to reward that.

This institutional reflex is understandable. If AI is now touching the whole research ecosystem, the first response is usually to try to control the visible outputs. Universities write policy. Doctoral programmes worry about the integrity of the thesis and the viva. Publishers and funders worry about originality, trust, disclosure, and screening. That is not irrational. Outputs are what the system can see, count, archive, inspect, and regulate. But that instinct also keeps us fixed on the wrong object. It encourages us to treat the paper, the thesis, the assessment, or the review as if that were the thing we most cared about.

This is the central move. Research systems have always relied on artifacts. We use papers, theses, grants, reviews, and other outputs as evidence that certain kinds of thinking and judgement happened behind them. That was never perfect, but it was workable. The artifact stood in for a process we could not directly observe. AI does not eliminate the artifact. It weakens the inference. The paper can exist, the thesis can exist, the review can exist, while telling us less than before about how much understanding, judgement, struggle, or development sits behind it. So the problem is not just AI touching outputs. The problem is that outputs are weaker proxies for the processes we actually care about.

Research is not reducible to its outputs. An impact metric is not the same thing as a researcher. A publication list is not the same thing as intellectual development. And a polished output is not the same thing as the process that gave rise to it. That matters here because AI pushes us toward the measurable artifact even more strongly. If we are not careful, we end up defending the proxy rather than the purpose.

This is an important intermediate step in the argument. For a long time, part of the cognitive struggle of research lived in the labour of producing words. Writing a paragraph forced you to discover whether the thought was actually there. The friction was not always pleasant, but it often sharpened the thinking. Now that friction can be delegated. An agent can produce acceptable wording quickly. That is not automatically a problem. But if the struggle disappears from the writing, it has to reappear somewhere else. It can move upward into the direction layer: choosing the question, setting the frame, defining the constraints, deciding what standard the output must clear, and judging whether it does. If that shift does not happen, then all we have done is accelerate output. We have removed the resistance that used to reveal weak thinking without replacing it with stronger thinking elsewhere. That is where slop comes from: fluent words with no real substance behind them.

This is where I want to distinguish between tedious friction and formative struggle. Formatting references, fixing grammar, reducing search friction, getting past the blank page: there is no moral reason to preserve these as hardships. But making decisions under uncertainty, refining a question, recognising what a field actually needs, responding to challenge from someone who knows your project deeply: those are part of the developmental mechanism. This is also part of how research taste develops: by making choices, seeing what follows from them, and slowly refining your sense of what is worth pursuing. This is also where the supervision literature matters. Chatbot feedback can be useful, but supervision is relational, contextual, developmental, and cumulative. The AI is not wrong to be what it is. It is just not the whole of what development requires.

This is the diagnostic in its clearest form. In the first case, AI is helping the researcher get into harder territory. It is provoking judgement, clarification, and real decisions. In the second, it is helping the researcher submit something acceptable while insulating them from the engagement that was supposed to produce development. That second case matters because it is ordinary, not sensational. Fluency can now be produced without understanding. The artifact can exist without the development being visible inside it.

This is why I want to move the conversation. Once AI is present across the ecosystem, once our instinct has been to police outputs, once we recognise that outputs are only proxies, and once we see that development can happen or not depending on how AI is used, the old question stops helping. The useful question is not whether AI is being used. Of course it is. The useful question is what each use is for. Is it helping us better understand the world? Is it helping us think, judge, refine, and interpret? Or is it mainly helping us satisfy the infrastructure around research more quickly? That is the framing shift I want for the rest of the talk.

Once we accept that the question is about what AI is for, we have to be honest about what AI actually is. Most AI policy in universities is written as if these systems were tools — in the same category as email, Word, or SPSS. That framing misses what is actually happening. A tool is passive. You pick it up, you use it, you put it down. It doesn't hold context. It doesn't change depending on who is using it. It doesn't remember what you said last week. It doesn't push back. SPSS does not become a different piece of software when a different researcher sits in front of it. Agents are not like that. They hold context across a conversation, and increasingly across sessions. They take on roles — thinking partner, research assistant, writing collaborator, mediator, project manager — and they behave differently in each. They make local decisions about how to move through a problem. They push back. They remember. They can produce different versions of themselves depending on who is working with them and how. That makes the interaction relational, not instrumental. And relational interactions have two-way effects: they shape the work, and they shape the person doing the work. A researcher who uses an agent as a thinking partner ends up a different kind of researcher from one who uses it as a reassurance machine, even if the papers look similar. And it affects every point of the research process, not just the drafting stage.

If agents are relational counterparties rather than tools, the work itself changes shape. These systems can now hold context, take a role, do work across multiple steps, and come back with something. The important change is not only that more work can be delegated. It is that the human contribution shifts upward. Less execution, more direction and judgement. Less doing every step yourself, more deciding what should be done, how the work should be sequenced, what constraints matter, what standard it must clear, and whether it actually does. And that changes the bottleneck. The limiting factor is often no longer what the model can do. It is how much coherent direction I can maintain across multiple parallel workstreams.

I want to give this some shape before the closing diagnostic. It is easy to describe what bad AI use looks like. It is harder to describe what a disciplined, developmental use of AI in research looks like. These are a handful of concrete examples. A supervisor can use an agent to sharpen their own feedback: brief it with the student's draft and their own initial notes, and compare what comes back. A clinician can read a paper first and then use an agent to challenge their interpretation, rather than letting the agent tell them what to think. A doctoral candidate can treat their agent interactions as part of the record of their thinking — not as something to hide, but as evidence of how their judgement developed. A journal can use AI to make scholarship more connected, not just to triage submissions. A research team can treat the brief — the standards, the constraints, the exemplars — as the serious intellectual work, with drafting handled downstream. In every case, the human contribution is still the scarce, valuable thing. The agent is pointed at understanding, not at output.

This is the keynote's main diagnostic and it travels beyond the PhD. The same distinction applies to papers, peer review, evidence summaries, grant applications, and the way clinicians engage with research. AI is good at whatever it is pointed at. The question is whether we are pointing it at better understanding or merely at faster throughput. This is why the framing shift matters. "Are you using AI?" is too blunt. "What are you using AI for?" gives us a way to judge the use in relation to the purpose of the work.

This is the constructive bridge from the blog posts. The first requirement is planning before handoff. A half-formed task produces generic work. The brief is not a formality. It is the mechanism by which you stay accountable for what the agent produces. The second is documentation treated as infrastructure. Notes, metadata, tagged references, project briefs, exemplars, and explicit connections are no longer just records for your future self. They become working context that agents can use. The third is expertise. Delegation only works if you can evaluate what comes back. You need enough domain knowledge to tell plausible from real, to see when a draft is thin, and to redirect the next iteration precisely. The fourth is protecting the capacity to evaluate in the first place. If agents increasingly do the reading, summarising, comparing, and drafting, then the human capacity to judge those outputs can itself start to weaken. That is the self-undermining risk: the better the delegation works, the easier it is to stop building the expertise needed to evaluate the delegation. So disciplined work with agents is not casual prompting. It is planning, infrastructure, and evaluative judgement deliberately kept alive.

Up to this point I have mainly been talking about one researcher and their agents. But that is only the beginning. Very soon the research environment is likely to contain multiple people, each bringing their own agents into the interaction. The student may have an agent helping them draft, plan, and interpret. The supervisor may have an agent helping compare versions or generate feedback. The institution may have agents involved in compliance, support, monitoring, or audit. Even participants or publics may arrive with agents that mediate consent, understanding, or representation. And once that happens, the challenge is no longer only how one person uses AI. The challenge becomes relational, institutional, and orchestrational. How do we negotiate student-agent, supervisor-agent, institution-agent, and participant-agent interactions? Who coordinates the overlapping workstreams? What counts as fair use, appropriate disclosure, or legitimate delegation? Where does confidentiality sit? How is intellectual property handled? What professional and social norms govern these interactions? We do not yet have settled language for this, let alone settled norms. And the bottleneck may not be what the agents can do, but how much coherent direction and coordination the human system around them can sustain.

This is the complementarity move and the wider systemic picture together. If we are serious about AI capabilities, we have to be honest: words are now a commodity. Anyone can produce fluent, publishable-looking text on any topic. That is the new baseline. Pretending otherwise is not a strategy. Execution is becoming delegable in the same way. Searching, summarising, drafting, comparison, routine analysis — these used to be the bulk of the work and they are now the easiest parts to hand off. But formats are also becoming fluid. The same understanding can be rendered as a paper, a clinical brief, a podcast, a set of slides, a plain-language summary, a piece of interactive code, or a visualisation. That is not just stylistic — it means research can reach audiences and formats that the journal-paper system was never going to serve well. So what becomes scarce? The things that decide which words matter: taste, judgement, trust, responsibility, and the capacity to recognise which questions are worth asking in the first place. The system-level risk is a flood of plausible work with thin judgement behind it. The system-level opportunity is that understanding can now travel into formats and contexts that were previously closed to it. Which of these we get depends on what we point AI at.

This is where I want to land. Research exists to help us better understand the world. AI will be pointed at whatever we point it at. The research-industrial complex will pull it toward throughput, because that is what the system already rewards. The opportunity — the one worth protecting — is to point it at understanding. That means using AI to sharpen questions, pressure-test interpretations, and make thinking more visible. It means treating the artifact as evidence of a process, not a substitute for it. It means holding onto the human capacities — taste, judgement, trust, responsibility — that give words significance. Every person here occupies more than one position in the research ecosystem: researcher, supervisor, reviewer, clinician, teacher, user of evidence. In each of those positions, the choice is the same. Point AI at understanding. The infrastructure can take care of itself.