Definition

Large language model (noun): A deep learning model with billions or more parameters, trained on large quantities of unlabelled text using self-supervised learning, capable of general-purpose language understanding and generation tasks.

The term has no formal definition but typically refers to models with parameter counts in the billions or trillions. Unlike earlier NLP approaches that required specialised models for specific tasks, LLMs are general-purpose systems whose capabilities emerge from scale—the amount of data, parameter size, and computing power devoted to training.

Key characteristics:

  • Trained on vast text corpora using next-token prediction
  • Billions to trillions of parameters
  • General-purpose rather than task-specific
  • Emergent capabilities not explicitly programmed
  • Require significant computational infrastructure

Context and background

LLMs emerged from the neural network renaissance of the 2010s, building on earlier statistical language modelling. The modern era began with Google’s introduction of the Transformer architecture in 2017 (“Attention Is All You Need”), which enabled more efficient parallel processing of language data.

OpenAI’s GPT series, beginning in 2018, demonstrated that scaling model size and training data could yield dramatic improvements. The 2020 release of GPT-3 marked a pivotal moment, revealing emergent abilities arising from scale rather than explicit programming. The field accelerated after 2020 with models like BERT, T5, PaLM, Claude, and GPT-4 pushing boundaries of size, training methods, and capabilities.

LLMs developed in response to limitations of earlier NLP approaches that struggled with context, nuance, and generative capabilities. The shift represented a move from supervised learning on task-specific datasets to self-supervised pretraining on general corpora, followed by fine-tuning or instruction-tuning.

Distinctive characteristics

Core architecture:

  • Transformer models (encoder, decoder, or encoder-decoder configurations)
  • Attention mechanisms (self-attention, multi-head attention)
  • Tokenisation approaches (BPE, WordPiece, SentencePiece)
  • Massive parameter scales

Training methodology:

  • Self-supervised learning through next-token prediction
  • Reinforcement learning from human feedback (RLHF)
  • Constitutional AI approaches
  • Fine-tuning for specific tasks or alignment goals

What distinguishes LLMs from earlier approaches:

  • Scale as the primary driver of capability
  • Emergent abilities not present in smaller models
  • General-purpose rather than task-specific design
  • Ability to perform tasks through prompting rather than retraining

Applications

Operational processes:

  • Pretraining: learning language patterns from vast corpora
  • Fine-tuning: adapting to specific tasks or alignment goals
  • Inference: generating responses to prompts
  • Context handling: processing and utilising context windows

Interaction patterns:

  • Prompting techniques and prompt engineering
  • Chain-of-thought reasoning
  • Few-shot and zero-shot learning
  • Tool use and function calling

Capability spectrum:

  • Factual recall and question answering
  • Reasoning and analysis
  • Creative generation
  • Instruction following
  • Code generation and debugging
  • Translation and summarisation

Implications

For knowledge work and education:

  • Transformation of writing, research, and analysis tasks
  • New approaches to learning and assessment
  • Questions about academic integrity and AI-assisted work
  • Potential for personalised learning support

For society:

  • Impact on labour markets and knowledge work
  • Content creation and media landscape transformation
  • Misinformation and synthetic content concerns
  • Accessibility of expertise and information

For technical development:

  • Hardware architecture co-evolution
  • Significant energy and computational resource requirements
  • New evaluation and red-teaming methodologies
  • Data curation and quality control challenges

Questions and tensions

Known limitations:

  • Hallucinations: generating plausible but false information
  • Context window constraints
  • Temporal knowledge cutoffs
  • Difficulty with precise reasoning and mathematics

Open questions:

  • How do emergent capabilities arise from scale and architecture?
  • How will LLMs evolve to balance capabilities with safety, transparency, and alignment?
  • What are the parallels and differences with human cognition?
  • How should governance frameworks develop alongside capabilities?

Safety and alignment concerns:

  • Interpretability and transparency
  • Value alignment techniques
  • Bias mitigation
  • Deployment safeguards

My thinking

LLMs represent a significant shift in how we think about AI capabilities—from carefully engineered systems to emergent properties arising from scale. This has profound implications for education, where we’ve historically assumed that producing certain artifacts (essays, analyses, code) required and therefore demonstrated understanding.

The question isn’t whether LLMs will transform knowledge work but how we adapt our practices—particularly in education—to a world where generation is computationally trivial. This connects to the bitter lesson: we may have been measuring artifact production rather than learning itself.

What remains unclear is how capabilities will continue to scale and whether current approaches have fundamental limits. The field moves faster than our ability to develop appropriate governance and educational responses.


Sources

  • Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS.
  • Brown, T. et al. (2020). Language Models are Few-Shot Learners. NeurIPS.
  • Wikipedia contributors. (2023). Large language model.

Notes

This note synthesises foundational concepts about LLMs. As the field evolves rapidly, specific model capabilities and limitations may shift. The conceptual framework—scale driving emergent capabilities, self-supervised pretraining, alignment through RLHF—provides more durable understanding than specific model comparisons.