Skip to content
6 min read·Lesson 2 of 10

Supervised, Unsupervised, and Reinforcement Learning

Learn the three main paradigms of machine learning, when to use each, and the canonical problems they solve — with concrete examples from production systems.

Machine learning is not a single technique but a family of approaches. The three classical paradigms — supervised, unsupervised, and reinforcement learning — differ in what data they need and what kind of problem they solve.

Supervised Learning

You have: A dataset of inputs paired with the correct output (labels).

Goal: Learn a function that maps new inputs to correct outputs.

Two flavours:

TypeOutputExamples
ClassificationA category (discrete)Spam / not spam, cat / dog, fraud / legitimate
RegressionA number (continuous)House price, temperature tomorrow, expected revenue

Supervised learning is the most common form of ML in industry today. The dominant practical bottleneck is usually getting enough labelled data — labelling 100,000 examples by hand is expensive.

Unsupervised Learning

You have: A dataset of inputs with no labels.

Goal: Discover structure or patterns in the data.

Common tasks:

  • Clustering: Group similar items together. Example: segment customers into 5 personas based on purchase behaviour. K-means is the canonical algorithm.
  • Dimensionality reduction: Compress high-dimensional data while preserving information. Used for visualisation (t-SNE, UMAP) and as a preprocessing step.
  • Anomaly detection: Flag items that do not fit the normal pattern. Used in fraud detection and system monitoring.
  • Association: Find items that frequently co-occur. The classic "people who bought X also bought Y" recommendation.

Reinforcement Learning (RL)

You have: An environment in which an agent can take actions and receive rewards.

Goal: Train the agent to maximise cumulative reward over time.

The classic conceptual model:

  ┌─────────┐   action   ┌─────────────┐
  │  Agent  │ ─────────▶ │ Environment │
  │         │            │             │
  │         │ ◀───────── │             │
  └─────────┘  state +   └─────────────┘
                reward

Production examples:

  • AlphaGo defeating the world Go champion (DeepMind, 2016)
  • Robotics: learning to manipulate objects, walk, or fly
  • Game playing: OpenAI Five for Dota 2; AlphaStar for StarCraft
  • RLHF (Reinforcement Learning from Human Feedback): the technique used to align ChatGPT and Claude with helpful, harmless behaviour

RL is powerful but data-hungry — agents typically need millions of trials. It is dominant in research but used less than supervised learning in everyday business applications.

Self-Supervised Learning

A relatively new fourth paradigm — the foundation of modern LLMs. The idea: take unlabelled data and create artificial labels from the data itself.

For example, given the sentence "The cat sat on the ___", the label is the word that was actually there ("mat"). The model learns to predict masked or next words across billions of sentences. No human labelling required — the internet is the dataset.

This trick is what made GPT, BERT, and all modern LLMs possible. It turns the entire web into a labelled dataset for free.

Semi-Supervised Learning

A practical hybrid: a small amount of labelled data plus a large amount of unlabelled data. Common in production where you have, say, 1,000 hand-labelled support tickets and 1,000,000 unlabelled ones.

How to Choose

If your data is...And you want to...Use
LabelledPredict a category or numberSupervised learning
UnlabelledFind groups, anomalies, or patternsUnsupervised learning
From an environment with feedbackOptimise sequential decisionsReinforcement learning
Massive and unlabelled (text, images)Build a foundation modelSelf-supervised learning
Mostly unlabelled with a few labelsMaximise the value of labelsSemi-supervised learning

Key Takeaways

  • Supervised learning trains on labelled examples to predict labels for new inputs (classification or regression).
  • Unsupervised learning finds structure in unlabelled data — clustering, dimensionality reduction, anomaly detection.
  • Reinforcement learning trains an agent to take actions in an environment to maximise a reward signal.
  • Self-supervised learning is the foundation of modern LLMs: the data labels itself (predict the next token).
  • Choosing the right paradigm depends on what data you have, not just what problem you want to solve.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →