Machine learning is not a single technique but a family of approaches. The three classical paradigms — supervised, unsupervised, and reinforcement learning — differ in what data they need and what kind of problem they solve.
Supervised Learning
You have: A dataset of inputs paired with the correct output (labels).
Goal: Learn a function that maps new inputs to correct outputs.
Two flavours:
| Type | Output | Examples |
|---|---|---|
| Classification | A category (discrete) | Spam / not spam, cat / dog, fraud / legitimate |
| Regression | A number (continuous) | House price, temperature tomorrow, expected revenue |
Supervised learning is the most common form of ML in industry today. The dominant practical bottleneck is usually getting enough labelled data — labelling 100,000 examples by hand is expensive.
Unsupervised Learning
You have: A dataset of inputs with no labels.
Goal: Discover structure or patterns in the data.
Common tasks:
- Clustering: Group similar items together. Example: segment customers into 5 personas based on purchase behaviour. K-means is the canonical algorithm.
- Dimensionality reduction: Compress high-dimensional data while preserving information. Used for visualisation (t-SNE, UMAP) and as a preprocessing step.
- Anomaly detection: Flag items that do not fit the normal pattern. Used in fraud detection and system monitoring.
- Association: Find items that frequently co-occur. The classic "people who bought X also bought Y" recommendation.
Reinforcement Learning (RL)
You have: An environment in which an agent can take actions and receive rewards.
Goal: Train the agent to maximise cumulative reward over time.
The classic conceptual model:
┌─────────┐ action ┌─────────────┐
│ Agent │ ─────────▶ │ Environment │
│ │ │ │
│ │ ◀───────── │ │
└─────────┘ state + └─────────────┘
reward
Production examples:
- AlphaGo defeating the world Go champion (DeepMind, 2016)
- Robotics: learning to manipulate objects, walk, or fly
- Game playing: OpenAI Five for Dota 2; AlphaStar for StarCraft
- RLHF (Reinforcement Learning from Human Feedback): the technique used to align ChatGPT and Claude with helpful, harmless behaviour
RL is powerful but data-hungry — agents typically need millions of trials. It is dominant in research but used less than supervised learning in everyday business applications.
Self-Supervised Learning
A relatively new fourth paradigm — the foundation of modern LLMs. The idea: take unlabelled data and create artificial labels from the data itself.
For example, given the sentence "The cat sat on the ___", the label is the word that was actually there ("mat"). The model learns to predict masked or next words across billions of sentences. No human labelling required — the internet is the dataset.
This trick is what made GPT, BERT, and all modern LLMs possible. It turns the entire web into a labelled dataset for free.
Semi-Supervised Learning
A practical hybrid: a small amount of labelled data plus a large amount of unlabelled data. Common in production where you have, say, 1,000 hand-labelled support tickets and 1,000,000 unlabelled ones.
How to Choose
| If your data is... | And you want to... | Use |
|---|---|---|
| Labelled | Predict a category or number | Supervised learning |
| Unlabelled | Find groups, anomalies, or patterns | Unsupervised learning |
| From an environment with feedback | Optimise sequential decisions | Reinforcement learning |
| Massive and unlabelled (text, images) | Build a foundation model | Self-supervised learning |
| Mostly unlabelled with a few labels | Maximise the value of labels | Semi-supervised learning |