Supervised, Unsupervised, and Reinforcement Learning — AI and ML Fundamentals | CertQnA

Machine learning is not a single technique but a family of approaches. The three classical paradigms — supervised, unsupervised, and reinforcement learning — differ in what data they need and what kind of problem they solve.

Supervised Learning

You have: A dataset of inputs paired with the correct output (labels).

Goal: Learn a function that maps new inputs to correct outputs.

Two flavours:

Type	Output	Examples
Classification	A category (discrete)	Spam / not spam, cat / dog, fraud / legitimate
Regression	A number (continuous)	House price, temperature tomorrow, expected revenue

Supervised learning is the most common form of ML in industry today. The dominant practical bottleneck is usually getting enough labelled data — labelling 100,000 examples by hand is expensive.

Unsupervised Learning

You have: A dataset of inputs with no labels.

Goal: Discover structure or patterns in the data.

Common tasks:

Clustering: Group similar items together. Example: segment customers into 5 personas based on purchase behaviour. K-means is the canonical algorithm.
Dimensionality reduction: Compress high-dimensional data while preserving information. Used for visualisation (t-SNE, UMAP) and as a preprocessing step.
Anomaly detection: Flag items that do not fit the normal pattern. Used in fraud detection and system monitoring.
Association: Find items that frequently co-occur. The classic "people who bought X also bought Y" recommendation.

Reinforcement Learning (RL)

You have: An environment in which an agent can take actions and receive rewards.

Goal: Train the agent to maximise cumulative reward over time.

The classic conceptual model:

  ┌─────────┐   action   ┌─────────────┐
  │  Agent  │ ─────────▶ │ Environment │
  │         │            │             │
  │         │ ◀───────── │             │
  └─────────┘  state +   └─────────────┘
                reward

Production examples:

AlphaGo defeating the world Go champion (DeepMind, 2016)
Robotics: learning to manipulate objects, walk, or fly
Game playing: OpenAI Five for Dota 2; AlphaStar for StarCraft
RLHF (Reinforcement Learning from Human Feedback): the technique used to align ChatGPT and Claude with helpful, harmless behaviour

RL is powerful but data-hungry — agents typically need millions of trials. It is dominant in research but used less than supervised learning in everyday business applications.

Self-Supervised Learning

A relatively new fourth paradigm — the foundation of modern LLMs. The idea: take unlabelled data and create artificial labels from the data itself.

For example, given the sentence "The cat sat on the ___", the label is the word that was actually there ("mat"). The model learns to predict masked or next words across billions of sentences. No human labelling required — the internet is the dataset.

This trick is what made GPT, BERT, and all modern LLMs possible. It turns the entire web into a labelled dataset for free.

Semi-Supervised Learning

A practical hybrid: a small amount of labelled data plus a large amount of unlabelled data. Common in production where you have, say, 1,000 hand-labelled support tickets and 1,000,000 unlabelled ones.

How to Choose

If your data is...	And you want to...	Use
Labelled	Predict a category or number	Supervised learning
Unlabelled	Find groups, anomalies, or patterns	Unsupervised learning
From an environment with feedback	Optimise sequential decisions	Reinforcement learning
Massive and unlabelled (text, images)	Build a foundation model	Self-supervised learning
Mostly unlabelled with a few labels	Maximise the value of labels	Semi-supervised learning