Skip to content
7 min read·Lesson 5 of 10

Classical ML Algorithms You Should Know

Tour the most important algorithms in classical machine learning — linear regression, decision trees, random forests, gradient boosting, k-means — and learn when each is the right tool.

Before deep learning, machine learning was dominated by a set of classical algorithms. These algorithms are still essential — for tabular data and small-to-medium datasets they often outperform neural networks while being faster, more interpretable, and easier to deploy.

Linear Regression

The simplest ML algorithm. Find the line (or hyperplane in higher dimensions) that best fits the data:

y = w1x1 + w2x2 + ... + wnxn + b

Training finds the weights wi and bias b that minimise the squared error between predictions and actual values. Linear regression is:

  • Fast — closed-form solution exists; trains in milliseconds.
  • Interpretable — each weight tells you how much the target changes per unit of that feature.
  • Limited — only captures linear relationships.

Logistic Regression

Despite the name, this is a classification algorithm. It applies the sigmoid function to a linear combination of inputs to produce a probability between 0 and 1, then thresholds at 0.5 for a binary decision.

Despite its simplicity, logistic regression remains the baseline for many production systems — particularly where interpretability matters (credit scoring, medical diagnosis, fraud detection).

Decision Trees

A decision tree splits the data based on feature values. At each node, it picks the feature and threshold that best separates the classes. Example for predicting loan default:

                  income > $50k?
                  /            \
                YES             NO
                 |              |
        credit_score > 700?   default
              /     \
            YES     NO
             |       |
           approve  default

Trees are extremely interpretable and handle mixed numerical/categorical data naturally. But a single tree easily overfits — small changes in data produce wildly different trees.

Random Forests

The fix for overfitting trees: train many trees on different random subsets of the data and features, then average their predictions. This ensemble of trees is much more accurate and stable than any single tree. Random forests are the go-to choice when you need a strong baseline with minimal tuning.

Gradient Boosting

Even more powerful than random forests: train trees sequentially, where each new tree corrects the errors of the previous ones. The most popular implementations:

LibraryStrengths
XGBoostOriginal gradient boosting library; battle-tested; many language bindings
LightGBMFaster than XGBoost on large datasets; histogram-based binning
CatBoostBest handling of categorical features without preprocessing

Gradient boosting models win the majority of Kaggle competitions on tabular data — they consistently outperform neural networks unless the dataset is very large.

Support Vector Machines (SVM)

SVMs find the hyperplane that maximally separates classes, with kernels (RBF, polynomial) allowing non-linear decision boundaries. Powerful for medium-sized datasets but slow to train on large ones — largely superseded by gradient boosting and neural networks for most practical problems.

K-Nearest Neighbours (k-NN)

The simplest classifier: to predict the label of a new point, find the k closest training points and take the majority vote. There is no training step — all the work happens at prediction time. k-NN is:

  • Easy to understand and implement
  • Slow at prediction time on large datasets (must compare to every training point)
  • Sensitive to feature scaling and the choice of k

K-Means Clustering

An unsupervised algorithm that groups data into k clusters:

  1. Pick k random initial centroids
  2. Assign each point to the nearest centroid
  3. Move each centroid to the mean of its assigned points
  4. Repeat until convergence

Choosing k is a judgement call — the "elbow method" plots within-cluster variance vs k and looks for the inflection point. Always run k-means multiple times with different initial centroids and pick the best result; it is sensitive to initialisation.

Choosing an Algorithm

ProblemTry first
Tabular regressionLinear regression baseline → XGBoost / LightGBM
Tabular classificationLogistic regression baseline → XGBoost / LightGBM
Image / audio / textPre-trained deep learning model
ClusteringK-means, then DBSCAN or hierarchical clustering
Anomaly detectionIsolation Forest, autoencoder
RecommendationsMatrix factorisation, neural collaborative filtering

Always start with a simple baseline. If logistic regression gets you 85% accuracy and your goal is 87%, a complex deep learning model is rarely worth the engineering cost.

Key Takeaways

  • Linear regression and logistic regression are simple, interpretable, and the right starting point for many problems.
  • Decision trees split the feature space recursively; ensembles of trees (random forest, gradient boosting) are state-of-the-art for tabular data.
  • Gradient boosting (XGBoost, LightGBM, CatBoost) wins more Kaggle competitions on tabular data than deep learning.
  • K-nearest neighbours (k-NN) classifies by majority vote of nearby training examples — no training step.
  • K-means clusters by iteratively assigning points to centroids and updating centroids — pick k carefully.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →