Classical ML Algorithms You Should Know — AI and ML Fundamentals | CertQnA

Before deep learning, machine learning was dominated by a set of classical algorithms. These algorithms are still essential — for tabular data and small-to-medium datasets they often outperform neural networks while being faster, more interpretable, and easier to deploy.

Linear Regression

The simplest ML algorithm. Find the line (or hyperplane in higher dimensions) that best fits the data:

y = w₁x₁ + w₂x₂ + ... + w_nx_n + b

Training finds the weights w_i and bias b that minimise the squared error between predictions and actual values. Linear regression is:

Fast — closed-form solution exists; trains in milliseconds.
Interpretable — each weight tells you how much the target changes per unit of that feature.
Limited — only captures linear relationships.

Logistic Regression

Despite the name, this is a classification algorithm. It applies the sigmoid function to a linear combination of inputs to produce a probability between 0 and 1, then thresholds at 0.5 for a binary decision.

Despite its simplicity, logistic regression remains the baseline for many production systems — particularly where interpretability matters (credit scoring, medical diagnosis, fraud detection).

Decision Trees

A decision tree splits the data based on feature values. At each node, it picks the feature and threshold that best separates the classes. Example for predicting loan default:

                  income > $50k?
                  /            \
                YES             NO
                 |              |
        credit_score > 700?   default
              /     \
            YES     NO
             |       |
           approve  default

Trees are extremely interpretable and handle mixed numerical/categorical data naturally. But a single tree easily overfits — small changes in data produce wildly different trees.

Random Forests

The fix for overfitting trees: train many trees on different random subsets of the data and features, then average their predictions. This ensemble of trees is much more accurate and stable than any single tree. Random forests are the go-to choice when you need a strong baseline with minimal tuning.

Gradient Boosting

Even more powerful than random forests: train trees sequentially, where each new tree corrects the errors of the previous ones. The most popular implementations:

Library	Strengths
XGBoost	Original gradient boosting library; battle-tested; many language bindings
LightGBM	Faster than XGBoost on large datasets; histogram-based binning
CatBoost	Best handling of categorical features without preprocessing

Gradient boosting models win the majority of Kaggle competitions on tabular data — they consistently outperform neural networks unless the dataset is very large.

Support Vector Machines (SVM)

SVMs find the hyperplane that maximally separates classes, with kernels (RBF, polynomial) allowing non-linear decision boundaries. Powerful for medium-sized datasets but slow to train on large ones — largely superseded by gradient boosting and neural networks for most practical problems.

K-Nearest Neighbours (k-NN)

The simplest classifier: to predict the label of a new point, find the k closest training points and take the majority vote. There is no training step — all the work happens at prediction time. k-NN is:

Easy to understand and implement
Slow at prediction time on large datasets (must compare to every training point)
Sensitive to feature scaling and the choice of k

K-Means Clustering

An unsupervised algorithm that groups data into k clusters:

Pick k random initial centroids
Assign each point to the nearest centroid
Move each centroid to the mean of its assigned points
Repeat until convergence

Choosing k is a judgement call — the "elbow method" plots within-cluster variance vs k and looks for the inflection point. Always run k-means multiple times with different initial centroids and pick the best result; it is sensitive to initialisation.

Choosing an Algorithm

Problem	Try first
Tabular regression	Linear regression baseline → XGBoost / LightGBM
Tabular classification	Logistic regression baseline → XGBoost / LightGBM
Image / audio / text	Pre-trained deep learning model
Clustering	K-means, then DBSCAN or hierarchical clustering
Anomaly detection	Isolation Forest, autoencoder
Recommendations	Matrix factorisation, neural collaborative filtering

Always start with a simple baseline. If logistic regression gets you 85% accuracy and your goal is 87%, a complex deep learning model is rarely worth the engineering cost.