The End-to-End ML Lifecycle — AI and ML Fundamentals | CertQnA

Building an ML model in a notebook is the easy part. Turning that model into a reliable production system that creates business value involves a lifecycle of distinct stages, each with its own pitfalls.

The Seven Stages

Problem framing
Data collection
Exploratory data analysis (EDA)
Model training
Evaluation
Deployment
Monitoring and feedback

1. Problem Framing

The most important stage — and the most often skipped. Before writing any code, answer:

What business outcome are we trying to improve?
Is ML even the right tool? (Often a SQL query or a rule-based system is sufficient.)
What would success look like, and how will we measure it?
What is the cost of being wrong? (False positive vs false negative trade-off.)

Anti-pattern: "We have a lot of data, what can ML do with it?" This produces solutions in search of problems.

2. Data Collection

Where does training data come from? Options:

Existing operational data: Logs, transactions, user activity
Public datasets: Hugging Face Datasets, Kaggle, ImageNet, Common Crawl
Manual labelling: Hire labellers via Amazon Mechanical Turk, Scale AI, or Labelbox
Synthetic data: Generate examples using simulation or other models

Data quality matters more than data quantity. The famous saying: "garbage in, garbage out" — a model trained on biased or noisy data will produce biased and noisy predictions.

3. Exploratory Data Analysis (EDA)

Before training anything, understand your data:

Distribution of each feature (mean, median, range, missing values)
Distribution of the target variable (is it balanced?)
Correlations between features
Outliers and data quality issues

Tools: pandas .describe(), matplotlib, seaborn, plotly. Modern alternatives: pandas-profiling, ydata-profiling for automated reports.

4. Model Training

Split your data into three sets:

Set	Typical size	Purpose
Training	60–80%	Fit the model parameters
Validation	10–20%	Tune hyperparameters, choose between models
Test	10–20%	Final evaluation — touched only once at the end

Time-based data (stock prices, server metrics) requires chronological splits — never randomly split time series, or you train on the future to predict the past.

5. Evaluation

Choose metrics that match the business outcome:

Classification: accuracy, precision, recall, F1, AUC-ROC
Regression: mean absolute error (MAE), root mean squared error (RMSE), R²
Ranking: NDCG, mean reciprocal rank

For imbalanced classes, accuracy is misleading. If 99% of transactions are legitimate, a model that predicts "legitimate" for everything has 99% accuracy and is useless. Use precision, recall, and the confusion matrix instead.

6. Deployment

Common deployment patterns:

Batch prediction: Run the model on a schedule, store predictions in a database
Real-time API: Wrap the model in an HTTP service (Flask, FastAPI, TorchServe)
Streaming: Score events from a Kafka or Kinesis stream
On-device: Run the model in a mobile app or browser (TensorFlow Lite, ONNX Runtime)

Deployment requires more than the model itself — you need the same preprocessing, the right software dependencies, the right hardware, and the ability to roll back. This is where MLOps tools (MLflow, Weights & Biases, SageMaker, Vertex AI, Azure ML) earn their keep.

7. Monitoring and Feedback

The world changes after you deploy. Two failure modes to monitor:

Data drift: The distribution of inputs changes (e.g., during a pandemic, shopping patterns change).
Concept drift: The relationship between inputs and outputs changes (e.g., what counts as fraud evolves as fraudsters adapt).

Continuous monitoring tracks input distributions, prediction distributions, and (when ground truth becomes available) model accuracy. When drift is detected, retrain on fresh data.

The Iterative Reality

The lifecycle is not a waterfall — it is a loop. Insights from monitoring inform the next round of data collection. Better evaluation reveals that the original problem framing was wrong. The discipline of MLOps is making this loop fast, repeatable, and reliable.