Building an ML model in a notebook is the easy part. Turning that model into a reliable production system that creates business value involves a lifecycle of distinct stages, each with its own pitfalls.
The Seven Stages
- Problem framing
- Data collection
- Exploratory data analysis (EDA)
- Model training
- Evaluation
- Deployment
- Monitoring and feedback
1. Problem Framing
The most important stage — and the most often skipped. Before writing any code, answer:
- What business outcome are we trying to improve?
- Is ML even the right tool? (Often a SQL query or a rule-based system is sufficient.)
- What would success look like, and how will we measure it?
- What is the cost of being wrong? (False positive vs false negative trade-off.)
Anti-pattern: "We have a lot of data, what can ML do with it?" This produces solutions in search of problems.
2. Data Collection
Where does training data come from? Options:
- Existing operational data: Logs, transactions, user activity
- Public datasets: Hugging Face Datasets, Kaggle, ImageNet, Common Crawl
- Manual labelling: Hire labellers via Amazon Mechanical Turk, Scale AI, or Labelbox
- Synthetic data: Generate examples using simulation or other models
Data quality matters more than data quantity. The famous saying: "garbage in, garbage out" — a model trained on biased or noisy data will produce biased and noisy predictions.
3. Exploratory Data Analysis (EDA)
Before training anything, understand your data:
- Distribution of each feature (mean, median, range, missing values)
- Distribution of the target variable (is it balanced?)
- Correlations between features
- Outliers and data quality issues
Tools: pandas .describe(), matplotlib, seaborn, plotly. Modern alternatives: pandas-profiling, ydata-profiling for automated reports.
4. Model Training
Split your data into three sets:
| Set | Typical size | Purpose |
|---|---|---|
| Training | 60–80% | Fit the model parameters |
| Validation | 10–20% | Tune hyperparameters, choose between models |
| Test | 10–20% | Final evaluation — touched only once at the end |
Time-based data (stock prices, server metrics) requires chronological splits — never randomly split time series, or you train on the future to predict the past.
5. Evaluation
Choose metrics that match the business outcome:
- Classification: accuracy, precision, recall, F1, AUC-ROC
- Regression: mean absolute error (MAE), root mean squared error (RMSE), R²
- Ranking: NDCG, mean reciprocal rank
For imbalanced classes, accuracy is misleading. If 99% of transactions are legitimate, a model that predicts "legitimate" for everything has 99% accuracy and is useless. Use precision, recall, and the confusion matrix instead.
6. Deployment
Common deployment patterns:
- Batch prediction: Run the model on a schedule, store predictions in a database
- Real-time API: Wrap the model in an HTTP service (Flask, FastAPI, TorchServe)
- Streaming: Score events from a Kafka or Kinesis stream
- On-device: Run the model in a mobile app or browser (TensorFlow Lite, ONNX Runtime)
Deployment requires more than the model itself — you need the same preprocessing, the right software dependencies, the right hardware, and the ability to roll back. This is where MLOps tools (MLflow, Weights & Biases, SageMaker, Vertex AI, Azure ML) earn their keep.
7. Monitoring and Feedback
The world changes after you deploy. Two failure modes to monitor:
- Data drift: The distribution of inputs changes (e.g., during a pandemic, shopping patterns change).
- Concept drift: The relationship between inputs and outputs changes (e.g., what counts as fraud evolves as fraudsters adapt).
Continuous monitoring tracks input distributions, prediction distributions, and (when ground truth becomes available) model accuracy. When drift is detected, retrain on fresh data.
The Iterative Reality
The lifecycle is not a waterfall — it is a loop. Insights from monitoring inform the next round of data collection. Better evaluation reveals that the original problem framing was wrong. The discipline of MLOps is making this loop fast, repeatable, and reliable.