In April 2020, something peculiar happened on Reddit.
The machine learning subreddit is usually filled with good discussions, links to papers, interesting videos, some drama in academia and industry, and the occasional post that You Know Who actually invented something first.
When the pandemic hit, questions kept popping up, casually asking
Hey guyys, sooo, how do you adapt your models to the changes due to the pandemic?
Machine learning engineers in retail, travel, finance and possibly every other industry were having a bad ol' time. Machine learning models depend on data staying consistent. Reddit was quick to tell these posters to earn their 6-figure salaries proper, because, no one knew what to do.
Machine Learning really depends on precedented times.
In These Unprecedented Times... - A ML Horror Story
When reality changes, machine learning models lose their power.
The pandemic was clearly a catastrophic and abrupt change. It was easy and quick to observe how models that would predict something like customer behaviour were completely useless all of a sudden. However, this often happens slowly over time, it's a phenomenon called "drift".
Oftentimes, they have different words before, namely concept drift, model drift, and data drift.
The F1 and the Furious: Concept Drift
Model & Concept Drift is the change in your target variable. This could be a business changing its definition of "success". This could be laws and regulations changing. This could be new research adjusting our understanding of the world.
When categories in an online store change, this equates to the underlying distribution of our
y shifting. This results in degrading performance of our machine learning models.
Data Drift is the change of our underlying data. This is the customer base of a business changing. This is global travel patterns changing. The target variable is still the exact same, but the behaviour that would lead to that outcome has suddenly changed.
These issues can be addressed by tracking model performance, tracking data distributions, and automatic retraining when degradation thresholds are met.