How to deal with data changing and machine learning models doing worse after training

Jesper Dramsch drift, machine learning, production, read in 2 minutes

Machine Learning in Production 101

I just finished some writing for the UN (ITU) about machine learning models in production.

Wonder how to deal with data changing and models doing worse after training? (when deployed)

This is for you.

📌 We're talking about Drift

Our training data is static. Contact with the real world is non-stationary.

This drift can happen in three ways:

The input data changes
The labels for the data change
The inherent relationship changes

⚗️ Input data changes!

One way to monitor these is by checking the distribution of the new data vs the training data.

We can use these tests:

Continuous: Kolmogorov-Smirnov test
Categorical: Chi-squared test

Solution:

Retraining the models regularly.

🧩 Target / Label changes

These can be natural changes similar to the input changes.

In that case, you can use the same approach.

But sometimes, our categories change, because we make discoveries or mgmt decisions.

Solution: Updates are best reflected in automated pipelines

🔀 Concept shift

This one sucks.

ML models learn the relationship between input and label (ideally).

When that relationship changes our entire historic data set is obsolete.

Essentially what happened in early 2020.

Solution: New data, but setting up auto alerts is essential

📖 More info?

I wrote an ebook about machine learning validation.

I give it away to my newsletter subscribers.

I have just made the biggest update to the ebook, including production models and machine learning drift.

Subscribe to receive insights from Late to the Party on machine learning, data science, and Python every Friday.

Conclusion

We hope training data represents real-world data in machine learning, but it doesn't always.

Set up MLOps automation
Retrain for input data changes
Care for label changes
Hope it's not concept drift, where the relationship of data changes

This website stores data such as cookies to enable site functionality including analytics through Google Analytics and personalization. By using this website, you automatically accept that we use cookies.

Read more...

How to deal with data changing and machine learning models doing worse after training

How to deal with data changing and machine learning models doing worse after training

Machine Learning in Production 101

📌 We're talking about Drift

⚗️ Input data changes!

🧩 Target / Label changes

🔀 Concept shift

📖 More info?

Conclusion

Search

Contents

Recent Articles

Newsletter

Related Articles

Categories

Popular Tags

© 2024 Jesper Dramsch All rights reserved.

Ethics Statement ◆ Privacy Policy ◆ Terms & Conditions ◆ Affiliate Disclosure

This website stores data such as cookies to enable site functionality including analytics through Google Analytics and personalization. By using this website, you automatically accept that we use cookies.

Read more...

How to deal with data changing and machine learning models doing worse after training

How to deal with data changing and machine learning models doing worse after training

Machine Learning in Production 101

📌 We're talking about Drift

⚗️ Input data changes!

🧩 Target / Label changes

🔀 Concept shift

📖 More info?

Conclusion

Search

Contents

Recent Articles

Newsletter

Related Articles

Categories

Popular Tags

Share

© 2024 Jesper Dramsch All rights reserved.

Ethics Statement ◆ Privacy Policy ◆ Terms & Conditions ◆ Affiliate Disclosure