A Gentle Introduction to Concept Drift in Machine Learning

By Jason Brownlee on December 10, 2020 in Machine Learning Algorithms 56

Data can change over time. This can result in poor and degrading predictive performance in predictive models that assume a static relationship between input and output variables.

This problem of the changing underlying relationships in the data is called concept drift in the field of machine learning.

In this post, you will discover the problem of concept drift and ways to you may be able to address it in your own predictive modeling problems.

After completing this post, you will know:

The problem of data changing over time.
What is concept drift and how it is defined.
How to handle concept drift in your own predictive modeling problems.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

A Gentle Introduction to Concept Drift in Machine Learning
Photo by Joe Cleere, some rights reserved.

Overview

This post is divided into 3 parts; they are:

Changes to Data Over Time
What is Concept Drift?
How to Address Concept Drift

Changes to Data Over Time

Predictive modeling is the problem of learning a model from historical data and using the model to make predictions on new data where we do not know the answer.

Technically, predictive modeling is the problem of approximating a mapping function (f) given input data (X) to predict an output value (y).

y = f(X)

y = f(X)

Often, this mapping is assumed to be static, meaning that the mapping learned from historical data is just as valid in the future on new data and that the relationships between input and output data do not change.

This is true for many problems, but not all problems.

In some cases, the relationships between input and output data can change over time, meaning that in turn there are changes to the unknown underlying mapping function.

The changes may be consequential, such as that the predictions made by a model trained on older historical data are no longer correct or as correct as they could be if the model was trained on more recent historical data.

These changes, in turn, may be able to be detected, and if detected, it may be possible to update the learned model to reflect these changes.

… many data mining methods assume that discovered patterns are static. However, in practice patterns in the database evolve over time. This poses two important challenges. The first challenge is to detect when concept drift occurs. The second challenge is to keep the patterns up-to-date without inducing the patterns from scratch.

— Page 10, Data Mining and Knowledge Discovery Handbook, 2010.

What is Concept Drift?

Concept drift in machine learning and data mining refers to the change in the relationships between input and output data in the underlying problem over time.

In other domains, this change maybe called “covariate shift,” “dataset shift,” or “nonstationarity.”

In most challenging data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time, thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining this phenomenon is referred to as concept drift.

— An overview of concept drift applications, 2016.

A concept in “concept drift” refers to the unknown and hidden relationship between inputs and output variables.

For example, one concept in weather data may be the season that is not explicitly specified in temperature data, but may influence temperature data. Another example may be customer purchasing behavior over time that may be influenced by the strength of the economy, where the strength of the economy is not explicitly specified in the data. These elements are also called a “hidden context”.

A difficult problem with learning in many real-world domains is that the concept of interest may depend on some hidden context, not given explicitly in the form of predictive features. A typical example is weather prediction rules that may vary radically with the season. […] Often the cause of change is hidden, not known a priori, making the learning task more complicated.

— The problem of concept drift: definitions and related work, 2004.

The change to the data could take any form. It is conceptually easier to consider the case where there is some temporal consistency to the change such that data collected within a specific time period show the same relationship and that this relationship changes smoothly over time.

Note that this is not always the case and this assumption should be challenged. Some other types of changes may include:

A gradual change over time.
A recurring or cyclical change.
A sudden or abrupt change.

Different concept drift detection and handling schemes may be required for each situation. Often, recurring change and long-term trends are considered systematic and can be explicitly identified and handled.

Concept drift may be present on supervised learning problems where predictions are made and data is collected over time. These are traditionally called online learning problems, given the change expected in the data over time.

There are domains where predictions are ordered by time, such as time series forecasting and predictions on streaming data where the problem of concept drift is more likely and should be explicitly tested for and addressed.

A common challenge when mining data streams is that the data streams are not always strictly stationary, i.e., the concept of data (underlying distribution of incoming data) unpredictably drifts over time. This has encouraged the need to detect these concept drifts in the data streams in a timely manner

— Concept Drift Detection for Streaming Data, 2015.

Indre Zliobaite in the 2010 paper titled “Learning under Concept Drift: An Overview” provides a framework for thinking about concept drift and the decisions required by the machine learning practitioner, as follows:

Future assumption: a designer needs to make an assumption about the future data source.
Change type: a designer needs to identify possible change patterns.
Learner adaptivity: based on the change type and the future assumption, a designer chooses the mechanisms which make the learner adaptive.
Model selection: a designer needs a criterion to choose a particular parametrization of the selected learner at every time step (e.g. the weights for ensemble members, the window size for variable window method).

This framework may help in thinking about the decision points available to you when addressing concept drift on your own predictive modeling problems.

How to Address Concept Drift?

There are many ways to address concept drift; let’s take a look at a few.

1. Do Nothing (Static Model)

The most common way is to not handle it at all and assume that the data does not change.

This allows you to develop a single “best” model once and use it on all future data.

This should be your starting point and baseline for comparison to other methods. If you believe your dataset may suffer concept drift, you can use a static model in two ways:

Concept Drift Detection. Monitor skill of the static model over time and if skill drops, perhaps concept drift is occurring and some intervention is required.
Baseline Performance. Use the skill of the static model as a baseline to compare to any intervention you make.

2. Periodically Re-Fit

A good first-level intervention is to periodically update your static model with more recent historical data.

For example, perhaps you can update the model each month or each year with the data collected from the prior period.

This may also involve back-testing the model in order to select a suitable amount of historical data to include when re-fitting the static model.

In some cases, it may be appropriate to only include a small portion of the most recent historical data to best capture the new relationships between inputs and outputs (e.g. the use of a sliding window).

3. Periodically Update

Some machine learning models can be updated.

This is an efficiency over the previous approach (periodically re-fit) where instead of discarding the static model completely, the existing state is used as the starting point for a fit process that updates the model fit using a sample of the most recent historical data.

For example, this approach is suitable for most machine learning algorithms that use weights or coefficients such as regression algorithms and neural networks.

4. Weight Data

Some algorithms allow you to weigh the importance of input data.

In this case, you can use a weighting that is inversely proportional to the age of the data such that more attention is paid to the most recent data (higher weight) and less attention is paid to the least recent data (smaller weight).

5. Learn The Change

An ensemble approach can be used where the static model is left untouched, but a new model learns to correct the predictions from the static model based on the relationships in more recent data.

This may be thought of as a boosting type ensemble (in spirit only) where subsequent models correct the predictions from prior models. The key difference here is that subsequent models are fit on different and more recent data, as opposed to a weighted form of the same dataset, as in the case of AdaBoost and gradient boosting.

6. Detect and Choose Model

For some problem domains it may be possible to design systems to detect changes and choose a specific and different model to make predictions.

This may be appropriate for domains that expect abrupt changes that may have occurred in the past and can be checked for in the future. It also assumes that it is possible to develop skillful models to handle each of the detectable changes to the data.

For example, the abrupt change may be a specific observation or observations in a range, or the change in the distribution of one or more input variables.

7. Data Preparation

In some domains, such as time series problems, the data may be expected to change over time.

In these types of problems, it is common to prepare the data in such a way as to remove the systematic changes to the data over time, such as trends and seasonality by differencing.

This is so common that it is built into classical linear methods like the ARIMA model.

Typically, we do not consider systematic change to the data as a problem of concept drift because it can be dealt with directly. Rather, these examples may be a useful way of thinking about your problem and may help you anticipate change and prepare data in a specific way using standardization, scaling, projections, and more to mitigate or at least reduce the effects of change to input variables in the future.

Summary

In this post, you discovered the problem of concept drift in changing data for applied machine learning.

Specifically, you learned:

The problem of data changing over time.
What is concept drift and how it is defined.
How to handle concept drift in your own predictive modeling problems.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

56 Responses to A Gentle Introduction to Concept Drift in Machine Learning

Cédric December 15, 2017 at 6:56 am #

Thanks a lot for your blog, I always read with a lot of attention. Something which is a bit linked for which I did not find an answer: how should we adapt a classification model when new outputs are required ? Typically, classification for face recognition, for which you can add new people regularly (So new output classes), without regenerating a model each time?

Reply
- Jason Brownlee December 15, 2017 at 3:31 pm #
  
  Great question!
  
  You may need to train a new model, but use the existing model as a starting point.
  
  Reply
  - harish July 17, 2021 at 10:37 pm #
    
    Hey,Jason could you please let me know how to start my model using existing model as starting point?
    
    Reply
    - Jason Brownlee July 18, 2021 at 5:21 am #
      
      It is called transfer learning, see these tutorials:
      https://machinelearningmastery.com/?s=transfer+learning&post_type=post&submit=Search
      
      Or model updating:
      https://machinelearningmastery.com/update-neural-network-models-with-more-data/
      
      Reply
Tejas December 16, 2017 at 12:07 am #

Hi Jason,

Great post, as always!

Can you post some literature on Point 5: Learn The Change? The way I think of it is, you have a global baseline model which acts like a static model. Another model which is trained on more subsequent data, which essentially captures the recent trend.

Thanks

Reply
- Jason Brownlee December 16, 2017 at 5:32 am #
  
  Sorry, I don’t have refs.
  
  Reply
  - Sean August 21, 2019 at 1:23 am #
    
    Hey, thanks for your blog,It was very interesting! How would you refer to the appearnece of novel classes (which usually addressed with novelty detection algorithms). Is it also concept drift in your perspective?
    Thanks
    
    Reply
    - Jason Brownlee August 21, 2019 at 6:49 am #
      
      If you can anticipate them, you can leave room in the encoding.
      
      If you can afford to refit, then refit a model once the problem has changed.
      
      If you cannot, perhaps fit a new model that chooses to use the old model for the old classes and a new model for the new classes with a one vs reset type structure. This sounds terrible actually…
      
      Reply
sathouel January 19, 2018 at 12:27 am #

Hi, i am looking for dataset of regression problem containing drift.
Do you have an idea of where can i find such dataSet or maybe generate them (the three types (gradual, abrupt ..)

Thanks for your job and your help !

Reply
- Jason Brownlee January 19, 2018 at 6:32 am #
  
  No sorry.
  
  Reply
- arpieb April 13, 2018 at 12:44 am #
  
  @sathouel you might consider financial data from around 2001 and 2008 where the global economy (and esp the US economy) had abrupt changes as well as some unusual leading indicator fluctuations leading up to the changes (2008 recession comes to mind) that break most traditional financial forecasting models.
  
  Reply
- Pabitra January 9, 2019 at 1:51 am #
  
  Why?
  The stock market data has hidden point “ market sentiment” which is dependent on many unknown parameters
  
  Reply
Sayak February 24, 2018 at 3:17 pm #

Any code example covering the “How to Address Concept Drift?” section would be great.

Reply
- Jason Brownlee February 25, 2018 at 7:40 am #
  
  I have a little on updating LSTMs here that might be of interest:
  https://machinelearningmastery.com/update-lstm-networks-training-time-series-forecasting/
  
  Reply
Lili October 2, 2018 at 9:41 pm #

Hi! Thanks for this post. I would love to find “algorithms that allow you to weigh the importance of input data” but have difficulties finding some. I found this paper ( https://research.uni-sofia.bg/bitstream/10506/57/1/ECAI2000_WSTR.pdf) but the implementation is not available, do you have examples of these algorithms and especially with some code available? Thanks a lot

Reply
- Jason Brownlee October 3, 2018 at 6:18 am #
  
  A simple linear or exponential weighting that pays more attention to more recent data might be a good place to start.
  
  Reply
Dhruv Agarwal October 30, 2018 at 9:16 pm #

Hi,

Great article!

I wanted to know, though, if non-parametric approaches (like clustering) help prevent the problem of Concept Drift from arising in the first place?

Reply
- Jason Brownlee October 31, 2018 at 6:28 am #
  
  How so?
  
  Reply
  - Dhruv Agarwal November 28, 2018 at 6:57 am #
    
    By implicitly adjusting cluster compositions based on the (new) observed patterns in the users’ behaviours.
    
    I’m not claiming that it does. Just wanted to discuss if it’s a possibility.
    
    Reply
Vihanga Bandara January 1, 2019 at 1:57 am #

Hi,

I am working on a domain that belongs to concept drift problem which is spam detection in twitter where the problem that is still yet to be solved properly is ‘spam drift’, Do you have any insights or ideas on how to proceed or any algorithm that might be suitable.

Appreciate your help

Reply
- Jason Brownlee January 1, 2019 at 6:27 am #
  
  Measure the drift first, then propose and test methods to address it.
  
  Reply
  - varsha April 30, 2020 at 3:15 am #
    
    How to measure drift ? can you plz elaborate it with example
    
    Reply
    - Jason Brownlee April 30, 2020 at 6:52 am #
      
      A change in distribution would be a good start. Perhaps cross-entropy or kl divergence.
      
      Reply
cara January 18, 2019 at 1:59 am #

I’m trying to quantify the expense of concept drift. Can you please place a value in terms of time saved, processing power saved, accuracy improvement of the elimination of this problem?

Reply
- Jason Brownlee January 18, 2019 at 5:44 am #
  
  I don’t follow sorry, what do you mean exactly?
  
  Reply
syed January 22, 2019 at 11:08 pm #

Hi,
How can we justify Concept Drift inimage classification?

Reply
- Jason Brownlee January 23, 2019 at 8:48 am #
  
  What do you mean justify? Do you mean detect?
  
  A good sign of drift is a decay in performance over time with new data.
  
  Reply
Manas antani May 10, 2019 at 2:50 pm #

HI,
WHAT IS MODEL TRACKING AND HOW CAN IT BE DONE BEFORE DEPOLYMENT ?
THANK YOU.

Reply
- Jason Brownlee May 11, 2019 at 6:04 am #
  
  I don’t know, what is model tracking?
  
  Do you mean model evaluation?
  https://machinelearningmastery.com/faq/single-faq/how-do-i-evaluate-a-machine-learning-algorithm
  
  Reply
Rebwar Mala Nabi August 30, 2019 at 5:08 am #

Thanks for the lovely article. I am working on Historical Data from Nasdaq for doing stock prediction using ensemble methods, do the concept drift improve the model accuracy if I apply it? or is it practical to use concept Drift in Stock prediction?
Thanks for your kind cooperation

Reply
- Jason Brownlee August 30, 2019 at 6:28 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
  
  Reply
  - Rebwar Mala Nabi August 30, 2019 at 9:10 pm #
    
    Thanks a lot for ur kind reply. However, I have the following questions:
    1- How many ensemble Algorithms are currently available? or which once are considered as an ensemble
    2- Is easy to modify one of the ensembles to achieve better accuracy?
    thanks in advance
    
    Reply
    - Jason Brownlee August 31, 2019 at 6:05 am #
      
      There are many ensemble algorithms, perhaps more than you can count.
      
      There are maybe 3 that you should focus on to start with, e.g. stacking, bagging, boosting.
      
      All models must be tuned to a specific dataset to improve model skill.
      
      Reply
Varsha April 20, 2020 at 2:34 am #

Can you please give simple real life example of concept drift with small dataset?

Reply
- Jason Brownlee April 20, 2020 at 5:28 am #
  
  Sure: The level in a time series changes each day.
  
  Reply
  - Varsha Khandekar April 30, 2020 at 3:11 am #
    
    Sir, I need an mathematical example which shows change in probability distribution i.e.P(Y|X) which shows concept drift. Can you plz demonstrate concept drift on small n simple dataset
    
    Reply
    - Jason Brownlee April 30, 2020 at 6:52 am #
      
      Thanks for the suggestion.
      
      Reply
mahshad April 22, 2020 at 12:32 am #

Dear Jason, thanks alot for your extremely useful article. Currently I am working on an employee churn prediction model. since our churn is limited to maximum 200 people a year, we have decided to use 5 years churn data. in order to train our machine, we need to include retained employee in data set as well.

now there’s a conflict between me and my colleague, when I think we need to use remained employees of each year and add all their records so if some one have retained in company in last 5 years will have 5 different records with 5 different values like different age, tenure, different performance score and so on. but My colleague think counting 1 retained person 5 times in data set is wrong and only their last status should be considered.

may i use this concept and tell him that my idea is correct?

Reply
- Jason Brownlee April 22, 2020 at 6:00 am #
  
  You’re welcome.
  
  I don’t know off the cuff sorry, you will need to dig into your data to discover what is true.
  
  Reply
prasad January 10, 2021 at 2:41 am #

Dear Jason Brownlee ,
In your blog u given good information about the drift detection, i started my research doing ph.d , i have chosen the concept drift detection using ML as a problem. I need the help of choosing the methodology for the above problem dear. can u suggest new methodolgy

Reply
- Jason Brownlee January 10, 2021 at 5:45 am #
  
  Defining a methodology sounds like something you should do as part of your research project.
  
  Reply
prasad January 10, 2021 at 2:43 am #

Can any one help me a novel methodology for concept drift detection using ML.

Reply
- Jason Brownlee January 10, 2021 at 5:46 am #
  
  Yes, but it would probably invalidate the idea of doing the phd in the first place. I recommend checking the literature and discussing with your research advisor.
  
  Reply
Yogendra April 9, 2021 at 4:43 pm #

Hi Mahshad,

In my view you are correct, each year priorities are different so good approach is use yearly data instead of complete 5 years data.

Reply
Varsha June 6, 2021 at 6:51 pm #

Hello, what is difference in concept drift and noise in dataset ?

Reply
- Jason Brownlee June 7, 2021 at 5:20 am #
  
  Noise is random, drift is in a specific direction, e.g. concepts like classes and decision boundaries change.
  
  Drift could be a bias, it could be a type of noise.
  
  Reply
Benny August 11, 2021 at 6:16 am #

Can the concept drift happen in the stream data of students’ performance? And how?

Reply
- Jason Brownlee August 11, 2021 at 7:44 am #
  
  Sure, any change in data over time is an example of drift.
  
  The specific cause for a domain may depend on many factors, perhaps discuss with a domain expert.
  
  Reply
Aidan October 8, 2021 at 4:04 am #

Hi Jason,

Thanks a lot for all of your blogs. I find them very interesting and helpful.

I have built a model which predicts a dependent variable based on a number of independent variables. In practice, new combinations of independent variables are supplied and decisions are based on the resulting prediction. I have received a lot of pushback from supervisor about what would happen if the new combinations are far removed from the population used in the training dataset. Naturally a prediction will still be made, but the validity of it is dubious. My question is: how can one assess the similarity of out-of-sample independent variables to those used during training? If this is possible, I could caveat predictions with warnings when this example is far removed from the distribution in the training dataset.
Another idea is to test where each new variable fits within the distribution of the same variable during training. I have a large dataset and this is do-able. However, their is pushback about only using univariate distributions? and the case when a combination is very unlikely or impossible. Any thoughts on these two methods? Are these methods named in literature?

Thanks in advance and apologies if this does not make sense.

Regards,
Aidan

Reply
- Adrian Tam October 13, 2021 at 5:12 am #
  
  I think confidence can be provided with cross validation in your case. Please see this post: https://machinelearningmastery.com/training-validation-test-split-and-cross-validation-done-right/
  At the end, you compare test set score with cross validation score. In this way, you can tell.
  
  Reply
  - Aidan October 28, 2021 at 3:39 am #
    
    Thanks Adrian. I can certainly see the benefits of cross validation and it makes sense in general cases, but if the new combination of independent variables is far removed from the population this would not help. For instance, negative interest rates are a popular theme in economics and if these were supplied to my model, it would result in many independent variables changing their distribution from what was trained on.
    
    I know this is a bad example, as the user should be aware of such a large change, but what if a user unknowingly supplies a combination of variables which is far removed from the training populations, how would we alert he/she that the model may not be accurate? Is there a technique for such things? E.g. similarity of out-of-sample independent variables to those used during training
    
    Reply
    - Adrian Tam October 28, 2021 at 2:07 pm #
      
      The model will not know it is inaccurate because it will not see what is the correct prediction. But your description seems to me that you’re looking for outlier detection. Try think along that line.
      
      Reply
Krishna November 3, 2021 at 8:04 am #

Thanks for the article Adrian, we have different tests which can measure the data drift like population stability index(PSI), chi-square for categorical, k-s for continuous, but how can we quantify that data in the training set drifted by a certain value over the test set?

Reply
- Adrian Tam November 7, 2021 at 7:24 am #
  
  Do you think the sample mean and variance within test and training set can do the job?
  
  Reply
Bean November 7, 2021 at 12:41 pm #

Greeting Dr. Jason,
How to inject concept drift into the dataset for the purpose of a study?

Reply
- Adrian Tam November 14, 2021 at 11:15 am #
  
  It is best not to inject, but to collect data in different time and see how a model trained using one dataset apply on another. For example, in NLP problems, if you collect text from 2000s for training and apply it on 1900s for testing, you can tell how the drift may affect the model. If you really want to inject, you have to have a model on how the data drifts before you can inject it.
  
  Reply

Navigation

A Gentle Introduction to Concept Drift in Machine Learning

Overview

Changes to Data Over Time

What is Concept Drift?