A Gentle Introduction to Concept Drift in Machine Learning

Data can change over time. This can result in poor and degrading predictive performance in predictive models that assume a static relationship between input and output variables.

This problem of the changing underlying relationships in the data is called concept drift in the field of machine learning.

In this post, you will discover the problem of concept drift and ways to you may be able to address it in your own predictive modeling problems.

After completing this post, you will know:

  • The problem of data changing over time.
  • What is concept drift and how it is defined.
  • How to handle concept drift in your own predictive modeling problems.

Let’s get started.

A Gentle Introduction to Concept Drift in Machine Learning

A Gentle Introduction to Concept Drift in Machine Learning
Photo by Joe Cleere, some rights reserved.

Overview

This post is divided into 3 parts; they are:

  1. Changes to Data Over Time
  2. What is Concept Drift?
  3. How to Address Concept Drift

Changes to Data Over Time

Predictive modeling is the problem of learning a model from historical data and using the model to make predictions on new data where we do not know the answer.

Technically, predictive modeling is the problem of approximating a mapping function (f) given input data (X) to predict an output value (y).

Often, this mapping is assumed to be static, meaning that the mapping learned from historical data is just as valid in the future on new data and that the relationships between input and output data do not change.

This is true for many problems, but not all problems.

In some cases, the relationships between input and output data can change over time, meaning that in turn there are changes to the unknown underlying mapping function.

The changes may be consequential, such as that the predictions made by a model trained on older historical data are no longer correct or as correct as they could be if the model was trained on more recent historical data.

These changes, in turn, may be able to be detected, and if detected, it may be possible to update the learned model to reflect these changes.

… many data mining methods assume that discovered patterns are static. However, in practice patterns in the database evolve over time. This poses two important challenges. The first challenge is to detect when concept drift occurs. The second challenge is to keep the patterns up-to-date without inducing the patterns from scratch.

— Page 10, Data Mining and Knowledge Discovery Handbook, 2010.

What is Concept Drift?

Concept drift in machine learning and data mining refers to the change in the relationships between input and output data in the underlying problem over time.

In other domains, this change maybe called “covariate shift,” “dataset shift,” or “nonstationarity.”

In most challenging data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time, thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining this phenomenon is referred to as concept drift.

An overview of concept drift applications, 2016.

A concept in “concept drift” refers to the unknown and hidden relationship between inputs and output variables.

For example, one concept in weather data may be the season that is not explicitly specified in temperature data, but may influence temperature data. Another example may be customer purchasing behavior over time that may be influenced by the strength of the economy, where the strength of the economy is not explicitly specified in the data. These elements are also called a “hidden context”.

A difficult problem with learning in many real-world domains is that the concept of interest may depend on some hidden context, not given explicitly in the form of predictive features. A typical example is weather prediction rules that may vary radically with the season. […] Often the cause of change is hidden, not known a priori, making the learning task more complicated.

The problem of concept drift: definitions and related work, 2004.

The change to the data could take any form. It is conceptually easier to consider the case where there is some temporal consistency to the change such that data collected within a specific time period show the same relationship and that this relationship changes smoothly over time.

Note that this is not always the case and this assumption should be challenged. Some other types of changes may include:

  • A gradual change over time.
  • A recurring or cyclical change.
  • A sudden or abrupt change.

Different concept drift detection and handling schemes may be required for each situation. Often, recurring change and long-term trends are considered systematic and can be explicitly identified and handled.

Concept drift may be present on supervised learning problems where predictions are made and data is collected over time. These are traditionally called online learning problems, given the change expected in the data over time.

There are domains where predictions are ordered by time, such as time series forecasting and predictions on streaming data where the problem of concept drift is more likely and should be explicitly tested for and addressed.

A common challenge when mining data streams is that the data streams are not always strictly stationary, i.e., the concept of data (underlying distribution of incoming data) unpredictably drifts over time. This has encouraged the need to detect these concept drifts in the data streams in a timely manner

Concept Drift Detection for Streaming Data, 2015.

Indre Zliobaite in the 2010 paper titled “Learning under Concept Drift: An Overview” provides a framework for thinking about concept drift and the decisions required by the machine learning practitioner, as follows:

  • Future assumption: a designer needs to make an assumption about the future data source.
  • Change type: a designer needs to identify possible change patterns.
  • Learner adaptivity: based on the change type and the future assumption, a designer chooses the mechanisms which make the learner adaptive.
  • Model selection: a designer needs a criterion to choose a particular parametrization of the selected learner at every time step (e.g. the weights for ensemble members, the window size for variable window method).

This framework may help in thinking about the decision points available to you when addressing concept drift on your own predictive modeling problems.

How to Address Concept Drift?

There are many ways to address concept drift; let’s take a look at a few.

1. Do Nothing (Static Model)

The most common way is to not handle it at all and assume that the data does not change.

This allows you to develop a single “best” model once and use it on all future data.

This should be your starting point and baseline for comparison to other methods. If you believe your dataset may suffer concept drift, you can use a static model in two ways:

  1. Concept Drift Detection. Monitor skill of the static model over time and if skill drops, perhaps concept drift is occurring and some intervention is required.
  2. Baseline Performance. Use the skill of the static model as a baseline to compare to any intervention you make.

2. Periodically Re-Fit

A good first-level intervention is to periodically update your static model with more recent historical data.

For example, perhaps you can update the model each month or each year with the data collected from the prior period.

This may also involve back-testing the model in order to select a suitable amount of historical data to include when re-fitting the static model.

In some cases, it may be appropriate to only include a small portion of the most recent historical data to best capture the new relationships between inputs and outputs (e.g. the use of a sliding window).

3. Periodically Update

Some machine learning models can be updated.

This is an efficiency over the previous approach (periodically re-fit) where instead of discarding the static model completely, the existing state is used as the starting point for a fit process that updates the model fit using a sample of the most recent historical data.

For example, this approach is suitable for most machine learning algorithms that use weights or coefficients such as regression algorithms and neural networks.

4. Weight Data

Some algorithms allow you to weigh the importance of input data.

In this case, you can use a weighting that is inversely proportional to the age of the data such that more attention is paid to the most recent data (higher weight) and less attention is paid to the least recent data (smaller weight).

5. Learn The Change

An ensemble approach can be used where the static model is left untouched, but a new model learns to correct the predictions from the static model based on the relationships in more recent data.

This may be thought of as a boosting type ensemble (in spirit only) where subsequent models correct the predictions from prior models. The key difference here is that subsequent models are fit on different and more recent data, as opposed to a weighted form of the same dataset, as in the case of AdaBoost and gradient boosting.

6. Detect and Choose Model

For some problem domains it may be possible to design systems to detect changes and choose a specific and different model to make predictions.

This may be appropriate for domains that expect abrupt changes that may have occurred in the past and can be checked for in the future. It also assumes that it is possible to develop skillful models to handle each of the detectable changes to the data.

For example, the abrupt change may be a specific observation or observations in a range, or the change in the distribution of one or more input variables.

7. Data Preparation

In some domains, such as time series problems, the data may be expected to change over time.

In these types of problems, it is common to prepare the data in such a way as to remove the systematic changes to the data over time, such as trends and seasonality by differencing.

This is so common that it is built into classical linear methods like the ARIMA model.

Typically, we do not consider systematic change to the data as a problem of concept drift because it can be dealt with directly. Rather, these examples may be a useful way of thinking about your problem and may help you anticipate change and prepare data in a specific way using standardization, scaling, projections, and more to mitigate or at least reduce the effects of change to input variables in the future.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Articles

Summary

In this post, you discovered the problem of concept drift in changing data for applied machine learning.

Specifically, you learned:

  • The problem of data changing over time.
  • What is concept drift and how it is defined.
  • How to handle concept drift in your own predictive modeling problems.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.


Frustrated With Machine Learning Math?

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

…with just arithmetic and simple examples

Discover how in my new Ebook: Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

Click to learn more.


14 Responses to A Gentle Introduction to Concept Drift in Machine Learning

  1. Cédric December 15, 2017 at 6:56 am #

    Thanks a lot for your blog, I always read with a lot of attention. Something which is a bit linked for which I did not find an answer: how should we adapt a classification model when new outputs are required ? Typically, classification for face recognition, for which you can add new people regularly (So new output classes), without regenerating a model each time?

    • Jason Brownlee December 15, 2017 at 3:31 pm #

      Great question!

      You may need to train a new model, but use the existing model as a starting point.

  2. Tejas December 16, 2017 at 12:07 am #

    Hi Jason,

    Great post, as always!

    Can you post some literature on Point 5: Learn The Change? The way I think of it is, you have a global baseline model which acts like a static model. Another model which is trained on more subsequent data, which essentially captures the recent trend.

    Thanks

  3. sathouel January 19, 2018 at 12:27 am #

    Hi, i am looking for dataset of regression problem containing drift.
    Do you have an idea of where can i find such dataSet or maybe generate them (the three types (gradual, abrupt ..)

    Thanks for your job and your help !

    • Jason Brownlee January 19, 2018 at 6:32 am #

      No sorry.

    • arpieb April 13, 2018 at 12:44 am #

      @sathouel you might consider financial data from around 2001 and 2008 where the global economy (and esp the US economy) had abrupt changes as well as some unusual leading indicator fluctuations leading up to the changes (2008 recession comes to mind) that break most traditional financial forecasting models.

  4. Sayak February 24, 2018 at 3:17 pm #

    Any code example covering the “How to Address Concept Drift?” section would be great.

  5. Lili October 2, 2018 at 9:41 pm #

    Hi! Thanks for this post. I would love to find “algorithms that allow you to weigh the importance of input data” but have difficulties finding some. I found this paper ( https://research.uni-sofia.bg/bitstream/10506/57/1/ECAI2000_WSTR.pdf) but the implementation is not available, do you have examples of these algorithms and especially with some code available? Thanks a lot

    • Jason Brownlee October 3, 2018 at 6:18 am #

      A simple linear or exponential weighting that pays more attention to more recent data might be a good place to start.

  6. Dhruv Agarwal October 30, 2018 at 9:16 pm #

    Hi,

    Great article!

    I wanted to know, though, if non-parametric approaches (like clustering) help prevent the problem of Concept Drift from arising in the first place?

    • Jason Brownlee October 31, 2018 at 6:28 am #

      How so?

      • Dhruv Agarwal November 28, 2018 at 6:57 am #

        By implicitly adjusting cluster compositions based on the (new) observed patterns in the users’ behaviours.

        I’m not claiming that it does. Just wanted to discuss if it’s a possibility.

Leave a Reply