How to Address Concept Drift? - 1

There are many ways to address concept drift; let’s take a look at a few.

1. Do Nothing (Static Model)

The most common way is to not handle it at all and assume that the data does not change.

This allows you to develop a single “best” model once and use it on all future data.

This should be your starting point and baseline for comparison to other methods. If you believe your dataset may suffer concept drift, you can use a static model in two ways:

  1. Concept Drift Detection . Monitor skill of the static model over time and if skill drops, perhaps concept drift is occurring and some intervention is required.
  2. Baseline Performance . Use the skill of the static model as a baseline to compare to any intervention you make.

2. Periodically Re-Fit

A good first-level intervention is to periodically update your static model with more recent historical data.

For example, perhaps you can update the model each month or each year with the data collected from the prior period.

This may also involve back-testing the model in order to select a suitable amount of historical data to include when re-fitting the static model.

In some cases, it may be appropriate to only include a small portion of the most recent historical data to best capture the new relationships between inputs and outputs (e.g. the use of a sliding window).

3. Periodically Update

Some machine learning models can be updated.

This is an efficiency over the previous approach (periodically re-fit) where instead of discarding the static model completely, the existing state is used as the starting point for a fit process that updates the model fit using a sample of the most recent historical data.

For example, this approach is suitable for most machine learning algorithms that use weights or coefficients such as regression algorithms and neural networks.

4. Weight Data

Some algorithms allow you to weigh the importance of input data.

In this case, you can use a weighting that is inversely proportional to the age of the data such that more attention is paid to the most recent data (higher weight) and less attention is paid to the least recent data (smaller weight).