Adaptive Random Forest

Adaptive Random Forest (ARF for short) [1] is an adaptation of the original Random Forest algorithm [2], which has been successfully applied to a multitude of machine learning tasks. In layman’s terms the original Random Forest algorithm is an ensemble of decision trees, which are trained using bagging and where the node splits are limited to a random subset of the original set of features. The “Adaptive” part of ARF comes from its mechanisms to adapt to different kinds of concept drifts, given the same hyper-parameters.

Specifically, the 3 most important aspects of the ARF algorithm are:

  1. It adds diversity through resampling (“bagging”);
  2. It adds diversity through randomly selecting subsets of features for node splits (See moa.classifiers.trees.ARFHoeffdingTree.java);
  3. It has one drift and warning detector per base tree, which cause selective resets in response to drifts. It also allows training background trees, which start training if a warning is detected and replace the active tree if the warning escalates to a drift.

ARF was designed to be “embarrassingly” parallel, in other words, there are no dependencies between trees. This allowed an easy parallel implementation of the algorithm in MOA (see parameter numberOfJobs).

Parameters

Currently, the ARF is configurable using the following parameters in MOA:

  • treeLearner (-l). ARFHoffedingTree is the base tree learner, the important default value is the grace period = 50 (this causes trees to start splitting earlier)
  • ensembleSize (-s). Ensemble size from 10 to 100 (default is 10, but the model tends to improves as more trees are added)
  • mFeaturesMode (-o). Features per subset mode = sqrt(M)+1, which corresponds to sqrt(M) + 1 that is the default for Random Forest algorithms, still it is an hyper-parameter and can be tuned for better results (this can be controlled by changing the mFeaturesMode).
  • mFeaturesPerTreeSize (-m). This controls the size features subsets based on the kFeaturesMode. For example, if mFeaturesMode = sqrt(M)+1 then this parameter is ignored, but if mFeaturesMode = “Specified m” then the value defined in for -m is considered.
  • lambda (-a). Defaults to the same as leveraging bagging, i.e., lambda = 6.
  • numberOfJobs (-j). Defines the number of threads used to train the ensemble, default is 1.
  • driftDetectionMethod (-x). Best results tend to be obtained by using ADWINChangeDetector, the default deltaAdwin (ADWIN parameter) is 0.00001. Still other drift detection methods can be easily configured and used, such as PageHinkley, DDM, EDDM, etc.
  • warningDetectionMethod (-p). This parameter controls the warning detector, such that whenever it triggers a background tree starts training along with the tree where the warning was triggered. It is important that whatever parameters are used in here, that this is consistent with the driftDetectionMethod, i.e., the warning should always trigger before the drift to guarantee the overall algorithm’s expected behavior. The default values are ADWINChangeDetector with deltaAdwin = 0.0001, notice that this is 10x greater than the default deltaAdwin for the driftDetectionMethod.
  • disableWeightedVote (-w). If this is set, then predictions will be based on majority vote instead of weighted votes. The default is to use weighted votes, thus this is not set.
  • disableDriftDetection (-u). This controls if drift detection is used, when set there will be no drift or warning detection as well as background learners are not created, independently of any other parameters. The default is to use drift detection, thus this is not set.
  • disableBackgroundLearner (-q). When set the background learner is not used as well as the warning detector is ignored (there are no checks for warning). The default is to use background learners, thus this is not set.

Comparing against the Adaptive Random Forest

A practical and simple way to compare against Adaptive Random Forest is to compare classification performance in terms of Accuracy/Kappa M and Kappa T using the traditional immediate setting, where  the true label is presented right after the instance has been used for testing, or the delayed setting, where there is a delay between the time an instance is presented and its true label becomes available.

In this post we present some practical and simple ways to compare against ARF, for a thorough comparison against other state-of-the-art ensemble classifiers and thorough discussion of other aspects of ARF (e.g. weighted vote) please refer to publication [1].

Evaluator task Adaptive Random Forest

In [1] the evaluator tasks used were EvaluatePrequential, EvaluatePrequentialCV, EvaluatePrequentialDelayed and EvaluatePrequentialCV. CV stands for Cross-Validation and more details about it can be found in [3]. In this example we are only going to use EvaluatePrequential as CV versions take longer to run, however one can easily change from EvaluatePrequential to EvaluatePrequentialCV and obtain the CV results or EvaluatePrequentialDelayed (and EvaluatePrequentialDelayedCV) to obtain delayed labeling results.

ARF fast and moderate

In the original paper, there are 2 main configurations that should be useful to compare against future algorithms, they were identified as ARF_{moderate} and ARF_{fast}. Basically ARF_{moderate} is the standard configuration (described in the default parameters values above) and ARF_{fast} increases the deltaAdwin (e.g. the default drift detector is ADWINChangeDetector for -x and -p) for both warning and drift detection causing more detections.

To test AdaptiveRandomForest in either delayed or immediate setting execute the latest MOA jar.
You can copy and paste the following commands in the interface (right click the configuration text edit and select “Enter configuration”).

Airlines dataset, ARF_{moderate} (default configuration) with 10 trees.
EvaluatePrequential -l (meta.AdaptiveRandomForest) -s (ArffFileStream -f airlines.arff) -e BasicClassificationPerformanceEvaluator -f 54000

Airlines dataset, ARF_{fast} (detect drifts earlier and yield false positives) with 10 trees.
EvaluatePrequential -l (meta.AdaptiveRandomForest -x (ADWINChangeDetector -a 0.001) -p (ADWINChangeDetector -a 0.01)) -s (ArffFileStream -f airlines.arff) -e BasicClassificationPerformanceEvaluator -f 54000

Airlines dataset, LeveragingBagging with 10 trees
EvaluatePrequential -l meta.LeveragingBag -s (ArffFileStream -f airlines.arff) -e BasicClassificationPerformanceEvaluator -f 54000

Explanation: All these commands executes EvaluatePrequential with BasicClassificationPerformanceEvaluator (this is equivalent to InterleavedTestThenTrain) on the airlines dataset (-f airlines.arff). The tree commands use 10 trees (-s 10 is implicit in all three cases). ARF also uses m = sqrt(total features) + 1 and the difference between ARF_{moderate} and ARF_{fast} remains on the detectors parameters. Make sure to extract the airlines.arff dataset, and setting -f to its location, before executing the command. The airlines and others datasets are available in here: https://github.com/hmgomes/AdaptiveRandomForest.

[1] Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabricio Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem. Adaptive random forests for evolving data stream classification. In Machine Learning, DOI: 10.1007/s10994-017-5642-8, Springer, 2017.

This can be downloaded from Springer: https://rdcu.be/tsdi
The pre-print is also available in here: Adaptive Random Forest pre-print

[2] Breiman, Leo. Random forests.Machine Learning 45.1, 5-32, 2001.

[3] Albert Bifet, Gianmarco De Francisci Morales, Jesse Read, Geoff Holmes, Bernhard Pfahringer: Efficient Online Evaluation of Big Data Stream Classifiers. KDD 2015: 59-68