IBLStreams (Instance Based Learner on Streams for Regression and Classification)

IBLStreams (Instance Based Learner on Streams) is an instance-based learning algorithm for classification and regression problems on data streams by Ammar Shaker, Eyke Hüllermeier and Jürgen Beringer. The method is able to handle large streams with low requirements in terms of memory and computational power. Moreover, it disposes of mechanisms for adapting to concept drift and concept shift.

In instance-based learning, a prediction for the query instance is obtained by combining, in one way or the other, the outputs of the neighbors of this instance in the training data. The type of aggregation depends on the type of problem to be solved. We offer four different prediction schemes, namely the WeightedMode for classification, the WeightedMedian for ordinal classification, and the WeightedMean and LocalLinearRegression for regression problems.

Regression

In regression, the target attribute is numerical, and loss is typically measured in terms of the absolute or squared difference between predicted and true output. Corresponding prediction problems can be solved in two ways. First, the target value can be estimated by the weighted mean of the target values of the k neighbor instances; this prediction is obtained by using the option “-s WMeanReg”, which sets the PredictionStrategy parameter to WeightedMean(Regression). Second, a prediction can be derived by means of locally weighted linear regression. In this case, a (local) linear regression model is fitted to the k nearest neighbors, and this model is used to make a prediction for the query instance. For this approach, the PredictionStrategy parameter must be set to LocalLinearRegression by using “-s LocLinReg”.

Classification

In conventional classification, the target attribute has a nominal scale, i.e., the set of classes is simply a finite set. In ordinal (aka ordered) classification, the set of classes is finite, too, but equipped with a total order relation; that is, the class labels can be put in a natural order (e.g., hotel categories *, **,***, ****, *****). Ordered classification can be enabled by using the option “-s WMedianOClass”; in this case, the WeightedMedian prediction is used, which is suitable for minimizing the absolute error loss function (predicting the i-th class although the j-th class is correct yields an error of abs(i-j)). Leaving this option empty is equivalent to using the default value “-s WModeClass”, in which case the WeightedMode is returned; this prediction is a proper risk minimizer for the standard 0/1 loss (i.e., the loss is 0 if the predicted class is correct and 1 otherwise).

Website