MOA (Massive On-line Analysis) is a framework for data stream mining. It includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, it is also written in Java, while scaling to more demanding problems. The goal of MOA is a benchmark framework for running experiments in the data stream mining context by proving
- storable settings for data streams (real and synthetic) for repeatable experiments
- a set of existing algorithms and measures form the literature for comparison and
- an easily extendable framework for new streams, algorithms and evaluation methods.
Using MOA
The workflow in MOA follows the simple schema depicted below: first a data stream (feed, generator) is chosen and configured, second an algorithm (e.g. a classifier) is chosen and its paramters are set, third the evaluation method or measure is chosen and finally the results are obtained after running the task.
To run an experiment using MOA, the user can choose between a graphical user interface (GUI) or a command line execution. Users should probably start by watching the demo video (see downloads) or download the software and try on an example. Developers can easily extend all three parts of the above architecture to include and test new methods.
MOA currently supports stream classification, stream clustering, outlier detection, change detection and concept drift and recommender systems. We are working on extending MOA to support other mining tasks on data streams. If you have any suggestions, wishes, contributions or ideas, do not hesitate to contact us!
Stream Classification
- Data Sources or Streams: ARFF Reader, Random Tree Generator, SEA Concepts Generator, STAGGER Concepts Generator, Rotating Hyperplane, Random RBF Generator, LED Generator, Waveform Generator, and Function Generator.
- Classifiers: Naive Bayes, Hoeffding Tree, Hoeffding Option Tree, Hoeffding Adaptive Tree, Bagging, Boosting, Bagging using ADWIN, Leveraging Bagging, SGD, Perceptron, SPegasos.
- Evaluation procedures for Data Streams: Holdout and Interleaved Test-Then-Train or Prequential
All details can be found here.
Stream Clustering
All details can be found here.
Outlier Detection
All details can be found here.
Recommender Systems
All details can be found here.
Extending MOA
Here we just want to give a short example of how to easily extend MOA with a new learning algorithm. New methods are added to the framework via reflections on start up.
To add a new stream classifier algorithm, implement the Classifier.java interface with the following three main methods
void resetLearningImpl(): a method for initializing a classifier learner
void trainOnInstanceImpl(Instance): a method to train a new instance
double[] getVotesForInstance(Instance): a method to obtain the prediction result
To add a new stream clustering algorithm, implement the Clusterer.java interface with the following three main methods
void resetLearningImpl(): a method for initializing a clusterer learner
void trainOnInstanceImpl(Instance): a method to train a new instance
Clustering getClusteringResult(): a method to obtain the current clustering result for evaluation or visualization
Bi-directional interaction of MOA with WEKA
It is easily possible to use WEKA classifiers from MOA, and MOA classifiers and streams from WEKA.