  • New Release of MOA 16.04

    We’ve made a new release of MOA 16.04.

    The new features of this release are:

    • BICO: BIRCH Meets Coresets for k-Means Clustering.
      • Hendrik Fichtenberger, Marc Gillé, Melanie Schmidt,
        Chris Schwiegelshohn, Christian Sohler: ESA 2013: 481-492 (2013)
    • Updates:
      • MultiLabel and MultiTarget methods

    There are these important changes after MOA 2015.11 release:

    • Use Examples instead of Instances to be able to deal easily with unstructured data
    • Use Apache Samoa instances instead of WEKA instances
    • Use the javacliparser library

    You find the download link for this release on the MOA homepage:

    MOA Machine Learning for Data Streams


    The MOA Team

  • New Release of MOA 15.11

    We’ve made a new release of MOA 15.11.

    The new features of this release are:

    • iSOUPTree.
      • Aljaz Osojnik, Pance Panov, Saso Dzeroski: Multi-label Classification via Multi-target Regression on Data Streams. Discovery Science 2015: 170-185
    • SEEDChangeDetector.
      • David Tse Jung Huang, Yun Sing Koh, Gillian Dobbie, Russel Pears: Detecting Volatility Shift in Data Streams. ICDM 2014: 863-868
    • Paired Learners for Concept Drift, by Paulo Gonçalves
      • Stephen H. Bach, Marcus A. Maloof: Paired Learners for Concept Drift. ICDM 2008: 23-32
    • Updates:
      • MultiLabel and MultiTarget methods, FIMT-DD and ORTO

    There are these important changes in this new release:

    • Use Examples instead of Instances to be able to deal easily with unstructured data
    • Use Apache Samoa instances instead of WEKA instances
    • Use the javacliparser library

    You find the download link for this release on the MOA homepage:

    MOA Machine Learning for Data Streams


    The MOA Team

  • New Release of MOA 14.11

    We’ve made a new release of MOA 14.11.

    The new features of this release are:

    • Lazy kNN methods.
      • Albert Bifet, Bernhard Pfahringer, Jesse Read, Geoff Holmes: Efficient data stream classification via probabilistic adaptive windows. SAC 2013: 801-806
    • SGDMultiClass for multi-class SGD learning.
    • OnlineSmoothBoost
      • Shang-Tse Chen, Hsuan-Tien Lin, Chi-Jen Lu:An Online Boosting Algorithm with Theoretical Justifications. ICML 2012
    • ReplacingMissingValuesFilter: a filter to replace missing values by Manuel Martin Salvador.
    • HDDM Concept Drift detector
      • I. Frias-Blanco, J. del Campo-Avila, G. Ramos-Jimenez, R. Morales-Bueno, A. Ortiz-Diaz, and Y. Caballero-Mota, Online and non-parametric drift detection methods based on Hoeffding’s bound, IEEE Transactions on Knowledge and Data Engineering, 2014.
    • SeqDriftChangeDetector Concept Drift detector
      • Pears, R., Sakthithasan, S., & Koh, Y. (2014). Detecting concept change in dynamic data streams. Machine Learning, 97(3), 259-293.
    • Updates:
      • SGD, HoeffdingOptionTree, HAT, FIMTDD, Change Detectors, and DACC

    You find the download link for this release on the MOA homepage:

    MOA Machine Learning for Streams


    The MOA Team

  • Using MOA from ADAMS workflow engine

    MOA and WEKA are powerful tools to perform data mining analysis tasks. Usually, in real applications and professional settings, the data mining processes are complex and consist of several steps. These steps can be seen as a workflow. Instead of implementing a program in JAVA, a professional data miner will build a solution using a workflow, so that it will be much easier to maintain for non-programmer users.

    The Advanced Data mining And Machine learning System (ADAMS) is a novel, flexible workflow engine aimed at quickly building and maintaining real-world, complex knowledge workflows.

    The core of ADAMS is the workflow engine, which follows the philosophy of less is more. Instead of letting the user place operators (or actors in ADAMS terms) on a canvas and then manually connect inputs and outputs, ADAMS uses a tree-like structure. This structure and the control actors define how the data is flowing in the workflow, no explicit connections necessary. The tree-like structure stems from the internal object representation and the nesting of sub-actors within actor-handlers.

    This figure shows ADAMS Flow editor and the adams-moa-classifier-evaluation flow.

    For more information, take a look at the following tutorial: Tutorial 4.

  • Using MOA’s API with Scala

    As Scala runs in the Java Virtual Machine, it is very easy to use MOA objects from Scala.

    Let’s see an example: the Java code of the first example in Tutorial 2.

    In Scala, the same code will be as follows:

    As you can see, it is very easy to use MOA objects from Scala.

  • Using MOA with Scala and its Interactive Shell

    Scala is a powerful language that has functional programming capabilities. As it runs in the Java Virtual Machine, it is very easy to use MOA objects inside Scala.

    Let’s see an example, using the Scala Interactive Interpreter. First we need to start it, telling where the MOA library is:

    scala -cp moa.jar
    Welcome to Scala version 2.9.2.
    Type in expressions to have them evaluated.
    Type :help for more information.

    Let’s run a very simple experiment: using a decision tree (Hoeffding Tree) with data generated from an artificial stream generator (RandomRBFGenerator).

    We should start importing the classes that we need, and defining the stream and the learner.

    scala> import moa.classifiers.trees.HoeffdingTree
    import moa.classifiers.trees.HoeffdingTree
    scala> import moa.streams.generators.RandomRBFGenerator
    import moa.streams.generators.RandomRBFGenerator
    scala> val learner = new HoeffdingTree();
    learner: moa.classifiers.trees.HoeffdingTree =
    Model type: moa.classifiers.trees.HoeffdingTree
    model training instances = 0
    model serialized size (bytes) = -1
    tree size (nodes) = 0
    tree size (leaves) = 0
    active learning leaves = 0
    tree depth = 0
    active leaf byte size estimate = 0
    inactive leaf byte size estimate = 0
    byte size estimate overhead = 0
    Model description:
    Model has not been trained.
    scala> val stream = new RandomRBFGenerator();
    stream: moa.streams.generators.RandomRBFGenerator =

    Now, we need to initialize the stream and the classifier:

    scala> stream.prepareForUse()
    scala> learner.setModelContext(stream.getHeader())
    scala> learner.prepareForUse()

    Now, let’s load an instance from the stream, and use it to train the decision tree:

    scala> import
    scala> val instance = stream.nextInstance().getData()
    instance: = 0.210372,1.009586,0.0919,0.272071,
    scala> learner.trainOnInstance(instance)

    And finally, let’s use it to do a prediction.

    scala> learner.getVotesForInstance(instance)
    res9: Array[Double] = Array(0.0, 0.0)
    scala> learner.correctlyClassifies(instance)
    res7: Boolean = false

    As shown in this example, it is very easy to use the Scala interpreter to run MOA interactively.

  • OpenML: exploring machine learning better, together.

    Now you can use MOA classifiers inside OpenML. OpenML is a website where researchers can share their datasets, implementations and experiments in such a way that they can easily be found and reused by others.

    OpenML engenders a novel, collaborative approach to experimentation with important benefits. First, many questions about machine learning algorithms won’t require the laborious setup of new experiments: they can be answered on the fly by querying the combined results of thousands of studies on all available datasets. OpenML also keeps track of experimentation details, ensuring that we can easily reproduce experiments later on, and confidently build upon earlier work. Reusing experiments also allows us to run large-scale machine learning studies, yielding more generalizable results with less effort. Finally, beyond the traditional publication of algorithms in journals, often in a highly summarized form, OpenML allows researchers to share all code and results that are possibly of interest to others, which may boost their visibility, speed up further research and applications, and engender new collaborations.

  • SAMOA: Scalable Advanced Massive Online Analysis

    SAMOA is distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms. It is a project started at Yahoo Labs Barcelona.

    SAMOA enables development of new ML algorithms without dealing with the complexity of underlying streaming processing engines (SPE, such as Apache Storm and Apache S4). SAMOA users can develop distributed streaming ML algorithms once and execute the algorithms in multiple SPEs, i.e., code the algorithms once and execute them in multiple SPEs.

    To use MOA methods inside SAMOA take a look at

  • RMOA: Massive online data stream classifications with R & MOA

    For R users who work with a lot of data or encounter RAM issues when building models on large datasets, MOA and in general data streams have some nice features. Namely:
    1. It uses a limited amount of memory. So this means no RAM issues when building models.
    2. Processes one example at a time, and will run over it only once
    3. Works incrementally – so that a model is directly ready to be used for prediction purposes

    Unfortunately it is written in Java and not easily accessible for R users to use. For users mostly interested in clustering, the stream package already facilites this (this blog item gave an example when using ff alongside the stream package). In our day-to-day use cases, classification is a more common request. The stream package only allows to do clustering. So hence the decision to make the classification algorithms of MOA easily available to R users as well. For this the RMOA package was created and is available on github (
  • The streams Framework

    The streams framework is a Java implementation of a simple stream processing
    environment by Christian Bockermann and Hendrik Blom at TU Dortmund University. It aims at providing a clean and easy-to-use Java-based platform to process streaming data.

    The core module of the streams library is a thin API layer of interfaces and
    classes that reflect a high-level view of streaming processes. This API serves
    as a basis for implementing custom processors and providing services with the
    streams library.

    Figure 1: Components of the streams library.

    Figure 1 shows the components of the streams library. The binding glue element
    is a thin API layer that attaches to a runtime provided as a separate module or
    can embedded into existing code.

    Process Design with JavaBeans

    The streams library promotes simple software design patterns such as JavaBean
    conventions and dependency injection to allow for a quick setup of streaming
    processes using simple XML files.

    As shown in Figure 2, the idea of the streams library is to provide a simple
    runtime environment that lets users define streaming processes in XML files,
    with a close relation to the implementing Java classes.

    Figure 2: XML process definitions mapped to a runtime environment, using
    stream-api components and other libraries.

    Based on the conventions and patterns used, components of the
    streams library are simple Java classes. Following the basic design
    patterns of the streams library allows for quickly adding custom
    classes to the streaming processes without much trouble.

  • New Release of MOA 14.04

    We’ve made a new release of MOA 14.04.

    The new features of this release are:

    • Change detection Tab
      • Albert Bifet, Jesse Read, Bernhard Pfahringer, Geoff Holmes, Indre Zliobaite: CD-MOA: Change Detection Framework for Massive Online Analysis. IDA 2013: 92-103
    • New Tutorial on Clustering by Frederic Stahl.
    • New version of Adaptive Model Rules for regression
      • Ezilda Almeida, Carlos Abreu Ferreira, João Gama: Adaptive Model Rules from Data Streams. ECML/PKDD (1) 2013: 480-492
    • AnyOut Outlier Detector
      • Ira Assent, Philipp Kranen, Corinna Baldauf, Thomas Seidl: AnyOut: Anytime Outlier Detection on Streaming Data. DASFAA (1) 2012: 228-242
    • ORTO Regression Tree with Options
      • Elena Ikonomovska, João Gama, Bernard Zenko, Saso Dzeroski: Speeding-Up Hoeffding-Based Regression Trees With Options. ICML 2011: 537-544
    • Online Accuracy Updated Ensemble
      • Dariusz Brzezinski, Jerzy Stefanowski: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265: 50-67 (2014)
    • Anticipative and Dynamic Adaptation to Concept Changes Ensemble
      • Ghazal Jaber, Antoine Cornuéjols, Philippe Tarroux: A New On-Line Learning Method for Coping with Recurring Concepts: The ADACC System. ICONIP (2) 2013: 595-604

    You find the download link for this release on the MOA homepage:

    MOA Machine Learning for Data Streams


    The MOA Team

  • New release of MOA 13.11

    We’ve made a new release of MOA 13.11.

    The new feature of this release is:

    • Temporal dependency evaluation
      • Albert Bifet, Jesse Read, Indre Zliobaite, Bernhard Pfahringer, Geoff Holmes: Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them. ECML/PKDD (1) 2013: 465-479

    You find the download link for this release on the MOA homepage:

    MOA Machine Learning for Data Streams


    The MOA Team

  • Temporal Dependency in Classification

    The paper presented at ECML-PKDD 2013 titled “Pitfalls in benchmarking data stream classification and how to avoid them“, showed that classifying data streams has an important temporal component, which we are currently not considering in the evaluation of data-stream classifiers. A very simple classifier that considers this temporal component, the non-change classifier that predicts only using the last class seen by the classifier, can outperform current state-of-the-art classifiers in some real-world datasets. MOA can now evaluate data streams considering this temporal component using:

    • NoChange classifier
    • TemporallyAugmentedClassifier classifier
    • new evaluation measure Kappa+ or Kappa Temp

    which provides a more accurate gauge of classifier performance.

  • New recommender algorithms and evaluation

    MOA has been extended in order to provide an interface to develop and visualize online recommender algorithms.

    This is a simple example in order to show the functionality of the EvaluateOnlineRecommender task in MOA.

    This task takes a rating predictor and a dataset (each training instance being a [user, item, rating] triplet) and evaluates how well the model predicts the ratings, given the user and item, as more and more instances are processed. This is similar to an online scenario of a recommender system, where new ratings from users to items arrive constantly, and the system has to make predictions of unrated items for the user in order to know which ones to recommend.

    Let’s start by opening the MOA user interface. In the Classification tab, click on Configure task, and select from the list the ‘class moa.tasks.EvaluateOnlineRecommender’.

    Now we need to select which dataset we want to process, so we click the corresponding button to edit that option.

    On the list, we can choose different publicly available datasets. For this example, we will be using the Movielens 1M dataset. We can download it from Finally, we select the file where the input data is located.

    Once the dataset is configured, the next step is to choose which ratingPredictor to evaluate.

    For the moment, there are just two available: BaselinePredictor and BRISMFPredictor. The first is a very simple rating predictor, and the second is an implementation of a factorization algorithm described in Scalable Collaborative Filtering Approaches for Large Recommender Systems (Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk). We choose the latter,

    and find the following parameters:

    • features – the number of features to be trained for each user and item
    • learning rate – the learning rate of the gradient descent algorithm
    • regularization ratio – the regularization ratio to be used in the tikhonov regularization
    • iterations – the number of iterations to be used when retraining user and item features (online training).

    We can leave the default parameters for this dataset.

    Going back to the configuration of the task,

    we have the sampleFrequency parameter, which defines the frequency in which the precision measures are taken. And finally, the taskResultFile which allows us to save the output of the task in a file. We can leave the default values for them.

    Now the task is configured, and we only have to run it:

    As the task progresses, we can see in the preview box the RMSE of the predictor from the instance 1 to the processed so far.

    When the task finishes, we can see the final results, the RMSE error of the predictor at each measured point.


  • New release of MOA 13.08

    We’ve made a new release of MOA 13.08.

    The new features of this release are:

    • new outlier detection tab
      • Dimitrios Georgiadis, Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, Yannis Manolopoulos: Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms. SIGMOD Conference 2013: 1061-1064
    • new regression tab
    • FIMT-DD regression tree
      • Elena Ikonomovska, João Gama, Saso Dzeroski: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23(1): 128-168 (2011)
    • Adaptive Model Rules for regression
      • Ezilda Almeida, Carlos Abreu Ferreira, João Gama: Adaptive Model Rules from Data Streams. ECML/PKDD (1) 2013: 480-492
    • a recommender system based in BRISMFPredictor
      • Gábor Takács, István Pilászy, Bottyán Németh, Domonkos Tikk: Scalable Collaborative Filtering Approaches for Large Recommender Systems. Journal of Machine Learning Research 10: 623-656 (2009)
    • clustering updates

    You find the download link for this release on the MOA homepage:

    MOA Machine Learning for Streams


    The MOA Team

  • Pre-release of MOA 13.08

    We are preparing a new release of MOA 13.08.

    The new release of MOA will contain the FIMT-DD regression tree, the Adaptive Model Rules, a recommender system based in BRISMFPredictor, and some more features in clustering and a new outlier detection tab.

    You will find the source code at the repository:


    The MOA Team

  • ADAMS – a different take on workflows

    A fascinating new workflow for MOA and Weka is available. The Advanced
    Data mining And Machine learning System (ADAMS) is a novel, flexible
    workflow engine aimed at quickly building and maintaining real-world,
    complex knowledge workflows. It is written in Java and uses Maven as
    its build system. The framework was open-sourced in September 2012,
    released under GPLv3.

    The core of ADAMS is the workflow engine, which follows the philosophy
    of less is more. Instead of letting the user place operators (or
    actors in ADAMS terms) on a canvas and then manually connect inputs
    and outputs, ADAMS uses a tree-like structure. This structure and the
    control actors define how the data is flowing in the workflow, no
    explicit connections necessary. The tree-like structure stems from the
    internal object representation and the nesting of sub-actors within

    The MOA team recommends ADAMS as the best workflow tool for MOA.

  • New release of MOA 12.08

    We’ve made a new release of MOA 12.08.

    The new features of this release are:

    • new rule classification methods : VFDR Rules from Learning Decision Rules from Data Streams, IJCAI 2011, J. Gama, P. Kosina
    • migrated to proper maven project
    • NaiveBayesMultinomial and SGD updated with adaptive DoubleVector for weights
    • new multilabel classifiers: Scalable and efficient multi-label classification for evolving data streams. Jesse Read, Albert Bifet, Geoff Holmes, Bernhard Pfahringer: Machine Learning 88(1-2): 243-272 (2012)
    • updated DDM with with an option of minimum number of instances to detect change

    You find the download link for this release on the MOA homepage:


    The MOA Team