The first example will command MOA to train the HoeffdingTree
classifier and create a model.
java -cp moa.jar -javaagent:sizeofag.jar moa.DoTask \ LearnModel -l trees.HoeffdingTree \ -s generators.WaveformGenerator -m 1000000 -O model1.moa
The moa.DoTask class is the main class
for running tasks on the command line. It will accept the name of a task followed by
any appropriate parameters. The first task used is the LearnModel task. The
parameters used are the following
- -l parameter specifies the learner, in this case the HoeffdingTree class.
- -s parameter specifies the stream to learn from, in this case generators.WaveformGenerator is specified, which is a data stream generator that produces a three-class learning problem of identifying three types of waveform.
- -m option specifies the maximum number of examples to train the learner with, in this case one million examples.
- -O option specifies a file to output the resulting model to:
This will create a file named model1.moa that contains the decision stump
model that was induced during training.
The next example will evaluate the model to see how accurate it is on
a set of examples that are generated using a different random seed. The
EvaluateModel task is given the parameters needed to load the model produced
in the previous step, generate a new waveform stream with a random
seed of 2, and test on one million examples:
java -cp moa.jar -javaagent:sizeofag.jar moa.DoTask \ "EvaluateModel -m file:model1.moa \ -s (generators.WaveformGenerator -i 2) -i 1000000"
This is the first example of nesting parameters using brackets. Quotes
have been added around the description of the task, otherwise the operating
system may be confused about the meaning of the brackets.
After evaluation the following statistics are output:
classified instances = 1,000,000 classifications correct (percent) = 84.474 Kappa Statistic (percent) = 76.711
Note the the above two steps can be achieved by rolling them into one,
avoiding the need to create an external file, as follows:
java -cp moa.jar -javaagent:sizeofag.jar moa.DoTask \ "EvaluateModel -m (LearnModel -l trees.HoeffdingTree \ -s generators.WaveformGenerator -m 1000000) \ -s (generators.WaveformGenerator -i 2) -i 1000000"
The task EvaluatePeriodicHeldOutTest will train a model while taking
snapshots of performance using a held-out test set at periodic intervals.
The following command creates a comma separated values file, training the
HoeffdingTree classifier on the WaveformGenerator data, using
the first 100 thousand examples for testing, training on a total of 100 million
examples, and testing every one million examples:
java -cp moa.jar -javaagent:sizeofag.jar moa.DoTask \ "EvaluatePeriodicHeldOutTest -l trees.HoeffdingTree \ -s generators.WaveformGenerator \ -n 100000 -i 10000000 -f 1000000" > dsresult.csv
For the purposes of comparison, a bagging learner using ten decisions trees can be trained on
the same problem:
java -cp moa.jar -javaagent:sizeofag.jar moa.DoTask \ "EvaluatePeriodicHeldOutTest -l (OzaBag -l trees.HoeffdingTree -s 10)\ -s generators.WaveformGenerator \ -n 100000 -i 10000000 -f 1000000" > htresult.csv
Another evaluation method implemented in MOA is Interleaved Test-Then-Train or Prequential:
Each individual example is used to test the model
before it is used for training, and from this the accuracy is incrementally
updated. When intentionally performed in this order, the model is always
being tested on examples it has not seen. This scheme has the advantage that
no holdout set is needed for testing, making maximum use of the available
data. It also ensures a smooth plot of accuracy over time, as each individual
example will become increasingly less significant to the overall average.
An example of the EvaluateInterleavedTestThenTrain task
creating a comma separated values file, training the
HoeffdingTree classifier on the WaveformGenerator data, training on a total of 100 million
examples, and testing every one million examples, is the following:
java -cp moa.jar-javaagent:sizeofag.jar moa.DoTask \ "EvaluateInterleavedTestThenTrain -l trees.HoeffdingTree \ -s generators.WaveformGenerator \ -i 10000000 -f 1000000" > htresult.csv