Streams

The classes available to generate streams are the following:

ArffFileStream: A stream read from an ARFF file

Parameters:
-f : ARFF file to load
-c : Class index of data. 0 for none or -1 for last attribute in file


ConceptDriftStream: Generator that adds concept drift to examples in a stream 

Example:

ConceptDriftStream -s (generators.AgrawalGenerator -f 7)
  -d (generators.AgrawalGenerator -f 2) -w 1000000 -p 900000

Parameters:
-s : Stream
-d : Concept drift Stream
-p : Central position of concept drift change
-w : Width of concept drift change



ConceptDriftRealStream
: Generator that adds concept drift to examples in a stream with 

 different classes and attributes. Example: real datasets

 Example:

ConceptDriftRealStream -s (ArffFileStream -f covtype.arff) \
   -d (ConceptDriftRealStream -s (ArffFileStream -f PokerOrig.arff) \
   -d (ArffFileStream -f elec.arff) -w 5000 -p 1000000 ) -w 5000 -p 581012

Parameters:
-s : Stream
-d : Concept drift Stream
-p : Central position of concept drift change

-w : Width of concept drift change



FilteredStream
: A stream that is filtered. 

Parameters:
-s : Stream to filter
-f : Filters to apply : AddNoiseFilter



AddNoiseFilter
: Adds random noise to examples in a stream. Only to use with FilteredStream 

Parameters:
-r : Seed for random noise
-a : The fraction of attribute values to disturb
-c : The fraction of class labels to disturb


    
generators.AgrawalGenerator: Generates one of ten different pre-defined loan functions 

Generator described in paper:

   Rakesh Agrawal, Tomasz Imielinksi, and Arun Swami,
    “Database Mining: A Performance Perspective”,
     IEEE Transactions on Knowledge and Data Engineering,
      5(6), December 1993.
 Public C source code available at:
   http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/datasets/syndata.html

 Notes:
 The built in functions are based on the paper (page 924),
  which turn out to be functions pred20 thru pred29 in the public C implementation
 Perturbation function works like C implementation rather than description in paper    

Parameters:
-f : Classification function used, as defined in the original paper.
-i : Seed for random generation of instances.

-p : The amount of peturbation (noise) introduced to numeric values
-b : Balance the number of instances of each class.



generators.HyperplaneGenerator
: Generates a problem of predicting class of a rotating hyperplane. 

Parameters:
-i : Seed for random generation of instances.
-c : The number of classes to generate
-a : The number of attributes to generate.

-k : The number of attributes with drift.”
-t : Magnitude of the change for every example
-n : Percentage of noise to add to the data.
-s : Percentage of probability that the direction of change is reversed



generators.LEDGenerator
: Generates a problem of predicting the digit displayed on a 7-segment LED display. 

Parameters:
-i : Seed for random generation of instances.

-n : Percentage of noise to add to the data
-s : Reduce the data to only contain 7 relevant binary attributes


generators.LEDGeneratorDrift: Generates a problem of predicting the digit displayed on a 7-segment LED display with drift. 

Parameters:
-i : Seed for random generation of instances.
-n : Percentage of noise to add to the data
-s : Reduce the data to only contain 7 relevant binary attributes

-d : Number of attributes with drift



generators.RandomRBFGenerator
: Generates a random radial basis function stream. 

Parameters:
-r : Seed for random generation of model
-i : Seed for random generation of instances
-c : The number of classes to generate

-a : The number of attributes to generate
-n : The number of centroids in the model


generators.RandomRBFGeneratorDrift: Generates a random radial basis function stream with drift. 

Parameters:
-r : Seed for random generation of model
-i : Seed for random generation of instances
-c : The number of classes to generate

-a : The number of attributes to generate
-n : The number of centroids in the model
-s : Speed of change of centroids in the model.
-k : The number of centroids with drift



generators.RandomTreeGenerator
: Generates a stream based on a randomly generated tree. 

Parameters:
-r: Seed for random generation of tree

-i: Seed for random generation of instances
-c: The number of classes to generate
-o: The number of nominal attributes to generate
-u: The number of numeric attributes to generate
-v: The number of values to generate per nominal attribute
-d: The maximum depth of the tree concept
-l: The first level of the tree above maxTreeDepth that can have leaves
-f: The fraction of leaves per level from firstLeafLevel onwards


generators.SEAGenerator: Generates SEA concepts functions

 Generator described in paper:
  W. Nick Street and YongSeog Kim
    “A streaming ensemble algorithm (SEA) for large-scale classification”,
     KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
     377-382 2001.

Parameters:
-f: Classification function used, as defined in the original paper
-i: Seed for random generation of instances
-b: Balance the number of instances of each class
-n: Percentage of noise to add to the data


generators.STAGGERGenerator: Generates STAGGER Concept functions.

 Generator described in paper:
   Jeffrey C. Schlimmer and Richard H. Granger Jr.
    “Incremental Learning from Noisy Data”,
     Machine Learning 1: 317-354 1986.

Parameters:
-i: Seed for random generation of instances

-f: Classification function used, as defined in the original paper
-b: Balance the number of instances of each class



generators.WaveformGenerator
: Generates a problem of predicting one of three waveform types. 

Parameters:
-i: Seed for random generation of instances
-n: Adds noise, for a total of 40 attributes


generators.WaveformGeneratorDrift: Generates a problem of predicting one of three waveform types with drift.

Parameters:
-i: Seed for random generation of instances
-n: Adds noise, for a total of 40 attributes
-d: Number of attributes with drift