The classes available to generate streams are the following:
ArffFileStream: A stream read from an ARFF file
Parameters:
-f : ARFF file to load
-c : Class index of data. 0 for none or -1 for last attribute in file
ConceptDriftStream: Generator that adds concept drift to examples in a stream
Example:
ConceptDriftStream -s (generators.AgrawalGenerator -f 7) -d (generators.AgrawalGenerator -f 2) -w 1000000 -p 900000
Parameters:
-s : Stream
-d : Concept drift Stream
-p : Central position of concept drift change
-w : Width of concept drift change
ConceptDriftRealStream : Generator that adds concept drift to examples in a stream with
different classes and attributes. Example: real datasets
Example:
ConceptDriftRealStream -s (ArffFileStream -f covtype.arff) \ -d (ConceptDriftRealStream -s (ArffFileStream -f PokerOrig.arff) \ -d (ArffFileStream -f elec.arff) -w 5000 -p 1000000 ) -w 5000 -p 581012
Parameters:
-s : Stream
-d : Concept drift Stream
-p : Central position of concept drift change
-w : Width of concept drift change
FilteredStream: A stream that is filtered.
Parameters:
-s : Stream to filter
-f : Filters to apply : AddNoiseFilter
AddNoiseFilter: Adds random noise to examples in a stream. Only to use with FilteredStream
Parameters:
-r : Seed for random noise
-a : The fraction of attribute values to disturb
-c : The fraction of class labels to disturb
generators.AgrawalGenerator: Generates one of ten different pre-defined loan functions
Generator described in paper:
Rakesh Agrawal, Tomasz Imielinksi, and Arun Swami,
“Database Mining: A Performance Perspective”,
IEEE Transactions on Knowledge and Data Engineering,
5(6), December 1993.
Public C source code available at:
https://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/datasets/syndata.html
Notes:
The built in functions are based on the paper (page 924),
which turn out to be functions pred20 thru pred29 in the public C implementation
Perturbation function works like C implementation rather than description in paper
Parameters:
-f : Classification function used, as defined in the original paper.
-i : Seed for random generation of instances.
-p : The amount of peturbation (noise) introduced to numeric values
-b : Balance the number of instances of each class.
generators.HyperplaneGenerator: Generates a problem of predicting class of a rotating hyperplane.
Parameters:
-i : Seed for random generation of instances.
-c : The number of classes to generate
-a : The number of attributes to generate.
-k : The number of attributes with drift.”
-t : Magnitude of the change for every example
-n : Percentage of noise to add to the data.
-s : Percentage of probability that the direction of change is reversed
generators.LEDGenerator: Generates a problem of predicting the digit displayed on a 7-segment LED display.
Parameters:
-i : Seed for random generation of instances.
-n : Percentage of noise to add to the data
-s : Reduce the data to only contain 7 relevant binary attributes
generators.LEDGeneratorDrift: Generates a problem of predicting the digit displayed on a 7-segment LED display with drift.
Parameters:
-i : Seed for random generation of instances.
-n : Percentage of noise to add to the data
-s : Reduce the data to only contain 7 relevant binary attributes
-d : Number of attributes with drift
generators.RandomRBFGenerator: Generates a random radial basis function stream.
Parameters:
-r : Seed for random generation of model
-i : Seed for random generation of instances
-c : The number of classes to generate
-a : The number of attributes to generate
-n : The number of centroids in the model
generators.RandomRBFGeneratorDrift: Generates a random radial basis function stream with drift.
Parameters:
-r : Seed for random generation of model
-i : Seed for random generation of instances
-c : The number of classes to generate
-a : The number of attributes to generate
-n : The number of centroids in the model
-s : Speed of change of centroids in the model.
-k : The number of centroids with drift
generators.RandomTreeGenerator: Generates a stream based on a randomly generated tree.
Parameters:
-r: Seed for random generation of tree
-i: Seed for random generation of instances
-c: The number of classes to generate
-o: The number of nominal attributes to generate
-u: The number of numeric attributes to generate
-v: The number of values to generate per nominal attribute
-d: The maximum depth of the tree concept
-l: The first level of the tree above maxTreeDepth that can have leaves
-f: The fraction of leaves per level from firstLeafLevel onwards
generators.SEAGenerator: Generates SEA concepts functions
Generator described in paper:
W. Nick Street and YongSeog Kim
“A streaming ensemble algorithm (SEA) for large-scale classification”,
KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
377-382 2001.
Parameters:
-f: Classification function used, as defined in the original paper
-i: Seed for random generation of instances
-b: Balance the number of instances of each class
-n: Percentage of noise to add to the data
generators.STAGGERGenerator: Generates STAGGER Concept functions.
Generator described in paper:
Jeffrey C. Schlimmer and Richard H. Granger Jr.
“Incremental Learning from Noisy Data”,
Machine Learning 1: 317-354 1986.
Parameters:
-i: Seed for random generation of instances
-f: Classification function used, as defined in the original paper
-b: Balance the number of instances of each class
generators.WaveformGenerator: Generates a problem of predicting one of three waveform types.
Parameters:
-i: Seed for random generation of instances
-n: Adds noise, for a total of 40 attributes
generators.WaveformGeneratorDrift: Generates a problem of predicting one of three waveform types with drift.
Parameters:
-i: Seed for random generation of instances
-n: Adds noise, for a total of 40 attributes
-d: Number of attributes with drift