Datasets

AWS Prices In their AWS platform, Amazon allows users to bid on spare sever capacity known as spot instances. This allows people to buy server time for prices that are potentially much cheaper than the usual on-demand rates. The downside, however, is that the sever may be terminated without notice if too many other people are willing to bid higher for the same spare capacity. This dataset contains 27,410,309 measurements of spot instance prices gathered over a period of several months. Six attributes corresponding to sever specifications, geographical region, and time of purchase are associated with each measurement. OpenML DatasetKaggle Repository.

Forest Covertype Contains the forest cover type for 30 x 30 meter cells obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581, 012 instances and 54 attributes, and it has been used in several papers on data stream classification. UCI Machine Learning Repository Normalized Dataset

Poker-Hand Consists of 1, 000, 000 instances and 11 attributes. Each record of the Poker-Hand dataset is an example of a hand consisting of five playing cards drawn from a standard deck of 52. Each card is described using two attributes (suit and rank), for a total of 10 predictive attributes. There is one class attribute that describes the “Poker Hand”. UCI Machine Learning Repository Normalized Dataset

Electricity is another widely used dataset described by M. Harries and analysed by Gama. This data was collected from the Australian New South Wales Electricity Market. In this market, prices are not fixed and are affected by demand and supply of the market. They are set every five minutes. The ELEC dataset contains 45, 312 instances. The class label identifies the change of the price relative to a moving average of the last 24 hours. Original Dataset. Normalized Dataset

These are normalized versions of these datasets, so that the numerical values are between 0 and 1. With the Poker-Hand dataset, the cards are not ordered, i.e. a hand can be represented by any permutation, which makes it very hard for propositional learners, especially for linear ones. This dataset is a modified version, where cards are sorted by rank and suit, and have removed duplicates.

Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure. Dataset