All posts filed under “Uncategorized

MEKA Software: A Multi-label Extension to the WEKA Framework

This software provides an open source implementation of the `pruned sets’ and `classifier chains’ methods for multi-label classification. These methods were developed during the PhD Thesis of Jesse Read at the Machine Learning Group at University of Waikato. See these publications:

Jesse Read. Scalable Multi-label Classification. PhD Thesis, University of Waikato, Hamilton, New Zealand. (2010)

Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank. Classifier Chains for Multi-label Classification. In Proc. of 20th European Conference on Machine Learning (ECML 2009). Bled, Slovenia, September 2009.

Jesse Read, Bernhard Pfahringer, Geoff Holmes. Multi-label Classification using Ensembles of Pruned Sets. Proc. of IEEE International Conference on Data Mining (ICDM 2008). Pisa, Italy, December 2008.

Website

Book “Knowledge Discovery from Data Streams” from João Gama

This book covers the fundamentals of data stream mining and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets. This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

https://www.crcpress.com/product/isbn/9781439826119

PAKDD 2011 Tutorial: Handling Concept Drift: Importance, Challenges and Solutions

Tutorial at PAKDD discussing concept drift, and MOA as an open source software to deal with concept drift.

Abstract: In the real world data often arrives in streams and is evolving over time. Concept drift in supervised learning means that the underlying distribution of the data is changing. As a result the predictions might become less accurate as the time passes, or opportunities to improve the accuracy might be missed. Therefore, the learning models need to adapt to changes quickly and accurately. The proposed tutorial aims to provide a unifying view on the basic and applied concept drift research in data mining and related areas. In the first part we will introduce the problem of concept drift, discuss why changes appear in supervised learning and motivation to handle them. We will overview what types of application tasks are available. In the second part we will present available approaches and techniques to handle concept drift, discuss evaluation issues and open source software. In the third part we will reflect on the past, present and future of concept drift research and outline future research directions. We will focus on the link between research scenarios and application needs.

Presenters:

  • Albert Bifet, University of Waikato, New Zealand
  • João Gama, University of Porto, Portugal
  • Mykola Pechenizkiy, Eindhoven University of Technology, Netherlands
  • Indrė Žliobaitė, Eindhoven University of Technology, the Netherlands

Tutorial website