pHMM4weka: Profile Hidden Markov Models (PHMMs) for binary protein classification for WEKA

This Java software implements Profile Hidden Markov Models (PHMMs) for binary protein classification for the WEKA workbench. Standard PHMMs and newly introduced binary PHMMs are used. In addition the software allows propositionalisation of PHMMs.

This software was developed by Stefan Mutter during his PhD at the Machine Learning Group at University of Waikato. His thesis investigated similarity amongst proteins. In this area of research there are two important and closely related classification tasks – the detection of similar proteins and the discrimination amongst them. Hidden Markov Models (HMMs) have been successfully applied in the detection task as they model sequence similarity very well. From a machine learning point of view these HMMs are essentially one-class classifiers trained solely on a small number of similar proteins neglecting the vast number of dissimilar ones. His basic assumption is that integrating this neglected information will be highly beneficial to the classification task. Thus, he transform the problem representation from a one-class to a binary one. Also, he suggested a new way to significantly improve on discriminative power and runtime by means of terminating the time-intense training of HMMs early, subsequently applying propositionalisation and classifying with a discriminative, binary learner. More information.