||Mar 18, 2015
||Handling missing features with boosting algorithms for protein-protein interaction prediction
Combining information from multiple heterogeneous data sources can aid prediction of protein-protein interaction. This information is typically arranged into a feature vector for classification; however, missing values in the data can impact on the prediction accuracy. Boosting has emerged as a powerful tool for feature selection and classification.
Bayesian methods have traditionally been used to cope with missing data, with boosting being applied to the output of Bayesian classifiers. In this talk I will describe a variant of Adaboost that deals with the missing values at the level of the boosting algorithm itself, without the need for any preliminary density estimation step. Experiments on a publicly available PPI dataset suggest this overall simpler and mathematically coherent approach may be more accurate.
This is joint work with Dr Mansoor Saqi, now at the European Institute for Systems Bioinformatics (Lyon, France) and Dr Michael Defoin-Platel.