ゲノム情報科学研究教育機構  アブストラクト
Date Jan 7, 2015
Speaker Makoto Yamada
Title Minimum Redundancy Maximum Relevance Feature Selection for Large and High-dimensional Data
Abstract Feature selection is an important machine learning problem, and it is widely used for various types of applications such as gene selection from microarray data, document categorization, and prosthesis control, to name a few. The feature selection problem is a traditional and popular machine learning problem, and thus there exist many methods including the least absolute shrinkage and selection operator (Lasso) and the spectral feature selection (SPEC). Recently, a wrapper based large-scale feature selection method called the feature generation machine (FGM) was proposed (Tan et al., 2014). However, to the best of our knowledge, there is a few filter based methods for large and high-dimensional setting, in particular for nonlinear and dense setting. Moreover, existing filter type methods employ a maximum relevance based approach which selects m features with the largest relevance to the output. MR-based methods are simple yet efficient and can be easily applicable to high-dimensional and large sample problems. However, since MR-based approaches only use input-output relevance and not use input-input relevance, they tend to select redundant features. In this talk, we first propose a nonlinear extension of the non-negative least-angle regression (N3LARS). An advantage of N3LARS is that it can easily incorporate with map-reduce framework such as Hadoop and Spark. Thus, with the help of distributed computing, a set of features can be efficiently selected from a large and high-dimensional data. Finally, we show that the N3LARS can solve a large and high- dimensional feature selection problem in a few hours.
「セミナー」に戻る      
 ホーム