Date |
Jan 7, 2015 |
Speaker |
Makoto Yamada
|
Title |
Minimum Redundancy Maximum Relevance Feature Selection for
Large and High-dimensional Data
|
Abstract |
Feature selection is an important machine learning problem,
and it is widely used for various types of applications such as gene
selection from microarray data, document categorization, and
prosthesis control, to name a few. The feature selection problem is a
traditional and popular machine learning problem, and thus there exist
many methods including the least absolute shrinkage and selection
operator (Lasso) and the spectral feature selection (SPEC). Recently,
a wrapper based large-scale feature selection method called the
feature generation machine (FGM) was proposed (Tan et al., 2014).
However, to the best of our knowledge, there is a few filter based
methods for large and high-dimensional setting, in particular for
nonlinear and dense setting. Moreover, existing filter type methods
employ a maximum relevance based approach which selects m features
with the largest relevance to the output. MR-based methods are simple
yet efficient and can be easily applicable to high-dimensional and
large sample problems. However, since MR-based approaches only use
input-output relevance and not use input-input relevance, they tend to
select redundant features. In this talk, we first propose a nonlinear
extension of the non-negative least-angle regression (N3LARS). An
advantage of N3LARS is that it can easily incorporate with map-reduce
framework such as Hadoop and Spark. Thus, with the help of distributed
computing, a set of features can be efficiently selected from a large
and high-dimensional data. Finally, we show that the N3LARS can solve
a large and high- dimensional feature selection problem in a few
hours.
|
|