||Feb 24, 2017
Prof. Einoshin Suzuki, Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan
Recovering a Partial Decision List from Noisy Data and
an Approximate Theory Based on Minimum Encoding
A partial decision list is defined as a partial classifier
which consists of ordered classification rules.
The problem of recovering a partial decision list from noisy data
and an approximate theory represents a realistic setting of knowledge
discovery from data, especially discovery of mutually related rules.
To measure the goodness of a method to a specific problem with
the ground-truth, we defined discovery accuracy as the average ratio
of successful recoveries for data sets.
The extended MDL principle proposed by Tangkitvanich and Shimura
has proven to be useful for a similar problem of classification
from noisy data and an approximate theory but has several flaws
to be extended to our problem.
In this talk, I will explain our solution based on minimum
encoding which corresponds to the maximum a posteriori hypothesis
when the initial theory and the data are statistically independent
given the output hypothesis.
Experiments with synthetic data show that our CLARDEM by far
outperforms its simplification which neglects the initial theory
and another method based on information compression in terms of
the discovery accuracy, especially in the presence of class noise
of 10 to 20%.
Experiments using the UCI ML Repository data and C4.5Rules show
that CLARDEM almost always outperforms or ties with the other
two methods in recall and precision and is often much faster
in the running time.