Date |
Aug 29, 2016 |
Speaker |
Shanfeng Zhu, Fudan University
|
Title |
DeepMeSH and MeSHLabeler: Recent progress in large-scale MeSH indexing
|
Abstract |
Motivation: Medical Subject Headings (MeSHs) are used by National
Library of Medicine (NLM) to index almost all citations in MEDLINE,
which greatly facilitates the applications of biomedical information
retrieval and text mining. To reduce the time and financial cost of
manual annotation, NLM has developed a software package, Medical Text
Indexer (MTI), for assisting MeSH annotation, which uses k-nearest
neighbors (KNN), pattern matching and indexing rules. Large-scale MeSH
indexing has two challenging aspects: the MeSH side and the citation
side. For the MeSH side, there are more than 27,000 distinct MeSH
headings, and their distribution is biased. In addition, there are
large variations in the number of MeSH for each citation. For the
citation side, the existing methods mainly deal with text by
bag-of-words, which cannot capture semantic and context-dependent
information well.
Method: To address the challenge in the MeSH side, we propose a novel
framework, MeSHLabeler, to integrate multiple evidence for accurate
MeSH annotation by using 'learning to rank'. Evidence includes
numerous predictions from MeSH classifiers, KNN, pattern matching, MTI
and the correlation between different MeSH terms, etc. We further
propose DeepMeSH that incorporates deep semantic information for
large-scale MeSH indexing. The citation side challenge is solved by a
new deep semantic representation, D2V-TFIDF, which concatenates both
sparse and dense semantic representations.
Result: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than
0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3
challenge data with 6000 citations. In addition, DeepMeSH achieved the
first place in the BioASQ4 challenge, and MeSHLabeler achieved first
place in both BioASQ2 and BioASQ3 challenges.
|
|