ゲノム情報科学研究教育機構  アブストラクト
Date Aug 29, 2016
Speaker Shanfeng Zhu, Fudan University
Title DeepMeSH and MeSHLabeler: Recent progress in large-scale MeSH indexing
Abstract Motivation: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Large-scale MeSH indexing has two challenging aspects: the MeSH side and the citation side. For the MeSH side, there are more than 27,000 distinct MeSH headings, and their distribution is biased. In addition, there are large variations in the number of MeSH for each citation. For the citation side, the existing methods mainly deal with text by bag-of-words, which cannot capture semantic and context-dependent information well.

Method: To address the challenge in the MeSH side, we propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. We further propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations.

Result: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations. In addition, DeepMeSH achieved the first place in the BioASQ4 challenge, and MeSHLabeler achieved first place in both BioASQ2 and BioASQ3 challenges.
