ゲノム情報科学研究教育機構  アブストラクト
Date March 8, 2011
Speaker Dr. Shanfeng Zhu, Associate Professor, Fudan University, China
Title Efficient Semi-Supervised MEDLINE Document Clustering
Abstract Combining multiple information for biomedical document clustering is a subject of intense research. For example, recently the performance of document clustering was enhanced by using both content and MeSH (Medical Subject Heading) semantic information, which were however linearly combined. The simple linear combination could be ineffective, because its representation space is too limited to combine multiple information sources considering the difference in their reliability. To relax this problem, we propose a new semi-supervised clustering method, SSNCut, which incorporates positive (must-link) and negative (cannot-link) constraints in terms of the cost function of spectral learning with normalized cut. We apply SSNCut to MEDLINE document clustering, reasonably assuming that document pairs with high semantic similarities have positive constraints to be in the same cluster and those with low similarities have negative constraints to be in different clusters. Experimental results with various 100 datasets of MEDLINE records show that SSNCut outperformed a linear combination method, being statistically significant.