ゲノム情報科学研究教育機構 / アブストラクト (2003年10月31日)

In biological data, it is common that one object is described by two or more representations. For example, a bacterium is represented by several marker sequences. In most cases, informative representations are expensive to obtain, and cheaper representations are less useful for classification. When a kernel matrix is derived from an expensive representation, we have to leave the entries for unavailable samples as missing. So as to complete the missing entries, we will exploit an auxiliary kernel matrix derived from a cheaper representation. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm based on the information geometry of positive definite matrices. I will report promising results on several bioinformatic tasks, e.g., bacteria clustering with two marker sequences: 16S and gyrB.