ゲノム情報科学研究教育機構  アブストラクト
Date Jul 1, 2016
Speaker Prof. Yutaka Akiyama, School of Computing, Tokyo Institute of Technology
Title Massively parallel bioinformatics computing on K-computer, TSUBAME, and Azure: Metagenome analysis and exhaustive protein-protein interaction prediction
Abstract We have developed two parallelized bioinformatics software tools: GHOST-MP for metagenome sequence analysis, and MEGADOCK for protein-protein 3-D docking.

For functional annotation of metagenome sequences, we developed a special algorithm and software called GHOSTX, and it is more than 160 times faster than BLASTX and has sufficient search sensitivity. GHOSTX is now employed in KEGG metagenome analysis service named GhostKOALA. Then we have also developed a massively parallel version of it, GHOST-MP, with OpenMP/MPI heterogeneous parallelization. GHOST-MP showed excellent scalability up to 200,000 CPU cores on K-computer (85% weak scaling efficiency). GHOST-MP enables sensitive homology search with billions of DNA reads against amino-acid database like as nr-aa or KEGG GENES. We have been using GHOST-MP on human oral microbiome analysis, especially on periodontitis clinical study.

On the other hand, MEGADOCK enables millions of protein-protein docking with using tertiary structure information. The calculation is based on rigid body model, and evaluation function (a kind of convolution operation) is implemented via very efficient FFT technique. MEGADOCK showed excellent scalability up to 700,000 CPU cores on K-computer (91% strong scaling efficiency). Using MEGADOCK, we have analyzed 2 million protein pairs in EGFR pathway, and found some promising PPI candidates with SPR experiments.

GHOST-MP and MEGADOCK are also available on Microsoft Azure cloud services. For example, MEGADOCK showed good scalability up to 1000 cores even on the public cloud.

[1] Suzuki S, Kakuta M, Ishida T, Akiyama Y. GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array, PLOS ONE, 9(8): e103833, 2014.

[2] Ohue M, Shimoda T, Suzuki S, Matsuzaki Y, Ishida T, Akiyama Y. MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers, Bioinformatics, 30(22): 3281-3283, 2014.

「セミナー」に戻る      
 ホーム