Date |
Jul 1, 2016 |
Speaker |
Prof. Yutaka Akiyama, School of Computing, Tokyo Institute of Technology
|
Title |
Massively parallel bioinformatics computing on K-computer, TSUBAME, and Azure: Metagenome analysis and exhaustive protein-protein interaction prediction
|
Abstract |
We have developed two parallelized bioinformatics software tools:
GHOST-MP for metagenome sequence analysis, and MEGADOCK for
protein-protein 3-D docking.
For functional annotation of metagenome sequences, we developed a
special algorithm and software called GHOSTX, and it is more than 160
times faster than BLASTX and has sufficient search sensitivity. GHOSTX
is now employed in KEGG metagenome analysis service named GhostKOALA.
Then we have also developed a massively parallel version of it,
GHOST-MP, with OpenMP/MPI heterogeneous parallelization. GHOST-MP
showed excellent scalability up to 200,000 CPU cores on K-computer
(85% weak scaling efficiency). GHOST-MP enables sensitive homology
search with billions of DNA reads against amino-acid database like as
nr-aa or KEGG GENES. We have been using GHOST-MP on human oral
microbiome analysis, especially on periodontitis clinical study.
On the other hand, MEGADOCK enables millions of protein-protein
docking with using tertiary structure information. The calculation is
based on rigid body model, and evaluation function (a kind of
convolution operation) is implemented via very efficient FFT
technique. MEGADOCK showed excellent scalability up to 700,000 CPU
cores on K-computer (91% strong scaling efficiency). Using MEGADOCK,
we have analyzed 2 million protein pairs in EGFR pathway, and found
some promising PPI candidates with SPR experiments.
GHOST-MP and MEGADOCK are also available on Microsoft Azure cloud
services. For example, MEGADOCK showed good scalability up to 1000
cores even on the public cloud.
[1] Suzuki S, Kakuta M, Ishida T, Akiyama Y. GHOSTX: an improved
sequence homology search algorithm using a query suffix array and a
database suffix array, PLOS ONE, 9(8): e103833, 2014.
[2] Ohue M, Shimoda T, Suzuki S, Matsuzaki Y, Ishida T, Akiyama Y.
MEGADOCK 4.0: an ultra-high-performance protein-protein docking
software for heterogeneous supercomputers, Bioinformatics, 30(22):
3281-3283, 2014.
|
|