The current and expanding wealth of fully sequenced genomes is opening both new large-scale evolutionary analysis opportunities and major computational challenges. A multispecies genome-scale analysis using thousands of precise molecular markers has been developed and applied to the Drosophila phylogeny. This is a simple and fast computational approach to infer rearrangement and/or breakpoint phylogeny and probable ancestral gene order. It exploited a unique dataset of 12 closely related eukaryotic species within the order Diptera of class Insecta with the mosquito, Anopheles gambiae, as an out group. The analysis employed discrete elements consisting of all neighboring pairs of homologous genetic elements across a range of species. These homologous genetic elements can include orthologous protein coding genes with chromosome arm indexing, as used here, as well as transposable elements and RNA genes and micro-RNAs, or any genetic sequence elements that can be identified with some level of certainty as homologous. The approach accommodated varying levels of incompleteness of draft genome assemblies and different degrees of homoplasy and has the ability to suggest potential assembly errors. In particular, when applied to the now available very large genomic data sets, the method avoids sensitivity to sample size and a level of paralogue-homologue misidentification, which has limited the reliability of most past molecular phylogenetic reconstruction approaches.
This study is an outstanding example of the level of graduate research carried out collaboratively within Boston University's Bioinformatics program. This program encourages cross-disciplinary, and inter-institutional collaborations, as well as including students from a very wide range of backgrounds. Students from computer science, molecular biology, physics, and engineering work collaboratively along with various senior researchers, both theoretical and experimental.