Date |
Oct 11, 2016 |
Speaker |
Kin Fai Au, University of Iowa
|
Title |
Transcriptome analysis at the gene isoform level using hybrid sequencing
|
Abstract |
New generation sequencing techniques can provide very
informative insights into the transcriptome. However, the currently
available transcriptome analysis tools are for Second Generation
Sequencing (SGS) short reads and the short read length of which can
introduce bias and even errors in downstream analysis. While the
recent application of Third Generation Sequencing (TGS) long reads,
such as PacBio and Oxford Nanopore Technologies data, to human
transcriptome analysis has greatly advanced the field, key
bioinformatic analysis platforms are missing. Furthermore, hybrid
sequencing (Hybrid-Seq), which integrates SGS short read data into the
analysis of TGS data, can improve the overall performance and
resolution of the output data. Indeed, a handful of existing
publications demonstrate the potential power of Hybrid-Seq for genome
data analysis. Here I present a series of bioinformatics methods to
analyze transcriptome at the gene isoform levels. These methods
include 1) LSC to correct the sequencing error of PacBio data; 2) IDP
to identify and quantify gene isoforms; 3) IDP-fusion to annotate
fusion genes and identify fusion gene isoforms; 4) IDP-ASE to phase
genotype and quantify allele-specific expression at the gene isoform
level; 5) IDP-denovo to de novo assemble and annotate transcripts for
non-model organism without depending on reference genome. The
proof-of-concept applications to breast cancer cells and human
embryonic stem cells reveal the isoform-level complexity of fusion
gene expression and allele-specific expression, and also discover
novel genes involved in pluripotency regulation, novel
tumorigenesis-relevant gene fusions and ASE bias of oncogenes and
pluripotency markers.
|
|