Nextgen sequencing technology has had a phenomenal impact in biology arguably in as significant a manner as PCR and cloning. Enamored by the rapid advances in this technology, we began to apply it to an interesting problem, one of deciphering transcriptional networks involved in differentiation of embryonic stem cells into cardiomyocytes (funded by a NHLBI grant). Very quickly, we discovered from analyzing the data that the use of random primer methodology in sequencing by synthesis limits the coverage and uniqueness of the RNA reads and further, we simply need large amounts of RNA from the system to obtain transcripts across a reasonable dynamic range. This is a huge challenge, when one is isolating painstakingly only hundreds of cells during intermediate stages of a stem cell differentiating into a specified lineage. I came up with a simple idea of using designed primers that would cover most of the transcriptome. The idea is to find out how many transcriptome-wide pseudo-unique hexamer primer sequences flanked by unique sequence regions in each gene are needed for coverage in sequencing by synthesis of a large part of the transcriptome. This involved two steps, the first where using bioinformatics, we had to identify a small but unique set of primers for the transcriptome. A postdoctoral associate in my laboratory developed a suffix tree-based method and obtained 44 primers for the mouse transcriptome that would give over 90%+ coverage of the transcriptome. An exceptional graduate student in my laboratory then worked hard for 2 years (!) to come up with the proper protocol for the experiments and we have proof of concept of this exciting technology (DP-seq) that has the potential to change the landscape of gene expression measurements. Our first paper describing this technology and an application to the transcriptome of germ layers in stem cell differentiation was published recently. We have also filed a disclosure to the UCSD Technology Transfer Office and a patent application is pending. One of the highlights of our technology was the ability to obtain transcripts across a large dynamic range with very low input of RNA (tens of picograms). We compared our method against the recent developments in low-input sequencing, namely Smart-seq and Cel-seq, and have demonstrated the power and limits of DP-seq in comparison with other methods. Most importantly, we delineated the stochastic noise arising in technical variations in all the amplification protocols starting with low amounts of RNA.
We use nextgen sequencing extensively in a large number of project in my laboratory. Our objective is to a) analyze the data towards reconstructing networks and predicting phenotypes and b) integration with other –omics data to understand systemic behavior of cells and tissues.