Simon A. Hardwick, Ira W. Deveson and Tim R. Mercer.
Nature Review Genetics (2017) June 19th
Next-generation sequencing (NGS) provides a broad investigation of the genome, and it is being readily applied for the diagnosis of disease-associated genetic features. However, the interpretation of NGS data remains challenging owing to the size and complexity of the genome and the technical errors that are introduced during sample preparation, sequencing and analysis. These errors can be understood and mitigated through the use of reference standards — well-characterized genetic materials or synthetic spike-in controls that help to calibrate NGS measurements and to evaluate diagnostic performance. The informed use of reference standards, and associated statistical principles, ensures rigorous analysis of NGS data and is essential for its future clinical use.
Ted Wong, Simon A. Hardwick, Ira W. Deveson and Tim R. Mercer.
Bioinformatics (2017) Jan 27th
Spike-in controls are synthetic nucleic-acid sequences that are added to a user’s sample and constitute internal standards for subsequent steps in the next generation sequencing workflow.The Anaquin software toolkit can be used to analyze the performance of spike-in controls at multiple steps during RNA sequencing or genome sequencing analysis, providing useful diagnostic statistics, data visualization and sample normalization. The software is implemented in C ++/R and is freely available under BSD license. The source code is available from github.com/student-t/Anaquin, binaries and user manual from www.enantiome.com/software and R package from bioconductor.org/packages/Anaquin
Ira W. Deveson*, Wendy Y. Chen*, Ted Wong, Simon A. Hardwick, Stacey B. Andersen, Lars K. Nielsen, John S. Mattick and Tim R. Mercer.
Nature Methods (2016) Aug 8th
The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed sequins, that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, allowing them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome and demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.
Simon A. Hardwick*, Wendy Y. Chen*, Ted Wong, Ira W. Deveson, James Blackburn, Stacey B. Andersen, Lars K. Nielsen, John S. Mattick and Tim R. Mercer.
Nature Methods (2016) Aug 8th
RNA sequencing (RNAseq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNAseq analysis. We have developed a set of spike-in RNA standards, termed ‘sequins’ (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNAseq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Garvan Institute of Medical Research © 2016. All rights reserved.