Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 genomes data

Quinn, Emma M., Cormican, Paul, Kenny, Elaine M., Hill, Matthew, Anney, Richard ORCID: https://orcid.org/0000-0002-6083-407X, Gill, Michael, Corvin, Aiden P. and Morris, Derek W. 2013. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 genomes data. PLoS ONE 8 (3) , e58815. 10.1371/journal.pone.0058815

[thumbnail of journal.pone.0058815.PDF]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (730kB) | Preview

Abstract

Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling methods following alignment to both genome and transcriptome by applying them to RNA-seq data from a HapMap lymphoblastoid cell line sample and comparing results with sequence variation data from 1000 Genomes. We determined that the best method to achieve high specificity and sensitivity, and greatest number of SNP calls, is to remove duplicate sequence reads after alignment to the genome and to call SNPs using SAMtools. The accuracy of SNP calls is dependent on sequence coverage available. In terms of specificity, 89% of RNA-seq SNPs calls were true variants where coverage is >10X. In terms of sensitivity, at >10X coverage 92% of all expected SNPs in expressed exons could be detected. Overall, the results indicate that RNA-seq SNP data are a very useful by-product of sequence-based transcriptome analysis. If RNA-seq is applied to disease tissue samples and assuming that genes carrying mutations relevant to disease biology are being expressed, a very high proportion of these mutations can be detected.

Item Type: Article
Date Type: Publication
Status: Published
Schools: MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG)
Medicine
Subjects: R Medicine > R Medicine (General)
Publisher: Public Library of Science
ISSN: 1932-6203
Date of First Compliant Deposit: 14 February 2019
Last Modified: 12 May 2023 07:54
URI: https://orca.cardiff.ac.uk/id/eprint/85222

Citation Data

Cited 96 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics