Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels

Folkman, L., Yang, Y., Li, Z., Stantic, B., Sattar, A., Mort, Matthew, Cooper, David Neil, Liu, Y. and Zhou, Y. 2015. DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 31 (10) , pp. 1599-1606. 10.1093/bioinformatics/btu862

Full text not available from this repository.

Abstract

Motivation: Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy individuals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem. Results: We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques. Availability and implementation: The DDIG-in web-server for predicting NS variants, FS indels, and non-frameshifting (NFS) indels is available at http://sparks-lab.org/ddig.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Medicine
Subjects: Q Science > QH Natural history > QH426 Genetics
R Medicine > R Medicine (General)
Publisher: Oxford University Press
ISSN: 1367-4803
Date of Acceptance: 23 December 2014
Last Modified: 21 Jun 2019 20:56
URI: http://orca.cf.ac.uk/id/eprint/84065

Citation Data

Cited 11 times in Google Scholar. View in Google Scholar

Cited 25 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item