Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Utilising machine-learning algorithms to uncover complex genetic interactions in schizophrenia [Conference Abstract]

Vivian-Griffiths, Timothy, Escott-Price, Valentina, Walters, James Tynan Rhys, Moran, J., McCarroll, S., O'Donovan, Michael Conlon, Owen, Michael John and Pocklington, Andrew 2015. Utilising machine-learning algorithms to uncover complex genetic interactions in schizophrenia [Conference Abstract]. Human Heredity 79 (1) , p. 48. 10.1159/000381109

Full text not available from this repository.


Studies have shown that there are potentially thousands of common genetic variations implicated in schizophrenia, all contributing a small effect to the disorder (Sullivan et al., 2003; Ripke et al., 2013). These variants have been used in predictive models by calculating a polygenic-score (PS) for each individual. PS is a weighted average of the minor allele counts at the genetic loci, weighted by the log-odds-ratio of the respective loci and alleles, calculated from a Genome Wide Association Study (GWAS). This score can then be used in a logistic regression to predict the case/control status in an independent sample. This is a linear combination of the genetic variants which does not take into account complex interactions between them. To include these explicitly into a regression model is not feasible due to such an enormous number of possible combinations of n-way interactions, and the concomitant problem of correcting for the results for multiple comparisons. Here we investigate the ability of Support Vector Machines (SVMs) algorithms to discriminate between schizophrenia cases and controls using GWAS data. SVMs can account for interactions in the data via the use of Kernel functions, which increase the number of dimensions of the predictors. We investigated the Polynomial Kernel function, which can calculate n-way interactions based on the degree (n) of the polynomial and Radial-Basis-Function Kernels, which are capable of increasing the number of dimensions infinitely and thus include all possible interactions. We used the datasets drawn from the CLOZUK study (Hamshere et al., 2014), which was also included in the Psychiatric Genetics Consortium GWAS (Ripke et al., 2013). The first set consisted of 125 weighted allele counts defined by the index genomewide significant SNPs, The second dataset used all relatively (r2 < 0.2) independent variants which were also significant at the p < 0.05 level (n = 31,166). We tested whether the predictive model based upon these individual scores predicts the case/control status better than the PS and compare its performance with the logistic regression. The K-fold cross validation procedure was employed to validate the predictive model built with SVMs. The initial findings have shown that the performance of the SVM algorithms did not improve the prediction as compared to logistic regression analysis. Future work will include the use of decision tree and random forest algorithms and performance assessment of all these methods.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Advanced Research Computing @ Cardiff (ARCCA)
MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG)
Neuroscience and Mental Health Research Institute (NMHRI)
Subjects: R Medicine > R Medicine (General)
R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Publisher: Karger
ISSN: 0001-5652
Last Modified: 24 Apr 2018 22:26

Actions (repository staff only)

Edit Item Edit Item