Maximizing autism gene discovery by combining machine learning and single-cell expression data analyses

  • Awarded: 2018
  • Award Type: Targeted: Genomic Analysis for Autism Risk Variants in SPARK
  • Award #: 606450

Thorough understanding of the genetic causes of autism spectrum disorder (ASD) is critical to improve clinical care and advance biomedical sciences. A common view of ASD genetics is that common variants contribute most of the phenotypic variance, while rare or de novo variants contribute to disease risk with large effect size.

Recent studies estimated that there are about 1,000 ASD risk genes with large effect. However, only about 100 of these risk genes are known. Closing this gap is critical to advancing ASD research. SPARK provides an unprecedented opportunity to address this problem. Most risk genes are expected to have a few rare or de novo variants observed in the cohort. However, there will be many genes with such variants by chance. In this study, Yufeng Shen proposes to develop and apply new methods and data to identify new candidate genes.

Shen will utilize new genomics data and computational methods to maximize gene discovery. Specifically, his laboratory will combine SPARK whole-exome sequencing data with data from other ASD cohorts to improve statistical power. They will use machine learning and functional genomics data to improve the prioritization of missense and likely gene-disrupting (LGD) variants.

These studies will use a new machine-learning method developed in the lab that improves prediction of the deleteriousness of missense variants over existing methods1. Additionally, the lab recently published ‘D-score,’ a method that computes ASD risk genes based on cell-type specific gene expression2. Compared to the pLI (probability of being loss-of-function intolerant) score developed by the Exome Aggregation Consortium (ExAC), D-score achieves similar and, importantly, complementary performance in prioritizing LGD variants. Shen’s team will further improve the predictive value of D-score by using single-cell RNA-seq data. By combining single-cell expression data from brain regions in a broad range of developmental stages together with the large genetic data set of SPARK, Shen’s group will be able to further identify the specific cell types and time points critical for ASD pathogenesis.


1.Qi H. et al. bioRxiv (2018) Preprint
2.Zhang C. and Shen. Y. Hum. Mutat. 38, 204-215 (2017) PubMed
Subscribe to our newsletter and receive SFARI funding announcements and news