Statistical analysis of autism phenotypic and genotypic data

  • Awarded: 2020
  • Award Type: Director
  • Award #: 653364

Since the fall of 2008, Abba Krieger and colleagues (most prominently Andreas Buja) at the Wharton School of the University of Pennsylvania have served as statistical consultants to the Simons Foundation. In this role, they analyze phenotypic and genetic data generated by SFARI.

They received previous funding from SFARI (2008 and 2013 awards) to analyze data from the Simons Simplex Collection (SSC). While they are continuing to mine the SSC data, a large part of their focus has now shifted to analyzing data from SPARK.

They are also continuing to provide statistical support to other SFARI researchers. Krieger and Buja have an ongoing collaboration with SFARI Investigators Ivan Iossifov and Michael Wigler to support the statistical analysis of autism genetic data. Furthermore, they will develop new methodologies and software as needed by SFARI and will present their findings to the Scientific Board of SFARI on a regular basis.

The specific projects that Krieger and colleagues plan to work on over the next several years include:

  1. De novo disruption of introns and autism: This project involves investigating various locations on the genes of individuals with autism and measuring the extent to which there is disruption. The problem is to cluster both the individuals and, more critically, the locations to understand where these disruptions are most common.
  2. Heritability of autism: The literature on heritability measures it by a statistical model which determines the extent to which a disease or disorder occurs as a function of pre-specified locations on the genes and error. The fraction of the variability that is due to the rare de novo mutations was reported to be 2.6 percent for autism 1. But there are a number of problems with this finding: the model that measures variance explained for a latent trait is perhaps not the way to define heredity due to these rare variants (an alternative model of which Krieger and his team are working on); variance accounted for a squared measure, and if the square root is used, which is equally defensible, this changes the reported 2.6 percent to about 16 percent).
  3. Genotypic-phenotypic analyses: Krieger and his team have performed extensive analyses on phenotypic and genotypic data from the SSC. Specifically, they have considered the various phenotypic assessment tools, such as the Autism Diagnostic Observation Schedule (ADOS) and the Autism Diagnostic Interview (ADI), and answered questions such as: (i) How many dimensions are there in the data? This helps to understand whether autism is one condition or if there are various forms of autism that can fall under the same umbrella (e.g., communication skills; insistence on sameness and anxiety); (ii) To what extent do the different instruments tell the same story about the individual with autism? and (iii) What is the relationship between the phenotypic and genotypic data? In particular, Krieger, Iossifov and Wigler’s teams found a significant correlation between damaging de novo mutations and impaired motor skills in the SSC cohort 3. In the process, they observed that it is common for researchers to match subjects on IQ. In a paper in preparation, they show that conditioning on outcome variables, such as IQ scores, distorts causality and should therefore be avoided. They did not find any relationship with other dimensions such as social communication. They plan to extend these genotypic-phenotypic studies to the much larger datasets that are now available from SPARK.


  1. Gaugler T. et al. Nature Genetics 46, 881-885 (2014) PubMed
  2. Bishop S.L. et al. J. Autism Dev. Disord. 43, 1287-1297 (2013) PubMed
  3. Buja A. et al. Proc. Natl. Acad. Sci. USA 115, E1859-E1866 (2018) PubMed
Subscribe to our newsletter and receive SFARI funding announcements and news