It is now widely appreciated that de novo mutations contribute to a large fraction of autism spectrum disorders (ASD) risk. Initial studies in dozens to hundreds of families identified only a handful of high-confidence candidate mutations and suffered from a lack of robust methods for assessing the risk conferred by individual mutations. More recently, analysis of more than 35,000 individuals employing statistical models of mutation constraint identified nearly 100 risk genes and suggests there are more to be found1. Still, large efforts to identify de novo mutations have focused almost exclusively on point mutations (single nucleotide polymorphisms [SNPs] or small indels), and the contribution of more complex variant types remains largely unexplored.
To address this issue, Melissa Gymrek and colleagues have been studying the role of genetic variation at repetitive regions of the genome in ASD. They specifically consider short tandem repeats (STRs), consisting of repeated motifs of one to six base pairs, and variable number tandem repeats (VNTRs), with motifs of more than six base pairs, collectively referred to as tandem repeats (TRs). As TRs encompass more than one million loci comprising over three percent of the genome and exhibit mutation rates that are orders of magnitude higher than SNPs, TRs likely contribute a larger number of de novo mutations per generation than all other types of variants combined.
Multiple lines of evidence support the hypothesis that mutations in TRs contribute substantially to risk for neurodevelopmental disorders, including ASD. STRs are involved in dozens of Mendelian diseases, most of which exhibit neurological or psychiatric phenotypes. For instance, expansion of a CCG repeat upstream of FMR1 results in fragile X syndrome, the most prevalent known genetic cause of ASD. Additionally, Gymrek and her lab recently showed that STRs play important roles in gene expression and contribute to complex traits such as schizophrenia2. Finally, analysis of population-wide STR variation suggests that STRs expressed in brain-related tissues are under the strongest mutational constraint3.
In a pilot study, funded by a SFARI Explorer Award, Gymrek and collaborators performed initial analyses to assess the role of de novo STRs in ASD, using whole-genome sequencing (WGS) data from the Simons Simplex Collection (SSC). More recently, they have developed novel tools for genotyping expanded STRs (GangSTR)4 and VNTRs (adVNTR)5 that greatly expand the repertoire of repetitive variation that can be profiled.
In the current project, they plan to utilize these new tools and leverage bioinformatics advances in TR analysis to perform a more comprehensive evaluation of the role of TR variants in ASD. WGS data from both the SSC and SPARK (once available) will be analyzed.