Systematic analysis of the complete mutational spectrum of structural variation associated with autism and a continuum of developmental disorders from short-read and long-read genome sequencing

Awarded: 2025
Award Type: Data Analysis
Award #: SFI-AN-AR-Data Analysis-00019405

Michael Talkowski, Ph.D.
Massachusetts General Hospital

Autism is a phenotypically complex cluster of symptoms with genetic underpinnings that are governed by many common variants of subtle effect and a small number of rare, high-impact variants of major effect. Due to technological and methodological limitations, most autism studies have focused on variants that impact single DNA bases, or large copy number variants (CNVs) that encompass many genes. Most of these studies have utilized short-read exome or genome sequencing (srGS), which brings significant blind spots in the ascertainment of structural variants (SVs) and highly repetitive sequences that are recalcitrant to srGS alignments.

Michael Talkowski and colleagues have developed and distributed to the field scalable cloud-based tools to capture, annotate and characterize population SVs maps in gnomAD and autism trait associations in the Simons Simplex Collection, SPARK and ASC from srGS data, as well as recent analyses from long-read sequencing (lrGS). They have also aggregated large-scale datasets across human developmental disorders to define the genetic causes of autism that overlap with other neurodevelopmental disorders (NDDs), and those that are distinct from other disorders (the ‘CrossDEV’ study). In this project, Talkowski’s team will perform a comprehensive rare variant and SV association study from our already aggregated cohorts of 72,335 autism cases, 105,751 familial controls, and 273,462 population controls. They will compare these findings to other NDDs and neuropsychiatric conditions from joint analyses of CrossDev (107,396 additional cases) to identify genes and biological pathways that distinctly contribute to autism, and those with pleiotropic impacts across phenotypes. Finally, the group will close the gap on SV discovery by applying methods to genotype SVs discovered from over 15,000 lrGS samples into srGS datasets to identify risk variants that were inaccessible to prior studies. Overall, this proposal will explore genetic data across the developmental continuum to discover those genes that uniquely influence clinical dimensions of autism.

SFARI