Integrating deep learning with splicing-QTL and splicing-TWAS to uncover hidden causal variants in autism spectrum disorders
- Awarded: 2025
- Award Type: Data Analysis
- Award #: SFI-AN-AR-Data Analysis-00019354
Autism spectrum disorder (ASD) remains largely unexplained at the molecular level, with whole-exome and whole-genome sequencing (WES/WGS) offering relatively low diagnostic yield. Many unresolved cases are suspected to involve non-coding variants that disrupt gene regulation, particularly alternative splicing, a context-dependent and tissue-specific process governing neuronal development. Despite growing evidence implicating splicing dysregulation in ASD, current methodologies lack the resolution and functional annotation to systematically identify causal, especially rare, non-coding variants.
This project aims to redefine the splicing-centered regulatory landscape of ASD by integrating novel statistical and deep learning frameworks. The investigators will first identify brain-specific splicing quantitative trait loci (sQTLs) and their associated splicing events (sGenes) using MAJIQTL1, an advanced sQTL discovery tool, applied to reference brain transcriptome datasets (GTEx, PsychENCODE). These sQTLs will be co-localized with ASD-associated regions from genome-wide association studies (GWAS) and transmission-based risk loci (TADA).
Next, the team will implement MAJIQ-TWAS, a splicing-aware transcriptome-wide association study (TWAS) framework, to assess the ASD relevance of sQTLs using genotyping data from large ASD family cohorts (SPARK, Simons Simplex Collection). By leveraging trio- and quad-based inheritance awareness provided by the Simons Foundation datasets and incorporating rare variant aggregation strategies, this approach is designed to improve power and specificity in detecting ASD-associated regulatory variation – building upon and extending prior frameworks such as SpliTWAS2.
Finally, candidate variants will be prioritized using TrASPr3, a large language model (LLM) trained to predict tissue-specific splicing outcomes directly from genomic sequence. Predicted splicing disruptions will be cross-referenced with experimentally validated RNA-binding protein (RBP) binding sites and known regulatory elements, providing mechanistic insights into variant function.
Together, this project seeks to uncover a novel class of non-coding ASD risk variants centered on splicing dysregulation and to establish a foundation for improved molecular diagnostics and future splicing-modulatory therapeutics.