From SPARK to insight: visual data portals and AI multimodal foundation model integration of autism sequencing datasets

  • Awarded: 2025
  • Award Type: Director
  • Award #: AN-Director-00025394

Autism research in genomics is hindered by fragmented data resources that are hard to combine in statistical models: while SFARI’s SPARK and Simons Simplex Collection cohorts provide extensive whole-exome and whole-genome sequencing (WES/WGS) data, these are not integrated into widely used genomic portals. Additionally, SFARI-funded single-cell RNA sequencing (scRNA-seq, spatial scRNA, long-read transcriptomics) datasets, which offer insights into cell-type-specific gene expression, are dispersed across publications and archives, lacking centralized access. This fragmentation impedes both experimental and computational analyses and it is unclear how much combined analyses can increase the accuracy when predicting autism gene and variants impact.

To address these challenges, this project aims to: (1) integrate SPARK/SSC WES/WGS variants into the UCSC Genome Browser1, harmonized with sample metadata and linked to SFARI Base, facilitating accessibility for researchers, (2) add long-read transcriptomic datasets to the UCSC Genome Browser and merge into a single brain long-read summary and share with the Gencode team and (3) centralize SFARI-funded scRNA-seq datasets in the UCSC Cell Browser, incorporating quality control, metadata annotations, and integration into a unified reference atlas using methods like Harmony to enable cross-study analyses.

Leveraging established infrastructure and expertise, this project will deliver accessible, integrated, and enhanced autism genomic resources. By bridging genetic variation, cell-type-specific expression, and autism phenotypes, it aims to help identify key genes, variants, and cell types underlying autism risk.

Reference

  1. Speir M.L. et al. Bioinformatics 37, 457-458 (2021) PubMed
Subscribe to our newsletter and receive SFARI funding announcements and news