Whole-exome sequencing of SPARK: New data release

SPARK (Simons Foundation Powering Autism Research for Knowledge) is pleased to announce that whole-exome sequencing (WES) and genotyping data are available for 1,369 and 1,398 individuals, respectively.

“This is the first step toward the genetic characterization of thousands of SPARK participants,” says Pam Feliciano, scientific director of SPARK. “We expect these genomic data to rapidly increase and are excited to provide such a useful resource to the scientific community.”

Phenotypic data for these individuals have been previously released and are also available for use by all approved researchers.

Data availability

SPARK: Whole-exome sequencing (WES) data

Whole exome sequencing was completed for 461 ‘trios’ (mother, father and at least one affected child), including 421 simplex and 26 multiplex families.

Exome capture and sequencing technology. Exome capturing was performed by using VCRome+PKv2, a commercially available, published DNA-capture hybridization reagent from Roche/Nimblegen. The reagent targets the coding and near intronic regions of the Vertebrate Genome Annotation (Vega) database, Consensus SDS (CCDS) project and RefSeq gene models, as well as more than 1,200 microRNA (miRNA) genes. Overall, approximately 34Mbp of genomic DNA, including all the coding exons of currently known disease genes — Online Mendelian Inheritance in Man (OMIM), GeneTest — are targeted.

PKv2, a spike-in probe set designed to enhance targets of clinically relevant, disease-related genes, was also used. Designed by the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine (BCM), this novel ‘spike in’ provides 2.5Mbp of additional probes that were used in conjunction with the VCRome exome design to improve read coverage across inadequately (<20X) covered exonic regions in 3,643 targeted clinical genes. Using this spike-in design, approximately 3,200 genes at complete coverage are found (every base of the transcript ≥ 20X). This represents a 34 percent increase in complete gene coverage of the targeted clinical genes.

Detailed coverage information for each exon for any specific gene can be found here.

SPARK: Single nucleotide polymorphism (SNP) data

In addition to WES data, SNP data are also available. These samples were analyzed by an SNP array (Infinium HumanExome cSNP array), which contains approximately 245,000 genotyping probes, located within coding regions of genes across the genome.

Which researchers can use the genomic data?

Whole-exome sequencing and genotyping data are available for use by all approved researchers, regardless of SFARI funding. The proposed research is not restricted to autism and/or other neurodevelopmental disorders.

Researchers must agree to abide by the publication embargo that applies until April 15, 2018, or after the SPARK Genomics Consortium publishes an analysis of these data, whichever comes first.

Accessing the data

Researchers can access the data by logging into SFARI Base and completing an application. The application will be reviewed, and once approved, the data can be accessed via the following three options:

• Cloud-based access (via Amazon Web Services)
• Fermilab access (data can be transferred to research institute servers via GridFTP)
• Simons Foundation server (data can be transferred to research institute servers via globus.org)

Data are released as BAM, VCF, genotyping Final Reports and IDAT files. Additional details about how to download or access the data (estimated to be roughly 20Tb) will be provided to researchers after their SFARI Base application has been approved.

Additional information

For more information, please contact [email protected]

Recent News