Like many complexly inherited conditions, autism shows high heritability, yet the inventory of identified genomic variants appears to explain only a modest portion of the risk. Could the highly repetitive and largely unexplored heterochromatic centromere-proximal regions (CPRs) of the human genome harbor variants that increase the risk of autism?
Human genes are intimately surrounded by repetitive sequences, and these sequences are targets of epigenomic repression via the methylation of histone H3 and binding of heterochromatin protein 1 alpha (HP1). Differences in the amount of the main HP1-binding target, heterochromatin of the CPR, will play a role in the dynamics of the liquid-liquid phase separation that organizes and regulates chromatin in the nucleus, including gene-proximal repetitive DNA. Diversity among individuals in the sizes and sequences in the CPRs of each chromosome are hypothesized to affect the expression of genes underlying complexly inherited traits such as autism.
Until recently, the fundamental limitations of genome sequencing and genotyping in highly repetitive regions such as CPRs stood as a daunting barrier to the consideration of this hypothesis. The discovery of large-scale, centromere-spanning haplotypes, or cenhaps1,2, revealed great diversity in the CPRs of the human genome, including the largest Neanderthal and putatively even older ‘archaic’ segments, and strong associations with common differences in the sizes of the highly repetitive satellite DNA arrays. The tracking of the transmission of cenhaps in Simons Simplex Collection (SSC) families, the ongoing advances in human genome sequencing emerging from the Telomere-to-Telomere (T2T) and Human Pangenome Reference consortia, as well as the new SSC expression profiling data of promise, provide a powerful new basis upon which to address the role of CPRs in gene regulation and the risk of autism.