Australian geneticists have developed a new tool, the Single Nucleotide Association Test for CNVs (SNATCNV), with which to analyze copy number variations (CNVs) and their associations in genetic neurodevelopmental disorders.
Using SNATCNV in a large cohort of autism spectrum disorder (ASD) patients, the researchers generated a map of autism-associated CNVs and provided full lists of brain-enriched coding and long, non-coding (lnc) RNA genes.
This analysis showed that SNATCNV was an effective, freely available open-access tool for defining genomic loci and causative genes for CNV-associated conditions, the authors reported in the October 27, 2020, edition of Cell Reports.
"While there are other tools with which to identify disease-associated CNVs, SNATCNV is the only one that can identify biological relevant disease-associated CNVs in a high confidence manner," said first author Hamid Alinejad-Rokny, head of the Systems Biology and Health Data Analytics Laboratory at the University of New South Wales in Sydney.
ASDs comprise neurodevelopmental disorders with different phenotypes, including impaired social interaction, repetitive behaviors, and learning and speech impediments, but diagnosis and treatment can be problematic.
"When an individual is clinically diagnosed with autism, genetic testing is increasingly being offered as a means to accurately diagnose and deliver personalized care," Alinejad-Rokny told BioWorld Science.
However, "the genetic contribution to autism is approximately 80%, but existing diagnostic tools identify only around 25% of individuals with autism of a genetic origin," he said.
In this regard, "SNATCNV, which took almost 3 years to develop, represents a significant advancement in identifying ASD-associated CNVs."
Genetic lesions leading to ASD are heterogeneous, ranging from highly penetrant single-gene mutations and CNVs, to weakly penetrant risk alleles identified by genome-wide association studies.
"Unlike single gene mutations, CNVs often contain multiple genes, so pinpointing the responsible gene within a CNV [that is] causative of a given phenotype is difficult," said Alinejad-Rokny. "Therefore, the identification of the responsible genes and their causative variants is a challenging task, which requires more robust tools."
Recent studies have identified 98 recurrently mutated ASD candidate encoding genes, with an estimated 15-25% of cases being attributable to de novo mutations, most of which correspond to CNVs.
Moreover, at least 14 independent CNVs have previously been associated with ASD, several of which are associated with distinct phenotypes.
Such studies have important treatment implications, said study coleader Alistair Forrest, a professor and head of Systems Biology and Genomics in the Harry Perkins Institute of Medical Research at the University of Western Australia in Perth, where Alinejad-Rokny was previously a postdoctoral research fellow.
"Knowing which genetic lesions are involved is the key to a precise diagnosis and deciding on what treatment, if any, should be pursued as, with potentially over 100 different genetic causes, a one-size-fits-all approach to ASD doesn't make sense," Forrest told BioWorld Science.
"The finding that some CNVs are associated with additional phenotypes is also important in terms of management, as some patients may have additional susceptibilities to problems such as heart defects," he noted.
However, despite the known importance of CNVs in ASD, there remains a need to define what fraction of ASD is actually due to CNVs, as estimates range from 10-20%.
Additionally, to understand the causative genes within these CNV regions, improved methods are needed to focus on the critical regions most strongly associated with ASD and to prioritize the genes contained within each.
CNV hunting SFARI
In the new Cell Reports study, SNATCNV was used to investigate the Simons Foundation Autism Research Initiative (SFARI) CNV database, which comprises 19,663 cases of ASD and 6,479, in which 47 recurrent ASD CNV regions were identified.
Compared to the PLINK open-source whole-genome association analysis toolset, SNATCNV was shown to be able to identify smaller critical regions that better discriminate ASD cases from controls.
"At all tested thresholds, SNATCNV identified smaller regions that explained more patients and overlapped with fewer controls than those identified by PLINK," Forrest said.
"CNVs are large regions with many genes, so by narrowing down these regions there are fewer genes to sift through, meaning we have a better chance of understanding what is causing the ASD phenotypes and potentially developing treatments."
Analysis of ASD CNV gene content using Functional Annotation of the Mammalian Genome 5 (FANTOM5) revealed that constituent coding genes and lncRNAs had brain-enriched patterns of expression.
Importantly, such enrichment is not observed for regions identified by using other tools such as PLINK.
"SNATCNV identified regions that are sufficiently small and specific to patients and not controls, to show that genes implicated in ASD are nervous system enriched," said Forrest.
"Previous studies have either ignored this or had no significant enrichment, but we are finally able to confirm a major hypothesis that ASD-associated genes are more likely to be nervous system enriched than not."
The study also found evidence of sexual dimorphism, one locus uniquely comprising a single lncRNA gene, and correlation of CNVs to distinct clinical and behavioral traits.
Finally, analysis of a large database for schizophrenia further demonstrated that SNATCNV is an effective free tool with which to define genomic loci and causative genes for CNV-associated conditions.
"Although we have shown an analysis for schizophrenia, in principle SNATCNV could be applied to any disease where CNV data have been collected," Forrest said.
Looking forward, "the next obvious application is to apply SNATCNV to large whole genome-based sequencing cohorts such as the UK Biobank database." (Alinejad-Rokny, H. et al. Cell Rep 2020, 33(4): 108307).