Researchers at Stanford University have connected risk-associated single nucleotide polymorphisms (SNPs) for both Alzheimer's disease (AD) and Parkinson's disease (PD) to their possible causal genes via single-cell investigations into epigenomic states.

The findings, which were published in the October 26, 2020, online issue of Nature Genetics, gave new clues to which genes might be involved in AD and PD, which collectively affect 60 million people globally and are rapidly becoming more prevalent as the population ages.

More generally, they provide a roadmap for getting from risk-associated variants to causal genes.

At last week's virtual annual meeting of the American Society of Human Genetics, Soumya Kundu presented key aspects of the work. Kundu is a graduate student in bioinformatics at Stanford University and a co-author of the paper.

The road from SNPs to genes can be surprisingly complex. Variance in SNPs is not usually the direct cause of altered disease risk -- in fact, the majority of SNPs are in noncoding regions of the genome.

GWAS is "great for finding genomic regions where genetic variation is associated with a disease," Kundu told the audience.

But beyond identification of the region, the method is unable to pinpoint either the specific genes that are at the root of the SNP signal, nor the specific cell types that are important.

Finally, he said, in linkage disequilibrium blocks where a number of SNPs are inherited together because they are near each other, "GWAS cannot pinpoint which specific SNPs actually have a causal effect.... it is difficult to find the specific variants that are causal, when all nearby SNPs are co-inherited."

Some noncoding SNPs are linked to disease risk because they are located near a disease-causing gene. But others directly affect disease risk by binding to transcription factors. And those transcription factor-bound enhancer regions sometimes affect genes that are at a distance, rather than their direct chromosomal neighbors.

In their work, the researchers used a multiple-step process to sleuth out noncoding SNPs with a direct role in affecting gene expression, their target gene, and the specific cell types in which those interactions were taking place.

They studied more than 70,000 individual cells from six brain regions of previously healthy donors. On the basis of surface marker expression, those cells could be divided into six types: excitatory and inhibitory neurons, microglia, astroglia, oligodendrocytes and oligodendrocyte precursors.

The team first used a technique called ATAC-Seq, which finds open chromatin and thus "marks accessible enhancer regions," Kundu explained. SNPs located in such open chromatin regions, even if those regions are noncoding, would be able to exert a direct effect on transcription by binding transcription factors.

Using machine learning, they analyzed the structures of the SNPs themselves to identify specific causal SNPs in blocks of co-inherited SNPs.

They also determined the long-range physical interactions of those open regions, which enabled them to identify interaction partners for the SNPs. Identifying long-range interaction partners increased the list of candidate causal genes from 51 to 428 for AD-linked SNPs, and from 109 to 528 in PD.

In their paper, the team presented new insights into both AD- and PD-linked SNPs. One AD SNP linked to the gene coding for phosphatidylinositol-binding clathrin assembly protein (PICALM), for example, was accessible specifically in oligodendrocytes. The machine learning experiment suggested that this SNP disrupted a binding site for the transcription factor FOS/AP1, providing a possible mechanism of action.

Kundu said that "the true promise in studying these SNPs is the identification of new genes," a function that is "especially important in PD, where gene identification is less mature."

The team reported success in that aspect, too, demonstrating that a PD-linked SNP pointed to stabilin-1 (STAB1), which had not previously been implicated in PD. The SNP was accessible specifically in microglia, and disrupted a binding site for the transcription factor KLF4, again providing a cell type and a plausible mechanism along with the target gene itself (Corces, M.R. et al. Nat Genet 2020, 52(11): 1158).