Statisticians are stuck between a rock and a hard place.
They worry a lot about false positives. When looking at data, "p" values, or probability values, estimate the likelihood that a reported effect is due to chance. A "p" value of 0.05 means an effect has a 1 in 20 likelihood of being due to chance. A "p" value of 0.05 or less satisfies most statisticians.
However, they also worry about false negatives. In a false negative, a statistical analysis will miss an existing cause-and-effect relationship, because the underlying association is real but not strong enough to show up over the noise in statistical analysis. In diseases that are multi-causal, with environmental factors, as well as multiple genetic loci contributing to the development (or not) of an illness, the statistical association between exposure to any one factor and eventual illness can be hard to demonstrate.
Breast cancer is a case in point. It is the most common type of cancer in women in the U.S. But while it is thought that a variety of genetic mutations affect an individual woman's risk of developing breast cancer, so far the only genes that have been clearly implicated in such risk are BRCA1 and BRCA2. Those inherited mutations have a strong effect on a woman's chance of developing breast cancer - the BRCA1 mutation raises a woman's lifetime risk of developing breast cancer from about 13 percent to 60 percent.
A study published in the Dec. 15, 2004, issue of Cancer Research reported on another genetic culprit in breast cancer, as well as in prostate cancer, the most common cancer in men. Scientists from Sequenom Inc.; the University and Technical University of Munich, Germany; the Planegg Urology Clinic and Genefinder Technologies Ltd., both also of Munich; Griffith University Gold Coast in Queensland, Australia; and the University Hospital of Tuebingen, Germany, used a large-scale association study to identify variants in the gene coding for several intracellular adhesion molecules (ICAMs) as risk factors for both breast and prostate cancer.
Association studies are an extension of linkage analysis, which uses familial genetic samples.
"Linkage analysis was extraordinarily successful in identifying monogenic diseases," said Andreas Braun, chief medical officer of San Diego-based Sequenom and corresponding author of the study. However, given the size of most family groups and the complexity of most polygenic diseases, attempts to identify genetic contributors in polygenic diseases via linkage analysis have mostly failed. For those types of diseases, association studies, which study larger groups of unrelated patients for single nucleotide polymorphisms that occur at different frequencies in the control and patient populations, are what Braun called "a much better strategy for identifying common variants that lead to common diseases."
Keeping The Baby, Tossing The Bathwater
However, association studies necessitate thinking about how to avoid both false positives and false negatives.
"We test for tens of thousands of SNPs" Braun told BioWorld Today. "By chance alone, you will get a number of significant associations," that is, false positives, between a given SNP and cancer risk.
Statistical procedures exist to take into account that a large number of statistical tests are being run; the problem is that they are likely to lead straight to false negatives.
"You'd need a dramatic p' value to survive such corrections. It is unlikely that the genetic underpinnings of a complex trait will have a strong enough association to survive," Braun said.
Instead, the scientists used a different approach: They first compared SNP allele frequencies from a discovery sample of about 250 patients to controls, and then attempted to replicate the strongest statistical associations between gene variants and cancer risk in two somewhat-smaller independent populations, since there is a low chance of the same false positive occurring in triplicate.
Using that approach, the scientists first identified about 1,600 SNPs with statistically significant differences in allele frequencies between breast cancer patients and controls. Through repeating measurements and genotyping, that group was winnowed down to about 50 SNPs that were genotyped in the replication samples.
The most significant association was found between a SNP in an exon of ICAM5, for which one allele was implicated in an increased breast cancer risk with a probability of 0.001 in the discovery sample and 0.03 and 0.07 in the two replication samples, respectively. The "p" value of the combined replication samples was 0.01. More fine-grained mapping of the region surrounding that SNP showed a high association between certain alleles and breast cancer risk for a chromosomal region spanning parts of the ICAM1, ICAM4 and ICAM5 genes.
Because of certain similarities between breast and prostate cancers, the researchers also tested a population of prostate cancer patients and controls for the same SNP. Again, there were significant differences in allele frequencies for the ICAM gene between patients and controls, with the strongest association in the ICAM5 region.
The fact that ICAMs play a role in cancer progression and tumor metastasis already was known from previous studies. However, the paper's association of specific gene variants with an increased risk for breast or prostate cancer potentially opened up diagnostic, prognostic and therapeutic applications. Sequenom itself plans to apply the data to the development of new diagnostics; however, the researchers note in their paper that ICAMs are "suitable targets for antibodies and small molecules," suggesting the findings apply to new therapeutics, as well.
Braun pointed out that statistics are only one part of a more comprehensive approach to identifying genetic risk factors in disease. Specifically, though the numbers say something different, his group believes that "ICAM1 is the main suspect in breast cancer, even if the significance is higher for ICAM5. But from everything else we know about ICAM5, it is barely expressed in breast cancer cells."