In one sense, they came in under the wire; but then again, they did do three times as much as they'd promised.

On Oct. 29, 2002, an international consortium consisting of researchers from academic centers, non-profit biomedical research groups and private companies in the U.S., Canada, China, Japan, Nigeria and the UK announced an enterprising goal: to develop, within three years, a haplotype map of the human genome for roughly 1 million single nucleotide polymorphisms, or SNPs, in six populations. (See BioWorld Today, Oct. 30, 2002.)

On Oct. 26, 2005, they held a press conference to announce the completion of not just 1 million SNPs, but a Phase II project mapping almost 3 million SNPs. The Phase I data are published in the Oct. 27, 2005, issue of Nature, with associated studies being published in Nature Genetics, PLoS Biology and Genome Research. The SNP data also have been deposited in public databases.

Kelly Frazer, vice president of genomics at Mountain View, Calif.-based Perlegen Sciences, told BioWorld Today that the plan originally had been to map 4.6 million SNPs in Phase II. However, many of the easiest SNPs had been characterized during Phase I of the project, making the Phase II SNPs more difficult on the average. "We took on almost all comers in Phase II and ended up with a success rate of about 50 percent," she said.

Despite falling short of the original 4.6 million SNP goal, the map will be a formidable aid to biomedical research. Francis Collins, director of the National Human Genome Research Institute, said at the press conference that for disease association studies, the HapMap "may ultimately prove more powerful" than the map of the human genome that was published in 2003.

The human genome project published a consensus sequence of DNA, that is, the bases everyone has in common. But Collins noted that "variety is the spice of life," and it is in fact the bases that differ between populations that hold the most promise for determining genetic contributions to disease.

The ultimate goal of HapMap consortium is to enable researchers to do more with less - analyze larger populations with fewer SNPs. Although technological advances mean it now costs fractions of a penny to genotype a SNP and the number of SNPs that can be done in a day has gone from hundreds to millions, large-scale association studies between SNPs and disease remain brute force exercises. But after mapping several million SNPs, the researchers and makers of gene analysis tools such as Perlegen and San Diego-based Illumina Inc., which also participated in the HapMap consortium, will attempt to reduce that map to one-hundredth of its size without significant information loss.

The reason that is feasible is that (like any two pieces of DNA) some SNPs are inherited together more frequently than would be predicted from their distance alone, presumably because the gene combination they are part of confers some selective advantage on its owner. For the purposes of association analysis, SNPs that are inherited in blocks, or haplotypes, provide redundant analysis and do not need to be genotyped separately.

"Testing a subset is adequate to capture the information of the whole," David Altshuler, director of the program in Medical and Population Genetics at the Broad Institute of Harvard and MIT, said at the press conference. By determining which SNPs are inherited en bloc, the scientists can select "tag" SNPs out of the blocks and conduct association studies using those tags.

Perlegen's Frazer said that the ultimate goal is to find a set of 200,000 to 300,000 SNP that will be as useful as the entire set. Having a tag set of "only" a few hundred thousand SNPs then will allow scientists to test several thousand individuals in large-scale association studies, rather than 3 million SNPs in a few hundred individuals, as was done for the raw map.

Such large follow-on studies can determine genetic contributions not just to the obvious suspects, but also to diseases that are not genetic in a simple sense. One example is the susceptibility to infectious diseases such as malaria and tuberculosis. Frazer said that the HapMap "paves the way for an explosion of follow-on studies."

In the meantime, the HapMap already is proving to be more than an academic exercise. Concurrently with Nature's publication describing the HapMap, the journal Genome Research published a special issue devoted to studies using HapMap data to provide insight into human biology and disease.

The published research includes a paper on predicting pregnancy success based on SNP variations in a gene cluster on chromosome 19, and showing that predisposition to prostate cancer can be determined by the specific architecture of a gene cluster on the X chromosome.