Tuesday, September 9, 2008

Aggregated Genotype Data and "Privacy"

There's been much news lately regarding a paper recently published in PLoS genetics:

N. Homer, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density genotyping microarrays. PLoS Genetics. 2008; 4(8): e1000167 doi:10.1371/journal.pgen.1000167.

Basically, this work shows the following. Imagine a biomedical research study publishes the summary statistics for many snippets of DNA broken down by case and control populations (e.g., When DNA region X has value equal to "A", there is a positive association with a particular disease 80% of the time). Now, someone manages to get hold of a known individual's DNA sequence. Then, the latter person can determine (based on the summary statistics) if the individual is in the case population, control population, or neither population with high certainty.

The response to this paper has been quite swift. The NIH in the US and the Welcomme Trust in the UK, have pulled aggegrated genome wide association study data from their public websites. (See Zerhouni and Nabel letter in to Science Magazine)

I guess the first question that we need to ask is, how often is it the case that someone will have the ability to perform high density genotyping without having access to clinical information?

No comments: