[ad_1]
The DNA sequence of each is unique. But for those who wish to preserve their genetic confidentiality, it may not be unique enough.
A new study says that more than half of Americans could be identified by name if all that was needed was a sample of their DNA and some basic facts, such as where they live and how old they are.
It would not be easy and it would not be cheap. But the fact that this has become feasible will force us all to rethink the meaning of privacy in the era of DNA, experts said.
There is little time to lose. The researchers behind the new study claim that once three million Americans have uploaded their genomes to public genealogy websites, almost everyone in the United States would be identifiable by its DNA and some additional clues.
More than a million Americans have already released their genetic information, and dozens of others do so every day.
"People were wondering how long it would take before they could use DNA to detect almost everyone," said Ruth Dickover, director of the forensic science program at the University of California, Davis , who did not participate in the study. "The writers say it's not going to take so long."
This new reality represents the convergence of two long-standing trends.
One of them is the rise of genetic testing directly to consumers. Companies such as Ancestry.com and 23andMe can sequence anyone's DNA for around $ 100. All you have to do is provide a saliva sample and mail it.
The other essential element is the proliferation of publicly searchable genealogy databases, such as GEDmatch. Anyone can download a complete genome to these sites and powerful computers will work there, looking for matching sequences of DNA sequences that can be used to create a family tree.
To test the growing power of these sites, researchers led by Columbia University computer scientist, Yaniv Erlich, sought to find out if they could find the name of a person – and so his identity – if all they had to do was a piece of his DNA a small amount of biographical information.
They started with a complete DNA sequence of a person whose genetic information had been published anonymously as part of an unrelated scientific study. (They had actually identified this woman in a previous study, but for the purposes of this work, they claimed that they did not know who she was.)
Erlich and her collaborators downloaded her genetic code on GEDmatch and did a search to see if she had any connections on the site. They found two: one in North Dakota and one in Wyoming.
The researchers were able to find that they were all related because they shared a number of polymorphisms of a nucleotide, or SNP. These are unique letters to specific locations among the 3 billion A, C, Ts and G that make up the human genome.
The more the SNPs share people, the more closely they are linked.
By comparing the DNA of the three parents, the Erlich team was able to find a common ancestral couple who were the great-grandparents of Utah's wife.
Then, the researchers went through genealogical websites and other sources to search for other descendants of this very old couple. They found 10 children and hundreds of grandchildren and great-grandchildren.
Then, they started collecting their massive list of descendants. They eliminated all men from the sample and then those who were not alive when the DNA of Utah's wife was sequenced. The authors also knew that their subject was married and how many children she had, which helped them to better identify their target.
After a long day of painstaking work, the researchers were able to correctly name the owner of the DNA sample.
The authors said the same process would work for about 60% of Americans of European descent, who are the people most likely to use genealogy sites, said Erlich. Although the odds of success are lower for people of other origins, one would still expect that he will work for more than half of Americans, they said.
To reach this conclusion, the researchers analyzed a different database consisting of 1.28 million anonymous individuals whose DNA was sequenced by MyHeritage, a DNA testing and history company. Erlich is the scientific manager.
If you can find the third cousin of a person in a genealogical database, then you should be able to identify the person with a reasonable amount of research, Erlich said. The team therefore verified the number of relatives of the order of a third cousin or closer whom they could find for each individual in their data set.
They found many: 60% of the 1.28 million people were paired with a parent at least as close as a third cousin and 15%, to a parent at least as close as one to second cousin.
The results were published Thursday in the journal Science.
Until now, Joseph James DeAngelo, 72, is the most famous person to have been identified this way. You may know him better as an alleged Golden State assassin, accused of 13 counts of murder and 13 counts of attempted kidnapping.
When law enforcement officials used a publicly accessible DNA database to catch DeAngelo in April, it was only the second time in the history of the crime resolution that the strategy was successfully implemented.
Since then, at least 13 other suspected criminals have been identified in the same way.
"The Golden State Killer case resolution has opened this method as a possibility and other criminal labs are taking advantage of it," Dickover said. "Clearly a trend has begun."
Individuals also benefit from the technology. The adoptees have found parents and biological siblings, while others have found distant cousins who can shed new light on the origins and heritage of a family.
But the more likely we are to upload DNA into publicly searchable databases, the implications can be terrifying.
"When the police arrested the state killer, it was a very good day for humanity," Erlich said. "The problem is that the same strategy can be misused."
Think of foreign governments using this technique to hunt down American citizens, he said. Or protesters and activists pursued in this way.
Erlich and his co-authors proposed a mitigation strategy that would make it more difficult to download an unknown DNA sequence into a genealogical database and search for a match.
They suggest that DNA testing companies directly to consumers put a special code on the raw data files that they send to their customers. Genealogy sites could then agree to allow people to download DNA sequences only if they have a valid code. This would ensure that people can search only their own DNA.
Such a system would not prevent law enforcement from using genealogical databases to search for suspects, Erlich said.
The ultimate goal is to allow people to use their DNA to learn more about their own family without sacrificing their privacy, Erlich said.
This year alone, his adoptive cousin has found a biological sister who lives on the other side of the world, he said.
"That's why we have this technique," he said.
Explore further:
Study: DNA websites are widely targeted at people
More information:
Y. Erlich et al., "Genome Data Identity Inference Using Long-Range Family Searches" Science (2018). science.sciencemag.org/lookup/… 1126 / science.aau4832
Source link