Most White Americans' DNA Can Be Identified Through Genealogy Databases


The genetic genealogy industry is booming. In recent years, more than 15 million people have come up with their DNA – a cheek swab, some saliva in a test-tube – to such services as 23andMe and in pursuit of answers about their heritage. In exchange for a genetic fingerprint, may be a parent, long-lost cousins, perhaps even a link to Oprah or Alexander the Great.

But as these registries of genetic identity grow, it is becoming harder for individuals to retain any anonymity. Already, 60 percent of Americans of Northern European descent – the primary group using these sites – can be identified through these studies whether or not they've joined one another, according to a study published today in the journal Science.

Within two or three years, 90 percent of Americans of European descent will be identifiable from their DNA, researchers found. The future science fiction, in which everyone is known whether or not they want to be, is nigh.

"It's not the future distant, it's the near future," said Yaniv Erlich, the lead author of the study. Dr. Erlich, formerly a genetic-privacy researcher at Columbia University, is the science director of MyHeritage, a genetic ancestry website.

[[Like the Science Times page on Facebook. | Sign up for the Science Times newsletter.]

The science involves a search for third cousins. To identify a person through a DNA sample, an investigator uploads a previously analyzed genetic sequence to a database. The goal is to find someone who shares the place. Most people in the world, who fall in this category. So long as one of these people is in a database, a skilled person may be able

This technique has been used to identify more than 15 suspects in murder and sexual assault cases. The breakthroughs began in April with an arrest in the box of the Golden State Killer, who terrorized California with rapes and murders in the '70s and' 80s. Other successes soon followed. A truck driver in Washington State was involved with the murder of a Canadian couple in 1987; A DJ in Pennsylvania was involved with the murder of a teacher in 1992.

Watching these developments, Dr. Erlich wondered about the odds of identifying people through these DNA databases.

His analysis is based on 23andMe and Ancestry, but on two of the smallest: GEDmatch, which has around one million profiles, and MyHeritage, which has around 1.5 million. That's because, for legal and logistical reasons, the larger sites can not be easily used.

But the smaller sites, set up to help genealogists, are more flexible. GEDmatch allows law-enforcement officials to scan its database in murder and sexual assault cases. MyHeritage does not allow you to upload from external labs. With both, it's hard to be sure what's being uploaded: grandma's saliva, crime scene blood, a sample from a medical study or something else entirely.

Dr Erlich and his colleagues – from Columbia University, the Hebrew University of Jerusalem and the New York Genome Center – analyzed 30 DNA kits from the GEDmatch database.

Their results were eye-opening. The team found that a sample of an American continent could be tracked to a third-cousin distance of 60 percent of cases. A comparable analysis on the MyHeritage site had similar results. (The analysis focused on the Americans of North European background = 75 percent of the users on GEDmatch and other genealogy sites belong to that demographic.)

Some experts have raised questions about the study's methodology. Its sample size was small, and it did not factor in that it was often required to identify a suspect.

This Moore, a genetic genealogist with Parabon, has forensic consulting firm, also expressed concern that the science paper may obscure the difficulty involved in puzzling out someone's identity; it takes a highly skilled expert to build a family tree from the initial genetic clues.

Still, she said, the takeaway of the study "is not news to us." In recent months Ms. Moore has been involved in a dozen murder and sexual assault cases that used GEDmatch to identify suspects. Of the 100 crime-scene profiles that she had uploaded to GEDmatch by May, they were obviously solvent, she said, and 20 were "promising."

"I think it's a strong and convincing paper," said Graham Coop, a population genetics researcher at the University of California, Davis. In a blog post in May, Dr. Coop calculated just how lucky investigators had been in the Golden State killer case. He reached a statistical conclusion similar to that of Dr. Erlich's: Society is not far from being able to identify 90 percent of people through the DNA of their cousins ​​in genealogical databases.

"This is this moment of, wow, oh, this opens up a lot of possibilities," he said.

In an alarming result, the scientific study was supposedly "anonymous" genetic profile taken from a medical data set could be uploaded to GEDmatch and positively identified. This shows that an individual's private health data might not be so private after all.

Dr. Erlich has analyzed the issue of genetics. This should help ensure that whoever uploads a genetic profile, who should they be?

Daniel MacArthur, a genomics researcher at Massachusetts General Hospital, said he endorses the cryptographic signature, but it does not go far enough. "We live in a world where they are very much interested in getting their genetic data to learn more about themselves," he said. "It's a natural human instinct. Legislative purpose is required to ensure that it is not used for nefarious purposes. "

Source link