You may not be anonymous, thanks to the genealogy databases



[ad_1]


(CNN) – In the early 2000s, genetic testing became a consumer product not requiring the participation of a physician and the consequences of this change could affect everyone.

More than 60% of people of European descent can be identified by an anonymous DNA sample, simply by using data from consumer DNA databases, according to new research. According to a study published Thursday in the journal Science, this percentage includes those who have not yet been tested for DNA.

"Usually we think about paternity testing, you can find father, siblings, but with the advancement of more powerful techniques in genomics, you can now identify cousins ​​or even cousins ​​in some cases "said Yaniv Erlich. , lead author of the study and associate professor of computer science at Columbia University. According to the study, once distant relatives have been found, an unidentified "anonymous" DNA can lead you to a specific person.

Party fishing?

According to Erlich and his co-authors, more than 15 million people have used genetic testing directly from consumers as of April. Since most genetic testing companies allow their clients to download files of their raw genetic information, this has prompted third-party companies to use other services, such as GEDmatch, which allows users to download their data. raw for additional analysis, such as ancestry searches.

These services are of interest to a large number of people, including adoptees seeking their biological parents, as they expand their research beyond the society where they were originally tested. In the words of Erlich, "you are fishing in more ponds, you might find something".

In fact, GEDmatch, which is only intended for genealogical research, has found the suspicious suspect of the Golden State Killer. This year, investigators captured a suspect nearly 32 years after the killer's rampage ended using DNA from the crime scene to conduct what is called a long-distance family search. . The search helped the security forces identify a third cousin of the rapist and serial killer, while additional data led the investigators to a suspect who has passed a standard DNA test to confirm his identity.

Building on this success, long-term family research is about to become a standard investigative tool, suggest Erlich and his co-authors. So they conducted a study to understand the power. They began by analyzing more than one million anonymous genomes sequenced by MyHeritage, a genetic test provider for the consumer of which Erlich is the scientific manager.

"We have a database of more than 1.75 million individuals and we offer a test that lets you know your past and find loved ones," said Erlich, explaining that he and his colleagues "wished know what was the profile of the matches you get from an individual. "

The study highlights the results for people of European descent because "it happens that this group is the largest in our database," he said.

For about 60% of the anonymous individuals whose genomes were analyzed, all of European ancestry, the researchers were able to find a relative at the third cousin level or closer. The study showed that for about 15% of these people, the closest relative found was a second-degree cousin or closer.

Then, the team repeated this process on GEDmatch, an independent website with a privacy policy allowing users to compare anonymous files designated "public" by their owners. CNN has contacted GEDmatch for a comment.

Here, the Erlich team found that the two databases, which use different strategies to identify biological parents, gave very similar results. This proves that the method can be replicated using different databases, he said.

Once family members are found, an anonymous person can be re-identified by building a family tree, looking for other family members, and then triangulating from there. The study illustrated the fact that the team again identified an "anonymous" woman – although publicly available. – DNA information.

The data contained in the genetic databases represent only a small portion of the total population of the United States, noted Erlich. Once genetic databases cover about 2% of the population, almost anyone can be compared to at least a third cousin and be identified by their DNA, say his co-authors.

Given the rapid growth of consumer genomics, such opportunities are likely to be achievable in the near future, they conclude.

Combination of databases

Noah Rosenberg, a professor of biology at Stanford University, said that the Erlich study "shows that the Golden State Killer case is not an anomaly".

"In quite a lot of cases, it would be possible to use the technique used in this case to identify the contributor of a DNA sample," said Rosenberg, who was not not involved in this study but was the main author of a separate study. published Thursday in the journal Cell.

In his study, Rosenberg and his co-authors wanted to know "whether databases commonly used in forensic genetics can communicate with databases commonly used in biomedical, genealogical and personal genomics research".

DNA evidence has been admissible in US courts since the late 1980s, and since then law enforcement has been collecting DNA. As each type of database uses "different parts of the genome," Rosenberg said, the technique used in his study was different from that used in the case of the Golden State Killer.

One of the pathologists had preserved a sample of the killer's DNA, which allowed investigators of the "cold" case to go back, resample and identify more genetic markers than those found in the typical forensic reports, explained Rosenberg. "If the DNA sample is not available for that, then what might be generally available would be the forensic genetic markers."

With less data available to law enforcement, what happens when investigators do not have the opportunity to retake a test on a sample of DNA?

"It is scientifically possible to establish links between different types of databases," Rosenberg said. "We were able to find matches between the samples in databases of genetic markers not overlapping more than 90% of the time when it was samples from the same individual and about 30% of the time when they came from close relatives. "

"Different databases built for different purposes might not independently provide enough information to reveal the identity of a person, but by combining information from multiple databases, identifications can be realized, "he said.

His studies and those of Erlich "both relate to the principle that the connection of several databases reveals information that is not contained in any of these databases and may not be wanted by those who use them. have created, "he said.

Rosenberg hopes his study "will help catalyze the conversation among many stakeholders in the fields of forensic genetics, genetic data privacy and pedigree testing."

Erlich and his co-authors concluded the study with ideas to reduce the misuse of genetic databases. First, they call for rule changes allowing discarded materials in clinics to be subjected to genetic testing. Requiring the consent of individuals before testing would "give better protection to human subjects," said Erlich. They also suggest that consumer genetic societies adopt a better encryption strategy in order to have a "technical means to differentiate legitimate searches from illegitimate searches".

Rosenberg said, "These ideas deserve a discussion.Another idea might be to know what kind of information is considered admissible in court."

Copyright 2018 by CNN NewSource. All rights reserved. This material may not be published, disseminated, rewritten or redistributed.

[ad_2]
Source link