Genealogical databases could reveal the identity of most Americans



[ad_1]

Protecting the anonymity of publicly available genetic data, including DNA given to research projects, may be impossible.

A new study reveals that nearly 60% of people of European descent who search genetic genealogy databases will find a match with a family member who is a third cousin or closer. The result suggests that with a database of about 3 million people, the police or anyone else with access to DNA data can determine the identity of virtually no one. any American of European origin, Yaniv Erlich and his colleagues refer to 11th of October Science.

Erlich, the scientific lead of the MyHeritage consumer genetic testing company, and his colleagues looked at his company's database and the GEDMatch public genealogy site, each containing data for about 1.2 million people. By using DNA matches with parents, family tree information and some basic demographics, scientists believe that they could reduce a holder's identity. anonymous DNA to one or two people only.

Recent cases identifying suspects in violent crimes through GEDMatch DNA searches, such as the Golden State Killer case (SN online: 4/29/18), raised issues of confidentiality (SN online: 6/7/18). And the same process used to search for rape and murder suspects can also identify people who donated anonymous DNA for genetic and medical research studies, according to scientists.

Genetic data used in research is devoid of information such as names, ages and addresses, and can not be used to identify people, government officials said. But "that's just not true," as Erlich and his colleagues show, says Rori Rohlfs, a Statistics Geneticist at the University of San Francisco, who was not involved in the study.

Using genetic genealogy techniques that reflect Golden State killer research and suspects in at least 15 other criminal cases, the Erlich team identified a woman who participated anonymously in the 1000 Genomes project. This project has cataloged genetic variants in approximately 2,500 people worldwide.

Erlich's team extracted anonymous data from this woman from the 1000 Genomes database, publicly available. The researchers then created a DNA profile similar to that generated by genetic testing companies such as 23andMe and AncestryDNA (SN: 23/06/18, p.14) and uploaded this profile to GEDMatch.

A search revealed matches with two distant cousins, one from North Dakota and one from Wyoming. The cousins ​​also shared their DNA, stating that they had a common set of ancestors four or six generations ago. Based on family tree information already collected by these cousins, the researchers identified the ancestral couple and filled in hundreds of descendants, looking for a woman corresponding to age and other publicly available demographic data. participating in 1000 Genomes.

It took a day to find the right person.

This example suggests to scientists who need to redefine whether they can guarantee the anonymity of research participants if genetic data is shared publicly, says Rohlfs.

In reality, however, identifying a person from a DNA match with a distant relative is much more difficult than it appears and requires a lot of expertise and hard work, explains Ellen Greytak. She is director of bioinformatics at Parabon NanoLabs, a company based in Reston, Va., Which has helped close at least a dozen criminal cases since May through genetic research. "The gap between correspondence and identification is absolutely huge," she says.

The company also found that people of European origin often had DNA matches with relatives in GEDMatch. However, finding CeCe Moore, a genealogist who runs Parabon's genealogical genealogy service, often complicates the search for a single suspect among these matches.

"The study demonstrates the power of genetic genealogy theoretically," says Moore, "but does not fully capture the challenges of working in practice." For example, Erlich and his colleagues already had genealogy tree information from the 1000 genome. The woman's parents, "so they had a significant length of advance."

Erlich's example could be an oversimplification, says Rohlfs. The researchers have made rough estimates and assumptions that are not perfect, but the conclusion is solid, she says. "Their work is rough, but totally reasonable." And this conclusion that almost everyone can be identified from DNA should spark public discussion about how DNA data should be used for law enforcement and the application of the law. search, she says.

[ad_2]
Source link