The technique of DNA that caught the Golden State Killer is more powerful than we thought



[ad_1]

In April, the police solved a decades-old mystery – the identity of the Golden State killer – with an unused DNA technique. Searching for a matching sample from existing databases yielded no results, but a search of a public DNA database of between 10 and 15 possible close relatives allowed the police to reduce the list of suspects and give her the initiative she needed.

It was a new technique at the time, but after the resounding success, this technique proved to be one of the most powerful new tools of forensic science. In the months that followed, groups such as Parabon Labs and the DNA Doe project identified at least 19 different cold case samples using this method, known as the family DNA test of public databases, providing new critical insights for previously insoluble cases.

Now, a pair of new discoveries could make this technique even more powerful. An article published today in the journal Science notes that the same technique could extend far beyond contemporary laboratories, covering almost the entire population from a relatively small sample base. At the same time, researchers publishing in Cell have devised a way to extrapolate from incomplete samples, by developing a larger genome image than originally tested. Taken together, these techniques would allow researchers to identify almost anyone using only existing samples, a terribly powerful new tool for DNA forensics.

Family DNA tests are a break from conventional DNA tests, which look for positive matches, such as matching the DNA of a bloody glove with the DNA of a specific suspect. The bottom line is that a match is only possible if the suspect's DNA can be collected, making it impractical for most cold cases. But family DNA searches look for partial matches, which could indicate that the sample comes from a brother or relative rather than from the same person. This is not enough to conclusively identify a person, but it can give the police a decisive advantage that can lead to further testing later.

To find these partial matches, the laboratories have made extensive use of public genetic databases such as GEDMatch and DNALand. This research does not require court approval because the data are already public, but their scope is more limited. The largest database, GEDMatch, contains just under one million genetic profiles, which greatly limits the scope of many searches. The FBI's national DNA index, on the other hand, contains more than 17 million profiles, but is only accessible in certain legal circumstances. The mainstream DNA services such as 23andMe and MyHeritage also contain many more samples, but their policies generally exclude such searches conducted by law enforcement agencies.

The result is new data scrambling and new uncertainty about the scope of public data. "The main limitation is coverage," says Yaniv Erlich, a computer professor at Columbia University and scientific leader of MyHeritage. "And even if you find an individual, it requires a complex analysis from there."

Today, Erlich is associated with other researchers from Columbia University and the Hebrew University to examine exactly the scope of this coverage. For the Science In this article, the team examined a dataset of 1.28 million individuals (largely derived from the MyHeritage database) and produced a statistical analysis of the probability that a person data can be matched to a parent whose DNA is in the database. According to these findings, the researchers found that more than 60% of the searches resulted in a third cousin or a closer match (same proximity as the Golden State Killer suspect), which gave a reasonable chance of not identifying the target. . As a result, researchers estimate that a database should only cover 2% of the target population to provide a match from a third cousin or better to almost any other person. "With the exponential growth of consumer genomics," the researchers write, "we estimate that such a database scale is predictable for some third-party websites in the near future."

Notably, this forecast is based on a homogeneous population, but most collections of genetic data reveal significant racial disparities. The most important among them concerns the databases of the forces of the order, coming from the populations of arrested or condemned, and directed thus towards the black populations and Latin American. Consumer and public databases show the opposite bias towards Caucasians, who are later more likely to be identified for family research, explains Erlich.

At the same time, another group of scientists is expanding the scope of these techniques. Genetic consumption tests extract different parts of the genome from those of the detection and suppression tests, which poses a permanent comparison problem when a complete sample can not be obtained. But a group of researchers from Stanford University, the University of California at Davis, and the University of Michigan have developed a method for comparing results even when parts of the genome do not overlap. relying on known correlations between different parts of the genetic code. The method is not fully developed, but it could give forensic analysts much more flexibility in the type of data they can use.

According to Michael Edge of UC Davis, who worked on the Cell On paper, the new research "suggests a framework that law enforcement could use to start thinking about the backward compatibility of existing STR databases with SNP data, but more work would be needed to see how much it would be practical. "

[ad_2]
Source link