“The game has changed.” AI triumphs in the resolution of protein structures | Science



[ad_1]

The structures of a protein predicted by artificial intelligence (in blue) and experimentally determined (in green) match almost perfectly.

DeepMind

By Robert F. Service

Artificial intelligence (AI) has solved one of the great challenges of biology: predicting how proteins wind from a linear chain of amino acids into 3D shapes that allow them to perform the tasks of life . Today, leading structural biologists and organizers of a biennial protein folding competition announced the achievement of researchers at DeepMind, a UK-based AI company. They say that the DeepMind method will have far-reaching effects, one of which is to significantly speed up the creation of new drugs.

“What the DeepMind team has been able to achieve is fantastic and will change the future of structural biology and protein research,” says Janet Thornton, Director Emeritus of the European Institute of Bioinformatics. “It’s a 50-year-old problem,” adds John Moult, structural biologist at the University of Maryland, Shady Grove, and co-founder of the competition, Critical Assessment of Protein Structure Prediction (CASP). “I never thought I would see this in my life.”

The human body uses tens of thousands of different proteins, each of which is a chain of tens to several hundred amino acids. The order of these amino acids dictates how the myriad of pushes and pulls between them give rise to the complex 3D forms of proteins, which, in turn, determine how they function. Knowing these shapes helps researchers design drugs that can get lodged in the pockets and crevices of proteins. And being able to synthesize proteins with a desired structure could accelerate the development of enzymes that make biofuels and degrade plastic waste.

For decades, researchers have deciphered the 3D structures of proteins using experimental techniques such as x-ray crystallography or cryo-electron microscopy (cryo-EM). But such methods can take months or years and do not always work. Structures have only been resolved for about 170,000 of the more than 200 million proteins found in all life forms.

In the 1960s, researchers realized that if they could determine all of the individual interactions in a protein’s sequence, they could predict its 3D shape. With hundreds of amino acids per protein, and so many ways that each pair of amino acids can interact, however, the number of possible structures per sequence was astronomical. Computer scientists jumped at the problem, but progress was slow.

In 1994 Moult and his colleagues started CASP, which takes place every 2 years. Participants obtain amino acid sequences for about 100 proteins whose structures are not known. Some groups calculate a structure for each sequence, while others determine it experimentally. The organizers then compare the computer predictions with the lab results and assign the predictions an overall distance test score (GDT). Scores above 90 on the zero to 100 scale are considered equivalent to experimental methods, Moult says.

Even in 1994, the structures predicted for small, simple proteins might match the experimental results. But for larger, harder proteins, the GDT scores for the calculations were around 20, “a complete disaster,” says Andrei Lupas, CASP judge and evolutionary biologist at the Max Planck Institute for Developmental Biology. By 2016, the competing groups had achieved scores of around 40 for the hardest proteins, mainly by deriving information about known structures of proteins closely related to CASP targets.

Fold

In the biennial Protein Structure Prediction Critical Appraisal Competition (CASP), groups compete to predict the 3D structure of proteins. This year, AlphaFold beat all the other groups and matched the experimental results to a precision measure.



20 20 20 20 100806040200 2001 2001 2001 2001 2001 2001 2001 2001 Difficulty predicting protein structure Overall distance test% 100806040200 CASP1 (1994) CASP5 (2002) CASP12 (2016) CASP13 (2018) Easy Difficult CASP14 (2020)other competitors AlphaFold (2020)

C. BICKEL /SCIENCE

During DeepMind’s first competition in 2018, its algorithm, called AlphaFold, relied on this comparative strategy. But AlphaFold has also incorporated a computational approach called deep learning, in which the software is trained on vast data repositories – in this case, known sequences, structures, and proteins – and learns to spot patterns. DeepMind won with flying colors, beating the competition by an average of 15% on each structure and earning GDT scores of up to around 60 for the toughest targets.

But the predictions were still too crude to be useful, says John Jumper, who heads AlphaFold development at DeepMind. “We knew how far we were from biological relevance.” To do better, Jumper and his colleagues combined deep learning with a “tension algorithm” that mimics how a person might put a puzzle together: first connect the pieces together into small groups – in this case clusters of clusters. amino acids – then look for ways to join the clumps into a bigger whole. Working on a modest computer network of 128 processors, they trained the algorithm on some 170,000 known protein structures.

And it worked. Across the target proteins in this year’s CASP, AlphaFold achieved a median GDT score of 92.4. For the more difficult proteins, AlphaFold scored a median of 87, 25 points above the following best predictions. He even excelled in resolving protein structures stuck in cell membranes, which are at the heart of many human diseases but are notoriously difficult to resolve with x-ray crystallography. Venki Ramakrishnan, structural biologist at the Molecular Biology Laboratory of the Medical Research Council, calls the result “an astonishing advance on the problem of protein folding”.

All groups in this year’s competition have improved, says Moult. But with AlphaFold, Lupas says, “The game has changed.” The organizers even feared that DeepMind may have cheated in one way or another. So Lupas set himself a special challenge: a membrane protein from a species of archaea, an ancient group of microbes. For 10 years, his research team tried every trick in the book to obtain an x-ray crystal structure of the protein. “We couldn’t solve it.”

But AlphaFold had no problem. He returned a detailed image of a three-part protein with two long helical arms in the middle. The model allowed Lupas and his colleagues to understand their radiographic data; within half an hour, they had adapted their experimental results to the intended structure of AlphaFold. “It’s almost perfect,” Lupas says. “They couldn’t have cheated on it. I don’t know how they do it.

As a condition for entering CASP, DeepMind – like all groups – agreed to reveal enough details about its method for other groups to recreate it. This will be a boon for experimenters, who will be able to use precise structure predictions to make sense of opaque X-ray and cryo-EM data. It could also allow drug designers to quickly determine the structure of each protein in dangerous new pathogens like SARS-CoV-2, a key step in finding molecules to block them, says Moult.

Yet AlphaFold isn’t doing everything right yet. In the competition, he noticeably weakened on a protein, an amalgam of 52 small repeating segments, which distort the positions of each other during their assembly. Jumper says the team now wants to train AlphaFold to solve such structures, as well as those of protein complexes that work together to perform key functions in the cell.

Even if one great challenge has fallen, others will undoubtedly arise. “It’s not the end of something,” Thornton said. “This is the start of many new things.”

[ad_2]

Source link