“The game has changed.” AI triumphs over protein folding



[ad_1]

Integrated image

The structures of a protein predicted by artificial intelligence (in blue) and experimentally determined (in green) match almost perfectly.

IMAGE: DEEPMIND

Artificial intelligence (AI) has solved one of the great challenges of biology: predicting how proteins fold up a chain of amino acids into 3D shapes that perform the tasks of life. This week, the organizers of a protein folding competition announced the achievement of researchers at DeepMind, a UK-based AI company. They say that the DeepMind method will have far-reaching effects, one of which will significantly speed up the creation of new drugs.

“What the DeepMind team has been able to achieve is fantastic and will change the future of structural biology and protein research,” says Janet Thornton, Director Emeritus of the European Institute of Bioinformatics. “It’s a 50-year-old problem,” adds John Moult, structural biologist at the University of Maryland, Shady Grove, and co-founder of the competition, Critical Assessment of Protein Structure Prediction (CASP). “I never thought I would see this in my life.”

The body uses tens of thousands of different proteins, each of which is a chain of tens to hundreds of amino acids. The order of the amino acids dictates how the myriad of pushes and pulls between them give rise to the complex 3D forms of proteins, which, in turn, determine how they work. Knowing these shapes helps researchers design drugs that can get lodged in crevices in proteins. And being able to synthesize proteins with a desired structure could speed up the development of enzymes to make biofuels and degrade plastic waste.

CREDITS: (GRAPHIC) C. BICKEL /SCIENCE; (DATA) CASP

For decades, researchers have deciphered the structures of proteins using experimental techniques such as x-ray crystallography or cryo-electron microscopy (cryo-EM). But such methods can take years and don’t always work. Structures have only been resolved for about 170,000 of the more than 200 million proteins found in all life forms.

In the 1960s, researchers realized that if they could determine all the interactions within a protein’s sequence, they could predict its shape. But amino acids in a given sequence could interact in so many different ways that the number of possible structures was astronomical. Computer scientists jumped at the problem, but progress was slow.

In 1994 Moult and his colleagues started CASP, which takes place every 2 years. Participants obtain amino acid sequences for about 100 proteins whose structures are not known. Some groups calculate a structure for each sequence, while others determine it experimentally. The organizers then compare the computer predictions with the lab results and assign the predictions an overall distance test score (GDT). Scores above 90 on a 100-point scale are considered equivalent to experimental methods, Moult says.

Even in 1994, the structures predicted for small, simple proteins could match the experimental results. But for larger, harder proteins, the GDT scores for the calculations were around 20, “a complete disaster,” says Andrei Lupas, CASP judge and evolutionary biologist at the Max Planck Institute for Developmental Biology. By 2016, competing groups had achieved scores of around 40 for the hardest proteins, primarily by deriving information about known structures of proteins closely related to CASP targets.

In DeepMind’s first competition, in 2018, its algorithm, called AlphaFold, was based on this comparative strategy. But AlphaFold has also incorporated a computational approach called deep learning, in which the software is trained on vast data repositories – in this case, known protein sequences and structures – and learns to spot patterns. DeepMind won easily, beating the competition by an average of 15% on each structure and earning GDT scores of up to around 60 for the toughest targets.

But the predictions were still too rough, says John Jumper, who heads AlphaFold development at DeepMind. “We knew how far we were from biological relevance.” So the team combined deep learning with an “attention algorithm” that mimics how a person might put a puzzle together: connect pieces into groups – in this case clusters of amino acids – then search for means of joining the clusters into a larger whole. Working with a computer network built around 128 machine learning processors, they trained the algorithm on the approximately 170,000 known protein structures.

And it worked. In this year’s CASP, AlphaFold achieved a median GDT score of 92.4. For the more difficult proteins, AlphaFold scored a median of 87, 25 points above the following best predictions. He even excelled in resolving protein structures stuck in cell membranes, which are at the heart of many human diseases but are notoriously difficult to resolve with x-ray crystallography. Venki Ramakrishnan, structural biologist at the Molecular Biology Laboratory of the Medical Research Council calls the result “an astonishing advance on the problem of protein folding.”

All groups in this year’s competition have improved, says Moult. But with AlphaFold, Lupas says, “The game has changed.” The organizers even feared that DeepMind had cheated in one way or another. So Lupas set himself a special challenge: a membrane protein from a species of archaea, an ancient group of microbes. For 10 years, his team had been trying to obtain its crystal structure on x-rays. “We couldn’t solve it.”

But AlphaFold had no problem. He returned a detailed image of a three-part protein with two helical arms in the middle. The model allowed Lupas and his team to understand their radiographic data; within half an hour, they had adapted their experimental results to the intended structure of AlphaFold. “It’s almost perfect,” Lupas says. “They couldn’t have cheated on it. I don’t know how they do it.

As a condition for entering CASP, DeepMind – like all groups – agreed to reveal enough details about its method for other groups to recreate it. This will be a boon for experimenters, who will be able to use structure predictions to make sense of opaque x-ray and cryo-EM data. It could also allow drug designers to work out the structure of each protein in new and dangerous pathogens like SARS-CoV-2, a key step in finding molecules to block them, Moult says.

Still, AlphaFold doesn’t do all right. In CASP, he weakened on a protein, an amalgam of 52 small repeating segments, which distort the positions of each other during their assembly. Jumper says the team now wants to train AlphaFold to solve such structures, as well as those of protein complexes that work together to perform key functions in the cell.

Even if one great challenge has fallen, others will undoubtedly arise. “It’s not the end of something,” Thornton said. “This is the start of many new things.”

[ad_2]

Source link