DeepMind Releases Massive Database of 3D Protein Structures



[ad_1]

With the advent of cheap genetic sequencing, the world of biology has been inundated with 2D data. Now, artificial intelligence is pushing the field into three dimensions.

On Thursday, Alphabet-owned artificial intelligence company DeepMind announced that it had used its highly accurate deep learning model AlphaFold2 to predict the 3D structure of 350,000 proteins – including almost all proteins expressed in the body. human – from their amino acid sequences. These predictions, reported in Nature and made public in the AlphaFold Protein Structure Database, are a powerful tool for unraveling the molecular mechanisms of the human body and deploying them in medical innovations.

“This resource that we are making available, from about twice as many predictions as there are structures in the protein database, is just the start,” said John Jumper, Principal Investigator at ‘AlphaFold at DeepMind, during a press call. The company intends to continue adding predicted structures to the database.

publicity

“When we get to the scale of 100 million predictions that covers all proteins, we really start talking about transformative uses,” he said.

One of these transformations may come from the application of the database to drug discovery. In a rare move, DeepMind has chosen to make the database – published in partnership with the European Molecular Biology Laboratory – completely open source for all use.

publicity

“So we’re actually hoping that drug discovery and pharmacy will use it,” DeepMind CEO Demis Hassabis said on the call.

DeepMind’s predictions could be of interest to AI-driven pharmaceutical companies looking to refine their models, biotech startups hoping to expand their list of target proteins, and even companies developing new design enzymes.

“Anytime there’s a breakthrough, I think the rising tides lift all the boats. And that opens a very exciting era in the design of structure-based drugs, ”said Abraham Heifets, CEO of the AI-based drug discovery company Atomwise, which uses its own library of computational-inferred protein structures. to find molecules that selectively bind to proteins involved in disease. “Having better information about the shape of a protein is how you design a molecule that fits very well into that protein, to stop or stop this disease process. “

DeepMind had pledged to open its work in November, after AlphaFold2 won first prize in the CASP Protein Folding Prediction Competition, in what has been hailed as a solution to the long-standing problem of protein folding. But for the next seven months, structural biologists got angry while waiting for the groundbreaking work to be made public. As STAT reported last week, DeepMind rushed to publish its open source code and methods in Nature, just as a group at the University of Washington published its own attempt to replicate AlphaFold’s approach. in Science.

With the database adding so many new structural predictions, researchers, drug developers and grassroots scientists will have a lot of new material to work with. “We’re going to go through it very quickly to see if there are any proteins that we’re interested in that are suddenly activated by this new data set,” Heifets said.

Jumper believes the new tool will remove a tough choice that plagues some biologists: If a protein structure isn’t available, they could spend a lot of time and money on physical experiments to figure it out (which still might not function), or they could just do without and focus on functional studies. “Suddenly, access to structures will increase dramatically,” he told STAT. “I think this will really change the way scientists approach these biological questions.”

Yet, these are not plug-and-play structures: they are predictions, and they come with caveats that scientists will need to heed.

“Me as a biochemist, I would like to understand if it is a good model or not? What about this algorithm is confident or not? “said Frank von Delft, who directs protein crystallography at the Center for Medicines Discovery at the University of Oxford.” I think that will be the key. Can you tell me, ‘Yeah, I kinda succeeded, and this one I have a hard time doing, but this one is easy to do ‘? ”

To answer this question, DeepMind incorporated metrics into its predictions to help researchers determine whether they should rely on structures for their work. “Preparing the predictions was actually only a small part of that work,” said Kathryn Tunyasuvunakool of DeepMind, lead author of the article, during the call. “Perhaps even more effort has been put into providing both local and global confidence-building measures. “

Overall, AlphaFold2 predicted 58% of the amino acids in the human proteome – all proteins expressed by the human body – with confidence, and 35.7% with very high confidence. At this level, the model could nail not only the backbone of the protein, but the orientation of its side chains. The degree of confidence required will depend on how scientists use the prediction. “If you were to look at, say, the active site of an enzyme, you would want the residues involved to be in that highest confidence range,” said Tunyasuvunakool, “but in reality there is tremendous utility even in the second highest confidence support. “

“It’s a little overwhelming what they can do,” said Arne Elofsson, a bioinformatician at Stockholm University.

The AlphaFold Database does not condemn experimental biologists, those who painstakingly determine the physical structure of proteins using methods such as x-ray crystallography, electron cryomicroscopy, and nuclear magnetic resonance spectroscopy. For many applications, it will be necessary to validate the structures offered by these models, Elofsson said.

But as predicted structures become more accepted, the AlphaFold database could change the way structural biology prioritizes its work – and even what it considers its gold standard.

“Normally, in CASP, we assume that the experiment is the gold standard, and if you don’t agree, you’re wrong,” said John Moult, a computer biologist at the University of Maryland who has founded the competition. “And with DeepMind, sometimes that’s true, but most of the time it’s not true. In other words, there is room for error in the physical experiments used to determine the structure of proteins – and with a very precise prediction model, a computer could in some cases do the job better. “So I think there are a lot of things to work out: when is a detail really better computationally than the corresponding experimental result?” “

This will be a philosophical question that the field will have to grapple with over time, especially as AlphaFold’s approach continues to develop. DeepMind has made huge gains between its first CASP contest entry in 2018, with AlphaFold1 and AlphaFold2 in 2020. “It’s sort of a 2.1 version, and we expect there will be more improvements to the game. over time, ”said Hassabis, adding that DeepMind may update the database as more experimental protein structures are resolved or the computer model continues to be developed.

As the database expands, the set of structures that might be applied to drug discovery might change as well. “One thing people don’t really know or think about is that there are 20,000 human genes, but only 4% of them have ever had an FDA approved drug,” Heifets said. . “So we have a lot more protein targets than we could hit than we’ve ever had in medicine.” DeepMind has partnered with the Drugs for Neglected Diseases Initiative to develop approaches for Chagas disease and leishmaniasis.

But there are also uses of the database that are still unknown. “AlphaFold is a paradigm shift in the level of precision biologists can now expect, which will unlock other applications,” Pushmeet Kohli, AI chief science officer at DeepMind, told STAT. “That’s why we wanted to make AlphaFold widely accessible so the community wouldn’t just exploit it for existing applications, like in drug discovery, but also for other applications that they might not even have. thought so far. “



[ad_2]

Source link