DeepMind AI handles protein folding, which humbled the previous software



[ad_1]

Proteins quickly form complicated structures that have proven difficult to predict.
Enlarge / Proteins quickly form complicated structures that have proven difficult to predict.

Today, DeepMind announced that it has apparently solved one of the major problems in biology: how the chain of amino acids in a protein folds into a three-dimensional shape that enables their complex functions. It’s a computational challenge that has withstood the efforts of many very intelligent biologists for decades, despite the application of supercomputer-level hardware for these calculations. Instead, DeepMind trained its system using 128 specialized processors for a few weeks; it now returns potential structures within days.

The limits of the system are not yet clear – DeepMind says it is currently planning to publish a peer-reviewed article and has only made available a blog post and some press releases. But the system is clearly performing better than anything that came before it, having more than doubled the performance of the best system in just four years. While not useful in all circumstances, the advance likely means that the structure of many proteins can now be predicted from nothing more than the DNA sequence of the gene that encodes them, which would mark a major change for biology.

Between the folds

To make proteins, our cells (and those of all other organisms) chemically bind amino acids to form a chain. It works because each amino acid shares a backbone that can be chemically connected to form a polymer. But each of the 20 amino acids used by life has a distinct set of atoms attached to this backbone. These can be charged or neutral, acidic or basic, etc., and these properties determine how each amino acid interacts with its neighbors and the environment.

The interactions of these amino acids determine the three-dimensional structure that the chain adopts after its production. Hydrophobic amino acids are found inside the structure to avoid the aqueous environment. Positive and negatively charged amino acids attract each other. Hydrogen bonds lead to the formation of regular spirals or parallel sheets. Collectively, these lead what might otherwise be a messy chain to fold back into an orderly structure. And this ordered structure in turn defines the behavior of the protein, allowing it to act as a catalyst, to bind to DNA, or to drive muscle contraction.

Determining the order of amino acids in a protein chain is relatively easy. – they are defined by the order of the DNA bases in the gene which code for the protein. And since we have become very good at whole genome sequencing, we have an overabundance of gene sequences and therefore a huge surplus of protein sequences at our disposal now. For many of them, however, we have no idea what the folded protein looks like, making it difficult to determine how they function.

Since a protein’s backbone is very flexible, almost every two amino acid in a protein could potentially interact with each other. So, determining which ones actually interact in the folded protein and how that interaction minimizes the free energy of the final configuration becomes an intractable computational challenge once the number of amino acids gets too large. Essentially, when an amino acid can occupy potential coordinates in 3D space, it becomes difficult to determine what to put where.

Despite the difficulties, there has been progress, notably thanks to distributed computing and the gamification of folding. But an ongoing biennial event called the Critical Assessment of Protein Structure Prediction (CASP) has seen fairly uneven progress throughout its existence. And in the absence of an efficient algorithm, people are left with the daunting task of purifying the protein, and then using X-ray diffraction or cryo-electron microscopy to understand the structure of the purified form, efforts that can often take years.

DeepMind enters the fray

DeepMind is an AI company that was acquired by Google in 2014. Since then, it has made a number of splashes, developing systems that have successfully taken on humans at Go, Chess and even StarCraft. In several of his notable successes, the system was trained simply by providing him with the rules of a game before letting him play himself.

The system is incredibly powerful, but it wasn’t clear that it would work for protein folding. For one thing, there is no obvious external standard for a “win” – if you get a structure with very low free energy, that doesn’t guarantee that there is something slightly lower there. . There aren’t many rules either. Yes, amino acids with opposite charges will reduce free energy if they are side by side. But that won’t happen if it comes at the cost of dozens of hydrogen bonds and hydrophobic amino acids coming out into the water.

So, how do you adapt an AI to work under these conditions? For their new algorithm, called AlphaFold, the DeepMind team treated the protein as a spatial network graph, with each amino acid as a node, and the connections between them mediated by their proximity in the folded protein. The AI ​​itself is then trained to determine the configuration and strength of these connections by feeding it the pre-determined structures of more than 170,000 proteins obtained from a public database.

When given a new protein, AlphaFold searches for all proteins with an associated sequence and aligns the related parts of the sequences. It also looks for proteins with known structures that also have regions of similarity. Typically, these approaches are great at optimizing local structural features, but not so good at predicting overall protein structure – smoothing together a bunch of highly optimized chunks doesn’t necessarily produce an optimal set. And this is where an attention-based deep learning portion of the algorithm was used to make sure the overall structure was consistent.

Clear success, but with limits

For this year’s CASP, AlphaFold and the algorithms of other entrants were unleashed on a series of proteins that were not yet resolved (and resolved as the challenge progressed) or that have been resolved but not yet released. . So the algorithm makers had no way to prepare the systems with real-world information, and the algorithms’ output could be compared to the best real-world data as part of the challenge.

AlphaFold did very well – much better, in fact, than any other entry. For about two-thirds of the proteins he predicted structure for, that was in the experimental error you would get if you tried to replicate structural studies in a lab. Overall, on an accuracy rating of zero to 100, it averaged a score of 92 – again, the kind of range you would see if you tried to get the structure twice in two. different conditions.

By any reasonable standard, the computational challenge of determining the structure of a protein has been solved.

Unfortunately, there are a lot of unreasonable proteins. Some get stuck in the membrane immediately; others quickly resume chemical changes. Still others require extensive interactions with specialized enzymes that burn energy in order to force other proteins to fold. In all likelihood, AlphaFold won’t be able to handle all of these extreme cases, and without an academic paper describing the system, the system will take some time – and some real-world use – to understand its limitations. This is not to take away an incredible achievement, just to warn against unreasonable expectations.

The key question now is how quickly the system will be made available to the biological research community so that its boundaries can be defined and we can begin to use it in cases where it is likely to work well and to have significant value, such as the protein structure of pathogens or mutated forms found in cancer cells.

[ad_2]

Source link