A new deep learning approach predicts the structure of proteins from the amino acid sequence



[ad_1]

amino acid

Selenocysteine ​​amino acids, model 3D-balls. Credit: YassineMrabet / CC BY 3.0 / Wikipedia

Almost all the basic biological processes necessary for life are made by proteins. They create and maintain the shape of cells and tissues; are the enzymes that catalyze the chemical reactions essential to the maintenance of life; act as molecular factories, carriers and engines; serve as both signal and receiver for cellular communications; and much more.

Composed of long chains of amino acids, proteins perform these innumerable tasks by bending into precise 3D structures that govern how they interact with other molecules. Because the shape of a protein determines its function and the extent of its dysfunction in the disease, efforts to illuminate its structures are at the heart of all molecular biology, and in particular therapeutic science and drug development who save and alter life.

In recent years, computer methods have made considerable progress in predicting protein folding based on knowledge of their amino acid sequence. If fully implemented, these methods could transform virtually every facet of biomedical research. Current approaches, however, are limited in the scale and scope of the proteins that can be determined.

Today, a scientist at Harvard Medical School uses a form of artificial intelligence called deep learning to effectively predict the 3D structure of any protein based on its sequence of acids. amines.

Online reporting in Cellular systems On April 17, systems biologist Mohammed AlQuraishi detailed a new approach for computer-based protein determination, providing precision comparable to current state-of-the-art methods, but at speeds up to a million times faster .

"Protein folding has been one of the most important issues for biochemists over the last 50 years, and this approach represents a fundamentally new way of meeting this challenge," said AlQuraishi, Systems Biology Instructor. at the Blavatnik Institute of HMS and Fellow the Systems Pharmacology Laboratory. "We now have a whole new perspective for exploring protein folding, and I think we're just starting to scratch the surface."

Easy to say

Although highly efficient, processes that use physical tools to identify protein structures are expensive and time consuming, even with modern techniques such as cryo-electronic microscopy. As such, the vast majority of protein structures – and the effects of mutations causing disease on these structures – are still largely unknown.

Computer methods that calculate how proteins fold can potentially significantly reduce the cost and time required to determine the structure. But the problem is difficult and remains unresolved after almost four decades of intense efforts.

Proteins are constructed from a library of 20 different amino acids. These act as letters in an alphabet, combining words, sentences and paragraphs to produce an astronomical number of possible texts. Unlike letters in the alphabet, amino acids are physical objects placed in a 3D space. Often, the sections of a protein are close to each other, but are separated by great distances in terms of sequence because its amino acid chains form loops, spirals, leaves, and twists .

"The problem is that the problem is easy enough to state: take a sequence and determine the form," AlQuraishi said. "A protein starts out as an unstructured chain that has to take on a three-dimensional shape, and the sets of possible forms in which a chain can bend are huge.Many proteins contain thousands of amino acids, and their complexity exceeds quickly the capacity of human intuition or even the most powerful computers ".

Difficult to solve

To meet this challenge, scientists are taking advantage of the fact that amino acids interact with one another based on the laws of physics, looking for energetically favorable states such as swaying downhill and settling deep down. a valley.

The most advanced algorithms calculate the protein structure by using supercomputers (or generalized computing power in projects such as Rosetta @ Home and Folding @ Home) to simulate the complex physics of interactions. amino acids by brute force. To reduce the massive computing requirements, these projects rely on mapping new sequences to predefined models, which are protein structures previously determined by the experiment.

Other projects such as Google's AlphaFold have sparked a recent craze using advances in artificial intelligence to predict the structure of a protein. To do this, these approaches analyze huge volumes of genomic data, which contain the protein sequence masterplan. They are looking for sequences across many species that have probably evolved together, using sequences as physical proximity indicators close to the assembly of the guiding structure.

However, these AI approaches do not predict structures based solely on the amino acid sequence of a protein. Thus, their effectiveness is limited for proteins for which there is no prior knowledge, such as single evolutionary proteins or new proteins designed by humans.

S train deeply

To develop a new approach, AlQuraishi has applied differentially differentiable end-to-end learning. This branch of artificial intelligence has dramatically reduced the computing power and the time required to solve such problems as image recognition and voice recognition, thus allowing applications such as Siri 's. Apple and Google Translate.

Essentially, differentiable learning involves a single, huge mathematical function – a much more sophisticated version of a secondary computing equation – organized into a neural network, with each component of the network feeding the information upstream and downstream.

This function can adjust and adjust, over and over again, at unimaginable levels of complexity, in order to "learn" precisely how a protein sequence is mathematically related to its structure.

AlQuraishi has developed a deep learning model, called a recurrent geometric network, which focuses on the key features of protein folding. But before being able to make new predictions, it must be trained using predetermined sequences and structures.

For each amino acid, the model predicts the most likely angle of the chemical bonds that link the amino acid to its neighbors. It also predicts the angle of rotation around these bonds, which affects the geometric relationship of a local section of a protein with the entire structure.

This is done several times, each calculation being informed and refined by the relative positions of each other amino acid. Once the entire structure is complete, the model verifies the accuracy of its prediction by comparing it to the "truth-on-the-ground" structure of the protein.

This process is repeated for thousands of known proteins, the model learning and improving its accuracy at each iteration.

New view

Once his model was formed, AlQuraishi tested his predictive power. He compared his performances with those of other recent years methods of critical evaluation of protein structure prediction, an annual experiment testing the effectiveness of computer prediction methods using specific protein structures. but not published.

He found that the new model outperformed all other methods for predicting protein structures for which there were no pre-existing models, including methods that use co-evolutive data. He also outperformed all but the best methods when pre-existing models were available to make predictions.

Although these accuracy gains are relatively small, AlQuraishi notes that any improvement at the top of these tests is difficult to achieve. And because this method represents an entirely new approach to protein folding, it can complement existing methods, both computer and physical, to determine a much wider range of structures than was previously possible.

Surprisingly, the new model performs its predictions about six to seven times faster than the existing calculation methods. Model formation can take months, but once trained, it can predict milliseconds over the hours or days needed using other approaches. This dramatic improvement is in part due to the unique mathematical function on which it is based, requiring only a few thousand lines of computer code to run instead of millions.

The speed of the predictions of this model allows new applications slow or difficult to achieve before, said AlQuraishi, such as predicting the change in the shape of proteins when interacting with other molecules.

"Deep learning approaches, not just mine, will continue to grow in predictive power and popularity, as they represent a minimal and simple paradigm that can more easily incorporate new ideas than today's complex models," he said. he added.

The new model is not immediately ready to be used, for example, in the discovery or design of drugs, said AlQuraishi, because its accuracy is currently around 6 angstroms, which still represents a certain distance by ratio to 1 or 2 angstroms required for protein resolution. But there are many opportunities to optimize the approach, he said, including more integration of rules from chemistry and physics.

"The accurate and efficient prediction of protein folding has been a great graal for the field, and I hope and hope that this approach, combined with all the other remarkable methods that have been developed, will be able to do so in the near future, "AlQuraishi said. "We could solve this problem soon, and I think no one would have said it five years ago, it's very exciting and shocking at the same time."

To help others participate in the development of methods, AlQuraishi has made its software and results available for free via the GitHub software sharing platform.

"A notable feature of AlQuraishi's work is that a mere researcher, integrated with the rich research ecosystem of Harvard Medical School and the Boston biomedical community, can compete with companies such as Google in the future." 39, one of the most advanced areas of computer science, "said Peter Sorger, HMS Otto Krayer Professor of Systems Pharmacology at the HMS Institute Blavatnik, Director of Systems Pharmacology Laboratory at HMS and AlQuraishi University Mentor.

"It is unwise to underestimate the disruptive impact of shining stock market performers such as AlQuraishi working with public domain open source software," Sorger said.


A model learns how individual amino acids determine the function of proteins


More information:
Cellular systems (2019). DOI: 10.1016 / j.cels.2019.03.006

Provided by
Harvard Medical School


Quote:
A new deep learning approach predicts the structure of proteins from the amino acid sequence (April 17, 2019)
recovered on April 17, 2019
at https://phys.org/news/2019-04-deep-learning-approach-protein-amino-acid.html

This document is subject to copyright. Apart from any fair use for study or private research purposes, no
part may be reproduced without written permission. Content is provided for information only.

[ad_2]

Source link