Highly accurate prediction of protein structure for the human proteome



[ad_1]

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we dramatically expand the structural coverage by applying the state-of-the-art machine learning method, AlphaFold2, on the scale of almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of the residuals with a reliable prediction, of which a subset (36% of all residuals) has very high confidence. We introduce several metrics developed based on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as potentially disordered regions. Finally, we provide case studies illustrating how high quality predictions can be used to generate biological hypotheses. It is important to note that we make our predictions available to the community for free via a public database (hosted by the European Institute of Bioinformatics at https://alphafold.ebi.ac.uk/). We predict that large-scale, high-precision routine structure prediction will become an important tool, allowing new questions to be addressed from a structural point of view.

[ad_2]

Source link