By Beatrix Watson-Smyth
This week it was announced that DeepMind, an AI company in the UK, has been able to predict the folding of a protein from just its primary structure. This has been one of the biggest challenges in biology and this breakthrough ‘will change everything’ according to the headline of an article in Nature. The program is known as AlphaFold 2 and predicts the 3D structure of any protein.
Proteins are incredible useful molecules, all relying heavily on their structure and how it interacts with itself and other molecules to give its function. For example, an enzyme's structure gives it its ability to catalyse a reaction, or Keratin, the protein in hair, has a large percentage of cysteine, an amino acid that can form S-S (disulphide) bonds, between other cysteine amino acids allowing hair to be curly.
Determining this protein structure is a difficult, time-consuming, and expensive procedure to carry out in the lab. Many methods are used, such as Nuclear Magnetic Resonance (NMR) spectroscopy, X-ray diffraction or Cryo-EM. Some structures still cannot be solved by these methods and many just don’t have the time or money in order to be looked into. Since the Protein Data Bank, PDB, was created in 1971 there are now only 120,000 protein structures added, whereas there are around 200 million known proteins, with another 30 million found every year.
DeepMind’s AI program simply requires the primary sequence, or the order of the amino acids that make up the peptide chains, in order to predict a 3D structure. This is much easier information to gather. Then, the program can predict how the structure will fold, which amino acids interact where, how the hydrophilic groups will land, etcetera.
To give you an idea of how many different structures can be formed from a protein’s primary structure, for a 100 amino acid long peptide chain it would take 10^23 years to test every possible conformation/shape, assuming it takes just 0.1picoseconds or 0.0000000000001 seconds to test each possibility. This is longer than the universe has existed for. This is known as Levinthal’s Paradox and is the evidence for why proteins don’t just fold randomly until they find their preferred shape. It would simply take too long.
Every two years there is a competition, CASP (Critical Assessment of protein Structure Prediction), to judge the performance of these computer programs. This involves the programs predicting structures which have not yet been released into data sharing sites. These unseen structures are determined primarily through cyro-EM in the lab, then compared to the predictions by the programs. Two examples of AlfaFolds predictions vs the lab data are seen in the picture above. AlphaFold scored over 90 on two-thirds of all proteins this year. Although this isn’t perfect, it is enough for Mohammed AlQuraishi, a computational biologist at Columbia University in New York City and a CASP participant, to state “I suspect many will leave the field [of protein-structure-prediction] as the core problem has arguably been solved”.
In the future, if AlphaFold can also predict how the proteins will change shape when interacting with other proteins, labs could run combinations online before testing in a lab, removing months of work and dramatically dropping costs. In the example of COVID-19, AlphaFold could not-only predict the shape of the spike or binding proteins on the cell surface, but could potentially predict what proteins can bind to these structure. This would tell us if a potential drug would be effective at preventing the virus multiplying in the body, as if a protein can bind to these structures the virus cannot bind to body cells and infect them. The program could test 100s of possibilities determining which will have a real therapeutic effect, preventing many expensive time-consuming experiments with proteins that wouldn’t bind.
Bibliography
Callaway, E. (2020). ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature.com. Retrieved 3 December 2020, from https://www.nature.com/articles/d41586-020-03348-4.
Current Release Statistics < Uniprot < EMBL-EBI. Ebi.ac.uk. (2020). Retrieved 4 December 2020, from https://www.ebi.ac.uk/uniprot/TrEMBLstats.
AlphaFold. Deepmind. (2020). Retrieved 4 December 2020, from https://deepmind.com/research/case-studies/alphafold.
Le Page, M. (2020). Educational access digital subscriptions | New Scientist. Institutions.newscientist.com. Retrieved 3 December 2020, from https://institutions.newscientist.com/article/2261156-deepminds-ai-biologist-can-decipher-secrets-of-the-machinery-of-life/?utm_campaign=onesignal&utm_medium=alert&utm_source=editorial
Atkins, P. (2003). Atkins' molecules (2nd ed., pp. 104-107). Cambridge University Press.
Helliwell, John R. “New developments in crystallography: exploring its technology, methods and scope in the molecular biosciences.” Bioscience reports vol. 37,4 BSR20170204. 4 Jul. 2017, doi:10.1042/BSR20170204
Image 1
Callaway, E. (2020). ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature.com. Retrieved 3 December 2020, from https://www.nature.com/articles/d41586-020-03348-4.
Image 2
AlphaFold: a solution to a 50-year-old grand challenge in biology. Deepmind. (2020). Retrieved 4 December 2020, from https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology.
Comments