Encoded in the DNA, a protein may be represented as a thread of hundreds of individual molecules called amino acids, linked together.
Depending on the particular amino acids combination, a protein folds in a particular way, resulting in a functional 3D shape. The shape makes the function, and with 20 different amino acids available, the possible combinations are countless.
Current genomic technologies make it easy to know the amino acid sequence of a protein but knowing its 3D shape calls for expensive and time-consuming experimental procedures, and these are not always successful.
For decades, researchers have tried to understand what makes a protein fold in a particular shape, to predict it from its amino acid sequence.
About Deep Mind’s Alpha Fold 2
Alpha Fold 2 is a neural network developed by Deep Mind, a Google-owned artificial intelligence (AI) company, specifically trained to solve the 3D structure of proteins precisely from its amino acid sequence. Its accuracy impressed the scientific community a few years ago after its victories at the annual international contest on protein structure modeling CASP, when its team presented the full proteome for 11 different species, including humans.
To put all the data released by Alpha Fold 2 into context (more than 300k models and growing), a community of independent researchers including Eduard Porta, head of the Cancer Immunogenetics group at the Josep Carreras Leukaemia Research Institute, compared the new structures made available to the currently available and concluded that Alpha Fold 2 contributed with an extra 25% of high-quality protein structures to any given species. The analysis has been published in the journal Nature Structural Biology.
The role that many proteins play in disease, such as cancer, is already known, but the lack of a deep knowledge of their functioning at the molecular level prevents the development of specific strategies against them. The structural information of these proteins will help scientists to understand those proteins much better, to know what other molecules they may interact with inside the cell and to design new drugs, capable of interfering with their function when they are altered.
There are limitations to the capabilities of Alpha Fold 2. The community team found the algorithm has problems when trying to recreate protein complexes. Most proteins work together with other proteins to perform a biological function, so predicting how different proteins could stick together would be highly desirable. Another limitation identified is its inability to show the structure of mutated proteins, with altered amino acids on its sequence. Mutations often result in abnormal protein function and are the cause of many diseases like cancer.
Despite the limitations, the team recognizes the contribution of Alpha Fold 2 to the community, which will impact basic and biomedical research greatly in the coming years. Not only thanks to its direct contribution (thousands of new reliable 3D protein models), but by starting a new era of computational tools based on AI able to yield results that no one can anticipate.
Meta enters the scene
Recently, a team at Meta (formerly Facebook) used a modified version of its natural language predictor to “autocomplete” proteins. The AI tool, called ESMFold, seems to be less accurate compared to its Google’s counterpart, but is 60 times faster and can overcome some of the identified Alpha Fold 2 limitations like handling mutated sequences.
All in all, as the authors of the publication admit, “the application of AlphaFold2 [and the coming tools] will have a transformative impact in life sciences.”