What AlphaFold Means for Structural Bioinformatics

WHAT IS STRUCTURAL BIOINFORMATICS?

Structural bioinformatics is concerned with the analysis, prediction, and visualization of the 3D structure of biological macromolecules, which are biomolecules with a mass between 800 and 1000 daltons, high molecular weights, and complex structures. Structural biology focuses on the molecular structure of proteins, nucleic acids, and membranes. These biomacromolecules perform most of the functions in cells and hence, it’s important to understand the structures that enable them to do so.

Source

INTRODUCING THE PROTEIN FOLDING PROBLEM

INTRODUCING CASP

Critical Assessment of Structural Prediction (CASP) is a biennial competition that presents predictor groups from industry and academia with about 100 protein sequences whose structures have been found but not publicly released. Some entrants (like AlphaFold) compute a structure for each sequence, whereas others determine it experimentally. The first CASP competition took place in 1994 and the most recent one was in 2020.

  • The Topology category is concerned with the ability of methods to predict contacts and interresidual distances.
  • The Refinement category is concerned with the analysis of success improving the accuracy of refined models from the initial submissions.
  • The Assembly category is concerned with the assessment of how well methods can predict various interactions.
  • The Accuracy Estimation category is concerned with the ability to provide useful estimates for the overall accuracy of models at the domain and residue level.
  • The Data Assisted category is concerned with how much the accuracy of models is improved by adding sparse data.
  • The Biological Relevance category is concerned with the answers to biological questions that the models provide.

WHAT IS ALPHAFOLD?

Like most modern prediction algorithms, AlphaFold’s feature engineering technique is multiple sequence alignment (MSA). In short, it helps determine how similar amino acid sequences are. The initial amino acid sequences often have an evolutionary relationship with the data in the Protein Data Bank, co-evolving in a close 3D space and descending from a common ancestor. It’s then possible to infer sequence homology (similarity due to shared ancestry) and conduct phylogenetic (evolutionary development and diversification of species) analysis. MSA is used to assess sequence conservation (similar sequences in nucleic acids and proteins across species) of secondary and tertiary structures. The idea is that if 2 amino acids are in close contact in 3D space, the mutations in one amino acid will be followed by mutations of the other. Amino acids that are distant in a sequence generally don’t have much of an effect on each other, so MSA could provide valuable hints on the shape of a protein. Input data for AlphaFold and AlphaFold2 is the information about pairs of amino acids that end up close together in folded structures.

Source

HOW DOES ALPHAFOLD WORK?

The architecture of the AlphaFold system [ Source ]

HOW DOES ALPHAFOLD2 WORK?

Instead of using a 2 step approach (like for AlphaFold), DeepMind took an end-to-end approach with AlphaFold2, taking the MSA as input and providing the full structure as output. This year, their program is based on an attention-based neural network, a new deep learning approach, called a Transformer. Also popular for natural language processing and computer vision purposes, attention mechanisms enable neural networks to focus on any subset of their inputs or features when training. The Transformer attempts to interpret the structure of a folded protein, always refining itself with the MSA and representation of amino acid residue pairs. The system is then able to make predictions of the physical structure and determine the accurate structure.

The attention-based neural network approach can be compared to assembling a puzzle, local chunks are firstly pieced together then fit to form a whole [ Source ]

WHAT ABOUT STRUCTURAL BIOINFORMATICS?

Thanks to AlphaFold2, structural bioinformaticians can now focus on problems other than structural prediction. The program doesn’t reveal how an amino acid chain assembles into the structure within milliseconds, but rather only demonstrates crystal structure. Because the neural networks might be difficult to interpret and poorly represent the dynamic folding process, understanding the way AlphaFold2 infers the folded structure could either provide a lot of or very little insight.

“Structural bioinformatics main objectives are the creation of new methods to deal with biological macromolecules data to solve problems in biology and generate new knowledge.”

There is no doubt that AlphaFold2 has met these goals and now the question is: How will the scientific community build upon this breakthrough?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store