Skip to content

Week 9 - Protein Structure Prediction Using AlphaFold

Computational prediction and evaluation of protein structures using AlphaFold and ColabFold.


Learning Objectives

By the end of this lab, you should be able to:

  1. Retrieve predicted protein structures from the AlphaFold Protein Structure Database.
  2. Interpret pLDDT confidence scores.
  3. Analyze Predicted Aligned Error (PAE) plots.
  4. Compare predicted models with experimental structures.
  5. Identify cases where AlphaFold performs well and where it struggles.
  6. Understand how ColabFold can be used to generate structure predictions.

Background

AlphaFold Protein Structure Database

AlphaFold is a deep learning-based system (DeepMind) that predicts protein three-dimensional structure from amino acid sequence.

The AlphaFold Protein Structure Database provides predicted structures for a large fraction of known protein sequences.

  • Database URL: https://alphafold.ebi.ac.uk
  • Structures are linked to UniProt entries
  • Models are provided in .pdb or .cif format
  • Each model includes per-residue confidence values (pLDDT)

Selected Proteins for This Lab

Protein 1: High expected accuracy

Xenorhodopsin

  • UniProt primary accession: A0AA82WPB4

Protein 2: Lower expected accuracy

Cellular tumor antigen p53

  • UniProt ID: P04637

Key Concepts

Predicted Local Distance Difference Test (pLDDT)

pLDDT is a per-residue confidence score ranging from 0 to 100.

pLDDT Score Interpretation
> 90 Very high confidence
70–90 Reliable backbone prediction
50–70 Low confidence
< 50 Likely disordered or flexible

Low pLDDT values often correspond to loops, termini, and intrinsically disordered regions.

Predicted Aligned Error (PAE)

PAE estimates the expected positional error between residue pairs.

PAE is useful for:

  • Assessing domain-domain orientation confidence
  • Evaluating multi-domain proteins
  • Identifying flexible linkers

PAE is typically visualized as a heatmap:

  • Dark regions indicate high confidence
  • Light regions indicate higher uncertainty

Part 1: Retrieval of AlphaFold Models

  1. Access the AlphaFold database:
  2. https://alphafold.ebi.ac.uk
  3. Search using UniProt ID:
  4. A0AA82WPB4 (Xenorhodopsin)
  5. P04637 (p53)

AlphaFold entry example

  1. Download the predicted structure file (.pdb or .cif).
  2. Download the associated PAE plot and confidence data if available.

Part 2: Visualization and Structural Analysis

Software options:

  • UCSF Chimera
  • ChimeraX
  • PyMOL
  • Jmol

Procedure

  1. Load the predicted structure file.
  2. Color the structure by pLDDT (stored in the B-factor column).
  3. Identify regions of high and low confidence.
  4. Examine secondary structure elements.
  5. Measure relevant distances and structural features.

Expected observations

Xenorhodopsin

  • Compact fold
  • Predominantly high pLDDT values
  • Clear secondary structure elements (typically alpha-helical)

p53

  • High confidence in the DNA-binding core domain
  • Low pLDDT in N- and C-terminal regions (often intrinsically disordered)
  • Uncertain inter-domain orientation (may vary depending on construct and context)

Part 3: Comparison with Experimental Structures

Xenorhodopsin

  1. Download an experimental structure (if available) using the relevant database (PDB).
  2. Superimpose experimental and AlphaFold structures.
  3. Calculate RMSD.

Expected result: RMSD < 2 Å, indicating strong agreement (when a close experimental structure exists).

p53

  1. Download relevant experimental structure data from the PDB (often individual domains or fragments).
  2. Superimpose the experimental structure with the AlphaFold model.
  3. Compare domain alignment and disordered regions.

Expected result:

  • Good agreement in structured core domain
  • Large discrepancies and/or missing structure in disordered regions

Optional Advanced Task: RMSD Calculation

Using Chimera or PyMOL:

  1. Align structures.
  2. Compute RMSD.
  3. Record and interpret structural deviation.

Part 4: Using ColabFold

What is ColabFold?

ColabFold is a Google Colab-based implementation of AlphaFold that:

  • Runs using cloud computing resources
  • Requires only amino acid sequence input
  • Uses MMseqs2 for rapid multiple sequence alignment
  • Produces ranked models with pLDDT and PAE outputs

ColabFold overview

Procedure

  1. Open the ColabFold notebook in Google Colab.
  2. Paste the amino acid sequence in FASTA format:
>Protein_name
MTEYKLVVVGAGGVGKSALTIQLIQNHFV...
  1. Select monomer or multimer mode.
  2. Run prediction.
  3. Download predicted models and confidence data.

Applications in this lab

  • Predict point mutants of Xenorhodopsin.
  • Predict truncated p53 domains.
  • Predict p53 tetramer using multimer mode.
  • Compare wild-type and mutant structures.

Discussion Questions

  1. Why does AlphaFold perform better on Xenorhodopsin than p53?
  2. How do intrinsically disordered regions affect pLDDT?
  3. Why is PAE useful for multi-domain proteins?
  4. Why can AlphaFold predict structure but not necessarily conformational dynamics?
  5. How might protein-protein interactions influence prediction accuracy?

Conclusion

AlphaFold performs exceptionally well for:

  • Single-domain globular proteins
  • Structurally rigid systems
  • Proteins with abundant homologous sequences

AlphaFold struggles with:

  • Intrinsically disordered proteins
  • Flexible multi-domain arrangements
  • Oligomerization-dependent conformations