Week 9 - Protein Structure Prediction Using AlphaFold¶
Computational prediction and evaluation of protein structures using AlphaFold and ColabFold.
Learning Objectives¶
By the end of this lab, you should be able to:
- Retrieve predicted protein structures from the AlphaFold Protein Structure Database.
- Interpret pLDDT confidence scores.
- Analyze Predicted Aligned Error (PAE) plots.
- Compare predicted models with experimental structures.
- Identify cases where AlphaFold performs well and where it struggles.
- Understand how ColabFold can be used to generate structure predictions.
Background¶
AlphaFold Protein Structure Database¶
AlphaFold is a deep learning-based system (DeepMind) that predicts protein three-dimensional structure from amino acid sequence.
The AlphaFold Protein Structure Database provides predicted structures for a large fraction of known protein sequences.
- Database URL: https://alphafold.ebi.ac.uk
- Structures are linked to UniProt entries
- Models are provided in
.pdbor.cifformat - Each model includes per-residue confidence values (pLDDT)
Selected Proteins for This Lab¶
Protein 1: High expected accuracy¶
Xenorhodopsin
- UniProt primary accession:
A0AA82WPB4
Protein 2: Lower expected accuracy¶
Cellular tumor antigen p53
- UniProt ID:
P04637
Key Concepts¶
Predicted Local Distance Difference Test (pLDDT)¶
pLDDT is a per-residue confidence score ranging from 0 to 100.
| pLDDT Score | Interpretation |
|---|---|
| > 90 | Very high confidence |
| 70–90 | Reliable backbone prediction |
| 50–70 | Low confidence |
| < 50 | Likely disordered or flexible |
Low pLDDT values often correspond to loops, termini, and intrinsically disordered regions.
Predicted Aligned Error (PAE)¶
PAE estimates the expected positional error between residue pairs.
PAE is useful for:
- Assessing domain-domain orientation confidence
- Evaluating multi-domain proteins
- Identifying flexible linkers
PAE is typically visualized as a heatmap:
- Dark regions indicate high confidence
- Light regions indicate higher uncertainty
Part 1: Retrieval of AlphaFold Models¶
- Access the AlphaFold database:
- https://alphafold.ebi.ac.uk
- Search using UniProt ID:
A0AA82WPB4(Xenorhodopsin)P04637(p53)

- Download the predicted structure file (
.pdbor.cif). - Download the associated PAE plot and confidence data if available.
Part 2: Visualization and Structural Analysis¶
Software options:
- UCSF Chimera
- ChimeraX
- PyMOL
- Jmol
Procedure¶
- Load the predicted structure file.
- Color the structure by pLDDT (stored in the B-factor column).
- Identify regions of high and low confidence.
- Examine secondary structure elements.
- Measure relevant distances and structural features.
Expected observations¶
Xenorhodopsin
- Compact fold
- Predominantly high pLDDT values
- Clear secondary structure elements (typically alpha-helical)
p53
- High confidence in the DNA-binding core domain
- Low pLDDT in N- and C-terminal regions (often intrinsically disordered)
- Uncertain inter-domain orientation (may vary depending on construct and context)
Part 3: Comparison with Experimental Structures¶
Xenorhodopsin¶
- Download an experimental structure (if available) using the relevant database (PDB).
- Superimpose experimental and AlphaFold structures.
- Calculate RMSD.
Expected result: RMSD < 2 Å, indicating strong agreement (when a close experimental structure exists).
p53¶
- Download relevant experimental structure data from the PDB (often individual domains or fragments).
- Superimpose the experimental structure with the AlphaFold model.
- Compare domain alignment and disordered regions.
Expected result:
- Good agreement in structured core domain
- Large discrepancies and/or missing structure in disordered regions
Optional Advanced Task: RMSD Calculation¶
Using Chimera or PyMOL:
- Align structures.
- Compute RMSD.
- Record and interpret structural deviation.
Part 4: Using ColabFold¶
What is ColabFold?¶
ColabFold is a Google Colab-based implementation of AlphaFold that:
- Runs using cloud computing resources
- Requires only amino acid sequence input
- Uses MMseqs2 for rapid multiple sequence alignment
- Produces ranked models with pLDDT and PAE outputs

Procedure¶
- Open the ColabFold notebook in Google Colab.
- Paste the amino acid sequence in FASTA format:
>Protein_name
MTEYKLVVVGAGGVGKSALTIQLIQNHFV...
- Select monomer or multimer mode.
- Run prediction.
- Download predicted models and confidence data.
Applications in this lab¶
- Predict point mutants of Xenorhodopsin.
- Predict truncated p53 domains.
- Predict p53 tetramer using multimer mode.
- Compare wild-type and mutant structures.
Discussion Questions¶
- Why does AlphaFold perform better on Xenorhodopsin than p53?
- How do intrinsically disordered regions affect pLDDT?
- Why is PAE useful for multi-domain proteins?
- Why can AlphaFold predict structure but not necessarily conformational dynamics?
- How might protein-protein interactions influence prediction accuracy?
Conclusion¶
AlphaFold performs exceptionally well for:
- Single-domain globular proteins
- Structurally rigid systems
- Proteins with abundant homologous sequences
AlphaFold struggles with:
- Intrinsically disordered proteins
- Flexible multi-domain arrangements
- Oligomerization-dependent conformations