Week 9 - Protein Structure Prediction Using AlphaFold¶

Computational prediction and evaluation of protein structures using AlphaFold and ColabFold.

Learning Objectives¶

By the end of this lab, you should be able to:

Retrieve predicted protein structures from the AlphaFold Protein Structure Database.
Interpret pLDDT confidence scores.
Analyze Predicted Aligned Error (PAE) plots.
Compare predicted models with experimental structures.
Identify cases where AlphaFold performs well and where it struggles.
Understand how ColabFold can be used to generate structure predictions.

Background¶

AlphaFold Protein Structure Database¶

AlphaFold is a deep learning-based system (DeepMind) that predicts protein three-dimensional structure from amino acid sequence.

The AlphaFold Protein Structure Database provides predicted structures for a large fraction of known protein sequences.

Database URL: https://alphafold.ebi.ac.uk
Structures are linked to UniProt entries
Models are provided in .pdb or .cif format
Each model includes per-residue confidence values (pLDDT)

Selected Proteins for This Lab¶

Protein 1: High expected accuracy¶

Xenorhodopsin

UniProt primary accession: A0AA82WPB4

Protein 2: Lower expected accuracy¶

Cellular tumor antigen p53

UniProt ID: P04637

Key Concepts¶

Predicted Local Distance Difference Test (pLDDT)¶

pLDDT is a per-residue confidence score ranging from 0 to 100.

pLDDT Score	Interpretation
> 90	Very high confidence
70–90	Reliable backbone prediction
50–70	Low confidence
< 50	Likely disordered or flexible

Low pLDDT values often correspond to loops, termini, and intrinsically disordered regions.

Predicted Aligned Error (PAE)¶

PAE estimates the expected positional error between residue pairs.

PAE is useful for:

Assessing domain-domain orientation confidence
Evaluating multi-domain proteins
Identifying flexible linkers

PAE is typically visualized as a heatmap:

Dark regions indicate high confidence
Light regions indicate higher uncertainty

Part 1: Retrieval of AlphaFold Models¶

Access the AlphaFold database:
https://alphafold.ebi.ac.uk
Search using UniProt ID:
A0AA82WPB4 (Xenorhodopsin)
P04637 (p53)

AlphaFold entry example

Download the predicted structure file (.pdb or .cif).
Download the associated PAE plot and confidence data if available.

Part 2: Visualization and Structural Analysis¶

Software options:

UCSF Chimera
ChimeraX
PyMOL
Jmol

Procedure¶

Load the predicted structure file.
Color the structure by pLDDT (stored in the B-factor column).
Identify regions of high and low confidence.
Examine secondary structure elements.
Measure relevant distances and structural features.

Expected observations¶

Xenorhodopsin

Compact fold
Predominantly high pLDDT values
Clear secondary structure elements (typically alpha-helical)

p53

High confidence in the DNA-binding core domain
Low pLDDT in N- and C-terminal regions (often intrinsically disordered)
Uncertain inter-domain orientation (may vary depending on construct and context)

Part 3: Comparison with Experimental Structures¶

Xenorhodopsin¶

Download an experimental structure (if available) using the relevant database (PDB).
Superimpose experimental and AlphaFold structures.
Calculate RMSD.

Expected result: RMSD < 2 Å, indicating strong agreement (when a close experimental structure exists).

p53¶

Download relevant experimental structure data from the PDB (often individual domains or fragments).
Superimpose the experimental structure with the AlphaFold model.
Compare domain alignment and disordered regions.

Expected result:

Good agreement in structured core domain
Large discrepancies and/or missing structure in disordered regions

Optional Advanced Task: RMSD Calculation¶

Using Chimera or PyMOL:

Align structures.
Compute RMSD.
Record and interpret structural deviation.

Part 4: Using ColabFold¶

What is ColabFold?¶

ColabFold is a Google Colab-based implementation of AlphaFold that:

Runs using cloud computing resources
Requires only amino acid sequence input
Uses MMseqs2 for rapid multiple sequence alignment
Produces ranked models with pLDDT and PAE outputs

ColabFold overview

Procedure¶

Open the ColabFold notebook in Google Colab.
Paste the amino acid sequence in FASTA format:

>Protein_name
MTEYKLVVVGAGGVGKSALTIQLIQNHFV...

Select monomer or multimer mode.
Run prediction.
Download predicted models and confidence data.

Applications in this lab¶

Predict point mutants of Xenorhodopsin.
Predict truncated p53 domains.
Predict p53 tetramer using multimer mode.
Compare wild-type and mutant structures.

Discussion Questions¶

Why does AlphaFold perform better on Xenorhodopsin than p53?
How do intrinsically disordered regions affect pLDDT?
Why is PAE useful for multi-domain proteins?
Why can AlphaFold predict structure but not necessarily conformational dynamics?
How might protein-protein interactions influence prediction accuracy?

Conclusion¶

AlphaFold performs exceptionally well for:

Single-domain globular proteins
Structurally rigid systems
Proteins with abundant homologous sequences

AlphaFold struggles with:

Intrinsically disordered proteins
Flexible multi-domain arrangements
Oligomerization-dependent conformations