Week 3 - Protein Data Bank (PDB) and PDBsum¶

Protein Data Bank (PDB)¶

The Protein Data Bank (PDB) contains 3D structure information of macromolecules including proteins. In other words, the PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules.

Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is annotated and publicly released into the archive by the wwPDB.

The constantly-growing PDB reflects the research happening in laboratories across the world. Structures are available for many proteins and nucleic acids involved in central processes of life (ribosomes, oncogenes, drug targets, even whole viruses). However, it can be challenging to find the information needed, since the PDB archives many different structures.

You will often find multiple structures for a given molecule, partial structures, or structures that have been modified or inactivated from their native form.

What is there in a PDB file?¶

The primary information stored in the PDB archive consists of coordinate files for biological molecules. These files list:

Atoms in the molecule
Their 3D location in space (coordinates)
Bond types and geometry

Files are available in several formats (PDB, mmCIF, XML). A typical PDB-formatted file includes:

A large header section summarizing the protein, citations, and details of structure solution
The sequence
A long list of atoms and their coordinates

The archive also contains experimental observations used to determine the coordinates.

Notes:

Typical entries include proteins, small molecules, ions, and water.
Crystallography structures annotate temperature factors (atomic vibration) and occupancies (alternate conformations).
NMR structures often include multiple models.
PDB files typically exclude hydrogen atoms; they can be added later using protein structure software.

Example 1: Search a protein in PDB with its name¶

Go to https://www.rcsb.org/
Type eIF (eukaryotic translation initiation factor) into the search box.

Click Search.
Note that there are PDB IDs on the upper left side of each protein listed (e.g., 3L6A, 2OGH, 2OQK).
Click 2OGH (Solution structure of yeast eIF1) by clicking its name.

2OGH entry

On the left-hand side, you will see the structure image.
Click 3D View: Structure to visualize the 3D structure.
Rotate the structure by moving the mouse while pressing the left mouse button.
Go back to the results window and choose 2OQK.
Note:
Which method was used to determine the structure
The resolution of the structure
This protein is studied in the presence of a sulfate ion, which can be visualized in the structure viewer.

Example 2: PDB structures of fragmented or complexed proteins¶

Search the PDB IDs below. These structures are obtained either as complexes with other macromolecules or in fragmented form.

3MZG - a protein crystallized with an antagonist
3AFC - structure of an extracellular domain of a protein
3NWT - N-terminus of a protein
2VBJ - complex with DNA
2VHM - complex with ribosome
1K4C - complex with an antibody

Sometimes, more than one structure of a particular protein can be found in PDB because:

The protein can be crystallized with different antibodies or ligands
Different studies reveal different portions of the same protein structure

Example 3: Different PDB structures of prolactin receptor¶

Type prolactin receptor into the search box.
Look at the result list and decide what factor makes each structure different.

Example 4: Display PDB file of `1KSC` (Endoglucanase from termite)¶

Type 1KSC into the search box.
Click Display Files on the upper left side of the page.

1KSC display files

Display the FASTA sequence of the protein.
PDB is another source to obtain FASTA sequence.
Display the PDB file and analyze the content.

Example 5: Find a structure similar to a given protein sequence¶

Assume you have the primary structure of JUN transcription factor from human and want to find similar PDB entries.

Go to https://pmc.ncbi.nlm.nih.gov/articles/PMC3123456/
Read the history of the site.
Go to Protein Model Portal:
https://swissmodel.expasy.org/repository
Search for the protein you are interested in.
In this exercise search for JUN: 1S9K
From the new window, you can download PDB coordinates, view the molecule, and find related PDB IDs.

PDBsum¶

Website:
http://www.ebi.ac.uk/pdbsum/

PDBsum is a database designed by the European Bioinformatics Institute (EBI). It provides structural summaries of proteins, ligands that bind to them, secondary structural elements, residue conservation analyses, and more. It is useful for getting a first look at a PDB structure.

Example 6: Explore PDBsum features using `2DR6` (a multidrug transporter)¶

Go to http://www.ebi.ac.uk/pdbsum/
Search PDB code: 2DR6

2DR6 in PDBsum

On the right-hand side, find the Procheck link with a Ramachandran diagram.
Analyze and comment on possible secondary structures in the protein.
From the contents menu at the left-hand side, choose Protein chains.

PDBsum contents menu

Note the CATH domains and the secondary structure diagram of the protein.
Click on A / B / C on the Protein Chain page (this will take you to CATH).
Find the ProMotif menu on the left side.

ProMotif menu

You can analyze secondary structure elements by:
Clicking on the secondary structure of interest, or
Clicking ProMotif to see a secondary structure element summary
Click Residue Conservation and analyze conservation of each residue.
PDBsum also has information on ligand binding.
- Click Ligands in the top menu bar to view residues interacting with doxorubicin.
- Click 3Dmol on the left to visualize doxorubicin and interacting residues.
Click Prot-Prot in the top menu bar to see how chains interact.
- 3IYN is a good example for analyzing interactions.