Week 3 - Protein Data Bank (PDB) and PDBsum¶
Protein Data Bank (PDB)¶
The Protein Data Bank (PDB) contains 3D structure information of macromolecules including proteins. In other words, the PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules.
Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is annotated and publicly released into the archive by the wwPDB.
The constantly-growing PDB reflects the research happening in laboratories across the world. Structures are available for many proteins and nucleic acids involved in central processes of life (ribosomes, oncogenes, drug targets, even whole viruses). However, it can be challenging to find the information needed, since the PDB archives many different structures.
You will often find multiple structures for a given molecule, partial structures, or structures that have been modified or inactivated from their native form.
What is there in a PDB file?¶
The primary information stored in the PDB archive consists of coordinate files for biological molecules. These files list:
- Atoms in the molecule
- Their 3D location in space (coordinates)
- Bond types and geometry
Files are available in several formats (PDB, mmCIF, XML). A typical PDB-formatted file includes:
- A large header section summarizing the protein, citations, and details of structure solution
- The sequence
- A long list of atoms and their coordinates
The archive also contains experimental observations used to determine the coordinates.
Notes:
- Typical entries include proteins, small molecules, ions, and water.
- Crystallography structures annotate temperature factors (atomic vibration) and occupancies (alternate conformations).
- NMR structures often include multiple models.
- PDB files typically exclude hydrogen atoms; they can be added later using protein structure software.
Example 1: Search a protein in PDB with its name¶
- Go to https://www.rcsb.org/
- Type
eIF(eukaryotic translation initiation factor) into the search box.

- Click Search.
- Note that there are PDB IDs on the upper left side of each protein listed (e.g.,
3L6A,2OGH,2OQK). - Click
2OGH(Solution structure of yeast eIF1) by clicking its name.

- On the left-hand side, you will see the structure image.
- Click 3D View: Structure to visualize the 3D structure.
- Rotate the structure by moving the mouse while pressing the left mouse button.
- Go back to the results window and choose
2OQK. - Note:
- Which method was used to determine the structure
- The resolution of the structure
- This protein is studied in the presence of a sulfate ion, which can be visualized in the structure viewer.
Example 2: PDB structures of fragmented or complexed proteins¶
Search the PDB IDs below. These structures are obtained either as complexes with other macromolecules or in fragmented form.
3MZG- a protein crystallized with an antagonist3AFC- structure of an extracellular domain of a protein3NWT- N-terminus of a protein2VBJ- complex with DNA2VHM- complex with ribosome1K4C- complex with an antibody
Sometimes, more than one structure of a particular protein can be found in PDB because:
- The protein can be crystallized with different antibodies or ligands
- Different studies reveal different portions of the same protein structure
Example 3: Different PDB structures of prolactin receptor¶
- Type
prolactin receptorinto the search box. - Look at the result list and decide what factor makes each structure different.
Example 4: Display PDB file of 1KSC (Endoglucanase from termite)¶
- Type
1KSCinto the search box. - Click Display Files on the upper left side of the page.

- Display the FASTA sequence of the protein.
- PDB is another source to obtain FASTA sequence.
- Display the PDB file and analyze the content.
Example 5: Find a structure similar to a given protein sequence¶
Assume you have the primary structure of JUN transcription factor from human and want to find similar PDB entries.
- Go to https://pmc.ncbi.nlm.nih.gov/articles/PMC3123456/
- Read the history of the site.
- Go to Protein Model Portal:
- https://swissmodel.expasy.org/repository
- Search for the protein you are interested in.
- In this exercise search for JUN:
1S9K - From the new window, you can download PDB coordinates, view the molecule, and find related PDB IDs.
PDBsum¶
- Website:
- http://www.ebi.ac.uk/pdbsum/
PDBsum is a database designed by the European Bioinformatics Institute (EBI). It provides structural summaries of proteins, ligands that bind to them, secondary structural elements, residue conservation analyses, and more. It is useful for getting a first look at a PDB structure.
Example 6: Explore PDBsum features using 2DR6 (a multidrug transporter)¶
- Go to http://www.ebi.ac.uk/pdbsum/
- Search PDB code:
2DR6

- On the right-hand side, find the Procheck link with a Ramachandran diagram.
- Analyze and comment on possible secondary structures in the protein.
- From the contents menu at the left-hand side, choose Protein chains.

- Note the CATH domains and the secondary structure diagram of the protein.
- Click on A / B / C on the Protein Chain page (this will take you to CATH).
- Find the ProMotif menu on the left side.

- You can analyze secondary structure elements by:
- Clicking on the secondary structure of interest, or
- Clicking ProMotif to see a secondary structure element summary
- Click Residue Conservation and analyze conservation of each residue.
- PDBsum also has information on ligand binding.
- Click Ligands in the top menu bar to view residues interacting with doxorubicin.
- Click 3Dmol on the left to visualize doxorubicin and interacting residues.
- Click Prot-Prot in the top menu bar to see how chains interact.
3IYNis a good example for analyzing interactions.