Pre-Lab 1 - Understanding Biological Data and Molecular Geometry¶
This pre-lab has two parts:
- Understanding the types of biological data you will encounter in bioinformatics and structural biology.
- Reviewing the basic geometry of molecules (coordinates and degrees of freedom) that underpins molecular visualization, docking, and structure refinement.
Part A - Understanding (Biological) Data¶
Why do we care about “data”?¶
To work effectively in computational biology, you must understand what “data” means in practice:
- What the data represents (sequence, structure, annotations, measurements)
- How it is stored (text vs binary)
- How software tools interpret and transform it
In this course you will mostly work with text files and a smaller number of binary files.
- Text files are human-readable and typically encoded in ASCII or UTF-8.
- Binary files are not human-readable and are optimized for speed and storage. They still “interact” with you through software (for example via a graphical interface or a command-line tool).
Most common bioinformatics formats we will use (FASTA, PDB, CIF) are text-based.
What is biological data?¶
In bioinformatics, data may represent:
- Nucleotide sequences
- Protein sequences
- Three-dimensional atomic coordinates
- Functional annotations
- Experimental measurements

File Types and Extensions¶
A file type is defined by the structure of the content. A file extension is a naming convention that helps humans and software guess how to handle the file.
Common text-based formats you will use in this course:
- FASTA:
.fasta,.fa,.faa(plain text) - PDB:
.pdb(plain text, fixed-column format) - mmCIF:
.cif(plain text, structured key/value format) - MOL2:
.mol2(plain text) - CSV:
.csv(plain text)

Important: renaming a file does not change its contents.
For example, renaming protein.pdb to protein.txt does not convert the file into a different format; it only changes the filename.
How software tools use these files¶
Most bioinformatics software does three things with your files:
- Parse the file according to a known format (e.g., “PDB lines are fixed columns”).
- Build an internal representation (atoms, residues, chains, coordinates, metadata).
- Apply transformations (delete a chain, mutate a residue, align structures) and optionally write output back to disk.
Hands-on A1 - CSV is just text¶
- Create a new file named
toy_table.csv. - Paste the following content and save:
name,score
Alice,90
Bob,75
Charlie,88
-
Open it:
-
In a spreadsheet program (LibreOffice / Excel)
- In a text editor
Observe that the file is still the same file; only the viewer changes.
Hands-on A2 - Changing the extension does not change the data¶
- Take any small text file (for example your
toy_table.csv). - Rename it to
toy_table.txt. - Open it again in a text editor.
You should see identical contents.
Optional: rename it back to .csv and open it in a spreadsheet program again.
Hands-on A3 - What happens when you “delete a chain” in a structure viewer?¶
- Download any PDB structure (
.pdb) from the PDB. - Open it in Chimera/ChimeraX.
- Delete one chain.
- Save the structure.
Then open the saved file in a text editor.
- You should see that many lines were removed (corresponding to the atoms that belonged to the deleted chain).
- The software did not “magically” change the nature of the data; it edited the text representation according to the file format rules.
Part B - Geometry of molecules¶
Cartesian coordinates in structural biology¶
Each atom in a protein structure is represented by three coordinates:
xyz
So a protein structure is a collection of points in 3D space (atoms), connected by chemical bonds.

Degrees of freedom (DoF)¶
Degrees of freedom describe how a system can move.
A daily-life analogy:
- A door hinge mainly rotates about one axis.
- Your finger has limited rotation.
- Your shoulder has a wider range of motion.
In molecular systems:
- Translation: movement along the x, y, z axes
- Rotation: rotation about the x, y, z axes
- Torsion: rotation around chemical bonds

A useful way to think about degrees of freedom:
| System | Typical degrees of freedom |
|---|---|
| Rigid body | 6 (3 translation + 3 rotation) |
| Protein backbone | torsions: \(\phi\) and \(\psi\) angles |
| Side chains | torsions: \(\chi\) angles |
These concepts underpin molecular visualization, docking, and structure refinement.
Hands-on B1 - Observe translation and rotation in a viewer¶
Warning
For now, you are not expected to have any of the programs mentioned below installed on your computer. This is just to let you know what we will demonstrate in the lab.
- Open any structure in Chimera/ChimeraX/Jmol.
-
Use the mouse controls to:
-
Translate the model
-
Rotate the model
-
Reset the view.
Write down which actions correspond to translation vs rotation.
Hands-on B2 - Observe torsion angles (conceptual)¶
- In a molecular viewer, choose a residue with a visible side chain.
- Find the tool/menu that displays torsions/dihedrals (software-dependent).
-
Identify:
-
Backbone \(\phi\) and \(\psi\)
- One side-chain \(\chi\) angle
You do not need to calculate anything in this pre-lab; the goal is to recognize that proteins have internal rotational degrees of freedom.