Week 1 - Usage of web-sites and web-based tools¶
NCBI - National Center for Biotechnology Information¶
- Website:
- http://www.ncbi.nlm.nih.gov/
NCBI website includes genome sequences of the organisms studied so far. This site includes gene and protein sequences and reference information about them. Thus, primary sequence, which is the amino acid sequence, of a protein can be gathered from NCBI website.
Amino acid sequence is given in two forms in many websites:
- Standard format (NCBI default)
- Single-letter amino acids
- Numbers to define every 60th position
- Spaces to separate 10 consecutive amino acids
- FASTA format
- Single-letter amino acids only (no numbers and spaces)
Example 1: Get the primary structure of Malate dehydrogenase (Mus musculus) from NCBI¶
- Go to http://www.ncbi.nlm.nih.gov/
- Set the search alternative to Protein (default is All databases)
- Type
malate dehydrogenaseinto the search box - Click Search

- NCBI lists proteins from different organisms (organism names are given in parentheses).
- Select the protein from the organism of interest.
- In this case, select Mus musculus.
- Tip: You can directly search
malate dehydrogenase Mus musculusand click the first result. - At the very end of the page, the primary sequence of malate dehydrogenase can be seen in standard format:
1 mlsalarpag aalrrsfsts aqnnakvavl gasggigqpl slllknsplv srltlydiah
61 tpgvaadlsh ietrakvkgy lgpeqlpdcl kgcdvvvipa gvprkpgmtr ddlfntnati
121 vatltaacaq hcpeamvcii anpvnstipi taevfkkhgv ynpnkifgvt tldivrantf
181 vaelkgldpa rvnvpviggh agktiiplis qctpkvdfpq dqlatltgri qeagtevvka
241 kagagsatls mayagarfvf slvdamngle gvvecsfvqs ketectyfst plllgkkgle
301 knlgigkitp feekmiaeai pelkasikkg edfvknmk
- For FASTA format, click the FASTA link just under the name of the protein at the very top.

Example 2: Get the gene sequence (CDS) of “malate dehydrogenase”¶
- Go back from the FASTA result window.
- Just above the sequence section, find CDS (coding sequence) and click it.

- At the very end of the new page, the coding sequence of malate dehydrogenase gene is displayed:
1 atgctgtccg ctctcgcccg tcctgccggc gccgctctcc gccgcagctt cagcacttcg
61 gcccagaaca atgctaaagt ggctgtcctg ggagcttctg ggggcattgg gcaacccctt
121 tcactcctgc tgaagaacag ccccctagtg agccgcctga ccctctacga tatcgctcac
181 acacctggtg tggcagcaga tctgagtcac attgagacca gagcaaaggt gaaaggctac
241 cttggaccgg agcagttgcc agattgcctc aaaggttgtg atgtggtggt catcccagcc
301 ggagtgccca ggaaaccagg aatgacacgg gatgacctgt tcaacaccaa cgctaccatt
361 gtggccaccc tgacggctgc ctgtgcccag cactgtcctg aagccatggt ttgcatcatt
421 gccaacccag tgaactccac catccccatc acagcagaag ttttcaagaa gcacggtgtg
481 tacaacccta acaagatctt cggtgtgaca acccttgaca tcgtcagagc gaacacgttt
541 gtggcagagc taaagggttt ggatccagct cgagtcaacg tgcctgtcat tggcggccac
601 gccgggaaga cgatcatccc cctgatctct cagtgtaccc cgaaggttga ctttccccaa
661 gaccagctgg ccacactcac cgggaggatc caggaggctg gcacagaagt cgtgaaggcc
721 aaggctggag caggttctgc cactctgtcc atggcttatg ctggagcccg ctttgtcttc
781 tccctcgtgg acgccatgaa cgggttggaa ggagtcgttg agtgttcttt tgttcagtcc
841 aaagagacgg aatgcactta cttctctacg cccttgctct tggggaaaaa gggcctggag
901 aagaacctgg gcattggcaa gatcactcct tttgaggaaa aaatgattgc cgaggctatc
961 cctgagctga aagcctccat caagaaaggc gaggactttg tcaagaacat gaagtga
- If FASTA format of the gene sequence is needed, click the FASTA link as in Example 1.
UniProt¶
- Website:
- http://www.uniprot.org/
UniProt can also be used to gather primary structure of a protein. This website presents more information about proteins, including:
- Subcellular location
- Function
- Post-translational modifications
- Catalytic activity
Example 3: Search the same protein in UniProt¶
For this, you can use the accession number you obtained from NCBI.

- Go to http://www.uniprot.org/
- Type the accession number (or the name of the protein)
- Click the relevant result link
-
According to the information on the page, answer the questions below:
-
What is the amino acid length of the protein?
- What is the metabolic reaction catalyzed by malate dehydrogenase?
- How is the enzyme regulated?
- Where in the cell is this enzyme found?
- What are the post-translational modifications of this protein?
From the Sequence section, the primary sequence can be gathered. FASTA format can also be displayed by clicking on the FASTA link.

ExPASy (Expert Protein Analysis System)¶
- Website:
- http://expasy.org/
This site combines many databases in one. UniProt, PROSITE and many other protein databases can be directed from ExPASy. It also has useful tools.
Example 4: Translate a given gene sequence to a protein sequence¶
- Go to:
- https://www.expasy.org/resources/translate
- or https://web.expasy.org/translate/
- Copy the gene sequence of “malate dehydrogenase” and paste it into the translation window.
- The tool can recognize gene sequence in both formats because it ignores spaces and numbers.
- Click TRANSLATE SEQUENCE
- There is not only one result.
- Comment on the alternative results and decide which one is the sequence you are looking for.