Difference between revisions of "Lecture 02"
Jump to navigation
Jump to search
Line 12: | Line 12: | ||
==The Sequence Abstraction== | ==The Sequence Abstraction== | ||
− | |||
− | |||
− | |||
− | |||
+ | ;What you should take home from this part of the course: | ||
+ | *Know the one-letter code and key properties of all 20 proteinogenic aminoacids; | ||
+ | *Understand the benefits and limitations of the sequence abstraction; | ||
+ | *Recognize common sequence identifiers, utilize them confidently; | ||
+ | *Know about the contents of key sequence databases; | ||
+ | *Be able to retrieve sequence data; | ||
+ | *Know about and confidently use the fields in GenBank and GenPept records. | ||
+ | | ||
+ | |||
+ | ;Links summary: | ||
+ | *[http://en.wikipedia.org/wiki/IUPAC IUPAC] (Wikipedia) | ||
+ | *[http://en.wikipedia.org/wiki/Adenine '''a'''denine], [http://en.wikipedia.org/wiki/Cytosine '''c'''ytosine], [http://en.wikipedia.org/wiki/Guanine '''g'''uanine], [http://en.wikipedia.org/wiki/Thymine '''t'''hymine] (Wikipedia) | ||
+ | *[http://en.wikipedia.org/wiki/Simplified_Molecular_Input_Line_Entry_Specification SMILES] (Wikipedia) | ||
+ | *[http://speedy.embl-heidelberg.de/aas/ Rob Russel's Amino Acid Pages] | ||
+ | *[http://www.geneontology.org/ GO (Gene Ontology)] | ||
+ | *[http://obofoundry.org/ Open Biology Ontologies] | ||
+ | *[[Glossary#FASTA_format|FASTA format]] | ||
+ | *[http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html Genbank Overview] | ||
+ | *[http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=4594 A GenBank] record example | ||
+ | *[http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=6320957 A GenPept] record example | ||
+ | *[http://www.pir.uniprot.org/cgi-bin/upEntry?id=SWI4_YEAST A UniProt] record example | ||
*[http://www.ncbi.nlm.nih.gov/entrez/query/static/help/entrez_tutorial_BIB.pdf Entrez tutorial (pdf)] | *[http://www.ncbi.nlm.nih.gov/entrez/query/static/help/entrez_tutorial_BIB.pdf Entrez tutorial (pdf)] | ||
+ | |||
+ | ;Exercises | ||
+ | |||
+ | *Find a protein that contains a '''selenocysteine''' residue (e.g. human glutathione peroxidase). Check the Genbank record to see how this residue is represented in the sequence and in the record. Find and compare the corresponding SwissProt record. | ||
+ | *Find a secreted protein such as ''E. coli'' '''beta-lactamase'''. Look into the Genbank record whether you can identify the signal-peptide that is post-translationally removed. Find the corresponding SwissProt entry and look for the annotation. | ||
+ | *Human mitochondrial proteins are translated according to a different genetic code from human nuclear proteins. Looking at the CDS of mitochondrial Cytochrome B ('''NC_001807'''), how would you know? | ||
==Lecture Slides== | ==Lecture Slides== | ||
Line 45: | Line 68: | ||
======Slide 006====== | ======Slide 006====== | ||
[[Image:L02_s006.jpg|frame|none|Lecture 02, Slide 006<br> | [[Image:L02_s006.jpg|frame|none|Lecture 02, Slide 006<br> | ||
− | Working with abstractions implies we are no longer manipulating the '''biological entity''', but it's representation.This distinction becomes crucial, when we start computing with representations to infer facts about the original entities! Inferences must be related back to biology! Common problems include that the abstraction may not be rich enough to capture the property we are investigating (e.g. one-letter sequence codes cannot represent amino acid modifications or sequence numbers), or the abstraction may be ambiguous (e.g. one protein may have more than one homologue in a related organism, thus the relationship between gene IDs is ambiguous) or the abstraction may not be unique (e.g. one protein may have more than one function, the same protein name may refer to unrelated proteins in different species). | + | <b>Working with abstractions implies we are no longer manipulating the '''biological entity''', but it's representation. </b><b>This distinction becomes crucial, when we start computing with representations to infer facts about the original entities! Inferences must be related back to biology! Common problems include that the abstraction may not be rich enough to capture the property we are investigating (e.g. one-letter sequence codes cannot represent amino acid modifications or sequence numbers), or the abstraction may be ambiguous (e.g. one protein may have more than one homologue in a related organism, thus the relationship between gene IDs is ambiguous) or the abstraction may not be unique (e.g. one protein may have more than one function, the same protein name may refer to unrelated proteins in different species).</b> |
]] | ]] | ||
− | |||
======Slide 007====== | ======Slide 007====== | ||
[[Image:L02_s007.jpg|frame|none|Lecture 02, Slide 007<br> | [[Image:L02_s007.jpg|frame|none|Lecture 02, Slide 007<br> | ||
Line 119: | Line 141: | ||
======Slide 020====== | ======Slide 020====== | ||
[[Image:L02_s020.jpg|frame|none|Lecture 02, Slide 020<br> | [[Image:L02_s020.jpg|frame|none|Lecture 02, Slide 020<br> | ||
− | Sequence is the most important abstraction in biology; you need to know your amino acids in order to relate a sequence back to the biopolymer. Required knowledge is: the '''structural formula''', the '''one-''' and '''three- letter codes''' and key properties (such as charge, relative size, polarity) for all 20 proteinogenic amino acids. | + | Sequence is the most important abstraction in biology; you need to know your amino acids in order to relate a sequence back to the biopolymer. Required knowledge is: the '''structural formula''', the '''one-''' and '''three- letter codes''' and key properties (such as charge, relative size, polarity) for all 20 proteinogenic amino acids. |
]] | ]] | ||
− | |||
======Slide 021====== | ======Slide 021====== | ||
[[Image:L02_s021.jpg|frame|none|Lecture 02, Slide 021<br> | [[Image:L02_s021.jpg|frame|none|Lecture 02, Slide 021<br> | ||
Line 217: | Line 238: | ||
======Slide 044====== | ======Slide 044====== | ||
[[Image:L02_s044.jpg|frame|none|Lecture 02, Slide 044<br> | [[Image:L02_s044.jpg|frame|none|Lecture 02, Slide 044<br> | ||
− | + | ||
]] | ]] | ||
− | |||
======Slide 045====== | ======Slide 045====== | ||
[[Image:L02_s045.jpg|frame|none|Lecture 02, Slide 045<br> | [[Image:L02_s045.jpg|frame|none|Lecture 02, Slide 045<br> | ||
Line 234: | Line 254: | ||
======Slide 048====== | ======Slide 048====== | ||
[[Image:L02_s048.jpg|frame|none|Lecture 02, Slide 048<br> | [[Image:L02_s048.jpg|frame|none|Lecture 02, Slide 048<br> | ||
− | + | ||
]] | ]] | ||
− | |||
======Slide 049====== | ======Slide 049====== | ||
[[Image:L02_s049.jpg|frame|none|Lecture 02, Slide 049<br> | [[Image:L02_s049.jpg|frame|none|Lecture 02, Slide 049<br> | ||
Line 255: | Line 274: | ||
======Slide 053====== | ======Slide 053====== | ||
[[Image:L02_s053.jpg|frame|none|Lecture 02, Slide 053<br> | [[Image:L02_s053.jpg|frame|none|Lecture 02, Slide 053<br> | ||
− | + | ||
]] | ]] | ||
− | |||
======Slide 054====== | ======Slide 054====== | ||
[[Image:L02_s054.jpg|frame|none|Lecture 02, Slide 054<br> | [[Image:L02_s054.jpg|frame|none|Lecture 02, Slide 054<br> |
Revision as of 14:52, 19 September 2007
Update Warning! This page has not been revised yet for the 2007 Fall term. Some of the slides may be reused, but please consider the page as a whole out of date as long as this warning appears here.
(Previous lecture) ... (Next lecture)
The Sequence Abstraction
- What you should take home from this part of the course
- Know the one-letter code and key properties of all 20 proteinogenic aminoacids;
- Understand the benefits and limitations of the sequence abstraction;
- Recognize common sequence identifiers, utilize them confidently;
- Know about the contents of key sequence databases;
- Be able to retrieve sequence data;
- Know about and confidently use the fields in GenBank and GenPept records.
- Links summary
- IUPAC (Wikipedia)
- adenine, cytosine, guanine, thymine (Wikipedia)
- SMILES (Wikipedia)
- Rob Russel's Amino Acid Pages
- GO (Gene Ontology)
- Open Biology Ontologies
- FASTA format
- Genbank Overview
- A GenBank record example
- A GenPept record example
- A UniProt record example
- Exercises
- Find a protein that contains a selenocysteine residue (e.g. human glutathione peroxidase). Check the Genbank record to see how this residue is represented in the sequence and in the record. Find and compare the corresponding SwissProt record.
- Find a secreted protein such as E. coli beta-lactamase. Look into the Genbank record whether you can identify the signal-peptide that is post-translationally removed. Find the corresponding SwissProt entry and look for the annotation.
- Human mitochondrial proteins are translated according to a different genetic code from human nuclear proteins. Looking at the CDS of mitochondrial Cytochrome B (NC_001807), how would you know?
Lecture Slides
Slide 001
Slide 002
Slide 003
Slide 004
Slide 005
Slide 006
Slide 007
Slide 008
Slide 009
Slide 010
Slide 011
Slide 012
Slide 013
Slide 014
Slide 015
Slide 016
Slide 017
Slide 018
Slide 019
Slide 020
Slide 021
Slide 022
Slide 023
Slide 024
Slide 025
Slide 026
Slide 027
Slide 028
Slide 029
Slide 030
Slide 031
(deleted)