Difference between revisions of "Lecture 02"
Jump to navigation
Jump to search
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | <div style="padding: 5px; background: #FF4560; border:solid 2px #000000;"> | + | <!-- div style="padding: 5px; background: #FF4560; border:solid 2px #000000;"> |
'''Update Warning!''' | '''Update Warning!''' | ||
− | This page has not been revised yet for the | + | This page has not been revised yet for the current term. Some of the slides may be reused, but please consider the page as a whole out of date as long as this warning appears here. |
− | </div> | + | </div --> |
| | ||
Line 12: | Line 12: | ||
==The Sequence Abstraction== | ==The Sequence Abstraction== | ||
− | |||
− | + | ;What you should take home from this part of the course: | |
− | * | + | *Know the one-letter code and key properties of all 20 proteinogenic aminoacids; |
− | * | + | *Understand the benefits and limitations of the sequence abstraction; |
+ | *Recognize common sequence identifiers, utilize them confidently; | ||
+ | *Know about the contents of key sequence databases; | ||
+ | *Be able to retrieve sequence data; | ||
+ | *Know about and confidently use the fields in GenBank and GenPept records. | ||
+ | | ||
+ | ;Links summary: | ||
+ | *[http://en.wikipedia.org/wiki/IUPAC IUPAC] (Wikipedia) | ||
+ | *[http://en.wikipedia.org/wiki/Adenine '''a'''denine], [http://en.wikipedia.org/wiki/Cytosine '''c'''ytosine], [http://en.wikipedia.org/wiki/Guanine '''g'''uanine], [http://en.wikipedia.org/wiki/Thymine '''t'''hymine] (Wikipedia) | ||
+ | *[http://en.wikipedia.org/wiki/Simplified_Molecular_Input_Line_Entry_Specification SMILES] (Wikipedia) | ||
+ | *[http://speedy.embl-heidelberg.de/aas/ Rob Russel's Amino Acid Pages] | ||
+ | *[http://www.geneontology.org/ GO (Gene Ontology)] | ||
+ | *[http://obofoundry.org/ Open Biology Ontologies] | ||
+ | *[[Glossary#FASTA_format|FASTA format]] | ||
+ | *[http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html Genbank Overview] | ||
+ | *[http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=4594 A GenBank] record example | ||
+ | *[http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=6320957 A GenPept] record example | ||
+ | *[http://www.pir.uniprot.org/cgi-bin/upEntry?id=SWI4_YEAST A UniProt] record example | ||
*[http://www.ncbi.nlm.nih.gov/entrez/query/static/help/entrez_tutorial_BIB.pdf Entrez tutorial (pdf)] | *[http://www.ncbi.nlm.nih.gov/entrez/query/static/help/entrez_tutorial_BIB.pdf Entrez tutorial (pdf)] | ||
+ | |||
+ | ;Exercises | ||
+ | |||
+ | *Find a protein that contains a '''selenocysteine''' residue (e.g. human glutathione peroxidase). Check the Genbank record to see how this residue is represented in the sequence and in the record. Find and compare the corresponding SwissProt record. | ||
+ | *Find a secreted protein such as ''E. coli'' '''beta-lactamase'''. Look into the Genbank record whether you can identify the signal-peptide that is post-translationally removed. Find the corresponding SwissProt entry and look for the annotation. | ||
+ | *Human mitochondrial proteins are translated according to a different genetic code from human nuclear proteins. Looking at the CDS of mitochondrial Cytochrome B ('''NC_001807'''), how would you know? | ||
+ | |||
+ | | ||
+ | | ||
==Lecture Slides== | ==Lecture Slides== | ||
Line 45: | Line 70: | ||
======Slide 006====== | ======Slide 006====== | ||
[[Image:L02_s006.jpg|frame|none|Lecture 02, Slide 006<br> | [[Image:L02_s006.jpg|frame|none|Lecture 02, Slide 006<br> | ||
− | + | Working with abstractions implies we are no longer manipulating the '''biological entity''', but it's representation.This distinction becomes crucial, when we start computing with representations to infer facts about the original entities! Inferences must be related back to biology! Common problems include that the abstraction may not be rich enough to capture the property we are investigating (e.g. one-letter sequence codes cannot represent amino acid modifications or sequence numbers), or the abstraction may be ambiguous (e.g. one protein may have more than one homologue in a related organism, thus the relationship between gene IDs is ambiguous) or the abstraction may not be unique (e.g. one protein may have more than one function, the same protein name may refer to unrelated proteins in different species). | |
]] | ]] | ||
+ | |||
======Slide 007====== | ======Slide 007====== | ||
[[Image:L02_s007.jpg|frame|none|Lecture 02, Slide 007<br> | [[Image:L02_s007.jpg|frame|none|Lecture 02, Slide 007<br> | ||
Line 118: | Line 144: | ||
======Slide 020====== | ======Slide 020====== | ||
[[Image:L02_s020.jpg|frame|none|Lecture 02, Slide 020<br> | [[Image:L02_s020.jpg|frame|none|Lecture 02, Slide 020<br> | ||
− | Sequence is the most important abstraction in biology; you need to know your amino acids in order to relate a sequence back to the biopolymer. Required knowledge is: the '''structural formula''', the '''one-''' and '''three- letter codes''' and key properties (such as charge, relative size, polarity) for all 20 proteinogenic amino acids. | + | Sequence is the most important abstraction in biology; you need to know your amino acids in order to relate a sequence back to the biopolymer. Required knowledge is: the '''structural formula''', the '''one-''' and '''three- letter codes''' and key properties (such as charge, relative size, polarity) for all 20 proteinogenic amino acids. A resource that summarizes amino acid properties is at http://speedy.embl-heidelberg.de/aas/ |
]] | ]] | ||
+ | |||
======Slide 021====== | ======Slide 021====== | ||
[[Image:L02_s021.jpg|frame|none|Lecture 02, Slide 021<br> | [[Image:L02_s021.jpg|frame|none|Lecture 02, Slide 021<br> | ||
− | In [http://en.wikipedia.org/wiki/Romeo_and_Juliet Shakespeare's classic tragedy] of romantic love and family allegiance, Juliet encapsulates the play's central struggle in this phrase by claiming that Romeo's family name is an artificial and meaningless convention. Just like in the world of the sequence abstraction, this is only partially true: the problems are not just based in the fact that Romeo is '''called''' a Montague, but that he '''is''' in fact a member of that family. | + | In [http://en.wikipedia.org/wiki/Romeo_and_Juliet Shakespeare's classic tragedy] of romantic love and family allegiance, Juliet encapsulates the play's central struggle in this phrase by claiming that Romeo's family name is an artificial and meaningless convention. Just like in the world of the sequence abstraction, this is only partially true: the problems are not just based in the fact that Romeo is '''called''' a Montague, but that he '''is''' in fact a member of that family. Even if a ''Thing'' does not change if it's abstract label changes, such labels rarely exist in isolation: other ''Things'' might be referred to by the same label and changing one changes the composition of the entire set. (Or, to remain with our example, as soon as Romeo renounces his name and thus his family, the family would be no longer the same.) Even worse - and this is something we encounter every day in bioinformatics - if identifiers are not stable over time, cross-references to that identifier fail. If you decide you'll call a rose a skunk, people would become very confused. |
]] | ]] | ||
+ | |||
======Slide 022====== | ======Slide 022====== | ||
[[Image:L02_s022.jpg|frame|none|Lecture 02, Slide 022<br> | [[Image:L02_s022.jpg|frame|none|Lecture 02, Slide 022<br> | ||
Line 161: | Line 189: | ||
]] | ]] | ||
======Slide 031====== | ======Slide 031====== | ||
− | + | <small>''(deleted)''</small> | |
− | + | ||
− | |||
======Slide 032====== | ======Slide 032====== | ||
[[Image:L02_s032.jpg|frame|none|Lecture 02, Slide 032<br> | [[Image:L02_s032.jpg|frame|none|Lecture 02, Slide 032<br> | ||
Line 178: | Line 205: | ||
======Slide 035====== | ======Slide 035====== | ||
[[Image:L02_s035.jpg|frame|none|Lecture 02, Slide 035<br> | [[Image:L02_s035.jpg|frame|none|Lecture 02, Slide 035<br> | ||
− | Just like in human language, rigourous syntactical rules enforce that you can | + | Just like in human language, rigourous syntactical rules enforce that you can't use bad grammar and get away with it. |
]] | ]] | ||
+ | |||
======Slide 036====== | ======Slide 036====== | ||
[[Image:L02_s036.jpg|frame|none|Lecture 02, Slide 036<br> | [[Image:L02_s036.jpg|frame|none|Lecture 02, Slide 036<br> | ||
Line 214: | Line 242: | ||
======Slide 044====== | ======Slide 044====== | ||
[[Image:L02_s044.jpg|frame|none|Lecture 02, Slide 044<br> | [[Image:L02_s044.jpg|frame|none|Lecture 02, Slide 044<br> | ||
+ | see: http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html | ||
+ | ]] | ||
− | |||
======Slide 045====== | ======Slide 045====== | ||
[[Image:L02_s045.jpg|frame|none|Lecture 02, Slide 045<br> | [[Image:L02_s045.jpg|frame|none|Lecture 02, Slide 045<br> | ||
Line 230: | Line 259: | ||
======Slide 048====== | ======Slide 048====== | ||
[[Image:L02_s048.jpg|frame|none|Lecture 02, Slide 048<br> | [[Image:L02_s048.jpg|frame|none|Lecture 02, Slide 048<br> | ||
+ | Go [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=4594 '''here for a GenBank'''] record example; go [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=6320957 '''here for a GenPept'''] record example. | ||
+ | ]] | ||
− | |||
======Slide 049====== | ======Slide 049====== | ||
[[Image:L02_s049.jpg|frame|none|Lecture 02, Slide 049<br> | [[Image:L02_s049.jpg|frame|none|Lecture 02, Slide 049<br> | ||
Line 250: | Line 280: | ||
======Slide 053====== | ======Slide 053====== | ||
[[Image:L02_s053.jpg|frame|none|Lecture 02, Slide 053<br> | [[Image:L02_s053.jpg|frame|none|Lecture 02, Slide 053<br> | ||
+ | see: http://www.ncbi.nlm.nih.gov/RefSeq/ | ||
+ | ]] | ||
− | |||
======Slide 054====== | ======Slide 054====== | ||
[[Image:L02_s054.jpg|frame|none|Lecture 02, Slide 054<br> | [[Image:L02_s054.jpg|frame|none|Lecture 02, Slide 054<br> | ||
Line 292: | Line 323: | ||
]] | ]] | ||
+ | |||
+ | |||
+ | | ||
+ | ---- | ||
+ | <small>[[Lecture_01|(Previous lecture)]] ... [[Lecture_03|(Next lecture)]]</small> |
Latest revision as of 14:57, 19 September 2007
(Previous lecture) ... (Next lecture)
The Sequence Abstraction
- What you should take home from this part of the course
- Know the one-letter code and key properties of all 20 proteinogenic aminoacids;
- Understand the benefits and limitations of the sequence abstraction;
- Recognize common sequence identifiers, utilize them confidently;
- Know about the contents of key sequence databases;
- Be able to retrieve sequence data;
- Know about and confidently use the fields in GenBank and GenPept records.
- Links summary
- IUPAC (Wikipedia)
- adenine, cytosine, guanine, thymine (Wikipedia)
- SMILES (Wikipedia)
- Rob Russel's Amino Acid Pages
- GO (Gene Ontology)
- Open Biology Ontologies
- FASTA format
- Genbank Overview
- A GenBank record example
- A GenPept record example
- A UniProt record example
- Exercises
- Find a protein that contains a selenocysteine residue (e.g. human glutathione peroxidase). Check the Genbank record to see how this residue is represented in the sequence and in the record. Find and compare the corresponding SwissProt record.
- Find a secreted protein such as E. coli beta-lactamase. Look into the Genbank record whether you can identify the signal-peptide that is post-translationally removed. Find the corresponding SwissProt entry and look for the annotation.
- Human mitochondrial proteins are translated according to a different genetic code from human nuclear proteins. Looking at the CDS of mitochondrial Cytochrome B (NC_001807), how would you know?
Lecture Slides
Slide 001
Slide 002
Slide 003
Slide 004
Slide 005
Slide 006
Slide 007
Slide 008
Slide 009
Slide 010
Slide 011
Slide 012
Slide 013
Slide 014
Slide 015
Slide 016
Slide 017
Slide 018
Slide 019
Slide 020
Slide 021
Slide 022
Slide 023
Slide 024
Slide 025
Slide 026
Slide 027
Slide 028
Slide 029
Slide 030
Slide 031
(deleted)
Slide 032
Slide 033
Slide 034
Slide 035
Slide 036
Slide 037
Slide 038
Slide 039
Slide 040
Slide 041
Slide 042
Slide 043
Slide 044
Slide 045
Slide 046
Slide 047
Slide 048
Slide 049
Slide 050
Slide 051
Slide 052
Slide 053
Slide 054
Slide 055
Slide 056
Slide 057
Slide 058
Slide 059
Slide 060
Slide 061
Slide 062
Slide 063