Lecture 03
Jump to navigation
Jump to search
(Previous lecture) ... (Next lecture)
Sequence Properties
- What you should take home from this part of the course
- Understand the ideas of analysis by composition and analysis by signal;
- Know what deterministic pattern matching is;
- Recognize and understand the term regular expression;
- Be familiar with common sequence signals in DNA,RNA and proteins;
- Be familiar with the Prosite database and the Prosite scan server;
- Kow where to find EMBOSS tools and how to use them;
- Know about the offerings on the ExPASy tools collection page;
- Work on an understanding how biological facts can be translated into hypotheses and how hypotheses can be translated into computational procedures for analysis.
- Links summary
- Exercises
- Retrieve and read the Prosite documentation entry for the Leucine Zipper.
- Download entry 1NWQ from the PDB, visualize the Leucine Zipper with VMD and study its architecture (stereo vision!).
Lecture Slides
Slide 001
![](/abc/images/4/4e/L03_s001.jpg)
Lecture 03, Slide 001
This finding made the news. You should be aware of important new developments: subscribe to read at least the news items from Nature and Science, preferably subscribe to and browse their tables of contents too. In this particular new finding, researchers challenge our current concept of "genome": what is a genome, if the same physical DNA molecule can contain coding information for more than one species? Also, this finding further emphasizes the importance of horizontal gene transfer in evolution.
This finding made the news. You should be aware of important new developments: subscribe to read at least the news items from Nature and Science, preferably subscribe to and browse their tables of contents too. In this particular new finding, researchers challenge our current concept of "genome": what is a genome, if the same physical DNA molecule can contain coding information for more than one species? Also, this finding further emphasizes the importance of horizontal gene transfer in evolution.
Slide 002
Slide 003
Slide 004
Slide 005
Slide 006
Slide 007
![](/abc/images/5/55/L03_s007.jpg)
Lecture 03, Slide 007
(#: Number of ...) A protein's isoelectric point depends on the pK values of the amino acids; the pK values characterize the propensity fo an amino acid sidechain to dissociate, which in turn depends on how energetically favourable dissociation is. For example: since a negatively charged amino acid will be stabilized in a positive electrostatic field, such a field will shift a pK value down. This means the pH value at which the side chain will be 50% ionized is lower, or in other words, in a positive electrostatic field the concentration of protons must be higher to keep a proton associated to the sidechain.
Compositional properties of nucleic acids include hybridization temperature and helix structure.
(#: Number of ...) A protein's isoelectric point depends on the pK values of the amino acids; the pK values characterize the propensity fo an amino acid sidechain to dissociate, which in turn depends on how energetically favourable dissociation is. For example: since a negatively charged amino acid will be stabilized in a positive electrostatic field, such a field will shift a pK value down. This means the pH value at which the side chain will be 50% ionized is lower, or in other words, in a positive electrostatic field the concentration of protons must be higher to keep a proton associated to the sidechain.
Compositional properties of nucleic acids include hybridization temperature and helix structure.
Slide 008
![](/abc/images/c/cc/L03_s008.jpg)
Lecture 03, Slide 008
Simple tools exist to conveniently calculate compositional properties for peptide sequences. For example the EMBOSS GUI serves EMBOSS tools on the Web on many freely accessible servers in the world. One of these tools ist the pepstats routine that was used to create the output above.
Simple tools exist to conveniently calculate compositional properties for peptide sequences. For example the EMBOSS GUI serves EMBOSS tools on the Web on many freely accessible servers in the world. One of these tools ist the pepstats routine that was used to create the output above.
Slide 009
Slide 010
![](/abc/images/c/c1/L03_s010.jpg)
Lecture 03, Slide 010
Comparison of our unknown text (the German translation of "What's in a name ...") with letter frequencies of German, English and French shows a tendency towards the German origins but we can't immediately say that this is statistically significant. It is, after all, a very short sample. It is interesting to consider the outliers W (for reasons of alliteration) and I and S (for assonance). Both poetic devices can be regarded to create constraints on the choice of words that will lead to deviations from expected distributions.
Comparison of our unknown text (the German translation of "What's in a name ...") with letter frequencies of German, English and French shows a tendency towards the German origins but we can't immediately say that this is statistically significant. It is, after all, a very short sample. It is interesting to consider the outliers W (for reasons of alliteration) and I and S (for assonance). Both poetic devices can be regarded to create constraints on the choice of words that will lead to deviations from expected distributions.
Slide 011
![](/abc/images/a/aa/L03_s011.jpg)
Lecture 03, Slide 011
This has a corollary in the non-random distribution of amino acids across species. Some of this may be due to physicochemical properties of amino acids in a particular ecological niche, but such effects may also be due to chance characteristics of the biochemical machinery of replication and translation.
This has a corollary in the non-random distribution of amino acids across species. Some of this may be due to physicochemical properties of amino acids in a particular ecological niche, but such effects may also be due to chance characteristics of the biochemical machinery of replication and translation.
Slide 012
![](/abc/images/4/4b/L03_s012.jpg)
Lecture 03, Slide 012
Sometimes the excess or depletion of a component may carry important information. In the example above case, the choice of words has been dictated by their letter composition. The poem Eunoia is an example of a univocalic lipogram, a form of concrete poetry, in which the author has constrained himself to use only a single vowel in each of the poem's chapters.
Sometimes the excess or depletion of a component may carry important information. In the example above case, the choice of words has been dictated by their letter composition. The poem Eunoia is an example of a univocalic lipogram, a form of concrete poetry, in which the author has constrained himself to use only a single vowel in each of the poem's chapters.
Slide 013
Slide 014
![](/abc/images/5/58/L03_s014.jpg)
Lecture 03, Slide 014
In this graph, amino acids have been ordered according to their standard frequencies in proteins (blue) to emphasize deviations in a particular protein (white). The sequence of the single stranded RNA binding protein Nab3p is remarkable, possessing long tracts of Glutamine and Glutamic acid and an excess of proline. Just looking at these anomalies allows us to infer that large tracts of sequence are likely not structured (and thus fulfill their function as an adaptable, flexible polypeptide) and that large segments are highly negatively charged (and thus presumably have a highly positively charged ligand, such as the (+)charges of exposed, single stranded nucleotide bases).
In this graph, amino acids have been ordered according to their standard frequencies in proteins (blue) to emphasize deviations in a particular protein (white). The sequence of the single stranded RNA binding protein Nab3p is remarkable, possessing long tracts of Glutamine and Glutamic acid and an excess of proline. Just looking at these anomalies allows us to infer that large tracts of sequence are likely not structured (and thus fulfill their function as an adaptable, flexible polypeptide) and that large segments are highly negatively charged (and thus presumably have a highly positively charged ligand, such as the (+)charges of exposed, single stranded nucleotide bases).
Slide 015
Slide 016
Slide 017
Slide 018
Slide 019
Slide 020
![](/abc/images/4/4b/L03_s020.jpg)
Lecture 03, Slide 020
Restriction endonucleases are the quintessential pattern recognition molecules. They bind strongly the specific conformation of DNA that is associated with a particular DNA sequence. Even though the structural differences between DNA strands of similar sequence is small, evolutionary pressure has resulted in enzymes that are highly specific for their cognate sequence. An excellent site for endonuclease information is Rebase.
Restriction endonucleases are the quintessential pattern recognition molecules. They bind strongly the specific conformation of DNA that is associated with a particular DNA sequence. Even though the structural differences between DNA strands of similar sequence is small, evolutionary pressure has resulted in enzymes that are highly specific for their cognate sequence. An excellent site for endonuclease information is Rebase.
Slide 021
Slide 022
![](/abc/images/d/d4/L03_s022.jpg)
Lecture 03, Slide 022
... or as a circular map, e.g. from PlasMapper.
... or as a circular map, e.g. from PlasMapper.
Slide 023
Slide 024
![](/abc/images/f/f7/L03_s024.jpg)
Lecture 03, Slide 024
Pattern search (or pattern matching) means inspecting an entity and stating whether that entity is an example of a given pattern. Usually the entity is a substring of a sequence, but patterns in protein structure, biological networks or morphogenesis can also be computationally defined. Pattern discovery means finding patterns that have not been defined a priori.
Pattern search (or pattern matching) means inspecting an entity and stating whether that entity is an example of a given pattern. Usually the entity is a substring of a sequence, but patterns in protein structure, biological networks or morphogenesis can also be computationally defined. Pattern discovery means finding patterns that have not been defined a priori.
Slide 025
Slide 026
![](/abc/images/e/ec/L03_s026.jpg)
Lecture 03, Slide 026
... such as the Boyer-Moore algorithm. For a step-by-step version see here. Defining optimal algorithms and analyzing their resource requirements is the domain of computer science.
... such as the Boyer-Moore algorithm. For a step-by-step version see here. Defining optimal algorithms and analyzing their resource requirements is the domain of computer science.
Slide 027
Slide 028
Slide 029
![](/abc/images/9/94/L03_s029.jpg)
Lecture 03, Slide 029
To be able search for patterns we need a convention to define them. In particular, we would like to be able to find degenerate patterns: patterns in which we allow a number of alternative choices for particular positions. Such patterns are commonly written as Regular Expressions (even though some sites, such as the ProSite database use a custom variant of the concept).
To be able search for patterns we need a convention to define them. In particular, we would like to be able to find degenerate patterns: patterns in which we allow a number of alternative choices for particular positions. Such patterns are commonly written as Regular Expressions (even though some sites, such as the ProSite database use a custom variant of the concept).
Slide 030
Slide 031
![](/abc/images/f/ff/L03_s031.jpg)
Lecture 03, Slide 031
A crude Perl program to find the Leucine Zipper pattern uses a regular expression at its core. L.{6}){3,}L means: a string matching an "L", followed by 6 occurrences of any character ("."), repeated three or more times, and terminated by a final "L". (The arcane-looking print statement is just there to capture the sequence number of the pattern.)
A crude Perl program to find the Leucine Zipper pattern uses a regular expression at its core. L.{6}){3,}L means: a string matching an "L", followed by 6 occurrences of any character ("."), repeated three or more times, and terminated by a final "L". (The arcane-looking print statement is just there to capture the sequence number of the pattern.)
Slide 032
Slide 033
Slide 034
![](/abc/images/9/9b/L03_s034.jpg)
Lecture 03, Slide 034
see http://www.expasy.org/tools
see http://www.expasy.org/tools
Slide 035
Slide 036
Slide 037
Slide 038
Slide 039
Slide 040
Slide 041
Slide 042
Slide 043
Slide 044
Slide 045
Slide 046