Difference between revisions of "Assignment 6"

From "A B C"
Jump to navigation Jump to search
 
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
__TOC__
+
<!-- {{Template:Active}} -->
 +
{{Template:Inactive}}
 +
 
 +
 
 +
__TOC__
 
&nbsp;
 
&nbsp;
 
&nbsp;
 
&nbsp;
  
 
<div style="padding: 5px; background: #A6AFD0;  border:solid 1px #AAAAAA; font-size:200%;font-weight:bold;">
 
<div style="padding: 5px; background: #A6AFD0;  border:solid 1px #AAAAAA; font-size:200%;font-weight:bold;">
Assignment 6 - Patterns, Regulons and Systems
+
Assignment 6 - <!-- Patterns, Regulons and -->Systems
 
</div>
 
</div>
  
Please note: This assignment is currently inactive. Unannounced changes may be made at any time.
 
&nbsp;
 
  
<!-- '''Please note: This assignment is currently active. All changes will be announced on the course mailing list.''' -->
+
<!-- '''Please note: This assignment is currently active. All significant changes will be announced on the course mailing list.'''
&nbsp;
 
  
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
+
&nbsp;-->
Introduction
 
; A theory can be proved by an experiment; but no path leads from experiment to the birth of a theory.
 
:''<small>(Variously attributed to Albert Einstein and Manfred Eigen)</small>''
 
  
 +
<div style="padding: 15px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 +
;A theory can be proved by an experiment; but no path leads from experiment to the birth of a theory.
 +
::''<small>(Variously attributed to Albert Einstein and Manfred Eigen)</small>''
 
</div>
 
</div>
 +
&nbsp;
 +
&nbsp;
  
Systems biology is about '''systems''' ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we '''can''' in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ..., as well it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.
+
Systems biology is about '''systems''' ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we '''can''' in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ... as well, it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.
  
 
At the end of the day, a biological system is a '''conceptual''' construct, a model we use to make sense of nature; nature however, ever pragmatical, knows nothing of systems.
 
At the end of the day, a biological system is a '''conceptual''' construct, a model we use to make sense of nature; nature however, ever pragmatical, knows nothing of systems.
  
In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular Mysticism, if you will (and some experimentalists do take this position), '''IF''' systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of substrate flow in living cellls. Nature appears to prefer constructing components that interact locally, complexes, modeules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.
+
In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular mysticism, if you will (and some experimentalists do take this position), '''IF''' systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of metabolite flux in living cells. Nature appears to prefer constructing components that interact locally, complexes, modules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.
 +
 
 +
The mental construct of a "system" thus provides a framework for concepts that describe the '''functional''' organisation of biological components.
 +
 
 +
In this assignment we will briefly explore pathways and interactions, from the perspective of tools that are currently in common use to map molecular observations to integrated systems concepts.
 +
 
  
The mental construct of a "system" thus provides a theory of the functional organisation of biological components.
+
{{Template:Preparation|
 +
care=Be sure you have understood all parts of the assignment and understand what you are expected to do! Sadly, we see too many assignments in which students have overlooked directive verbs such as explain, enumerate, list, name, compare, contrast, describe, summarize, outline, apply, justify, establish, defend, account for, sketch, clarify, state, illustrate or discuss. If any of these verbs don't catch your attention, you need to get more coffee before you start.|
 +
num=6|
 +
ord=sixth|
 +
due = Friday, December 7 at 24:00}}
  
In this assignment we will briefly explore two of the tools that are currently in common use to map molecular observations to integrated systems concepts and consider how complete that mapping currently is.
 
  
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
+
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
Preparation, submission and due date
+
==(1) Pathways==
 
</div>
 
</div>
 +
&nbsp;
 +
&nbsp;
  
Read carefully....
+
Pathways are perhaps the earliest biochemical representation of cellular systems. They are a particularly active area of bioinformatics research, stimulated by the vision of automatically grouping biological entities into meaningful systems, based on their known properties, or the properties of homologues. For instance if '''Enzyme A''' produces ''metabolite 1'' and '''Enzyme B''' consumes ''metabolite 1'', even a computer can figure out that '''A''' and '''B''' can be functionally connected. If we stitch all such connections together, we arrive at a description of material flow inside the cell, or of regulatory connections, or of signaling events. Of course, '''A''' and '''B''' have to be in the same compartment, and ''metabolite 1'' shouldn't be ATP. Or H<sub>2</sub>O. And if we infer relationships from well-studied model organisms, the components we compare really have to be orthologues. We are not sure yet to what degree automation will be possible, remember of the issues we had determining orthologues to the Mbp1 protein and its domains. Careful, manual curation of pathway data is going to be with us for some time to come.
 +
 
 +
&nbsp;
 +
 
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 +
=== (1.1) KEGG (5 marks)===
 +
</div>
 +
&nbsp;<br>
  
Prepare a Microsoft Word document with a title page that contains:
 
*your full name
 
*your Student ID
 
*your e-mail address
 
*the organism name you have been assigned (see below)
 
  
Follow the steps outlined below. You are encouraged to  write your answers in short answer form or point form, '''like you would document an analysis in a laboratory notebook'''. However, you must
 
*document what you have done,
 
*note what Web sites and tools you have used,
 
*paste important data sequences, alignments, information etc.
 
  
'''If you do not document the process of your work, we will deduct marks.''' Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exactly reproduce what you have done? If not, we will deduct marks. Avoid RTF and unnecessary formating. Do not paste screendumps. Keep the size of your submission below 1.5 MB.
+
The [http://www.genome.ad.jp/kegg/ '''Kyoto Encyclopedia of Genes and Genomes'''] is one of the oldest and best curated databases of metabolic and functional pathways. It stores hand-curated pathways for a number of model organisms and supports computational inference for other organisms by determining orthologues.  
  
Write your answers into separate paragraphs and give each its title. Save your document with a filename of:
+
&nbsp;<br><div style="padding: 5px; background: #DDDDEE;">
<code>A3_family name.given name.doc</code>
+
Access the [http://www.genome.ad.jp/kegg/ KEGG Web site].
<small>(for example my first assignment would be named: A3_steipe.boris.doc - and don't switch the order of your given name and familyname please!)</small>
+
</div>
 +
&nbsp;<br>
  
Finally e-mail the document to [boris.steipe@utoronto.ca] before the due date.
 
  
Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.
+
Kegg identifies organisms in the database according to three or four letter codes, eg. ''homo sapiens'' &rarr; <code>hsa</code>, ''saccharomyces cerevisiae'' &rarr; <code>sce</code>. '''Most''' of our fungi have manually curated genes in annotated in KEGG, however not all. Thus ...
  
With the number of students in the course, we have to economize on processing the assignments. '''Thus we will not accept assignments that are not prepared as described above.''' If you have technical difficulties, contact me.
+
<table border="0">
 +
<tr><td>... if your assigned organism is:</td><td>&nbsp;</td><td>use this alternative instead:</td></tr>
 +
<tr><td>''Gibberella zeae''</td><td>&rarr;</td><td>''Magnaporthe griseae''</td></tr>
 +
<tr><td>''Neurospora crassa''</td><td>&rarr;</td><td>''Magnaporthe griseae''</td></tr>
 +
<tr><td>''Aspergillus terreus''</td><td>&rarr;</td><td>''Aspergillus nidulans''</td></tr>
 +
<tr><td>''Coprinopsis cinereae''</td><td>&rarr;</td><td>''Cryptococcus neoformans''</td></tr>
 +
</table>
  
'''The due date for the assignment is XXXXX at 10:00 in the morning.'''
 
  
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
+
&nbsp;<br><div style="padding: 5px; background: #DDDDEE;">
Grading
+
Click on the '''KEGG Organism''' link to access the list of organisms abbreviations. Find the abbreviation for your organism (or your alternative). Record the code.
 
</div>
 
</div>
 +
&nbsp;<br>
  
Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.
+
Navigate to the [http://www.genome.ad.jp/kegg/kegg2.html '''KEGG2''' entry page], which contains a number of options to search the database contents. I have instantiated the links with searches relevant to yeast Mbp1 and the cell-cycle in the list below, try the links and make sure you understand what they contain.
  
Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will
 
* count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
 
* be divided by two for BCH1441 (graduates).
 
  
&nbsp;
+
*The simplest option is to search for a [http://www.genome.ad.jp/dbget-bin/www_bfind_sub?mode=bfind&max_hit=1000&dbkey=kegg&keywords=mbp1 '''gene name''']. KEGG will return all matches to that name in record titles and annotations.
&nbsp;
+
*You can execute a [http://blast.genome.jp/ '''BLAST'''] search against the database and thus search with domain sequences, such as the APSES domain, rather than with entire genes;
 +
*define a ligand or [http://www.genome.ad.jp/dbget-bin/www_bfind_sub?dbkey=enzyme&keywords=%22cyclin-dependent+kinase%22&mode=bfind&max_hit=1000 '''enzyme'''];
 +
*use the [http://www.genome.ad.jp/dbget-bin/www_bfind_sub?dbkey=pathway&keywords=%22cell+cycle%22&mode=bfind&max_hit=1000 '''PATHWAY'''] search tool to retrieve information on a particular system.
 +
 
 +
 
 +
The gene-search results return a list of genes, one of them should be [http://www.genome.ad.jp/dbget-bin/www_bget?sce:YDL056W '''sce:YDL056W] the KEGG code for Mbp1's systematic name. Access that record and click on the '''Help''' button on top of the record to find information about what the returned results contain.
 +
 
 +
'''Not all of KEGG's curated genes contain a link to a pathway record.''' However Mbp1 does. There is a line labeled Pathway with a link to the protein's curated pathway information: '''sce04111'''. (This pathway code '''<code>04111</code>''' also should have come up as one of the pathways returned via the search for "cyclin-dependent kinase" as an enzyme, or as one of the pathways returned for the pathway search for "cell-cycle".)
 +
 
  
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
+
&nbsp;<br><div style="padding: 5px; background: #DDDDEE;">
==SECTION Heading==
+
Follow the link to the yeast cell-cycle pathway.
 
</div>
 
</div>
&nbsp;
+
&nbsp;<br>
&nbsp;
+
 
 +
 
 +
 
 +
The position of Mbp1 is emphasized with a red box. All boxes in this reference pathway are green, this indicates that a gene for that component of the pathway has been curated and stored in the database. The boxes are linked to the respective KEGG gene pages. The phases of the cell cycle ''G1 - S - G2 - M'' are indicated at the bottom of the chart.
 +
 
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
&nbsp;<br><div style="padding: 5px; background: #DDDDEE;">
=== SUB section Heading (X marks)===
+
Use the drop-down menu to switch to the comparative pathway map curated '''for your organism'''.
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>
  
Instruction
+
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
Note that this displays the same map, but now some of the boxes are white (the KEGG curators have not annotated an orthologue for these genes in the KO (KEGG Ontology) database) and the green boxes are now linked to your organism's gene instead of yeast. This is a very convenient way to check which components of the well described yeast pathways have been curated as conserved in your organism.
*Task
+
 
 +
&nbsp;<br><div style="padding: 5px; background: #DDDDEE;">
 +
Briefly (!) compare this pathway with the cell-cycle diagram contained in [http://biochemistry.utoronto.ca/undergraduates/courses/BCH441H/restricted/Young_2002_SC_RegulatoryNetwork.pdf Figure 4 of Richard Young's conceptual analysis of the yeast cell cycle], based on transcriptional regulatory networks:
 +
 
 +
* Are the same genes annotated for the ''G<sub>1</sub>/S'' transition both representations?
 +
* Do the two representations of the ''G<sub>1</sub>/S'' transition have the same connections shown?
 +
* Young's Figure 4 focusses on a cyclical progression (in time). Is time represented in a similar way in the KEGG map?
 +
 
 +
* Not all of the boxes in you organism's version of the yeast cycle are shown in green. What can you conclude from the presence or absence of annotated genes in KEGG about how the ''G<sub>1</sub>/S'' transition is regulated in your organism?
 +
 
 +
(5 marks)
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>
  
Instruction
+
If you explore the various organisms for which this map has been transferred by homology (options menu at the top), you will notice that most organisms have very much less genes mapped to this yeast pathway by the KEGG curators (the situation is somewhat better with metabolic pathways, by the way, since computational inference through orthology is less ambiguous). But we have already determined in the second assignment that we can find Mbp1 orthologues in all of these organisms! And they are quite easy to find in KEGG too. Click on the Mbp1 link on the yeast pathway map to take you to the KEGG gene record for the Mbp1 protein. In the '''SSDB''' row, click the '''Ortholog''' button. Orthologs for all of the organisms (or at least their close relatives in the KEGG database) have been precomputed. It should take only a moment to check that the orthologue in your organism is listed too - even if the box was "white" on the pathway page. This is not an error - it just reflects different levels of annotation, curation and inference.
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
 
*Task.
+
Once again, we are back at a familiar problem: much and increasingly more of our annotations are based on analogy and inference. We study one system experimentally in a model organism, then we attempt to map the components to another organism. But pursuing the idea of orthology in order to map function is tricky. Even orthologues may have diverged in evolution to distinct and dissimilar functional '''systems'''. Note for example that in yeast Mbp1 binds to Swi6 (the MBF complex) and Swi6 can also bind to [http://www.ncbi.nlm.nih.gov/sutils/blink.cgi?pid=6320957 Swi4], an Mbp1 homologue (the SBF complex). In many CRMs (cis-regulatory modules) the respective binding sites of Mbp1 and Swi4 are closely juxtaposed. However only the ''Saccharomycotina'' seem to posess orthologues to Swi4, at least as far as they are more similar to Swi4 than to Mbp1. However in our phylogenetic analysis we noted an Mbp1 paralogue in the fungal cenancestor, which then was at the root of the Swi4/MbpA genes of our tree. We have called its descendant Swi4 in some cases, MbpA in others since we have annotated it from the perspective of similarity to yeast. Is Mbp1 a gene that has taken on functions that are distinct from Swi4? MBF and SBF appear to be two complementary systems, presumably each having taken over some part of the space of functions from the other and probably acquired a few novel functions  along the way. But the situation in the other fungi cannot be unambiguously inferred from the evidence we have considered.
</div>
 
  
&nbsp;
+
I hope that this short discussion has illuminated the problems associated with mapping functions between organisms, based on gene similarity. To paraphrase the issue one more time: we are mapping concepts to biology, but "concepts" and "biology" exist in two different worlds. It is helpful, indeed crucial to explain biology in terms of higher-order concepts. This is what we ultimately mean by "understanding" and indeed, if we would not try this, we would be merely "butterfly-collecting". But never, never fall into the trap of basing your biological conclusions - eg. '''functional equivalence of biological objects''' - mechanically on a computed '''similarity of concepts''' (such as gene similarity, pathway position, GO annotation ''etc.''). The mapping of concept to object may be arbitrarily imprecise and as a consequence, so is the equivalence, once we apply it to the "real" world.
 +
&nbsp;<br>
 
&nbsp;
 
&nbsp;
  
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
==SECTION Heading==
+
 
 +
==(2) Interactions==
 
</div>
 
</div>
&nbsp;
+
&nbsp;<br>
 
&nbsp;
 
&nbsp;
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
=== SUB section Heading (X marks)===
+
===(2.1) Interaction databases (5 marks)===
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>
 +
In high-throughput biology, the genome was the beginning. As [http://en.wikipedia.org/wiki/Sydney_Brenner Sydney Brenner] has phrased it: we have now written the "white-pages" of the cell, fulfilling the "CAP-criterion" (Comprehensive, Accurate and Permanent). The next level is figuring out the way the parts work - if you will, the "Yellow Pages" - and many of us expect that substantial progress can be made by mapping their interactions. After all,  physiological function can be described to a large part as the result of physical interaction.
 +
 +
Please note that there are different types of '''physical interactions'''. We most often think of '''complexes''', either stable or transient homo- or heterooligomers when we speak of physical interactions. But there are also interactions between '''substrates and products''' and not all of them correspond to classical enzymatic pathways. Phosphorylation and dephosphorylation are processes of key importance in  signal transduction and acetylation/deacetylation plays a critical role in regulatory pathways. Here, the substrates are proteins and the interaction with the modifying enzyme is of course a physical interaction.
 +
 +
'''Genetic interactions''' on the other hand are another story. Here the word ''interaction'' is used in an entirely different sense: it is not synonymous with ''contact'' it is synonymous with ''influence''. In fact, most proteins that display genetic interactions would '''not''' be expected to interact physically as well. (Why? Think.) It is important not to mix up the two. To understand what genetic interactions imply, think of the following analogy. If I were to break the wrist of my right arm, my survival would probably not be affected. My left arm would provide sufficient redundancy for most tasks.  What about breaking the right index finger, or spraining the elbow as well? Painful, but functionally not much worse than breaking the wrist alone. What about indigestion, or Alzheimers in addition to the fracture? Annoying, but really not significantly more so with or without a broken wrist. What about a broken left wrist? That would be bad. Losing the function of both hands is much, much worse than losing the function of only one hand. This is the kind of functional ordering that genetic interactions achieve: if two genes are active in the '''same''' system (like the right wrist and index finger) they will '''not''' display genetic interaction. The pathway is blocked and it matters little whether it is blocked in one or two points. If two genes work in completely '''different''' systems, they will also '''not''' show genetic interactions (like a fracture, combined with indigestion). Only if two genes affect parallel, mutually redundant pathways (left and right arm) will their joint deletion cause a critical situation for the affected organism. If the organism dies, we call this a ''synthetic lethal'' effect.
 +
 +
 +
 +
Let us briefly explore the BioGRID interaction database and IntAct, to retrieve interactions for yeast Mbp1. IntAct stores only physical interactions. BioGrid stores physical and genetic interactions.
 +
 +
&nbsp;<br>
 +
<div style="padding: 5px; background: #DDDDEE;">
 +
* Access the [http://www.thebiogrid.org/ the '''BioGRID'''] database at the Samuel-Lunenfeld Research Institute, Mount Sinai Hospital, Toronto.
 +
* Access the [http://www.ebi.ac.uk/intact/site/index.jsf '''Intact'''] database at the EBI.
  
Instruction
+
Search for interactions of the Mbp1 gene by entering the gene name into the form field.
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
 
*Task
+
*Follow the correct link in BioGrid  for ''saccharomyces cerevisiae'' Mbp1 (YDL056W). All genes listed in that table have demonstrated interactions with Mbp1.
 +
*The EBI search directly returns a table of pairwise interactions; both partners are listed as a pair and, in each pair one of the partners should be YDL056W.
 +
 
 +
*How many different physical interaction detection methods do the IntAct records list? Follow the links and read their definitions. <small>('''Bravo''' to the IntAct developers, for '''defining''' their terms. In a better world, all the semantics of our databases should be similarly defined to be meaningful.)</small>
 +
* List what general experimental type(s) the BioGrid interactors come from. (In particular note the difference between <span style="background-color:#FFCC00;">yellow</span> and <span style="background-color:#00FF44;">green</span> boxes).
 +
 
 +
You will note that some, but not all physical interactions listed by BioGRID and IntAct are the '''same''' according to Francis Ouellette's definition: same organism, same proteins, same experiment, same publication.
 +
 
 +
* Which of the IntAct Mbp1 interactions are the same in BioGrid?
 +
* Which of the IntAct interactions appear in the KEGG map.
 +
* Check whether all of the interactions between the regulators of the ''G<sub>1</sub>/S'' phase suggested by Figure 4 of the Young group's publication are present in BioGRID interactions.
 +
 
 +
(5 marks)
 
</div>
 
</div>
 +
 +
&nbsp;<br>
 +
Now, what about your organism? Could you infer interactions between proteins whose orthologs interact in another organism? Such predictions are called ''interologs'' (''inter''acting homo''logs''). Unfortunately, that does not appear to be the case. Confident prediction of interologs can only be achieved in cases of >80% joint sequence identity of both pairs [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020079], a level of similarity that (I believe) none of our Mbp1 proteins achieves.
 +
 
&nbsp;<br>
 
&nbsp;<br>
 +
&nbsp;
  
Instruction
+
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
 
*Task.
+
==(3) Summary of Resources==
 
</div>
 
</div>
 +
&nbsp;<br>
 +
 +
;Literature
 +
 +
:* [http://biochemistry.utoronto.ca/undergraduates/courses/BCH441H/restricted/Young_2002_SC_RegulatoryNetwork.pdf (PDF, restricted) ''Al. et'' Richard Young (2002) Transcriptional Regulatory Networks in ''Saccharomyces cerevisiae'']
 +
:* [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020079 Mika & Rost on the Conservation of Protein-Protein Interactions]
 +
 +
 +
;Sites
 +
 +
:* [http://www.genome.ad.jp/kegg/ the '''KEGG''' Web site]
 +
:* [http://www.thebiogrid.org/ The '''BioGRID''']
 +
:* [http://www.ebi.ac.uk/intact/ The '''IntAct''' database]
  
 
&nbsp;
 
&nbsp;
Line 128: Line 209:
 
</div>
 
</div>
  
If you have any questions at all, don't hesitate to mail me at [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] or post your question to the [mailto:bch441_2006@googlegroups.com Course Mailing List]
+
If you have any questions at all, don't hesitate to mail me at [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] or post your question to the [mailto:bch441_2007@googlegroups.com Course Mailing List]

Latest revision as of 05:13, 27 March 2008

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 


   

Assignment 6 - Systems


A theory can be proved by an experiment; but no path leads from experiment to the birth of a theory.
(Variously attributed to Albert Einstein and Manfred Eigen)

   

Systems biology is about systems ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we can in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ... as well, it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.

At the end of the day, a biological system is a conceptual construct, a model we use to make sense of nature; nature however, ever pragmatical, knows nothing of systems.

In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular mysticism, if you will (and some experimentalists do take this position), IF systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of metabolite flux in living cells. Nature appears to prefer constructing components that interact locally, complexes, modules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.

The mental construct of a "system" thus provides a framework for concepts that describe the functional organisation of biological components.

In this assignment we will briefly explore pathways and interactions, from the perspective of tools that are currently in common use to map molecular observations to integrated systems concepts.


Preparation, submission and due date

Read carefully.
Be sure you have understood all parts of the assignment and understand what you are expected to do! Sadly, we see too many assignments in which students have overlooked directive verbs such as explain, enumerate, list, name, compare, contrast, describe, summarize, outline, apply, justify, establish, defend, account for, sketch, clarify, state, illustrate or discuss. If any of these verbs don't catch your attention, you need to get more coffee before you start.

Review the guidelines for preparation and submission of BCH441 assignments.

The due date for the assignment is Friday, December 7 at 24:00.

   


(1) Pathways

   

Pathways are perhaps the earliest biochemical representation of cellular systems. They are a particularly active area of bioinformatics research, stimulated by the vision of automatically grouping biological entities into meaningful systems, based on their known properties, or the properties of homologues. For instance if Enzyme A produces metabolite 1 and Enzyme B consumes metabolite 1, even a computer can figure out that A and B can be functionally connected. If we stitch all such connections together, we arrive at a description of material flow inside the cell, or of regulatory connections, or of signaling events. Of course, A and B have to be in the same compartment, and metabolite 1 shouldn't be ATP. Or H2O. And if we infer relationships from well-studied model organisms, the components we compare really have to be orthologues. We are not sure yet to what degree automation will be possible, remember of the issues we had determining orthologues to the Mbp1 protein and its domains. Careful, manual curation of pathway data is going to be with us for some time to come.

 

(1.1) KEGG (5 marks)

 


The Kyoto Encyclopedia of Genes and Genomes is one of the oldest and best curated databases of metabolic and functional pathways. It stores hand-curated pathways for a number of model organisms and supports computational inference for other organisms by determining orthologues.

 

Access the KEGG Web site.

 


Kegg identifies organisms in the database according to three or four letter codes, eg. homo sapienshsa, saccharomyces cerevisiaesce. Most of our fungi have manually curated genes in annotated in KEGG, however not all. Thus ...

... if your assigned organism is: use this alternative instead:
Gibberella zeaeMagnaporthe griseae
Neurospora crassaMagnaporthe griseae
Aspergillus terreusAspergillus nidulans
Coprinopsis cinereaeCryptococcus neoformans


 

Click on the KEGG Organism link to access the list of organisms abbreviations. Find the abbreviation for your organism (or your alternative). Record the code.

 

Navigate to the KEGG2 entry page, which contains a number of options to search the database contents. I have instantiated the links with searches relevant to yeast Mbp1 and the cell-cycle in the list below, try the links and make sure you understand what they contain.


  • The simplest option is to search for a gene name. KEGG will return all matches to that name in record titles and annotations.
  • You can execute a BLAST search against the database and thus search with domain sequences, such as the APSES domain, rather than with entire genes;
  • define a ligand or enzyme;
  • use the PATHWAY search tool to retrieve information on a particular system.


The gene-search results return a list of genes, one of them should be sce:YDL056W the KEGG code for Mbp1's systematic name. Access that record and click on the Help button on top of the record to find information about what the returned results contain.

Not all of KEGG's curated genes contain a link to a pathway record. However Mbp1 does. There is a line labeled Pathway with a link to the protein's curated pathway information: sce04111. (This pathway code 04111 also should have come up as one of the pathways returned via the search for "cyclin-dependent kinase" as an enzyme, or as one of the pathways returned for the pathway search for "cell-cycle".)


 

Follow the link to the yeast cell-cycle pathway.

 


The position of Mbp1 is emphasized with a red box. All boxes in this reference pathway are green, this indicates that a gene for that component of the pathway has been curated and stored in the database. The boxes are linked to the respective KEGG gene pages. The phases of the cell cycle G1 - S - G2 - M are indicated at the bottom of the chart.


 

Use the drop-down menu to switch to the comparative pathway map curated for your organism.

 


Note that this displays the same map, but now some of the boxes are white (the KEGG curators have not annotated an orthologue for these genes in the KO (KEGG Ontology) database) and the green boxes are now linked to your organism's gene instead of yeast. This is a very convenient way to check which components of the well described yeast pathways have been curated as conserved in your organism.

 

Briefly (!) compare this pathway with the cell-cycle diagram contained in Figure 4 of Richard Young's conceptual analysis of the yeast cell cycle, based on transcriptional regulatory networks:

  • Are the same genes annotated for the G1/S transition both representations?
  • Do the two representations of the G1/S transition have the same connections shown?
  • Young's Figure 4 focusses on a cyclical progression (in time). Is time represented in a similar way in the KEGG map?
  • Not all of the boxes in you organism's version of the yeast cycle are shown in green. What can you conclude from the presence or absence of annotated genes in KEGG about how the G1/S transition is regulated in your organism?

(5 marks)

 

If you explore the various organisms for which this map has been transferred by homology (options menu at the top), you will notice that most organisms have very much less genes mapped to this yeast pathway by the KEGG curators (the situation is somewhat better with metabolic pathways, by the way, since computational inference through orthology is less ambiguous). But we have already determined in the second assignment that we can find Mbp1 orthologues in all of these organisms! And they are quite easy to find in KEGG too. Click on the Mbp1 link on the yeast pathway map to take you to the KEGG gene record for the Mbp1 protein. In the SSDB row, click the Ortholog button. Orthologs for all of the organisms (or at least their close relatives in the KEGG database) have been precomputed. It should take only a moment to check that the orthologue in your organism is listed too - even if the box was "white" on the pathway page. This is not an error - it just reflects different levels of annotation, curation and inference.

Once again, we are back at a familiar problem: much and increasingly more of our annotations are based on analogy and inference. We study one system experimentally in a model organism, then we attempt to map the components to another organism. But pursuing the idea of orthology in order to map function is tricky. Even orthologues may have diverged in evolution to distinct and dissimilar functional systems. Note for example that in yeast Mbp1 binds to Swi6 (the MBF complex) and Swi6 can also bind to Swi4, an Mbp1 homologue (the SBF complex). In many CRMs (cis-regulatory modules) the respective binding sites of Mbp1 and Swi4 are closely juxtaposed. However only the Saccharomycotina seem to posess orthologues to Swi4, at least as far as they are more similar to Swi4 than to Mbp1. However in our phylogenetic analysis we noted an Mbp1 paralogue in the fungal cenancestor, which then was at the root of the Swi4/MbpA genes of our tree. We have called its descendant Swi4 in some cases, MbpA in others since we have annotated it from the perspective of similarity to yeast. Is Mbp1 a gene that has taken on functions that are distinct from Swi4? MBF and SBF appear to be two complementary systems, presumably each having taken over some part of the space of functions from the other and probably acquired a few novel functions along the way. But the situation in the other fungi cannot be unambiguously inferred from the evidence we have considered.

I hope that this short discussion has illuminated the problems associated with mapping functions between organisms, based on gene similarity. To paraphrase the issue one more time: we are mapping concepts to biology, but "concepts" and "biology" exist in two different worlds. It is helpful, indeed crucial to explain biology in terms of higher-order concepts. This is what we ultimately mean by "understanding" and indeed, if we would not try this, we would be merely "butterfly-collecting". But never, never fall into the trap of basing your biological conclusions - eg. functional equivalence of biological objects - mechanically on a computed similarity of concepts (such as gene similarity, pathway position, GO annotation etc.). The mapping of concept to object may be arbitrarily imprecise and as a consequence, so is the equivalence, once we apply it to the "real" world.  
 

(2) Interactions

 
 

(2.1) Interaction databases (5 marks)

 
In high-throughput biology, the genome was the beginning. As Sydney Brenner has phrased it: we have now written the "white-pages" of the cell, fulfilling the "CAP-criterion" (Comprehensive, Accurate and Permanent). The next level is figuring out the way the parts work - if you will, the "Yellow Pages" - and many of us expect that substantial progress can be made by mapping their interactions. After all, physiological function can be described to a large part as the result of physical interaction.

Please note that there are different types of physical interactions. We most often think of complexes, either stable or transient homo- or heterooligomers when we speak of physical interactions. But there are also interactions between substrates and products and not all of them correspond to classical enzymatic pathways. Phosphorylation and dephosphorylation are processes of key importance in signal transduction and acetylation/deacetylation plays a critical role in regulatory pathways. Here, the substrates are proteins and the interaction with the modifying enzyme is of course a physical interaction.

Genetic interactions on the other hand are another story. Here the word interaction is used in an entirely different sense: it is not synonymous with contact it is synonymous with influence. In fact, most proteins that display genetic interactions would not be expected to interact physically as well. (Why? Think.) It is important not to mix up the two. To understand what genetic interactions imply, think of the following analogy. If I were to break the wrist of my right arm, my survival would probably not be affected. My left arm would provide sufficient redundancy for most tasks. What about breaking the right index finger, or spraining the elbow as well? Painful, but functionally not much worse than breaking the wrist alone. What about indigestion, or Alzheimers in addition to the fracture? Annoying, but really not significantly more so with or without a broken wrist. What about a broken left wrist? That would be bad. Losing the function of both hands is much, much worse than losing the function of only one hand. This is the kind of functional ordering that genetic interactions achieve: if two genes are active in the same system (like the right wrist and index finger) they will not display genetic interaction. The pathway is blocked and it matters little whether it is blocked in one or two points. If two genes work in completely different systems, they will also not show genetic interactions (like a fracture, combined with indigestion). Only if two genes affect parallel, mutually redundant pathways (left and right arm) will their joint deletion cause a critical situation for the affected organism. If the organism dies, we call this a synthetic lethal effect.


Let us briefly explore the BioGRID interaction database and IntAct, to retrieve interactions for yeast Mbp1. IntAct stores only physical interactions. BioGrid stores physical and genetic interactions.

 

  • Access the the BioGRID database at the Samuel-Lunenfeld Research Institute, Mount Sinai Hospital, Toronto.
  • Access the Intact database at the EBI.

Search for interactions of the Mbp1 gene by entering the gene name into the form field.

  • Follow the correct link in BioGrid for saccharomyces cerevisiae Mbp1 (YDL056W). All genes listed in that table have demonstrated interactions with Mbp1.
  • The EBI search directly returns a table of pairwise interactions; both partners are listed as a pair and, in each pair one of the partners should be YDL056W.
  • How many different physical interaction detection methods do the IntAct records list? Follow the links and read their definitions. (Bravo to the IntAct developers, for defining their terms. In a better world, all the semantics of our databases should be similarly defined to be meaningful.)
  • List what general experimental type(s) the BioGrid interactors come from. (In particular note the difference between yellow and green boxes).

You will note that some, but not all physical interactions listed by BioGRID and IntAct are the same according to Francis Ouellette's definition: same organism, same proteins, same experiment, same publication.

  • Which of the IntAct Mbp1 interactions are the same in BioGrid?
  • Which of the IntAct interactions appear in the KEGG map.
  • Check whether all of the interactions between the regulators of the G1/S phase suggested by Figure 4 of the Young group's publication are present in BioGRID interactions.

(5 marks)

 
Now, what about your organism? Could you infer interactions between proteins whose orthologs interact in another organism? Such predictions are called interologs (interacting homologs). Unfortunately, that does not appear to be the case. Confident prediction of interologs can only be achieved in cases of >80% joint sequence identity of both pairs [1], a level of similarity that (I believe) none of our Mbp1 proteins achieves.

 
 

(3) Summary of Resources

 

Literature


Sites

   

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List