Difference between revisions of "Assignment 6"

Revision as of 04:25, 7 December 2006

Systems biology is about systems ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we can in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ... as well, it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.

At the end of the day, a biological system is a conceptual construct, a model we use to make sense of nature; nature however, ever pragmatical, knows nothing of systems.

In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular mysticism, if you will (and some experimentalists do take this position), IF systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of metabolite flux in living cellls. Nature appears to prefer constructing components that interact locally, complexes, modules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.

The mental construct of a "system" thus provides a theory of the functional organisation of biological components.

In this assignment we will briefly explore two of the tools that are currently in common use to map molecular observations to integrated systems concepts and consider what we are learning from the mapping.

Preparation, submission and due date

Read carefully. Be sure you have understood all parts of the assignment and understand what you are expected to do! Sadly, we see too many assignments in which students have overlooked directive verbs such as explain, enumerate, list, name, compare, contrast, describe, summarize, outline, apply, justify, establish, defend, account for, sketch, clarify, state, illustrate or discuss. If any of these verbs don't catch your attention, you need to get more coffee before you start. Prepare a Microsoft Word document with a title page that contains:

your full name
your Student ID
your e-mail address
the organism name you have been assigned (see below)

Follow the steps outlined below. You are encouraged to write your answers in short answer form or point form, like you would document an analysis in a laboratory notebook. However, you must

document what you have done,
note what Web sites and tools you have used,
paste important data sequences, alignments, information etc.

If you do not document the process of your work, we will deduct marks. Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exactly reproduce what you have done? If not, we will deduct marks. Avoid RTF and unnecessary formating. Do not paste screendumps. Keep the size of your submission below 1.5 MB.

Write your answers into separate paragraphs and give each its title. Save your document with a filename of: A6_family name.given name.doc (for example my sixth assignment would be named: A6_steipe.boris.doc - and don't switch the order of your given name and familyname please!)

Finally e-mail the document to boris.steipe@utoronto.ca before the due date.

Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.

With the number of students in the course, we have to economize on processing the assignments. Thus we will not accept assignments that are not prepared as described above. If you have technical difficulties, contact me.

The due date for the assignment is Thursday, December 7. at 24:00 (last day of class). In case you need more time since the assignment was posted late, an extension is automatically granted to Tuesday, December 19. at 10:00 in the morning.

Grading

Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.

Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will

count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
be divided by two for BCH1441 (graduates).

(1) Pathways

Pathways are perhaps the earliest biochemical representation of cellular systems. They are a particularly active area of bioinformatics research since it should be possible to construct pathway descriptions essentially in an automatic fashion from databases of known properties of components. For instance if Enzyme A produces metabolite 1 and Enzyme B consumes metabolite 1, even a computer can figure out that A and B could be on the same pathway. However, what if metabolite 1 is ATP? Or H₂O? Careful, manual curation of pathway data is going to be with us for some time to come.

(1.1) KEGG (5 marks)

The Kyoto Encyclopedia of Genes and Genomes is one of the oldest and best curated databases of metabolic and functional pathways. Access the KEGG Web site.

Kegg stores it's curated organisms according to three or four letter codes: homo sapiens → hsa, saccharomyces cerevisiae → sce. Most of our fungi have been annotated in KEGG.

Click on the Organism button to open the interface to find organism abbreviations. Find the abbreviation for your organism (about half of them are there). If you think your organism is not in KEGG or you can't find it, check the Fallback Data page. Record the code.

Navigate to the KEGG2 page and find the link to BLAST search against genes in the database. Upload the APSES domain sequence for your organism, and click on the "Compute" button to execute the search. In the list of hits, yeast Mbp1 should appear, click on its link to access the KEGG gene record.

Click on the Help button on top of the record to find information about what the returned results contain.

In this gene record, there is a link to the protein's curated pathway information.Use the Fallback Data page if you can't find it. Access the pathway. The position of Mbp1 is emphasized with a red box. All boxes in this reference pathway are green and linked to the respective KEGG gene pages.

Use the drop-down menu to switch to the comparative pathway map curated for your organism. Note that this is the same map, but now some of the boxes are white (the KEGG curators have not annotated an orthologue for these genes in the KO (KEGG Ontology) database) and the green boxes are now linked to your organism's gene. This is a very convenient way to check which parts of the well described yeast pathways have been curated to be conserved in your organism.

Briefly (!) compare this pathway with the cell-cycle diagram contained in Figure 4 of Richard Young's conceptual analysis of the yeast cell cycle, based on transcriptional regulatory networks (see PDF link in the resources section):

Are the same genes involved?
Are the same connections shown?
Is the same concept of cyclical progression (time) that is the focus of Young's image represented in the KEGG pathways? (5 marks)

If you play around with the various versions of the annotation, you will notice quickly that most organisms have very few genes cross-annotated with yeast by the KEGG curators (the situation is somewhat better with metabolic pathways, by the way, since computational inference through orthology is less ambiguous). But we have already determined in the second assignment that we can find Mbp1 orthologues in all of these organisms! And they are quite easy to find in KEGG too. Click on the Mbp1 link on the yeast pathway map to take you to the KEGG gene record for the Mbp1 protein. In the SSDB row, click the Ortholog button. Orthologs for all of the organisms (or at least their close relatives in the KEGG database).

Once again, we are back at the same problem: much and increasingly more of our annotations are based on analogy and inference. We study one system experimentallly in a model organism, then we attempt to map the components to another organism. But pursuing the idea of orthology in order to map function is tricky. Even orthologues may have diverged in evolution to distinct and dissimilar functional systems. Note for example that in yeast Mbp1 binds to Swi6 (the MBF complex) and Swi6 can also bind to Swi4, an Mbp1 homologue (the SCF complex). In many CRMs (cis-regulatory modules) their respective binding sites are closely juxtaposed. However only the Saccharomycotina seem to posess orthologues to Swi4. MBF and SCF appear to be two complementary systems derived from the same progenitor gene, presumably each having taken over some part of the space of functions from the other and probably acquired a few novel functions along the way.

I hope that this short discussion has illuminated the problems associated with mapping functions between organisms, based on gene similarity. To paraphrase the issue once again: we are mapping concepts to biology, but "concepts" and "biology" exist in two different worlds. It is helpful, indeed crucial to explain biology in terms of higher-order concepts. This is what we ultimately mean by "understanding" and indeed, if we would not try this, we would be merely "butterfly-collecting". But never, never fall into the trap to base your biological conclusions - equivalence of objects - mechanically on the equivalence of concepts (such as gene similarity, pathway position, GO annotation etc.). The mapping of concept to object may be arbitrarily imprecise and as a consequence, so is the equivalence, once we apply it to biology.

(2) Interactions

(2.1) BioGRID (5 marks)

In high-throughput biology, the genome was the beginning. As Sydney Brenner has phrased it: we have now written the "white-pages" of the cell, fulfilling the "CAP-criterion" (Comprehensive, Accurate and Permanent). The next level is figuring out the way the parts work and many hope substantial progress can be made by mapping their interactions. After all, physiological function can be described to a large part as the result of physical interaction. Let us briefly explore the BioGRID interaction database, to retrieve interactions for yeast Mbp1.

Access the the BioGRID.

Search for interactions of the Mbp1 gene. If your search fails, follow the link on the ( Fallback Data page ).

List what experiment type(s) do the interactors come from.
Check which of these appear in the KEGG map.
Check which of these appear in Figure 4 of the Young group's publication.

(3) Summary of Resources

Links

Sequences

All APSES domains

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List

@@ Line 19: / Line 19: @@
 &nbsp;
-Systems biology is about '''systems''' ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we '''can''' in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ..., as well it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.
+Systems biology is about '''systems''' ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we '''can''' in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ... as well, it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.
 At the end of the day, a biological system is a '''conceptual''' construct, a model we use to make sense of nature; nature however, ever pragmatical, knows nothing of systems.
-In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular Mysticism, if you will (and some experimentalists do take this position), '''IF''' systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of substrate flow in living cellls. Nature appears to prefer constructing components that interact locally, complexes, modeules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.
+In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular mysticism, if you will (and some experimentalists do take this position), '''IF''' systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of metabolite flux in living cellls. Nature appears to prefer constructing components that interact locally, complexes, modules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.
 The mental construct of a "system" thus provides a theory of the functional organisation of biological components.
-In this assignment we will briefly explore two of the tools that are currently in common use to map molecular observations to integrated systems concepts and consider how complete that mapping currently is.
+In this assignment we will briefly explore two of the tools that are currently in common use to map molecular observations to integrated systems concepts and consider what we are learning from the mapping.
 <div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
@@ Line 48: / Line 48: @@
 Write your answers into separate paragraphs and give each its title. Save your document with a filename of:
-<code>A3_family name.given name.doc</code>
+<code>A6_family name.given name.doc</code>
-<small>(for example my first assignment would be named: A3_steipe.boris.doc - and don't switch the order of your given name and familyname please!)</small>
+<small>(for example my sixth assignment would be named: A6_steipe.boris.doc - and don't switch the order of your given name and familyname please!)</small>
 Finally e-mail the document to [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] before the due date.
@@ Line 78: / Line 78: @@
 &nbsp;
-Pathways are perhaps the earliest biochemical representation of cellular systems. They are a particularly active area of bioinformatics research, because many have a sense that it should be possible to construct patway descriptions essentially automatically from databases of known properties of components. For instance if Enzyme A produces metabolite 1 and Enzyme B consumes metabolite 1, even a computer can figure out that A and B could be on the same pathway. However, what if metabolite one is ATP? Or H<sub>2</sub>O? Careful, manual curation of pathway data is going to be with us for a long time.
+Pathways are perhaps the earliest biochemical representation of cellular systems. They are a particularly active area of bioinformatics research since it should be possible to construct pathway descriptions essentially in an automatic fashion from databases of known properties of components. For instance if Enzyme A produces metabolite 1 and Enzyme B consumes metabolite 1, even a computer can figure out that A and B could be on the same pathway. However, what if metabolite 1 is ATP? Or H<sub>2</sub>O? Careful, manual curation of pathway data is going to be with us for some time to come.
 <div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
-=== (1.1)KEGG (5 marks)===
+=== (1.1) KEGG (5 marks)===
 </div>
 &nbsp;<br>
@@ Line 87: / Line 87: @@
 The [http://www.genome.ad.jp/kegg/ '''Kyoto Encyclopedia of Genes and Genomes'''] is one of the oldest and best curated databases of metabolic and functional pathways. Access the [http://www.genome.ad.jp/kegg/ KEGG Web site].
-record three letter code
+Kegg stores it's curated organisms according to three or four letter codes: ''homo sapiens'' &rarr; <code>hsa</code>, ''saccharomyces cerevisiae'' &rarr; <code>sce</code>. Most of our fungi have been annotated in KEGG.
-Navigate to the [http://www.genome.ad.jp/kegg/kegg2.html KEGG2 page] and find the link to BLAST search against genes in the database. Upload the APSES domain sequence for your organism, and click on the "Compute" button to execute the search. In the list of hits, click on the link yeast Mbp1 (if there are more than one, use the one associated with Mbp1's systematic name).
+&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+* Click on the '''Organism''' button to open the interface to find organism abbreviations. Find the abbreviation for your organism (about half of them are there). If you think your organism is not in KEGG or you can't find it, check the [[Assignment_6_fallback_data|'''Fallback Data page''']]. Record the code.
-Click on the Help button on top of the record to find information about what the returned results contain.
+</div>
+&nbsp;<br>
-Access the Pathway associated with this gene. The position of Mbp1 is emphasized with a red box.
-Study the cell cycle control system: note that mbp1 binds to swi6 and swi 6 can also bind to swi4.
-Briefly (!) compare this pathway with the cell-cycle diagram contained in Figure 4 of Richard Young's conceptual analysis of the yeast cell cycle, based on transcriptional regulatory networks (see resources, below; for the purpose of this assignment, let's consider this expert curated schematic to correspond to the standard of truth for the current thinking on this particular system.):
-(i) Are the same genes involved?
-(ii) Are the same connections shown?
-(iii) Is the same concept of time/progression being represented?
-Does your organism have a swi4 orthologue?
-click on swi4.
+Navigate to the [http://www.genome.ad.jp/kegg/kegg2.html KEGG2 page] and find the link to BLAST search against genes in the database. Upload the APSES domain sequence for your organism, and click on the "Compute" button to execute the search. In the list of hits, yeast Mbp1 should appear, click on its link to access the KEGG gene record.
-click on ortholgues.
+Click on the '''Help''' button on top of the record to find information about what the returned results contain.
-Find the three-letter code in that table. What do you conclude?
+In this gene record, there is a link to the protein's curated pathway information.Use the  [[Assignment_6_fallback_data|'''Fallback Data page''']] if you can't find it. Access the pathway. The position of Mbp1 is emphasized with a red box. All boxes in this reference pathway are green and linked to the respective KEGG gene pages.
-Falllback file: three letter codes, Swi 4 orthologues.
+Use the drop-down menu to switch to the comparative pathway map curated for your organism. Note that this is the same map, but now some of the boxes are white (the KEGG curators have not annotated an orthologue for these genes in the KO (KEGG Ontology) database) and the green boxes are now linked to your organism's gene. This is a very convenient way to check which parts of the well described yeast pathways have been curated to be conserved in your organism.
+Briefly (!) compare this pathway with the cell-cycle diagram contained in Figure 4 of Richard Young's conceptual analysis of the yeast cell cycle, based on transcriptional regulatory networks (see PDF link in the resources section):
 &nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
-*Task
+* Are the same genes involved?
+* Are the same connections shown?
+* Is the same concept of cyclical progression (time) that is the focus of Young's image represented in the KEGG pathways? (5 marks)
 </div>
 &nbsp;<br>
-Instruction
+If you play around with the various versions of the annotation, you will notice quickly that most organisms have very few genes cross-annotated with yeast by the KEGG curators (the situation is somewhat better with metabolic pathways, by the way, since computational inference through orthology is less ambiguous). But we have already determined in the second assignment that we can find Mbp1 orthologues in all of these organisms! And they are quite easy to find in KEGG too. Click on the Mbp1 link on the yeast pathway map to take you to the KEGG gene record for the Mbp1 protein. In the '''SSDB''' row, click the '''Ortholog''' button. Orthologs for all of the organisms (or at least their close relatives in the KEGG database).
-&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
-*Task.
+Once again, we are back at the same problem: much and increasingly more of our annotations are based on analogy and inference. We study one system experimentallly in a model organism, then we attempt to map the components to another organism. But pursuing the idea of orthology in order to map function is tricky. Even orthologues may have diverged in evolution to distinct and dissimilar functional '''systems'''. Note for example that in yeast Mbp1 binds to Swi6 (the MBF complex) and Swi6 can also bind to Swi4, an Mbp1 homologue (the SCF complex). In many CRMs (cis-regulatory modules) their respective binding sites are closely juxtaposed. However only the ''Saccharomycotina'' seem to posess orthologues to [http://www.ncbi.nlm.nih.gov/sutils/blink.cgi?pid=6320957 Swi4]. MBF and SCF appear to be two complementary systems derived from the same progenitor gene, presumably each having taken over some part of the space of functions from the other and probably acquired a few novel functions  along the way.
-</div>
+I hope that this short discussion has illuminated the problems associated with mapping functions between organisms, based on gene similarity. To paraphrase the issue once again: we are mapping concepts to biology, but "concepts" and "biology" exist in two different worlds. It is helpful, indeed crucial to explain biology in terms of higher-order concepts. This is what we ultimately mean by "understanding" and indeed, if we would not try this, we would be merely "butterfly-collecting". But never, never fall into the trap to base your biological conclusions - '''equivalence of objects''' - mechanically on the '''equivalence of concepts''' (such as gene similarity, pathway position, GO annotation ''etc.''). The mapping of concept to object may be arbitrarily imprecise and as a consequence, so is the equivalence, once we apply it to biology.
 &nbsp;
 &nbsp;
@@ Line 143: / Line 130: @@
 </div>
 &nbsp;<br>
+In high-throughput biology, the genome was the beginning. As Sydney Brenner has phrased it: we have now written the "white-pages" of the cell, fulfilling the "CAP-criterion" (Comprehensive, Accurate and Permanent). The next level is figuring out the way the parts work and many hope substantial progress can be made by mapping their interactions. After all,  physiological function can be described to a large part as the result of physical interaction. Let us briefly explore the BioGRID interaction database, to retrieve interactions for yeast Mbp1.
-We will thus briefly explore BioGRID, to retrieve interactions for Mbp1, the yeast gene for which we have been studying orthologues in other organisms.
 Access the [http://www.thebiogrid.org/ the BioGRID].
-Search for interactions of the Mbp1 gene.
+Search for interactions of the Mbp1 gene. If your search fails, follow the link on the ( [[Assignment_6_fallback_data|'''Fallback Data page''']] ).
-Briefly (!) comment on the two interactors that are retrieved:
-(i) What experiment type(s) do they come from?
-(ii) Do they appear in the KEGG map?
-(iii) Do they appear in Figure 4 of the Young group's publication?
 &nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
-*Task
+* List what experiment type(s) do the interactors come from.
+* Check which of these appear in the KEGG map.
+* Check which of these appear in Figure 4 of the Young group's publication.
 </div>
 &nbsp;<br>
-Instruction
-&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
-*Task.
-</div>
-&nbsp;
 &nbsp;
@@ Line 184: / Line 151: @@
 :* [http://biochemistry.utoronto.ca/undergraduates/courses/BCH441H/restricted/Young_2002_SC_RegulatoryNetwork.pdf '''(PDF, restricted)''' ''Al. et'' Richard Young (2002) Transcriptional Regulatory Networks in ''Saccharomyces cerevisiae'']
 :* [[Organism_list_2006|Assigned Organisms]]
+:* [http://www.thebiogrid.org/ the BioGRID]
 :* [[Assignment_6_fallback_data|'''Fallback Data page''']]

Difference between revisions of "Assignment 6"

Revision as of 04:25, 7 December 2006

Contents

(1) Pathways

(1.1) KEGG (5 marks)

(2) Interactions

(2.1) BioGRID (5 marks)

(3) Summary of Resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools