Assignment 6

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

Systems biology is about systems ... but what is a system, anyway? Definitions abound, the recurring theme is that of connected "components" forming a complex "whole". Complexity describes the phenomenon that properties of components can depend on the "context" of a component; the context is the entire system. That fact that it can be meaningful to treat such a set of components in isolation, dissociated from their other environment, tells us that not all components of biology are connected to the same degree. Some have many, strong, constant interactions (often these are what we refer to "systems"), others have few, weak, sporadic interactions and thus can often be dissociated in analysis. Given the fact that complex biological components can be perturbed by any number of generic environmental influences as well as specific modulating interactions, it is non-trivial to observe that we can in many cases isolate some components or sets of components and study them in a meaningful way. A useful mental image is that of clustering in datasets: even if we can clearly define a cluster as a number of elements that are strongly connected to each other, that usually still means some of these elements also have some connections with elements from other clusters. Moreover, our concepts of systems is often hierarchical, discussing biological phenomena in terms of entities or components, subsystems, systems, supersystems ... as well, it often focusses on particular dimensions of connectedness, such as physical contact, in the study of complexes, material transformations, in the study of metabolic systems, or information flow, in the study of signalling systems and their higher-order assemblies in control and development.

At the end of the day, a biological system is a conceptual construct, a model we use to make sense of nature; nature however, ever pragmatical, knows nothing of systems.

In this sense systems biology could be dismissed as an artificial academic exercise, semantics, even molecular mysticism, if you will (and some experimentalists do take this position), IF systems biology were not curiously successful in its predictions. For example, we all know that metabolism is a network, crosslinked at every opportunity; still, the concept of pathways appears to correlate with real, observable properties of metabolite flux in living cellls. Nature appears to prefer constructing components that interact locally, complexes, modules and systems, in a way that encapsulates their complex behaviour, rather than leaving them free to interact randomly with any other number of components in a large, disordered bag.

The mental construct of a "system" thus provides a framework to build concepts of the functional organisation of biological components.

In this assignment we will briefly explore two of the tools that are currently in common use to map molecular observations to integrated systems concepts and consider what we are learning from the mapping.

Preparation, submission and due date

Read carefully.: Be sure you have understood all parts of the assignment and understand what you are expected to do! Sadly, we see too many assignments in which students have overlooked directive verbs such as explain, enumerate, list, name, compare, contrast, describe, summarize, outline, apply, justify, establish, defend, account for, sketch, clarify, state, illustrate or discuss. If any of these verbs don't catch your attention, you need to get more coffee before you start.

Review the guidelines for preparation and submission of BCH441 assignments.

The due date for the assignment is Friday, December 7 at 15:00 in the afternoon.

(1) Pathways

Pathways are perhaps the earliest biochemical representation of cellular systems. They are a particularly active area of bioinformatics research since it should be possible to construct pathway descriptions essentially in an automatic fashion from databases of known properties of components. For instance if Enzyme A produces metabolite 1 and Enzyme B consumes metabolite 1, even a computer can figure out that A and B could be on the same pathway. However, what if metabolite 1 is ATP? Or H₂O? Careful, manual curation of pathway data is going to be with us for some time to come.

(1.1) KEGG (5 marks)

The Kyoto Encyclopedia of Genes and Genomes is one of the oldest and best curated databases of metabolic and functional pathways. It stores hand-curated pathways for a number of model organisms and supports computational inference for other organisms by determining orthologues. Access the KEGG Web site.

Kegg identifies organisms in the database according to three or four letter codes, eg. homo sapiens → hsa, saccharomyces cerevisiae → sce. Most of our fungi have manually curated genes in annotated in KEGG, however not all. Thus ...

... if your organism is:		use this alternative instead:
Gibberella zeae	→	Magnaporthe griseae
Aspergillus terreus	→	Aspergillus nidulans
Coprinopsis cinereae	→	Cryptococcus neoformans

Click on the KEGG Organism link to access the list of organisms abbreviations. Find the abbreviation for your organism (or your alternative). Record the code.

Navigate to the KEGG2 entry page, which contains a number of options to search the database contents. I have instantiated the links with searches relevant to yeast Mbp1 and the cell-cycle in the list below, try the links.

The simplest option is to search for a gene name. KEGG will return all matches to that name it finds in its records, including comments.
You can execute a BLAST search against the database and thus search with domain sequences, such as the APSES domain, rather than with entire genes;
define a ligand or enzymatic reaction;
use the PATHWAY search tool to retrieve information on a particular system . Upload the Mbp1 APSES domain sequence for your organism, and click on the "Compute" button to execute the search. In the list of hits, yeast Mbp1 should appear, click on its link to access the KEGG gene record. (Or use the Fallback Data page if this fails).

Click on the Help button on top of the record to find information about what the returned results contain.

In this gene record, there is a link to the protein's curated pathway information.Use the Fallback Data page if you can't find it. Access the pathway. The position of Mbp1 is emphasized with a red box. All boxes in this reference pathway are green and linked to the respective KEGG gene pages.

Use the drop-down menu to switch to the comparative pathway map curated for your organism. Note that this is the same map, but now some of the boxes are white (the KEGG curators have not annotated an orthologue for these genes in the KO (KEGG Ontology) database) and the green boxes are now linked to your organism's gene instead of yeast. This is a very convenient way to check which parts of the well described yeast pathways have been curated as conserved in your organism.

Briefly (!) compare this pathway with the cell-cycle diagram contained in Figure 4 of Richard Young's conceptual analysis of the yeast cell cycle, based on transcriptional regulatory networks (see PDF link in the resources section):

Are the same genes involved?
Are the same connections shown?
Is the same concept of cyclical progression (time) that is the focus of Young's Figure 4 represented in the KEGG pathways? (5 marks)

If you play around with the various versions of the annotation, you will notice quickly that most organisms have very few genes cross-annotated with yeast by the KEGG curators (the situation is somewhat better with metabolic pathways, by the way, since computational inference through orthology is less ambiguous). But we have already determined in the second assignment that we can find Mbp1 orthologues in all of these organisms! And they are quite easy to find in KEGG too. Click on the Mbp1 link on the yeast pathway map to take you to the KEGG gene record for the Mbp1 protein. In the SSDB row, click the Ortholog button. Orthologs for all of the organisms (or at least their close relatives in the KEGG database) have been precomputed. It should take only a moment to check that the orthologue in your organism is listed too - even if the box was "white" on the pathway page. This is not an error - it just reflects different levels of annotation, curation and inference.

Once again, we are back at a familiar problem: much and increasingly more of our annotations are based on analogy and inference. We study one system experimentally in a model organism, then we attempt to map the components to another organism. But pursuing the idea of orthology in order to map function is tricky. Even orthologues may have diverged in evolution to distinct and dissimilar functional systems. Note for example that in yeast Mbp1 binds to Swi6 (the MBF complex) and Swi6 can also bind to Swi4, an Mbp1 homologue (the SCF complex). In many CRMs (cis-regulatory modules) their respective binding sites are closely juxtaposed. However only the Saccharomycotina seem to posess orthologues to Swi4. MBF and SCF appear to be two complementary systems derived from the same progenitor gene, presumably each having taken over some part of the space of functions from the other and probably acquired a few novel functions along the way.

I hope that this short discussion has illuminated the problems associated with mapping functions between organisms, based on gene similarity. To paraphrase the issue one more time: we are mapping concepts to biology, but "concepts" and "biology" exist in two different worlds. It is helpful, indeed crucial to explain biology in terms of higher-order concepts. This is what we ultimately mean by "understanding" and indeed, if we would not try this, we would be merely "butterfly-collecting". But never, never fall into the trap to base your biological conclusions - equivalence of objects - mechanically on the equivalence of concepts (such as gene similarity, pathway position, GO annotation etc.). The mapping of concept to object may be arbitrarily imprecise and as a consequence, so is the equivalence, once we apply it to biology.

(2) Interactions

(2.1) BioGRID (5 marks)

In high-throughput biology, the genome was the beginning. As Sydney Brenner has phrased it: we have now written the "white-pages" of the cell, fulfilling the "CAP-criterion" (Comprehensive, Accurate and Permanent). The next level is figuring out the way the parts work and many of us expect that substantial progress can be made by mapping their interactions. After all, physiological function can be described to a large part as the result of physical interaction. Let us briefly explore the BioGRID interaction database, to retrieve interactions for yeast Mbp1.

Access the the BioGRID.

Search for interactions of the Mbp1 gene by entering the gene name into the form field and selecting saccharomyces cerevisiae from the drop down menu. If your search fails, follow the link on the ( Fallback Data page ).

List what general experiment type(s) the interactors come from. (In particular note the difference between yellow and green boxes).
Check which of these interactions appear in the KEGG map.
Check whether all of these interactions suggested by Figure 4 of the Young group's publication are reflected in the BioGRID interactions.
Comment briefly on the apparent relationship between high-throughput experimental data (BioGRID) and curated expert knowledge (KEGG, PubMed). (5 marks)

A final point of importance: there are two fundamentally different types of experiments that are being stored in interaction databases.

One type are direct-, or physical interactions. These interactions can be found in interaction-trap experiment, tap-tags and other co-purification schemes, or in elucidation of biochemical activity. A typical scenario would be genes in a physical complex.
The other type are genetic- or functional interactions. These stem from experiments like "synthetic lethality" or "synthetic rescue" screens. These interactions become apparent when the combination of two genetic alteration result in a phenotype. The typical scenario would be the knock-out of two parallel, redundant pathways. Either knock-out would not be lethal, but losing the redundcay kills the cell. This demonstrates that the pathways are complementary.

The catch is that physical interactions usually find genes that functionally are found on the same pathway, while genetic interactions usually find genes that are on parallel pathways. That is a huge difference that is totallly underappreciated in the published databases. At least BioGRID colors them differently. But they should really be in separate lists.

(Except where both experiment types find the same gene, as with Skn7. Well, it would have been too easy otherwise, wouldn't it?)

(3) Summary of Resources

Links

Sequences

All APSES domains

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List

Assignment 6

Contents

(1) Pathways

(1.1) KEGG (5 marks)

(2) Interactions

(2.1) BioGRID (5 marks)

(3) Summary of Resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools