BIO systems project
Bioinformatics Project: Defining a System
This course gives you a broad overview of bioinformatics principles, but you should also strive to apply those principles towards a biological question.
For your term project I would like you to define a biological "system" - a set of genes that collaborate towards a shared purpose. We start from a biological process, represented in the Gene Ontology (GO). From there we can use methods of function-annotation to identify related processes, functions and cellular components and the genes that are associated with them. The goal of the project is, once such a list of genes is defined, to define the "system" in which those genes collaborate, and to sketch its architecture. This requires both "bottom up" procedures of gene discovery, and "top down" reasoning about ancillary functions such as substrate import, biosynthesis of cofactors, signalling, regulation, constructing scaffolds etc. that may not (yet) be represented in the list of genes, and to clearly identify, concepts such as purpose; boundaries (i.e. which genes from the list are actually part of the system, and which ones are associated with the process, but should be considered to be outside of its boundaries, in a supporting role, a shared role, or simply part of distinct but collaborating system. Membrane transporters might be an obvious example.); interfaces and, the system's input and output.
It is your task to manage this from the perspective of a biological expert and try to define inclusion/exclusion criteria as best as you can. While your "list of genes" is going to be interesting, compiling such lists can be automated. Thus the most valuable outcome of your project how you will address the task of defining the conceptual aspects of the system and attempting to organize this into an architectural sketch.
In practice you should
- choose a biological process you are interested in (I have provided a candidate list);
- collect all contributing genes[1] as best you can, using bioinformatics tools and literature annotations;
- develop unambiguous criteria for including or not including such genes in your system and annotating them;
- list the conceptual roles in your system;
- associate whatever genes you can with those roles and identify genes you were not able to associate with roles, and roles for which the associated genes are unknown;
- carefully document your efforts and results: the datasources and literature, what procedures have been applied, how the results been accessed, validated and interpreted...
Ideally, your system would be defined at a level where the system that realizes it is comprised of some 20, components or so, not more, to keep things manageable.
Contents
First stage: process, genes and concepts(11 marks max.)
To define a system, we will start from a biological process in the GO biological process ontology. I have excerpted a table of processes to get you started, and explained the procedure in detail. You can find the documentation and the table through the links below.
- Notes on the table creation and recommendations how to use it.
- Table of GO terms - use this to choose and "adopt" a process to define a system
Note that you are not constrained to start from a process in that table. If you are determined to work on a different human system because you have particular knowledge about it, you may suggest this to me and perhaps we can add it to the table.
More notes ...
Bottom Up ...
Keep your systems simple. I would avoid choosing systems/processes that integrate sensory, nervous, hormonal and cellular components. This may become too complex. Narrowing it down, to a manageable "subsystem" is a valuable exercise in itself. Such a system may implement
- integrating input,
- transmitting input signals to their effectors,
- regulating the process,
- providing resources,
- defining setpoints,
- assembling or disassembling the system,
- mediating interactions with other systems,
- or similar...
You'll need to collect genes that contribute to those roles. All tools of bioinformatics are fair game for this: finding homologs, looking for information in PubMed, looking for similarity in GO, querying pathway databases, assessing protein-protein interactions etc. etc. – the genes in the table are a useful starting point. It does becomes important however to draw the line: which genes are at the centre of your system, and which genes should really be part of something else.
Keep your systems manageable. When considering how many genes are associated with a system, check the taxon section of the relevant GO terms' statistic on QuickGO. The number of genes involved in the process in humans is likely as large as the largest number for ANY species - although many of the human genes may not have been annotated for that process (yet). For example, if the mouse (mus musculus) has 20 annotated genes and humans have only two, that probably does not mean humans can achieve with only two genes that for which the mouse needs twenty. In this situation, looking for orthologues of mouse genes should lead you to human candidate genes. But as a corollary, if the mouse has many, many genes annotated, that particular process might not be so suitable for this project after all.
... and Top Down.
Spend some thought on naming your "system" well. Focus on the purpose why the system exists, this will help you organize the components – ultimately, the components and architecture need to support that purpose. All of them. Be mindful: the GO terms are usually not suitable as they are to define the purpose of the system. Consider them a starting point.
Think about auxiliary roles that are part of your system: how comes into existence, how it accepts substrates and/or information, how it transforms this input and how its output is generated. Consider that whatever is switched on, needs to be switched off again.
Second stage: Sketch the system architecture (11 marks max.)
The second stage of the project is for you to define an architecture that integrates the system components and the concepts you have defined. Refer to the "Function" script for an example (the yeast G1/S transition switch) how to go about this task. Draw out this architecture in a sketch and post it on your Student Wiki project page.
Final stage: Documentation (4 marks max.)
The documentation must fulfill two aspects.
- First, your documentation must make your data and results reproducible. You need to specify the premises you started from and how you came up with them, and you need to specify the procedure through which you arrived at your conclusions. Put yourselves into the mind of a reviewer: are you providing enough information so that your (computational) steps can be reproduced? Are your source IDs specified? Your resources and programs?
- Second, your documentation must explain the rationale behind your procedure and conclusions. This is not so much what you did but why you did this, what was the logic behind a certain process or decision.
- Form is important:
- structure your project clearly, include a brief introduction and definitely include a meaningful conclusion;
- avoid jargon;
- make it easy to copy data for further analysis (no screenshots unless you are illustrating a Web-site or GUI);
- write complete sentences;
- do not plagiarize, but reference judiciously;
- make sure your references are complete and take advantage of the
<ref> ... </ref>
tags and the{{#pmid:1234567}}
template.
Ask(!) if you are not sure about Wiki markup or formatting to achieve a particular layout.
Due dates
The project (like all class work) is due by the end of classes, December 6. 2016. If you need an extension you must contact me at least a day before the deadline. Please state briefly the requested duration of the extension. The extension request should not extend past the final exam date.
Late submissions
The time of submission is recorded with your edits on the Wiki and can be identified in the View history tab of a page: I will consider the last edit before the submission deadline for marking. There will be no other "late deductions" applied.
Resources
- ↑ I speak of genes here in a very informal sense, the system components may include genes, their encoded proteins, structural and regulatory RNA, metabolites, and even environmental signals.