Difference between revisions of "BIO systems project"

From "A B C"
Jump to navigation Jump to search
m (Boris moved page BIO project to BIO systems project without leaving a redirect)
 
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<div id="BIO">
 
<div id="BIO">
 
<div class="b1">
 
<div class="b1">
Bioinformatics Project
+
Bioinformatics Project: Defining a System
 
</div>
 
</div>
  
&nbsp;
+
{{Vspace}}
&nbsp;
 
  
This course gives you a broad overview of bioinformatics principles, but you should also strive to explore one aspect of the field more deeply.
+
This course gives you a broad overview of bioinformatics principles, with this project we strive to apply those principles towards a biological question.
  
'''For your term project I would like you to identify a defined biological "function".''' Then you should collect all genes that collaborate towards that function. This would correspond to a "system". The problem is that there are more aspects to a system than just the actual function: genes that are responsible for substrate import, biosynthesis of cofactors, signalling, regulation, constructing scaffolds ''etc.'' may also be part of the system. This means you should
+
{{Vspace}}
* define the function you are interested in;
 
* collect all contributing genes as best you can, using a broad spectrum of literature comments and bioinformatics tools that we may have or have not covered in the course;
 
* develop unambiguous criteria for including or not including such genes in your list;
 
* provide an annotated list of included genes, and ones that you have excluded; and
 
* carefully document your efforts and results: the datasources, what procedures have been applied, how the results been accessed, validated and interpreted...
 
  
Ideally, your function would be defined at a level where it is realized with some 20, 30 genes or so, not much more.
+
'''For your term project I would like you to define a biological "system" - a set of genes that collaborate towards a shared purpose and sketch its architecture.''' We start from a biological process, represented in the '''Gene Ontology''' (GO). From there we can use methods of function-annotation to identify related processes, functions and cellular components and the genes that are associated with them.  To define the "system" in which those genes collaborate, and to sketch its architecture requires both "bottom up" procedures of gene discovery, and "top down" reasoning about ancillary functions and roles such as substrate import, biosynthesis of cofactors, signalling, regulation, constructing scaffolds ''etc.'' that may not (yet) be represented in the list of genes, and to clearly identify, concepts such as ''purpose''; ''boundaries'' (i.e. which genes from the list are actually part of the system, and which ones are associated with the process, but should be considered to be outside of its boundaries, in a supporting role, a shared role, or simply part of distinct but collaborating system.); ''interfaces'' and, the system's ''input'' and ''output''.
+
 
 +
It is your task to manage this from the perspective of a biological expert and try to define inclusion/exclusion criteria as best as you can. While your "list of genes" is going to be interesting, compiling such lists can be automated. Thus the most valuable outcome of your project is how you will address the task of defining the '''conceptual aspects''' of the system and attempting to organize this into an architectural sketch.
 +
 
 +
In practice you should
 +
* choose a biological process you are interested in (I have provided a candidate list);
 +
* collect all contributing genes<ref>I speak of ''genes'' here in a very informal sense, the system components may include genes, their encoded proteins, structural and regulatory RNA, metabolites, and even environmental signals.</ref> as best you can, using bioinformatics tools and literature annotations;
 +
* list the conceptual roles in your system;
 +
* associate whatever genes you can with those roles and identify genes you were not able to associate with roles, and roles for which the associated genes are unknown;
 +
* integrate these concepts in a sketch; and
 +
* carefully document your efforts and results: the datasources and literature, what procedures have been applied, how the results been accessed, validated and interpreted...
  
  
Line 24: Line 27:
  
  
===Open topic===
+
===First stage: process, genes and concepts(11 marks max.)===
The function you choose is open. I'll probably provide a list of suggestions. However, you should ensure you don't choose the same function as someone else in class.
+
 
 +
{{Vspace}}
  
 +
To define a system, we will start from a biological process in the '''GO''' biological process ontology. I have excerpted a table of processes to get you started, and explained the procedure in detail. You can find the documentation and the table through the links below.
  
===First stage: Choosing a suitable function (5 marks max.)===
+
*[[BIO_project_GO-term_table|Notes on the table creation and recommendations how to choose a process.]]
Details to be anounced ...
+
*[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCH441_2016_project_GO_term_table '''Table of GO terms''' - use this to choose and "adopt" a biological process to seed your system]
  
*[[BIO_project_GO-term_table|Table of GO terms]]
+
Note that you are not constrained to start from a process in that table. If you are determined to work on a different human system because you have particular knowledge about it, you may suggest this to me and perhaps we can add it to the table. But you need to coordinate this with me.
  
 +
Once you have chosen which system to work on:
  
<!--
+
* Create a subpage on your student wiki for your project;
#Choose an article and post its PubMed ID on the student Wiki. Make sure the article is not older than one year, and that no one else has chosen the article.  
+
* Copy the wikisource from the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCH441_2016_project_page_template template page I have provided '''here'''] and paste it into your project page. Save it;
#Start a subpage in your Student Wiki user space where you link to the article. Use the following syntax: <code><nowiki>{{#pmid: 16011803}}</nowiki></code>.  
+
* Start filling in the details and expanding.
#Write a one-sentence abstract of the process you are working on. It does not have to be the full process described in the article, but can be limited in scope or depth. However it must be independently useful and non-trivial. A BLAST search won't be enough.
+
 
#Write a sentence on how the results are useful.
+
{{Vspace}}
#Write a bulleted list of procedures that your process uses.  
+
 
#If you think you can improve on the published method, by all means do so. Explain your plan in one or two sentences.  
+
====Bottom up...====
#Add a category tag of [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2014_Bioinformatics_Project <code><nowiki>[[Category:BCH441 2014 Bioinformatics Project]]</nowiki></code>] to your page so it can be easily found.
+
 
#<small>If you would like to deviate from this template, coordinate with me.</small>
+
{{Vspace}}
-->
+
 
 +
* At first, you will complete your picture of the system's components, while at the same time refining your concept of the system's ''purpose''. In general, '''the purpose of the system is probably not going to be the same as the name of the GO term you started from'''. In order to find additional genes, you should consider the common bioinformatics strategies for function annotation. In general,there are four main sources of information to consider:
 +
 
 +
# '''GO''' and '''GOA''' (via [http://www.ebi.ac.uk/QuickGO/ '''QuickGo''']), and the associated GONUTS pages which give you information on related terms and annotated proteins;
 +
# The [http://www.uniprot.org/uniprot/ '''UniProt'''] page for your genes, with its wealth of detailed annotations and cross-references;
 +
# Obviously, (recent) literature that you find via crossreferences and [http://www.ncbi.nlm.nih.gov/pubmed/ '''PubMed'''] searches; and
 +
# the [http://string-db.org/ '''STRING database'''] for functional annotation, to which you can upload a set of genes all at once to discover functionally related genes.
 +
 
 +
In particular:
 +
* your proteins may have ''domains'' that mediate functional interactions with other proteins;
 +
* your proteins' ''structures'' may indicate requirements for ligands and/or co-factors that need to be synthesized, activated and incorporated;
 +
* your GO terms may have protein annotations for other species that are not present for humans, so you can use BLAST to find their ''reciprocal-best-matches'' in humans;
 +
* ''domain composition'' of your proteins may indicate related proteins;
 +
* ''coexpression analysis'' at [http://coxpresdb.jp/ '''COXPRESSdb'''] and [http://www.genefriends.org/ '''GeneFriends'''] may discover co-regulated genes;
 +
* ''protein-protein interactions'' (via UniProt) may point to the participation in functional complexes;
 +
* presence in ''annotated pathways'' (via UniProt) may point to collaborations.
 +
 
 +
{{Vspace}}
 +
 
 +
====... and Top Down====
 +
 
 +
{{Vspace}}
 +
 
 +
* '''Spend some thought on naming your "system" well.''' Focus on the purpose '''why''' the system exists, this will help you organize the components – ultimately, the components and architecture need to support that purpose. All of them. Be mindful: the GO terms are usually '''not''' suitable as they are to define the purpose of the system. Consider them a starting point.
 +
* Write down your definition of the "purpose". Remember: we consider the purpose to be the benefit of the system for the fitness of the organism.
 +
* Think about auxiliary roles that are part of your system: how it comes into existence, how it accepts substrates and/or information, how it transforms this input and how its output is generated. Consider that whatever is switched on, needs to be switched off again.  
 +
* Think about abstract roles like interfaces, set-points, feedback-control elements, signal integration, transmission, effectors ...  
 +
 
 +
{{Vspace}}
 +
 
 +
===Second stage: Sketch the system architecture (11 marks max.)===
 +
 
 +
{{Vspace}}
 +
 
 +
 
 +
The second stage of the project is for you to define an architecture that integrates the system components and the concepts you have defined. Refer to the <span class="PDFlink">[http://steipe.biochemistry.utoronto.ca/abc/CourseMaterials/BCH441/06-Function_LectureNotes.pdf "Function" script] for an example (the yeast G1/S transition switch) how to go about this task. Draw out this architecture in a sketch and include it on your Student Wiki project page.
 +
 
 +
{{Vspace}}
 +
 
 +
===Final stage: Documentation (4 marks max.)===
 +
 
 +
{{Vspace}}
 +
 
 +
The documentation must fulfill '''two''' aspects.
 +
 
 +
* First, your documentation must make your data and results '''reproducible'''. You need to specify the premises you started from and how you came up with them, and you need to specify the procedure through which you arrived at your conclusions. Put yourselves into the mind of a reviewer: are you providing enough information so that your (computational) steps can be reproduced? Are your source IDs specified? Your resources and programs?
 +
 
 +
* Second, your documentation must explain the rationale behind your procedure and conclusions. This is not so much ''what'' you did but ''why'' you did this, what was the logic behind a certain process or decision.
 +
 
 +
* Form is important:
 +
:*structure your project clearly, include a brief introduction and definitely include a meaningful conclusion;
 +
:*avoid jargon;
 +
:*make it easy to copy data for further analysis (no screenshots unless you are illustrating a Web-site or GUI);
 +
:*write complete sentences;
 +
:*do not plagiarize, but reference judiciously;
 +
:*make sure your references are complete and take advantage of the <code>&lt;ref&gt; ... &lt;/ref&gt;</code> tags and the <code>&#123;&#123;#pmid:1234567&#125;&#125;</code> template.
 +
 
 +
Ask(!) if you are not sure about Wiki markup or formatting to achieve a particular layout.
 +
 
 +
{{Vspace}}
 +
 
 +
==Organizational details==
 +
 
 +
{{Vspace}}
 +
 
 +
===Evaluation===
 +
 
 +
{{Vspace}}
 +
 
 +
Marking will consider suitability and usefulness of the process for this project, how well you were able to abstract the procedures, and how well you succeeded to integrate the data into a meaningful description. For general [[Eval_Sessions#Grading|'''Marking rubrics, follow this link''']].
 +
 
 +
{{Vspace}}
  
<!--
+
===Due dates===
<div class="mw-collapsible mw-collapsed exercise-box" data-expandtext="Expand" data-collapsetext="Collapse">
 
As for scope and contents, you may want to consider the following ...
 
  
<div class="mw-collapsible-content exercise-box">
+
{{Vspace}}
* Don't write more than a paragraph, but make it clear '''what''' you want to do and '''why''' this is interesting and/or useful.
 
* Don't provide an extensive review of literature, but make it clear that you understand how your vision is '''neither too narrow nor too broad'''. Addressing a solved problem (e.g. sequence alignment) would be too narrow (but a good tutorial for a complex workflow may be oK). Addressing an (as yet) unsolvable problem (e.g. cure cancer) would be too broad (but an exploration of a well defined step, or what is missing in the field would be oK). '''Add one citation that gives you confidence that what you are planning is doable!''' and make sure to explain in your vision why you think so.
 
* Don't attempt to do too much, but keep in mind how much '''available time''' you have this term. Many students have found their projects inspiring and  greatly enjoyed devoting significant time to it, that's great. But  if you get stressed out because the implementation turns out to be harder than you thought, that's bad. This should be fun. That said, a solid analysis of a problem is useful even without implementation. If it's a cool idea, you can come back to it over the winter break and perhaps form it into something publishable.
 
* Don't burden your concept with details of algorithms and databases &ndash; these may change anyway as you refine the idea &ndash; but by all means, '''add links for clarity''' to appropriate resources and references.
 
* Don't be too invested in any single strategy, but keep in mind that things might not work as expected and give some thought to '''fallback approaches'''.
 
* Above all, don't focus too much on the process and methodology, but be very clear about your '''objectives'''. Software development starts from a "requirements analysis".
 
  
 +
<div class="alert">
 +
The project (like all class work) is due by the end of classes, December 6. 2016. If you need an extension you '''must''' contact me at least a day before the deadline. Please state briefly the requested duration of the extension. The extension request should not extend past the final exam date.
 
</div>
 
</div>
</div>
 
  
-->
+
{{Vspace}}
 +
 
 +
====Late submissions====
 +
 
 +
{{Vspace}}
 +
 
 +
The time of submission is recorded with your edits on the Wiki and can be identified in the '''View history''' tab of a page: I will consider the last edit before the submission deadline for marking. There will be no "late deductions" applied - the deductions are implicit in the status of the project at the due date.
 +
 
 +
{{Vspace}}
 +
 
 +
==Resources==
 +
 
 +
{{Vspace}}
 +
 
 +
;Links
 +
* [[BIO_project_GO-term_table|Documentation of the process table and instructions on '''how to choose a process''' to adopt]]
 +
* [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCH441_2016_project_GO_term_table '''Table of GO terms'''] <small>(on the Student Wiki)</small>
 +
* [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCH441_2016_project_page_template wiki-code for a '''project template page'''] <small>(on the Student Wiki)</small>
 +
* [[Eval_Sessions#Grading|'''Marking rubrics''']]
 +
 
 +
{{Vspace}}
 +
 
 +
<references />
 +
 
 +
{{Vspace}}
 +
 
 +
----
 +
 
 +
[[Category:Bioinformatics]]
 +
 
 +
 
 +
<!-- OLD MATERIAL -->
 +
 
 +
 
 +
<!--
 +
 
 +
:We actually have an interesting situation. It is common for science to ask '''how''' questions, not '''why''' questions, because the '''why''' questions are thought usually not to have a scientific answer, ''i.e.'' they are not well posed in the sense that an answer might not exist, might not be unique, or might not be verifiable as being an answer. But we have discussed that evolution works by selecting from (neutral) variation according to an organism's fitness function. This allows us to formulate an answer to a '''why''' question: a system exists '''because''' it improves the organism's fitness function<ref>Of course this is a simplification - a system might also exist because it is a vestige of evolutionary history. The textbook example we often consider for this case is the existence of whales' pelvic bones. Matters are not so simple however: as has been recently shown these may play a role in copulation ([http://www.ncbi.nlm.nih.gov/pubmed/25186496 PubMed]).</ref>. In general we have no way of quantifying the fitness function - it represents a very high-dimensional multi-parameter optimization problem. But what we '''can''' observe is the existence of purifying selection. This gives us a rigorous, testable, scientific perspective: a system exists '''because''' it does something which results in traces of selection.
 +
 
 +
 
  
<!--
 
  
Marking will consider suitability and usefulness of the process for this project, how well you were able to abstract the procedures, and how well you succeeded to implement the process in your example.
 
  
 
When you are done, hopefully before the deadline, please change your category tag to [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2013_Open_Project_I_(submitted) <code><nowiki>[[Category:BCH441 2013 Open Project I (submitted)]]</nowiki></code>]
 
When you are done, hopefully before the deadline, please change your category tag to [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2013_Open_Project_I_(submitted) <code><nowiki>[[Category:BCH441 2013 Open Project I (submitted)]]</nowiki></code>]
Line 73: Line 179:
 
* [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2013_Open_Project_I_(review_requested) <code><nowiki>[[Category:BCH441 2013 Open Project I (review requested)]]</nowiki></code>] means I propose significant changes to scope or focus of your project. You may discuss with me, change your project, or simply change the category and move on ... this is your project after all.
 
* [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2013_Open_Project_I_(review_requested) <code><nowiki>[[Category:BCH441 2013 Open Project I (review requested)]]</nowiki></code>] means I propose significant changes to scope or focus of your project. You may discuss with me, change your project, or simply change the category and move on ... this is your project after all.
 
* [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2013_Open_Project_II_(in_progress) <code><nowiki>[[Category:BCH441 2013 Open Project II (in progress)]]</nowiki></code>] means you should move on and develop your outline.
 
* [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Category:BCH441_2013_Open_Project_II_(in_progress) <code><nowiki>[[Category:BCH441 2013 Open Project II (in progress)]]</nowiki></code>] means you should move on and develop your outline.
-->
 
  
  
&nbsp;
+
<div class="mw-collapsible mw-collapsed exercise-box" data-expandtext="Expand" data-collapsetext="Collapse">
 +
I suggest you leave the code for the CC license in place. Why? ...
 +
 
 +
<div class="mw-collapsible-content exercise-box">
  
===Second stage: Compiling a list of genes (12 marks max.)===
+
I put code for a [https://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International] (CC) license on the template for your project pages, anticipating that the material might become useful beyond the course at some point - suitably validated - to contribute to a training set for algorithms that automate systems discovery. The CC license is one of the standards for open-sourcing work in our domain; this variant of the license allows reproduction but requires those who produce derivative work to keep the original author names associated with the material.
Details to be announced ...
 
  
<!-- The second stage of the project is your detailed analysis of the process. Describe the steps of the process in enough detail that someone new to the field could execute it following your instructions. Make sure you define input data, algorithms, output, how to present the results, what controls to run, how the results should be interpreted. Don't forget to execute the process in an example and documenting this.  
+
This license does not mean I am CC-licensing the template so you can copy '''my''' work ... no: the license is there to make it easier for others to continue based on '''your''' ideas. The copyright status of course work and whether and how it can feed into further research is complex: on one hand the author is the author. On the other hand, such work is produced at a lecturers direction, feedback and instructions. Moreover there are fair-use and educational exceptions, non-commercial exceptions and the question what part of such a collation could be copyrighted in the first place, much of it being as it were derivative of public databases. In this context, attaching a CC license provides some clarity: any future use and reference should attribute it properly and move on.
  
Develop this on the same page of the Student Wiki as your concept.
+
If you like the idea that your work could perhaps become useful, and have your name associated with such use, then leave the license code on the page.
-->
 
  
&nbsp;
+
Some references:
 +
*<span class="PDFlink">[https://onesearch.library.utoronto.ca/sites/default/files/copyright/Copyright%20FAQ.pdf UofT Library: Copyright FAQ]</span>
 +
*<span class="PDFlink">[http://www.provost.utoronto.ca/Assets/Provost+Digital+Assets/26.pdf UofT Provost: Copyright Fair Dealing Guidelines]</span>
  
===Final stage: Documentation (9 marks max.)===
 
  
Details to be announced...
+
</div>
 +
</div>
  
<!-- Finally, provide some review feedback. I will assign each student three projects for review. Access the project page and write approximately a paragraph of critique on the ''Discussion page''. Discuss whether the process was easy to follow, completely described, and, in your view useful (for what purpose). As a metric for completeness, imagine you were a project student in the lab and had just been handed the project text as instruction to perform an analysis on new data. Would it contain all you need to know to proceed?
 
  
  
&nbsp;
+
<div class="mw-collapsible mw-collapsed exercise-box" data-expandtext="Expand" data-collapsetext="Collapse">
 +
As for scope and contents, you may want to consider the following ...
  
===Evaluation===
+
<div class="mw-collapsible-content exercise-box">
# Evaluation will be done with contributions from your peers; details will be announced at a later time.
+
* Don't write more than a paragraph, but make it clear '''what''' you want to do and '''why''' this is interesting and/or useful.
#Marking will consider:
+
* Don't provide an extensive review of literature, but make it clear that you understand how your vision is '''neither too narrow nor too broad'''. Addressing a solved problem (e.g. sequence alignment) would be too narrow (but a good tutorial for a complex workflow may be oK). Addressing an (as yet) unsolvable problem (e.g. cure cancer) would be too broad (but an exploration of a well defined step, or what is missing in the field would be oK). '''Add one citation that gives you confidence that what you are planning is doable!''' and make sure to explain in your vision why you think so.
##Quality, usefulness, creativity and originality of the contribution in the general field of bioinformatics or computational biology;
+
* Don't attempt to do too much, but keep in mind how much '''available time''' you have this term. Many students have found their projects inspiring and  greatly enjoyed devoting significant time to it, that's great. But  if you get stressed out because the implementation turns out to be harder than you thought, that's bad. This should be fun. That said, a solid analysis of a problem is useful even without implementation. If it's a cool idea, you can come back to it over the winter break and perhaps form it into something publishable.
##Execution and form;
+
* Don't burden your concept with details of algorithms and databases &ndash; these may change anyway as you refine the idea &ndash; but by all means, '''add links for clarity''' to appropriate resources and references.
##Timely submission.  
+
* Don't be too invested in any single strategy, but keep in mind that things might not work as expected and give some thought to '''fallback approaches'''.
# Time management is up to you. However there are three stages of the project and three deadlines.
+
* Above all, don't focus too much on the process and methodology, but be very clear about your '''objectives'''. Software development starts from a "requirements analysis".
  
-->
+
</div>
 +
</div>
  
===Due dates===
 
  
 +
Finally, provide some review feedback. I will assign each student three projects for review. Access the project page and write approximately a paragraph of critique on the ''Discussion page''. Discuss whether the process was easy to follow, completely described, and, in your view useful (for what purpose). As a metric for completeness, imagine you were a project student in the lab and had just been handed the project text as instruction to perform an analysis on new data. Would it contain all you need to know to proceed?
  
&nbsp;
 
<div class="alert">
 
The '''function choice''' is due by the end of '''week 6'''.<br />
 
The '''compilation of the list of genes''' is due by the end of '''week 10'''.<br />
 
The '''documentation''' are due by the end of '''week 12'''.<br />
 
</div>
 
  
  
&nbsp;
 
  
===Late submissions===
+
However, if you want me to consider a later edit instead (i.e. "late submission" with the appropriate penalties), send me an eMail to that effect. If you don't email me, your mark from an incomplete submission will stand.
The time of submission is recorded with your edits on the Wiki and can be identified in the '''View history''' tab of a page: I will consider the last edit before the submission deadline for marking. However, if you want me to consider a later edit instead (i.e. "late submission" with the appropriate penalties), send me an eMail to that effect.
 
  
 
Please get your deliverables done early, I will be quite resistant to grant extensions for reasons that have to do with your normal, expected workload. If you want to, you can submit all phases of your project at any earlier date you choose - and get it done with. Be especially mindful of your other courses, and their midterm tests.  
 
Please get your deliverables done early, I will be quite resistant to grant extensions for reasons that have to do with your normal, expected workload. If you want to, you can submit all phases of your project at any earlier date you choose - and get it done with. Be especially mindful of your other courses, and their midterm tests.  
Line 127: Line 228:
 
Just to clarify: "by the end of ..." means Tuesday at midnight. And yes, there will be penalties. Your final mark for the stage will be multiplied by the following factor for each day after the deadline on which it is submitted:
 
Just to clarify: "by the end of ..." means Tuesday at midnight. And yes, there will be penalties. Your final mark for the stage will be multiplied by the following factor for each day after the deadline on which it is submitted:
  
Received on the ...
+
Marked on the ...
 
* first day after the deadline: marks times 0.9
 
* first day after the deadline: marks times 0.9
 
* second day: 0.7
 
* second day: 0.7
Line 134: Line 235:
 
* fifth day and later: 0  
 
* fifth day and later: 0  
  
 
+
-->
 
 
[[Category:Bioinformatics]]
 
</div>
 

Latest revision as of 00:53, 24 February 2018

Bioinformatics Project: Defining a System


 

This course gives you a broad overview of bioinformatics principles, with this project we strive to apply those principles towards a biological question.


 

For your term project I would like you to define a biological "system" - a set of genes that collaborate towards a shared purpose and sketch its architecture. We start from a biological process, represented in the Gene Ontology (GO). From there we can use methods of function-annotation to identify related processes, functions and cellular components and the genes that are associated with them. To define the "system" in which those genes collaborate, and to sketch its architecture requires both "bottom up" procedures of gene discovery, and "top down" reasoning about ancillary functions and roles such as substrate import, biosynthesis of cofactors, signalling, regulation, constructing scaffolds etc. that may not (yet) be represented in the list of genes, and to clearly identify, concepts such as purpose; boundaries (i.e. which genes from the list are actually part of the system, and which ones are associated with the process, but should be considered to be outside of its boundaries, in a supporting role, a shared role, or simply part of distinct but collaborating system.); interfaces and, the system's input and output.

It is your task to manage this from the perspective of a biological expert and try to define inclusion/exclusion criteria as best as you can. While your "list of genes" is going to be interesting, compiling such lists can be automated. Thus the most valuable outcome of your project is how you will address the task of defining the conceptual aspects of the system and attempting to organize this into an architectural sketch.

In practice you should

  • choose a biological process you are interested in (I have provided a candidate list);
  • collect all contributing genes[1] as best you can, using bioinformatics tools and literature annotations;
  • list the conceptual roles in your system;
  • associate whatever genes you can with those roles and identify genes you were not able to associate with roles, and roles for which the associated genes are unknown;
  • integrate these concepts in a sketch; and
  • carefully document your efforts and results: the datasources and literature, what procedures have been applied, how the results been accessed, validated and interpreted...



First stage: process, genes and concepts(11 marks max.)

 

To define a system, we will start from a biological process in the GO biological process ontology. I have excerpted a table of processes to get you started, and explained the procedure in detail. You can find the documentation and the table through the links below.

Note that you are not constrained to start from a process in that table. If you are determined to work on a different human system because you have particular knowledge about it, you may suggest this to me and perhaps we can add it to the table. But you need to coordinate this with me.

Once you have chosen which system to work on:

  • Create a subpage on your student wiki for your project;
  • Copy the wikisource from the template page I have provided here and paste it into your project page. Save it;
  • Start filling in the details and expanding.


 

Bottom up...

 
  • At first, you will complete your picture of the system's components, while at the same time refining your concept of the system's purpose. In general, the purpose of the system is probably not going to be the same as the name of the GO term you started from. In order to find additional genes, you should consider the common bioinformatics strategies for function annotation. In general,there are four main sources of information to consider:
  1. GO and GOA (via QuickGo), and the associated GONUTS pages which give you information on related terms and annotated proteins;
  2. The UniProt page for your genes, with its wealth of detailed annotations and cross-references;
  3. Obviously, (recent) literature that you find via crossreferences and PubMed searches; and
  4. the STRING database for functional annotation, to which you can upload a set of genes all at once to discover functionally related genes.

In particular:

  • your proteins may have domains that mediate functional interactions with other proteins;
  • your proteins' structures may indicate requirements for ligands and/or co-factors that need to be synthesized, activated and incorporated;
  • your GO terms may have protein annotations for other species that are not present for humans, so you can use BLAST to find their reciprocal-best-matches in humans;
  • domain composition of your proteins may indicate related proteins;
  • coexpression analysis at COXPRESSdb and GeneFriends may discover co-regulated genes;
  • protein-protein interactions (via UniProt) may point to the participation in functional complexes;
  • presence in annotated pathways (via UniProt) may point to collaborations.


 

... and Top Down

 
  • Spend some thought on naming your "system" well. Focus on the purpose why the system exists, this will help you organize the components – ultimately, the components and architecture need to support that purpose. All of them. Be mindful: the GO terms are usually not suitable as they are to define the purpose of the system. Consider them a starting point.
  • Write down your definition of the "purpose". Remember: we consider the purpose to be the benefit of the system for the fitness of the organism.
  • Think about auxiliary roles that are part of your system: how it comes into existence, how it accepts substrates and/or information, how it transforms this input and how its output is generated. Consider that whatever is switched on, needs to be switched off again.
  • Think about abstract roles like interfaces, set-points, feedback-control elements, signal integration, transmission, effectors ...


 

Second stage: Sketch the system architecture (11 marks max.)

 


The second stage of the project is for you to define an architecture that integrates the system components and the concepts you have defined. Refer to the "Function" script for an example (the yeast G1/S transition switch) how to go about this task. Draw out this architecture in a sketch and include it on your Student Wiki project page.


 

Final stage: Documentation (4 marks max.)

 

The documentation must fulfill two aspects.

  • First, your documentation must make your data and results reproducible. You need to specify the premises you started from and how you came up with them, and you need to specify the procedure through which you arrived at your conclusions. Put yourselves into the mind of a reviewer: are you providing enough information so that your (computational) steps can be reproduced? Are your source IDs specified? Your resources and programs?
  • Second, your documentation must explain the rationale behind your procedure and conclusions. This is not so much what you did but why you did this, what was the logic behind a certain process or decision.
  • Form is important:
  • structure your project clearly, include a brief introduction and definitely include a meaningful conclusion;
  • avoid jargon;
  • make it easy to copy data for further analysis (no screenshots unless you are illustrating a Web-site or GUI);
  • write complete sentences;
  • do not plagiarize, but reference judiciously;
  • make sure your references are complete and take advantage of the <ref> ... </ref> tags and the {{#pmid:1234567}} template.

Ask(!) if you are not sure about Wiki markup or formatting to achieve a particular layout.


 

Organizational details

 

Evaluation

 

Marking will consider suitability and usefulness of the process for this project, how well you were able to abstract the procedures, and how well you succeeded to integrate the data into a meaningful description. For general Marking rubrics, follow this link.


 

Due dates

 

The project (like all class work) is due by the end of classes, December 6. 2016. If you need an extension you must contact me at least a day before the deadline. Please state briefly the requested duration of the extension. The extension request should not extend past the final exam date.


 

Late submissions

 

The time of submission is recorded with your edits on the Wiki and can be identified in the View history tab of a page: I will consider the last edit before the submission deadline for marking. There will be no "late deductions" applied - the deductions are implicit in the status of the project at the due date.


 

Resources

 
Links


 
  1. I speak of genes here in a very informal sense, the system components may include genes, their encoded proteins, structural and regulatory RNA, metabolites, and even environmental signals.