Difference between revisions of "Computational Systems Biology Main Page"

From "A B C"
Jump to navigation Jump to search
m
Line 94: Line 94:
 
:For JTB2020 see the [http://biochemistry.utoronto.ca/courses/jtb-2020h/ JTB2020 Course Web page] for general information.
 
:For JTB2020 see the [http://biochemistry.utoronto.ca/courses/jtb-2020h/ JTB2020 Course Web page] for general information.
  
 +
<section end=CSB_main_organization />
  
 +
{{Vspace}}
  
 
====Prerequisites and Preparation====
 
====Prerequisites and Preparation====
Line 114: Line 116:
 
{{Smallvspace}}
 
{{Smallvspace}}
  
A minimal subset of bioinformatics knowledge for BCB420 is linked from the BCB420-specific map below. The <span style="background-color: #b3dbce;">&nbsp;&nbsp;Live&nbsp;units&nbsp;&nbsp;</span> on that map will be the subject of our first ''Quiz'' in the third week of class. We will hold a mock-quiz on the material in the second week (our first class meeting).
+
A minimal subset of bioinformatics knowledge you need to begin with work in BCB420 is linked from the BCB420-specific map below. To ensure everyone is adequately prepared, we will hold a ''Quiz'' on the <span style="background-color: #b3dbce;">&nbsp;&nbsp;Live&nbsp;units&nbsp;&nbsp;</span> on that map in the third week of class. We will hold a mock-quiz on the material in the second week (our first class meeting) so everyone knows what to expect.
 
 
 
 
<section end=CSB_main_organization />
 
 
 
===The "Knowledge Network"===
 
 
 
Supporting learning units for this course are organized in a "Knowledge Network" of self-contained units that can be worked on according to students' individual needs and timing. Here is the '''detailed map'''. It contains links to all of the units.
 
  
 
+
* <command>-Click to open the BCB420 Preparation Learning Units Map in a new tab, scale for detail.
* <command>-Click to open the Learning Units Map in a new tab, scale for detail.
+
[[File:BCB420-Units.svg|thumb|500px|none|link=http://steipe.biochemistry.utoronto.ca/abc/assets/BCB420-Units.svg|'''A map of preparatory BCB420 learning units.''']]
[[File:BCB420-Units.svg|thumb|500px|none|link=http://steipe.biochemistry.utoronto.ca/abc/assets/BCB420-Units.svg|'''A map of the BCB420 learning units.''']]
 
 
* Hover over a learning unit to see its keywords.
 
* Hover over a learning unit to see its keywords.
 
* Click on a learning unit to open the associated page.
 
* Click on a learning unit to open the associated page.
Line 133: Line 127:
 
**<span style="background-color: #f2fafa;">&nbsp;&nbsp;Stubs&nbsp;&nbsp;</span> (placeholders) are pale. These still need basic contents.
 
**<span style="background-color: #f2fafa;">&nbsp;&nbsp;Stubs&nbsp;&nbsp;</span> (placeholders) are pale. These still need basic contents.
 
**<span style="background-color: #97bed5;">&nbsp;&nbsp;Milestone&nbsp;units&nbsp;&nbsp;</span> are blue. These collect a number of prerequisites to simplify the network.
 
**<span style="background-color: #97bed5;">&nbsp;&nbsp;Milestone&nbsp;units&nbsp;&nbsp;</span> are blue. These collect a number of prerequisites to simplify the network.
**<span style="background-color: #e19fa7;">&nbsp;&nbsp;Integrator&nbsp;units&nbsp;&nbsp;</span> are red. These embody the main goals of the course.
+
**<span style="background-color: #e19fa7;">&nbsp;&nbsp;Integrator&nbsp;units&nbsp;&nbsp;</span> are red. These embody the main goals of the course. These units are '''not''' for evaluation in BCB420.
**<span style="background-color: #f4d7b7;">&nbsp;&nbsp;Units&nbsp;that&nbsp;require&nbsp;revision&nbsp;</span> are pale orange.
 
*Units that have a <span style="background-color: #eeeeee; border:solid 2px #000000;">&nbsp;&nbsp;black border&nbsp;&nbsp;</span> have deliverables that can be submitted for credit. Visit the node for details.
 
 
*Arrows point from a prerequisite unit to a unit that builds on its contents.
 
*Arrows point from a prerequisite unit to a unit that builds on its contents.
  
 
{{Vspace}}
 
 
====Navigating the course====
 
 
Everything starts with the following three units:
 
*[[FND-Wiki_editing|Introduction to editing Wiki pages]]
 
:{{#lst:FND-Wiki_editing|abstract}}
 
 
*[[FND-Journal|Your Course Journal]]
 
:{{#lst:FND-Journal|abstract}}
 
 
*[[ABC-Insights|The "insights!" page]]
 
:{{#lst:ABC-Insights|abstract}}
 
 
* Once you have completed these three units, get started '''immediately''' on the Introduction-to-R units. You need time and practice, practice, practice<ref>[https://tapas.io/episode/923459 It's practice!]</ref> to acquire the programming skills you will need for the course.
 
 
* Whenever you want to take a break from studying R, get done with the other preparatory units.
 
 
At the end of our preparatory phase (after week 2) we will hold a comprehensive, non-trivial quiz on the preparatory units and on R basics.
 
 
 
<!--
 
Everything leads to the ''Integrator Units''. These cover four large areas of bioinformatics that make up the explicit goals of the course:
 
 
(i) algorithms and statistics;<br />
 
(ii) structural modelling and interpretation;<br />
 
iii) gene annotation; and;<br />
 
(iv) phylogenetic analysis.
 
 
The knowledge and skills you need to work on these ''Integrator Units'' can be obtained from the other learning units that are shown on the [http://steipe.biochemistry.utoronto.ca/abc/assets/ABC-units_map.svg learning units map] as prerequisites. Note that "prerequisites" in this context does not mean you '''must''' do one thing before you can do another, the arrows simply point out which units assume what prior knowledge. You can acquire that knowledge in whatever sequence makes sense to '''you''', and you don't have to learn from the learning units of this course at all. Just make sure that you submit enough general learning units for evaluation along the way. And document what you are doing in your Course Journal. Also, remember that '''all''' the material is cumulative - my evaluation of your work implicitly includes all of the prerequisite material.
 
-->
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 178: Line 138:
  
 
{{Vspace}}
 
{{Vspace}}
 +
For details of the deliverables, see below.
 +
{{Smallvspace}}
  
 
<table cellpadding="5">
 
<table cellpadding="5">
Line 237: Line 199:
 
{{Vspace}}
 
{{Vspace}}
  
====PartI: Foundations====
+
====Getting started====
 +
 
 +
Everything starts with the following four units:
 +
*[[FND-Wiki_editing|Introduction to editing Wiki pages]] (Optional if you have taken BCH441 or BCB410.)
 +
:{{#lst:FND-Wiki_editing|abstract}}
 +
 
 +
*[[FND-Journal|Your Course Journal]] (Mandatory - your Journals will be assessed. Note that the "rules" have changed - study the unit carefully and read the [[ABC-Rubrics#Course_Journal|evaluation rubrics]].)
 +
:{{#lst:FND-Journal|abstract}}
 +
 
 +
*[[ABC-Plagiarism|The "Plagiarism Unit"]] (Mandatory - must be the first entry in your Journal.)
 +
:{{#lst:FND-Journal|abstract}}
 +
 
 +
*[[ABC-Insights|The "insights!" page]] (Mandatory - your "insights!" pages will be assessed.)
 +
:{{#lst:ABC-Insights|abstract}}
 +
 
 +
* Once you have completed these four units, get started '''immediately''' on the Introduction-to-R units. You need time and practice, practice, practice<ref>[https://tapas.io/episode/923459 It's practice!]</ref> to acquire the programming skills you need for the course.
  
...
+
* Whenever you want to take a break from studying R, continue with the other preparatory units.  
  
 
{{Vspace}}
 
{{Vspace}}
  
  
====Part II: Biocuration====
 
  
...
+
====PartI: Foundations====
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
<small>Don't forget to document your work in your Journal!</small>
  
====Part III: Exploration====
+
{{Smallvspace}}
  
...
+
Your level of preparedness will be assessed in a "mock quiz" in week two, after which you have one more week to fill in gaps before our Quiz in week three. With that out of the way, we will look at different data sources that are useful in systems biology, including gene-level annotations and collections of experimental data, relationship data like physical and epistatic interactions, and systems-level data like metabolic or regulatory pathways. Each of you will select one data-source in our first open-ended session and then work on the following deliverables:
 +
* a brief summary page on the Student Wiki
 +
* an R package derived from [https://github.com/hyginn/rpt '''rpt'''],
 +
** hosted on GitHub,
 +
** containing an R markdown page that describes and annotates code for importing the chosen data,
 +
** and normalizing its identifiers to HuGO gene symbols,
 +
** and containing a sample dataset for a small number of genes.
  
 
{{Vspace}}
 
{{Vspace}}
  
  
<!--
+
====Part II: Biocuration====
====Oral Test====
 
  
Contents and reflection of participation ...
+
"Systems" are concepts and working with systems requires expert knowledge. To explore principles of expert curation of molecular systems, each of you will select one system in our second open-ended session and then work on the following deliverables:
 +
* a project page on the Student Wiki
 +
** that contains information about the system objectives;
 +
** an annotated list of genes that are system members;
 +
** an annotated list of related genes that are '''not''' system members;
 +
** annotations according to a "System Roles Ontology";
 +
** an attached JSON file that can be used to load your system data into a systems database that implements a relational data model for molecular systems;
  
 +
Your data import script and your system model will be assessed in the Oral Test.
  
 
{{Vspace}}
 
{{Vspace}}
  
====Journals====
 
  
Start forming a habit and even get marks for it too ...
+
====Part III: Exploration====
  
 +
At the end of Parts I and II we will have data available and annotated systems that induce relations on the data. Using this information, we can formulate tools for exploratory data analysis (EDA): isolating and evaluating features, looking at correlations, identifying patterns in networks,
 +
clustering data etc. Each of you will select one EDA workflow in our third open-ended session for which to build a tool in a jointly authored R package.  Your deliverables are:
 +
* a project page on the student Wiki that contains a specification of your tool;
 +
* an implementation of your tool as part of a jointly authored R package under continuous integration;
 +
* a Vignette in the package that describes the tool and includes sample code for which the data is also provided in the package.
 +
 +
Your deliverables will be jointly evaluated, together with your participation in constructing the package.
  
 
{{Vspace}}
 
{{Vspace}}
 
-->
 
  
 
===Extensions for term work===
 
===Extensions for term work===
Line 285: Line 278:
  
 
* '''Signing up for the oral tests.'''
 
* '''Signing up for the oral tests.'''
::The dates for the '''{{Oral-Test}}''' have been announced at the beginning of the term on this syllabus. If you fail to sign up for a slot, or if you fail to show up at the scheduled time, this is equivalent to a missed midterm exam, we apply the Faculty policy for a missed Midterm Test: "if the reasons for missing your test are ''acceptable'' to the instructor, a make-up opportunity should be offered to the student where ''practicable''. '''"Acceptable"''' reasons will be considered if they are justified, if the consideration is "fair, equitable and reasonable", and if the reason is documented through one of the four types of "official" documentation: UofT Verification of Illness or Injury Form, Student Health or Disability Related Certificate, a College Registrar’s Letter, and an Accessibility Services Letter. Scope for a '''"practicable"''' make-up opportunity for the Oral Test will be limited.
+
::The dates for the '''{{Oral-Test}}''' have been announced at the beginning of the term on this syllabus. If you fail to sign up for a slot, or if you fail to show up at the scheduled time, we apply the Faculty policy for a missed Midterm Test: "if the reasons for missing your test are ''acceptable'' to the instructor, a make-up opportunity should be offered to the student where ''practicable''. '''"Acceptable"''' reasons will be considered  
 +
::** if they are justified,  
 +
::** if the consideration is "fair, equitable and reasonable", and  
 +
::**if the reason is documented through one of the four types of "official" documentation: UofT Verification of Illness or Injury Form, Student Health or Disability Related Certificate, a College Registrar’s Letter, and an Accessibility Services Letter.  
 +
::Scope for a '''"practicable"''' make-up opportunity for the Oral Test will be limited.
  
 
* '''Submissions due on the {{lastdate}}.'''
 
* '''Submissions due on the {{lastdate}}.'''
Line 299: Line 296:
  
 
{{Vspace}}
 
{{Vspace}}
 +
 
===Copyright and Licensing===
 
===Copyright and Licensing===
 
{{Smallvspace}}
 
{{Smallvspace}}
Line 307: Line 305:
  
 
{{Vspace}}
 
{{Vspace}}
 
  
 
====Academic integrity====
 
====Academic integrity====
Line 321: Line 318:
 
{{Vspace}}
 
{{Vspace}}
  
== Timetable and syllabus ==
+
== Timetable and contents details ==
  
  
 
<div class="alert">
 
<div class="alert">
Warning: Syllabus and activities are currently being edited for the 2019 Winter Term. Return to this page soon.  
+
Warning: The activity details are currently being edited for the 2019 Winter Term. Return to this page soon.  
 
</div>
 
</div>
  

Revision as of 11:13, 6 January 2019

Computational Systems Biology

Course Wiki for BCB420 (Computational Systems Biology) and JTB2020 (Applied Bioinformatics).


 

This is our main tool to coordinate information, activities and projects in University of Toronto's computational systems biology course BCB420. If you are not one of our students, this site is unlikely to be useful. If you are here because you are interested in general aspects of bioinformatics or computational biology, you may want to review the Wikipedia article on bioinformatics, or visit Wikiomics. Contact boris.steipe(at)utoronto.ca with any questions you may have.


 

Note: This page is are currently being edited for the 2019 Winter Term. Return soon.

Please note: There will be no class meeting on Tuesday, January 8. I will sign you up to the course mailing list using your information from BCH441 or BCB420 if I have that, otherwise I will use your official UofT eMail address.

JTB2020 students: I will not get a class list before Monday, January 7. Please contact me by eMail so I can update you with course details as soon as possible.

Note: If you are not enrolled in this course by Friday, January 4. it is unlikely that you will be able to catch up with preparations.


 


 


 

BCB420 / JTB2020

These are the course pages for BCB420H (Computational Systems Biology). Welcome, you're in the right place.

These are also the course pages for JTB2020H (Applied Bioinformatics). How come? Why is JTB2020 not the graduate equivalent of BCB410 (Applied Bioinformatics)? Let me explain. When this course was conceived as a required part of the (then so called) Collaborative PhD Program in Proteomics and Bioinformatics in 2003, there was an urgent need to bring graduate students to a minimal level of computer skills and programming; prior experience was virtually nonexistent. Fortunately, the field has changed and our current graduate students are usually quite competent at least in some practical aspects of computational biology. In this course we profit from the rich and diverse knowledge of the problem-domain our graduate students have, while bringing everyone up to a level of competence in the practical, computational aspects.


The 2019 course...

In this course we explore systems biology of human genes with computational means in project oriented format. This will proceed in three phases:

  • Foundations first: we will review basic computational skills and bioinformatics knowledge to bring everyone to the same level. In all likelihood you will need to start with these tasks well in advance of the actual lectures. This phase will include a comprehensive quiz on prerequisite material in week 3. We will explore data-sources and you will choose one data-source for which you will develop import code and document it in an R markdown document within an R package;
  • Next we'll focus on Biocuration: the expertise-informed collection, integration and annotation of biological data. We will each choose a molecular "system" to work on, and define an ontology and data-model in which to annotate our system's components, their roles, and their relationships. The outcome of your curation task (together with your data script) will define the scope of this course's Oral Test;
  • Finally, we will develop tools for Exploratory Data Analysis in computational systems biology. We will jointly develop code for a team-authored R package where everyone contributes one mini workflow for data preparation, exploration and interpretation. Your code contributions to the package will be assessed;
  • There are several meta-skills that you will pick up "on the side" these include time management, working according to best practice of reproducible research in a collaborative environment on GitHub; report writing, and keeping a scientific lab journal.



Organization

Dates
BCB420/JTB2020 is a Winter Term course.
Lectures: Tuesdays, 16:00 to 18:00. (Classes start at 10 minutes past the hour.)
Note: there will be three open-ended collaborative planning sessions that may go well into the night. Attendance and participation is mandatory.
Final Exam: None for this course.
Events
  • Tuesday, January 8 2019: Course officially begins. No class meeting. Get started on preparatory material (well in advance actually).
  • Tuesday, January 15: First class meeting. Mock-quiz for preparatory material.
  • Tuesday, January 22: First live quiz on preparatory material. Later: open ended session on data import
  • Tuesday, February 5: Open ended session on system curation
  • Tuesday, March 12: Open ended session on exploratory data analysis


Location
MS 3278 (Medical Sciences Building).


Departmental information
For BCB420 see the BCB420 Biochemistry Department Course Web page.
For JTB2020 see the JTB2020 Course Web page for general information.



 

Prerequisites and Preparation

This course has formal prerequisites of BCH441H1 (Bioinformatics) or CSB472H1 (Computational Genomics and Bioinformatics). I have no way of knowing what is being taught in CSB472, and no way of confirming how much you remember from any of your previous courses, like BCH441 or BCB410. Moreover there are many alternative ways to become familiar with important course contents. Thus I generally enforce course-prerequisites only very weakly and you should not assume at all that having taken any particular combination of courses will have prepared you sufficiently. Instead I make the contents of the course very explicit. If your preparation is lacking, you will have to expend a very significant amount of effort. This is certainly possible, but whether you will succeed will depend on your motivation and aptitude.

The course requires (i) a solid understanding of molecular biology, (ii) solid, introductory level knowledge of bioinformatics, (iii) a good working knowledge of the R programming language.


 

The prerequisite material for this course includes the contents of the 2018 BCH441 course:

  • <command>-Click to open the Bioinformatics Learning Units Map in a new tab, scale for detail.
A knowledge network map of the bioinformatics learning units.
  • Open the Bioinformatics Knowledge Network Map and get an overview of the material. You should confidently be able to execute the tasks in the four   Integrator Units  .
  • If you have taken BCH441 before, please note that many of the units have undergone significant revisions and material has been added. You will need to review the material and familiarize yourself more with the R programming aspects.
  • If you have not taken BCH441, you will need to work through the material rather carefully. Estimate at least three weeks of time and get started immediately.


 

A minimal subset of bioinformatics knowledge you need to begin with work in BCB420 is linked from the BCB420-specific map below. To ensure everyone is adequately prepared, we will hold a Quiz on the   Live units   on that map in the third week of class. We will hold a mock-quiz on the material in the second week (our first class meeting) so everyone knows what to expect.

  • <command>-Click to open the BCB420 Preparation Learning Units Map in a new tab, scale for detail.
A map of preparatory BCB420 learning units.
  • Hover over a learning unit to see its keywords.
  • Click on a learning unit to open the associated page.
  • The nodes of the learning unit network are colour-coded:
    •   Live units   are green
    •   Units under development   are light green. These are still in progress.
    •   Stubs   (placeholders) are pale. These still need basic contents.
    •   Milestone units   are blue. These collect a number of prerequisites to simplify the network.
    •   Integrator units   are red. These embody the main goals of the course. These units are not for evaluation in BCB420.
  • Arrows point from a prerequisite unit to a unit that builds on its contents.


 


Grading, Activities, Deliverables

 

For details of the deliverables, see below.

 
Activity Weight
BCB410 - (Undergraduates)
Weight
JTB2020 - (Graduates)
Self-evaluation and Feedback session on preparatory material("Quiz"[1]) 20 marks 15 marks
Oral Test (March 7/8) 30 marks 30 marks
Collaborative software task and participation 20 marks 15 marks
Journal 25 marks 25 marks
Insights 5 marks 5 marks
Pull request reviews   10 marks
Total 100 marks 100 marks


 

Getting started

Everything starts with the following four units:

This should be the first learning unit you work with, since your Course Journal will be kept on a Wiki, as well as all other deliverables. This unit includes an introduction to authoring Wikitext and the structure of Wikis, in particular how different pages live in separate "Namespaces". The unit also covers the standard markup conventions - "Wikitext markup" - the same conventions that are used on Wikipedia - as well as some extensions that are specific to our Course- and Student Wiki. We also discuss page categories that help keep a Wiki organized, licensing under a Creative Commons Attribution license, and how to add licenses and other page components through template codes.


Keeping a journal is an essential task in a laboratory. To practice keeping a technical journal, you will document your activities as you are working through the material of the course. A significant part of your term grade will be given for this Course Journal. This unit introduces components and best practice for lab- and course journals and includes a wiki-source template to begin your own journal on the Student Wiki.


Keeping a journal is an essential task in a laboratory. To practice keeping a technical journal, you will document your activities as you are working through the material of the course. A significant part of your term grade will be given for this Course Journal. This unit introduces components and best practice for lab- and course journals and includes a wiki-source template to begin your own journal on the Student Wiki.


In paralell with your other work, you will maintain an insights! page on which you collect valuable insights and learning experiences of the course. Through this you ask yourself: what does this material mean - for the field, and for myself.


  • Once you have completed these four units, get started immediately on the Introduction-to-R units. You need time and practice, practice, practice[2] to acquire the programming skills you need for the course.
  • Whenever you want to take a break from studying R, continue with the other preparatory units.


 


PartI: Foundations

 

Don't forget to document your work in your Journal!


 

Your level of preparedness will be assessed in a "mock quiz" in week two, after which you have one more week to fill in gaps before our Quiz in week three. With that out of the way, we will look at different data sources that are useful in systems biology, including gene-level annotations and collections of experimental data, relationship data like physical and epistatic interactions, and systems-level data like metabolic or regulatory pathways. Each of you will select one data-source in our first open-ended session and then work on the following deliverables:

  • a brief summary page on the Student Wiki
  • an R package derived from rpt,
    • hosted on GitHub,
    • containing an R markdown page that describes and annotates code for importing the chosen data,
    • and normalizing its identifiers to HuGO gene symbols,
    • and containing a sample dataset for a small number of genes.


 


Part II: Biocuration

"Systems" are concepts and working with systems requires expert knowledge. To explore principles of expert curation of molecular systems, each of you will select one system in our second open-ended session and then work on the following deliverables:

  • a project page on the Student Wiki
    • that contains information about the system objectives;
    • an annotated list of genes that are system members;
    • an annotated list of related genes that are not system members;
    • annotations according to a "System Roles Ontology";
    • an attached JSON file that can be used to load your system data into a systems database that implements a relational data model for molecular systems;

Your data import script and your system model will be assessed in the Oral Test.


 


Part III: Exploration

At the end of Parts I and II we will have data available and annotated systems that induce relations on the data. Using this information, we can formulate tools for exploratory data analysis (EDA): isolating and evaluating features, looking at correlations, identifying patterns in networks, clustering data etc. Each of you will select one EDA workflow in our third open-ended session for which to build a tool in a jointly authored R package. Your deliverables are:

  • a project page on the student Wiki that contains a specification of your tool;
  • an implementation of your tool as part of a jointly authored R package under continuous integration;
  • a Vignette in the package that describes the tool and includes sample code for which the data is also provided in the package.

Your deliverables will be jointly evaluated, together with your participation in constructing the package.


 

Extensions for term work

 

Extensions for term work in this course are subject to Faculty regulations and will only be considered within the framework determined by the Faculty policies.


  • Regular Submissions
It is Faculty policy to require assessments to be "fair, equitable and reasonable". In order to be equitable, granting extensions requires the student to demonstrate that the need for the extension is due to unavoidable circumstances that go significantly beyond what was expected of the rest of the class. In general "official" documentation will be required: UofT Verification of Illness or Injury Form, Student Health or Disability Related Certificate, a College Registrar’s Letter, and an Accessibility Services Letter.
  • Signing up for the oral tests.
The dates for the Oral Test have been announced at the beginning of the term on this syllabus. If you fail to sign up for a slot, or if you fail to show up at the scheduled time, we apply the Faculty policy for a missed Midterm Test: "if the reasons for missing your test are acceptable to the instructor, a make-up opportunity should be offered to the student where practicable. "Acceptable" reasons will be considered
    • if they are justified,
    • if the consideration is "fair, equitable and reasonable", and
    • if the reason is documented through one of the four types of "official" documentation: UofT Verification of Illness or Injury Form, Student Health or Disability Related Certificate, a College Registrar’s Letter, and an Accessibility Services Letter.
Scope for a "practicable" make-up opportunity for the Oral Test will be limited.
Since the course does not have a final exam, the Faculty requires grades to be marked, collated and submitted a few days after the Template:Lastdate. Therefore I cannot normally grant extensions beyond this date. The Faculty allows so called informal extensions to be granted "in extraordinary circumstances"; in those cases too, the requirement to be "fair, equitable and reasonable" will apply, i.e. you would need to demonstrate that the need for the extension was due to unavoidable circumstances that go significantly beyond what was expected of the rest of the class, and submit "official" documentation to me. In that case, (i) we would determine an adjusted submission date, (ii) I will initially submit a mark of 0 for the missing submissions, and (iii) I will submit an amended mark, after that date, if appropriate. Note that the Faculty requires that such extensions don't go beyond a few days after the end of the Final Examination Period. If you require an extension beyond that date you need to submit a formal petition through your College Registrar.


 

Late penalties

 

Late penalties will be applied according to the following formula: (marks achieved) * 0.5^(fractional days late). However material submitted more than 3.0 days late (72 hours or more) will be marked zero</ref>. Note: this does not apply to material due before the Oral Test (see there).


 

Copyright and Licensing

 

We follow [FOSS] principles in this course. You automatically own copyright to all material you prepare. All material must be licensed for free re-use, under the condition of fair attribution. In practice:

All pages that you place on the Student Wiki must include a {{CC-BY}} tag. All documentation within GitHub pages that you prepare for this course must include a Creative Commons License - Attribution (CC-BY), v. 4.0 or later. All code submitted for this course must be licensed under the MIT software license. Unlicensed submissions will have marks deducted and may be removed from the Wiki.


 

Academic integrity

Our rules on Plagiarism and Academic Misconduct are clearly spelled out in this learning unit. This unit is part of our course prerequisites, and everyone documents in their course journal that they have worked through the unit and understood it. Consequences of having to report to the Office of Student Academic Integrity (OSAI) for plagiarism, misrepresentation or falsification include an indelible failing mark on the transcript, a delay in graduation, or not being able to complete your POSt. Please take extra time to clearly understand the requirements, and define for yourself what they mean for every aspect of your work.


 

Marks adjustments

I do not adjust marks towards a target mean and variance (i.e. there will be no "belling" of grades). I feel strongly that such "normalization" detracts from a collaborative and mutually supportive learning environment. If your classmate gets a great mark because you helped them with a difficult concept, this should never have the effect that it brings down your mark through class average adjustments. Collaborate as much as possible, it is a great way to learn. But do keep it honest and carefully consider our rules on Plagiarism and Academic Misconduct.


 

Timetable and contents details

Warning: The activity details are currently being edited for the 2019 Winter Term. Return to this page soon.


 

Note: Click on the "▽" - symbol to see details for each week's activities.


 

Part I: Foundations

 
Week In class: Tuesday, January 8 2019 This week's activities
1
  • No class meeting this day!
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, January 15 2019 This week's activities
2
  • First class meeting
  • Review of preparatory materials (you should have worked through all of the materials in preparation for class).
  • Practice quiz on preparations (not for credit)
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, January 22 2019 This week's activities
3
  • First Quiz

  • Data import

  • Choosing a dataset to define an import workflow (open ended session!)
  • Follow up from class meeting ...
  • work through
  • ...

  • To prepare before next meeting ...
  • create a package based on rpt
  • create a project page ...
  • draft a script ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, January 29 2019 This week's activities
4
  • Normalizing gene names
  • Follow up from class meeting ...
  • solve any normalization issues your dataset may have

  • To prepare before next meeting ...
  • work through literate programming
  • create test dataset
  • finalize your package
  • "Release" your package before Tuesday, February 5 2019 at 16:00[3].

 

Details ...  ▽△

  • TBD
  • ...
  • ...




 

Part II: Curation

 


Week In class: Tuesday, February 5 2019 This week's activities
5
  • Systems concepts
  • A systems ontology
  • A systems data model
  • Biocuration

  • Choosing your system for a systems curation project (open ended session!)
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, February 12 2019 This week's activities
6
  • ...
  • ...
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, February 19 2019 This week's activities
  • No class meeting - Reading Week
  • To prepare during reading week ...
  • ...
  • ...


 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, February 26 2019 This week's activities
7
  • ...
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, March 5 2019 This week's activities
8
  • ...
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • Curation project deadline
  • Oral Tests: March 7/8
  • Intro to EDA

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 

Part III: Exploration

 


Week In class: Tuesday, March 12 2019 This week's activities
9
  • Exploratory Data Analysis of Systems data

  • Choosing your workflow for a team-authored systems EDA package (open ended session!)
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, March 19 2019 This week's activities
10
  • Vignettes
  • ...
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, March 26 2019 This week's activities
11
  • ...
  • Follow up from class meeting ...
  • ...
  • ...

  • To prepare before next meeting ...
  • ...
  • ...

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 


Week In class: Tuesday, April 2 2019 This week's activities
12
  • No class meeting this day
  • Deadline for computational tasks to be documented in journal
  • Deadline for all remaining course deliverables

NA

 

Details ...  ▽△

  • TBD
  • ...
  • ...



 




Resources

Course related


 
Miller et al. (2011) Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinformatics 12:322. (pmid: 21816037)

PubMed ] [ DOI ] BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. RESULTS: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. CONCLUSIONS: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.

Chang et al. (2013) Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline. BMC Bioinformatics 14:368. (pmid: 24359104)

PubMed ] [ DOI ] BACKGROUND: As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. RESULTS: We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS(A): DE genes with non-zero effect sizes in all studies, (2) HS(B): DE genes with non-zero effect sizes in one or more studies and (3) HS(r): DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. CONCLUSIONS: The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS(A), HS(B), and HS(r)). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.

Thompson et al. (2016) Cross-platform normalization of microarray and RNA-seq data for machine learning applications. PeerJ 4:e1621. (pmid: 26844019)

PubMed ] [ DOI ] Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.


 
325C78 7097B8 9BACCF A8A5CC D7C0F0


 

Notes

  1. I call these activities Quiz sessions for brevity, however they are not quizzes in the usual sense, since they rely on self-evaluation and immediate feedback.
  2. It's practice!
  3. Note: late-penalties apply.