Difference between revisions of "BCB410"

From "A B C"
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<div id="APB">
 
<div id="APB">
 
<div class="b1">
 
<div class="b1">
BCB410H1F - 2017
+
BCB410H1F - 2018
 
</div>
 
</div>
  
Line 11: Line 11:
 
==Objectives and Participants==
 
==Objectives and Participants==
  
 +
{{Smallvspace}}
  
 
The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.
 
The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.
Line 18: Line 19:
 
{{Vspace}}
 
{{Vspace}}
  
In this course we will build contents for a knowledge network in applied bioinformatics.
 
  
  
===Knowledge Network===
+
==Organization==
  
;This year's course will focus on "Data Science" in bioinformatics.
+
{{Smallvspace}}
  
* <command>-Click to open the Knowledge Network in a new tab, scale for detail.
+
<div class="alert">
[[File:BCB410-units.svg|thumb|700px|none|link=http://steipe.biochemistry.utoronto.ca/abc/assets/BCB410-units.svg|'''Learning units in the "Applied Bioinformatics - Data Science" knowledge network.''']]
+
First review session: Wednesday, October 10
* Hover over a learning unit to see its keywords.
+
</div>
* Click on a learning unit to open the associated page.
 
* The nodes of the learning unit network are colour-coded:
 
**<span style="background-color: #b3dbce;">&nbsp;&nbsp;Live&nbsp;units&nbsp;&nbsp;</span> are green
 
**<span style="background-color: #d9ead5;">&nbsp;&nbsp;Units&nbsp;under&nbsp;development&nbsp;&nbsp;</span> are light green. These are still in progress.
 
**<span style="background-color: #f2fafa;">&nbsp;&nbsp;Stubs&nbsp;&nbsp;</span> (placeholders) are pale. These still need basic contents.
 
**<span style="background-color: #97bed5;">&nbsp;&nbsp;Milestone&nbsp;units&nbsp;&nbsp;</span> are blue. These collect a number of prerequisites to simplify the network.
 
**<span style="background-color: #e19fa7;">&nbsp;&nbsp;Integrator&nbsp;units&nbsp;&nbsp;</span> are red. These embody the main goals of the course.
 
**<span style="background-color: #ffff99;">&nbsp;&nbsp;Units&nbsp;that&nbsp;will&nbsp;be&nbsp;developed&nbsp;by&nbsp;students&nbsp;for&nbsp;this&nbsp;course</span> are yellow.
 
**<span style="background-color: #f4d7b7;">&nbsp;&nbsp;Units&nbsp;that&nbsp;require&nbsp;revision&nbsp;</span> are pale orange.
 
*Units that have a <span style="background-color: #eeeeee; border:solid 2px #000000;">&nbsp;&nbsp;black border&nbsp;&nbsp;</span> have deliverables that are designed to be  submitted for credit (not relevant for this course).
 
*Arrows point from a prerequisite unit to a unit that requires it.
 
  
 
{{Vspace}}
 
{{Vspace}}
For reference, and for links to other information sources, consider the Knowledge Network for the BCH441 - Bioinformatics course.
 
  
[[File:ABC-units_map.svg|thumb|250px|none|link=http://steipe.biochemistry.utoronto.ca/abc/assets/ABC-units_map.svg|'''Learning units in the General Bioinformatics knowledge network.''']]
 
  
{{Vspace}}
+
=== Dates and Location ===
 
 
===Sources===
 
  
;Consider the following sources before deciding whether a unit is suitable for you - or find additional sources.
+
{{Smallvspace}}
  
*Two general introductions to the field are here:
+
Classes meet Wednesdays between 10:00 and 12:00 in [http://map.utoronto.ca/utsg/building/033 SS1080 (Sidney Smith Hall)] throughout the Fall Term. Classes start at 10 minutes past the hour.
{{#pmid: 25948244}}
 
{{#pmid: 23765498}}
 
  
 
{{Vspace}}
 
{{Vspace}}
  
Two courses that I have taught this year contain code that can be adapted for many of the units we are constructing. Both courses are on github and can be accessed there, or by downloading them as RStudio projects.
+
{{#lst:User:Boris|Coordinator}}
 +
{{#lst:User:Boris|Office_hours}}
  
;[https://github.com/hyginn/BCH2024 '''BCH2024 (2017)'''] - Biological Data Analysis with R
+
{{Vspace}}
:A graduate "Focussed Topics" course. In this course we analyzed a high-resolution time series of yeast cell-cycle expression profiles. This data is well suited for the kind of tasks we are focussing on. I think we should additionally aim to annotate the data with [http://www.ebi.ac.uk/GOA '''GOA''']. The course proceeded through six units. 1-Data; 2-Features; 3-Modelling; 4-Graphs; 5-Clustering; 6-MachineLearning. The cell-cycle expression data is described here.
 
{{#pmid: 16912276 }}
 
  
;Exploratory Data Analysis with R - A Canadian Bioinformatics Workshop (2017)
+
=== Contact ===
:This two-day workshop was targeted to graduate students, postdocs and PIs who have little experience with programming but significant data analysis needs. The workshop had an online "Introduction to R" as a prerequisite, equivalent to the R introduction units in our knowledge network. Here are the individual RStudio projects (on GitHub).
 
:*https://github.com/hyginn/R_EDA-Introduction
 
:*https://github.com/hyginn/R_EDA-Regression
 
:*https://github.com/hyginn/R_EDA-DimensionReduction
 
:*https://github.com/hyginn/R_EDA-Clustering
 
:*https://github.com/hyginn/R_EDA-HypothesisTesting
 
  
* The Wikipedia article on {{WP|Cluster analysis}} is a decent first introduction.
+
Contact within the class is easiest via the [https://groups.google.com/forum/#!forum/bcb410_2018 '''Google Group'''] that you will subscribe to at the beginning of class.  
* We could spend an entire course just working through [https://web.stanford.edu/~hastie/Papers/ESLII.pdf '''The Elements of Statistical Learning''']. And we would have a lot of fun. But this is not a statistics course. Read this anyway.
 
* Apparently [http://socviz.co/ Healy's '''Data Visualization for Social Science'''] contains many good ideas. Online. Have a look.
 
* Grolemund & Wickham have a new book [http://r4ds.had.co.nz/ '''R for Data Science''']. We need to have the talk, about the [https://www.tidyverse.org/ '''tidyverse'''] and whether it's actually a good idea<ref>cf. [http://r4stats.com/2017/03/23/the-tidyverse-curse/ The tidyverse curse] for some thoughts on problems with the tidyverse philosophy and [http://www.fromthebottomoftheheap.net/2015/06/03/my-aversion-to-pipes/ "My aversion to pipes"] why the <code>%>%</code> may not be such a good idea as everybody seems to think it is.</ref>.
 
* What's everybody talking about in the field? [https://www.reddit.com/r/datascience/ This.]
 
  
==Organization==
+
{{Vspace}}
  
<div class="alert">
+
After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:
Details for the 2017 course will be discussed in our first class session, Wednesday, September 13, at 10:00 in Bahen BA025.
 
  
It is imperative that you attend the first class session in person. Do not enrol in this course if you can't attend the first class session.
+
* If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
</div>
 
 
 
{{Vspace}}
 
  
=== Dates and Location ===
+
* If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like <code>You must be a member of this group to view and participate in it.</code> This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't '''know''' you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.
  
Classes meet Wednesdays between 10:00 and 12:00 in [http://map.utoronto.ca/utsg/building/080 BA025 (Bahen Centre)] throughout the Fall Term. Classes start at 10 minutes past the hour.
+
<small>If you '''really''' need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here https://www.wikihow.com/Make-a-Google-Account-Without-Gmail . </small>
  
 
{{Vspace}}
 
{{Vspace}}
  
{{#lst:User:Boris|Coordinator}}
+
== Contents ==
{{#lst:User:Boris|Office_hours}}
 
  
=== Contact ===
+
{{Smallvspace}}
  
Contact within the class is easiest via the [https://groups.google.com/forum/#!forum/bcb410_2017 '''Google Group'''] that you will subscribe to at the beginning of class.
+
In this year's course you will define a useful tool for the analysis of biological data, write an R package to support it, review and critique other packages, and improve and document your work.
  
 
{{Vspace}}
 
{{Vspace}}
Line 104: Line 71:
 
=== Phases ===
 
=== Phases ===
  
We will work in four phases:
+
We will work in five phases:
  
* You will design a learning unit and draft its contents;
+
* You will define a tool for data analysis and pitch it to the class for feedback in a one-minute presentation;
* The class will work through the unit;
+
* You will develop an R package for the analysis;
* We will go through "Code reviews" of the material;
+
* The class will work through your package and we will review your code;
* You will respond to the review and improve the material.
+
* You will respond to the review, improve the material and add code to support an interactive webpage for data exploration with your tool based on the '''''shiny''''' package;
 +
* You will finalize your package with a vignette with examples, and documentation.
  
 
{{Vspace}}
 
{{Vspace}}
  
  
 +
<table>
 +
 +
<tr class="sh">
 +
<td><b>Week</b></td>
 +
<td><b>Date</b></td>
 +
<td><b>Topic</b></td>
 +
</tr>
  
 +
<tr><td colspan="3" class="sp">&nbsp;</td></tr>
  
 +
<tr class="s1">
 +
<td>1</td>
 +
<td>September 12</td>
 +
<td>Introduction, organization</td>
 +
</tr>
  
=== Marking ===
+
<tr class="s2">
 +
<td>2</td>
 +
<td>September 19</td>
 +
<td>Initial idea, one-minute pitch</td>
 +
</tr>
  
&nbsp;
+
<tr class="s1">
 +
<td>3</td>
 +
<td>September 26</td>
 +
<td>R package principles</td>
 +
</tr>
  
<table>
+
<tr class="s2">
 +
<td>4</td>
 +
<td>October 3</td>
 +
<td>Tests and performance</td>
 +
</tr>
  
<tr class="sh">
+
<tr class="s1">
<td><b>Activity</b></td>
+
<td>5</td>
<td><b>Weight</b></td>
+
<td>October 8</td>
 +
<td>All packages to be completed before Monday, October 8.</td>
 
</tr>
 
</tr>
  
<tr><td colspan="3" class="sp"></td></tr>
+
<tr class="s2">
 +
<td>5</td>
 +
<td>October 10</td>
 +
<td>Code Review I</td>
 +
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td>Initial design of your unit</td>
+
<td>6</td>
<td>20 marks</td>
+
<td>October 17</td>
 +
<td>Code Review II</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td>Participation in Review panels</td>
+
<td>7</td>
<td>4 x 10 marks</td>
+
<td>October 24</td>
 +
<td>Code Review III</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td>Final version of unit</td>
+
<td>8</td>
<td>20 marks</td>
+
<td>October 31</td>
 +
<td>Code Review IV</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td>[[FND-Journal|Journals]]</td>
+
<td>-</td>
<td>15 marks</td>
+
<td>November 7</td>
 +
<td>No class meeting, Fall Reading Week</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td>[[ABC-Insights|Insights!]]</td>
+
<td>9</td>
<td>5 marks</td>
+
<td>November 14</td>
 +
<td>R Shiny</td>
 +
</tr>
 +
 
 +
<tr class="s2">
 +
<td>10</td>
 +
<td>November 21</td>
 +
<td>Best practice, reproducible research</td>
 
</tr>
 
</tr>
  
<tr><td colspan="3" class="sp"></td></tr>
+
<tr class="s1">
 +
<td>11</td>
 +
<td>November 28</td>
 +
<td>Vignettes, examples, documentation</td>
 +
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td>'''Total'''</td>
+
<td>12</td>
<td>100 marks</td>
+
<td>December 5</td>
 +
<td>No class meeting, all material due.</td>
 
</tr>
 
</tr>
 +
 
</table>
 
</table>
 +
  
 
{{Vspace}}
 
{{Vspace}}
  
What makes an ''excellent'' grade? [[ABC-Rubrics|'''See here.''']]
+
=== Details ===
 +
 
 +
{{Smallvspace}}
  
{{Vspace}}
 
  
 +
==== 1. Define your tool ====
  
<!--
+
{{Smallvspace}}
== Contents ==
 
  
{{Vspace}}
+
;Requirements
 +
:The scope of your R package is add to or improve a current workflow in bioinformatics or computational biology. It is required that at least some of the functionality is to produce a compelling graphical output, ideally to support for exploratory analysis. Your package must not merely reproduce existing tools<ref>It is your responsibility to search the literature and available packages to define in what way your contribution is new.</ref>, and it must be distinct from the work of your classmates.
  
-->
+
;Ideas
 +
: You can draw on many sources for ideas:
 +
:* [https://www.nature.com/collections/vzsqzylnvx '''Current literature'''];
 +
:* [http://bioconductor.org/packages/release/BiocViews.html#___Workflow '''Bioconductor workflows'''];
 +
:* [https://cran.r-project.org/web/views/ '''CRAN task views'''];
 +
:* Tools collections at [https://links.bioinformatics.ca/ '''Bioinformatics.ca'''], [https://www.scripps.edu/research/cbb/tools.html '''Scripps'''] etc.;
 +
:* Online examples (eg. the [https://www.reddit.com/r/dataisbeautiful/ "Data is beautiful"] subreddit;
 +
:* Best practice for information design you have come across, e.g. [https://www.edwardtufte.com/tufte/ Edward Tufte's work];
 +
:* ...
  
<!--
 
===A syllabus of learning units===
 
  
Working from a general collection of topics in the field, we identify learning units that are of the greatest interest and greatest relevance for the students in the class. We jointly select the most suitable topics. '''Every student in class will take responsibility for development and delivery of one of the learning units.'''
+
;Pitch
 +
: Your '''one-minute pitch''' is a presentation on Wednesday, September 19. that is based on a single slide which you upload as a jpg image to your "Project Page" - a subpage of your User Page on the Student Wiki. You will be timed and a hard cutoff of 60 seconds applies. We will solicit brief feedback from the class regarding ways to '''improve creativity, utility, and visual appeal'''.
  
 
{{Vspace}}
 
{{Vspace}}
  
-->
+
==== 2. Develop your package ====
 +
 
 +
{{Smallvspace}}
 +
 
 +
You will develop your tool as an R package following principles outlined in Hadley Wickhams's [http://r-pkgs.had.co.nz/ '''R packages''' book]. Your package is to be posted on github. It must be complete (i.e it must pass without errors, warnings or notes) by Sunday, October 7. 23:59:59<ref>You must document that your package passes by posting the build-output in your journal on the student Wiki and your submission is only considered complete once all checks pass. Late penalties will be applied according to the following formula: <code>(marks achieved) * 0.5^(fractional days late)</code>. However material submitted more than three days late, or less than 24 hours before code review will be marked zero</ref>.
  
<!--
+
In addition to the package contents, you post a brief synopsis on your Project Page and a link to your github repository.
===Unit contents and delivery===
 
  
The detailed contents for each unit is to be be discussed with the coordinator. Each student will to lead a one-hour session on their topic.
+
{{Smallvspace}}
  
'''Presenter's responsibilities''' include<ref>Details may vary as required, by mutual agreement.</ref>:
+
;Package/project requirement details
* Write an outline of your unit contents, and e-mail it to me at least '''two weeks in advance'''; This needs to include:
+
* Your project / package must be posted on github.
** a detailed lecture outline that includes an introduction, discussion of algorithms, presentation of examples, exposition of practical- and implementation issues and an outlook on future developments in the field;
+
* It must result in a new, working project in RStudio when we check it out from github.
** suitable pre-reading material;
+
* When building the package within the project locally, it must pass build checks without errors, warnings, or notes.
** an outline of practical and relevant class-exercises;
+
* It must be installable from github using the following code:
* Iterate this contents with me, to be completed at least '''ten days in advance'''.
+
<source lang="R">
* Develop a set of exercises at least '''one week in advance'''. The exercises must be posted on the Student Wiki, [https://creativecommons.org/licenses/by/4.0/ '''CC-By'''] licensed for future classes, and work for Mac OS X, Windows and Linux platforms. Include a section for feedback as the last item of your page;
+
library(devtools)
* Communicate pre-reading materials to your classmates, at least '''on the Friday before your presentation''';
+
install_github("<user name>/<package name>")
* Deliver your lecture at a sufficiently technical level to be appropriate for an advanced fourth-year course and engaging the class in discussion. Don't go over time. You must make sure '''ahead of your presentation''' that you can connect your computer to the projector, that your wireless network connection works, and that you have all required assets and resources installed on your computer;
+
library(<package name>)
* Post your exercise materials to the Student Wiki and announce this to the class on the same day as the lecture;
+
</source>
* Draft a final-exam question that tests the successful completion of the exercises, at the latest '''one week after the lecture''' and send it to me.
+
* All dependencies must be available on CRAN or Bioconductor.
 +
* All functions must have roxygen generated <code>man</code> pages that include meaningful examples.
 +
* Example data must be small, less than 100kb or so.
 +
* Your package must conform to [https://cran.r-project.org/web/packages/policies.html '''CRAN policy'''] on source packages, and [https://www.bioconductor.org/developers/package-guidelines/ '''Bioconductor package guidelines''']<ref>We may allow deviations from these policies after discussion in class.</ref>.
  
 +
{{Smallvspace}}
  
Note that your marks will suffer if your items are late.
+
;Code requirements
 +
* You must adhere to the [[RPR-Coding_style|'''R coding style rules''']] for this course.
 +
* Your functions must not have side effects except for invoking the <code>plot()</code> function (w.o. side effects). In particular, never change global options permanently, and '''never''' assign into the global namespace with the <code><<-</code> operator. Temporary files or directories must be created using <code>tempfile()</code> resp. <code>tempdir()</code>.
 +
* Your code must be fully covered with unit and integration tests, as appropriate and test must run vie the <code>testthat</code> functions.
 +
* See the [[ABC-Rubrics#Code|'''evaluation rubrics''']] for further suggestions.
  
  
'''Audience responsibilities''' include:
 
* Pre-reading before class;
 
* Active participation in the discussion;
 
* Brief, written feedback on the exercises in the Student Wiki within two weeks of the lecture.
 
  
 
{{Vspace}}
 
{{Vspace}}
  
-->
+
==== 3. Code reviews ====
  
<!--
+
{{Smallvspace}}
===First Class===
 
{{Vspace}}
 
  
# Overview of how this course will work.
+
Your code will be examined by the entire class and will be [[APB-Code_review|'''reviewed in class''']].
# Overview of presenter and audience responsibilities and marking scheme.
 
# Define a first list of topics.
 
# Assign topics and dates.
 
# Subscribe everyone to [https://groups.google.com/forum/#!forum/bcb410_2016 the mailing list].
 
# Create a [http://steipe.biochemistry.utoronto.ca/abc/students Student Wiki] account for everyone.
 
  
 
{{Vspace}}
 
{{Vspace}}
  
-->
+
==== 4. Improvements and extensions ====
  
<!--
+
{{Smallvspace}}
===A selection of topics for consideration===
+
 
 +
Based on the reviews and feedback from the instructor, you develop improvements and extensions.
  
 
{{Vspace}}
 
{{Vspace}}
  
;Confirmed
+
==== 5. Examples and documentation ====
  
* Naina: ''de novo'' assembly from HTS data
+
{{Smallvspace}}
* Fupan: Bioinformatics in the '''Cloud''': use cases, platforms (AWS?, Azure?), code deployment (Docker?), example, practice - with an emphasis on CUDA
 
* Lindsay: Comparison of Variant Callers with an emphasis on detecting antibiotic resistance
 
* Won June: Adam
 
* James: Deep learning with CUDA architectures
 
* Charles: [https://www.rosettacommons.org/ '''Rosetta'''] protein structure prediction and/or design
 
* Moeen: TF binding site discovery: [http://opossum.cisreg.ca/oPOSSUM3/ '''oPOSSUM-3''']
 
* Allana: Current Cheminformatics: [http://www.eyesopen.com/ '''OpenEye'''] and other tools. Use cases. Workflows...
 
* Tom: [http://robpatro.com/blog/?p=260 RapMap] from reads to transcriptome mapping
 
* Chloe: ''Big Data'' in bioinformatics: sources, challenges, strategies, practice
 
* Hari: Function prediction with the [http://rast.nmpdr.org/ '''RAST server''']
 
* Pruthvi: Differential expression analysis with RNAseq (vs. microarrays): what are the statistics issues? How does this relate to GEO2R?
 
* Bhawan: ''Deep Learning'' how and why does it work? Bioinformatics use cases. '''R''' packages <code>darch</code>, <code>H<sub>2</sub>O</code> and <code>deepnet</code> (or newer ?). Practical example.
 
* Bro: Bioconductor - Navigator workflow
 
* Sabrina: [http://www.genemania.org/ '''GeneMANIA''']
 
* Ish: [http://www.yandell-lab.org/software/maker.html '''MAKER 2'''] genome annotation pipeline or more recent alternatives
 
  
 +
Once your code is complete, you develop comprehensive examples, a user guide ("vignette") and documentation.
  
 +
{{Vspace}}
  
;Proposed
 
* The github workflow for open source collaborative development: Commits? Master and branches? Pull? Merge? Issues? Blame?
 
* Test driven Development: principles, and practice with the <code>RUnit</code> package.
 
* The BioConductor data model: ranges, annotations etc. and how they relate to the architecture of bioconducter S4 objects...
 
* Any of the [http://www.bioconductor.org/help/workflows/ '''Bioconductor workflows''']
 
* Maximal Information Content (MIC): The '''R''' Minerva package: Pearson vs. MIC for co-Expression analysis.
 
* [http://www.broadinstitute.org/gatk/guide/best-practices Broad Institute '''GATK'''] (Genome Analysis ToolKit)
 
* Calculating volumetric data and displaying it with [https://www.cgl.ucsf.edu/chimera/ '''Chimera''']
 
* Structural genomics: [http://genome3d.eu/ Genome 3D] and [http://www.sbg.bio.ic.ac.uk/phyre2 '''Phyre2''']
 
* [https://salilab.org/modeller/ '''Modeller''']
 
* ... or other topics that you have encountered in a BCB330 or BCB430 project, for which you have particular expertise, or in which you are especially interested.
 
  
 +
===Supporting Knowledge Network===
  
Boris: screenscraping, regular expressions, Chimera python programming, dynamic programming ...
+
{{Smallvspace}}
* Advanced clustering and cluster-quality metrics with '''R'''
 
* [http://qiime.org/ Qiime] - demultiplexing community sequencing data
 
* other topics on the [[Applied Bioinformatics Main Page|Applied Bioinformatics Page]] of this Wiki ...
 
  
 +
For reference, consider the Knowledge Network for the BCH441 - Bioinformatics course.
  
{{Vspace}}
+
[[File:ABC-units_map.svg|thumb|250px|none|link=http://steipe.biochemistry.utoronto.ca/abc/assets/ABC-units_map.svg|'''Learning units in the General Bioinformatics knowledge network.''']]
 +
 
 +
In particular you must work through the "Journal" and "Plagiarism" units, and review the introduction to R units.
  
-->
 
  
<!--
+
{{Vspace}}
===Schedule===
 
  
;TBD
 
  
Students may swap their presentation dates among themselves but the coordinator '''must''' be informed of swaps.
+
=== Marking ===
  
{{Vspace}}
+
{{Smallvspace}}
  
 
<table>
 
<table>
  
 
<tr class="sh">
 
<tr class="sh">
<td class="wp"><b>Week</b></td>
+
<td><b>Activity</b></td>
<td class="wp"><b>Date</b></td>
+
<td><b>Weight</b></td>
<td class="wp"><b>Presenter 1</b></td>
 
<td class="wp"><b>Presenter 2</b></td>
 
 
</tr>
 
</tr>
 
  
 
<tr><td colspan="3" class="sp"></td></tr>
 
<tr><td colspan="3" class="sp"></td></tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td class="wp">2</td>
+
<td>One minute pitch</td>
<td class="wp">Sept. 21</td>
+
<td>8 marks</td>
<td class="wp">Boris<br />Screenscraping – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Boris/BCB410_Screenscraping '''Link''']</span></td>
 
<td class="wp">&nbsp;</td>
 
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td class="wp">3</td>
+
<td>Initial submission of your package</td>
<td class="wp">Sept. 28</td>
+
<td>18 marks</td>
<td class="wp">Bro<br />Bioconductor - NAViGaTOR – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Bronnil.hawill/NAViGaTOR_Workflow '''Link''']</span></td>
 
<td class="wp">&nbsp;</td>
 
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td class="wp" >4</td>
+
<td>[[APB-Code_review|Participation in Review panels]]</td>
<td class="wp" >Oct. 5</td>
+
<td>4 x 8 marks</td>
<td class="wp">Won June<br />Adam – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Wonjunetai/ADAM '''Link''']</span></td>
 
<td class="wp">&nbsp;</td>
 
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td class="wp" >5</td>
+
<td>General contributions to discussion and reviews</td>
<td class="wp" >Oct. 12</td>
+
<td>6 marks</td>
<td class="wp" colspan="2" style="color:#777777;">No class meeting this week.</td>
 
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td class="wp" >6</td>
+
<td>Final submission: improvements over the first submission and documentation</td>
<td class="wp" >Oct. 19</td>
+
<td>16 marks</td>
<td class="wp">James<br />Deep Learning using CUDA – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:James.yuan/DeepLearningWithCUDA '''Link''']</span></td>
 
<td class="wp">&nbsp;</td>
 
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td class="wp" >7</td>
+
<td>[[FND-Journal|Journals]]</td>
<td class="wp" >Oct. 26</td>
+
<td>15 marks</td>
<td class="wp">Pruthvi<br />RNAseq vs. microarrays – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Pruthvi.desai/Presentation '''Link''']</span></td>
 
<td class="wp">&nbsp;</td>
 
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td class="wp" >8</td>
+
<td>[[ABC-Insights|Insights!]]</td>
<td class="wp" >Nov. 2</td>
+
<td>5 marks</td>
<td class="wp">Lindsay<br />Variant callers – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Lindsay.liang/VarianceCallers '''Link''']</span></td>
 
<td class="wp">Charles<br />Autodock – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Charles.Ding/Autodock_Vina '''Link''']</span></td>
 
 
</tr>
 
</tr>
  
<tr class="s2">
+
<tr><td colspan="3" class="sp"></td></tr>
<td class="wp" >9</td>
 
<td class="wp" >Nov. 9</td>
 
<td class="wp">Bhawan<br />Deep learning for computer vision – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Bhawan_Panesar/DeepLearning '''Link''']</span></td>
 
<td class="wp">&nbsp;</td>
 
</tr>
 
 
 
<tr class="s1">
 
<td class="wp" >10</td>
 
<td class="wp" >Nov. 16</td>
 
<td class="wp">Sabrina<br />Horizontal Gene Transfer – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Sabrina.ge/BCB410-project '''Link''']</span></td>
 
<td class="wp">Fupan<br />The Cloud (with emphasis on code reproducibility) – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Gosuzombie/BCB410 '''Link''']</span></td>
 
</tr>
 
  
 
<tr class="s2">
 
<tr class="s2">
<td class="wp" >11</td>
+
<td>'''Total'''</td>
<td class="wp" >Nov. 23</td>
+
<td>100 marks</td>
<td class="wp">Allana<br />Cheminformatics – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Allanapereira/Cheminformatics_Presentation '''Link''']</span></td>
 
<td class="wp">Moeen<br />TF binding site discovery – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Bagherig/oPOSSUM_3.0_material '''Link''']</span></td>
 
 
</tr>
 
</tr>
 
<tr class="s1">
 
<td class="wp" >12</td>
 
<td class="wp" >Nov. 30</td>
 
<td class="wp">Naina<br />''de novo'' assembly from HTS data – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Naina.Singh/Assembly '''Link''']</span></td>
 
<td class="wp">Tom<br />GATK – <span style="font-size:80%;">[http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Yulong.piao/GATK '''Link''']</span></td>
 
</tr>
 
 
 
 
</table>
 
</table>
  
 
{{Vspace}}
 
{{Vspace}}
  
<sup>*</sup> Older material and/or previous lectures on these topics are available. Coordinate with me ...
+
What makes an ''excellent'' grade? [[ABC-Rubrics|'''See here.''']]
-->
 
  
<!--
+
{{Vspace}}
==OLDER MATERIAL==
 
  
&nbsp;
 
===Selection of topics for consideration===
 
  
 +
===First Class===
  
* '''Basics'''
+
{{Smallvspace}}
**UNIX
 
***UNIX commands
 
***The UNIX pipe ("|")
 
***Installation of programs
 
***shellscripts
 
**IDE (Integrated Development Environment)
 
**Screenscraping
 
**wget
 
**Regular expressions
 
**HTML
 
**CGI
 
* '''Perl'''
 
**CPAN
 
**Perl programming
 
**Perl one-liners
 
* '''PHP'''
 
* '''MySQL'''
 
**MySQL installation
 
  
 +
# Overview of how this course will work.
 +
# Overview of presenter and audience responsibilities and marking scheme.
 +
# Define a first list of topics.
 +
# Assign topics and dates.
 +
# Subscribe everyone to [https://groups.google.com/forum/#!forum/bcb410_2018 the mailing list].
 +
# Create a [http://steipe.biochemistry.utoronto.ca/abc/students Student Wiki] account for everyone.
  
 +
{{Vspace}}
  
-->
 
  
 
==Notes==
 
==Notes==

Latest revision as of 14:37, 9 October 2018

BCB410H1F - 2018



Objectives and Participants

 

The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.

As a required course in the BCB curriculum, BCB410 assumes the prerequisites and goals of fourth-year students in the BCB Specialist Program. Other students may be permitted to enrol on a case by case basis, but they may need to catch up on prerequisites in computer science or life-science courses that BCB students have taken at this point. Generally speaking, this is an advanced course that presupposes familiarity with programming principles, algorithm analysis, and methods of modern systems biology, as well as introductory knowledge of linear algebra, graph theory, information theory, statistics, as well as molecular–, structural– and cellular biology. The varying topics will be discussed at a highly technical level that is likely only useful for students who plan to integrate much of this material into their actual practice.


 


Organization

 

First review session: Wednesday, October 10


 


Dates and Location

 

Classes meet Wednesdays between 10:00 and 12:00 in SS1080 (Sidney Smith Hall) throughout the Fall Term. Classes start at 10 minutes past the hour.


 


Coordinator

Boris Steipe


 



Office hours

(Virtual) face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than meetings. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.


 



 

Contact

Contact within the class is easiest via the Google Group that you will subscribe to at the beginning of class.


 

After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:

  • If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
  • If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like You must be a member of this group to view and participate in it. This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't know you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.

If you really need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here https://www.wikihow.com/Make-a-Google-Account-Without-Gmail .


 

Contents

 

In this year's course you will define a useful tool for the analysis of biological data, write an R package to support it, review and critique other packages, and improve and document your work.


 

Phases

We will work in five phases:

  • You will define a tool for data analysis and pitch it to the class for feedback in a one-minute presentation;
  • You will develop an R package for the analysis;
  • The class will work through your package and we will review your code;
  • You will respond to the review, improve the material and add code to support an interactive webpage for data exploration with your tool based on the shiny package;
  • You will finalize your package with a vignette with examples, and documentation.


 


Week Date Topic
 
1 September 12 Introduction, organization
2 September 19 Initial idea, one-minute pitch
3 September 26 R package principles
4 October 3 Tests and performance
5 October 8 All packages to be completed before Monday, October 8.
5 October 10 Code Review I
6 October 17 Code Review II
7 October 24 Code Review III
8 October 31 Code Review IV
- November 7 No class meeting, Fall Reading Week
9 November 14 R Shiny
10 November 21 Best practice, reproducible research
11 November 28 Vignettes, examples, documentation
12 December 5 No class meeting, all material due.


 

Details

 


1. Define your tool

 
Requirements
The scope of your R package is add to or improve a current workflow in bioinformatics or computational biology. It is required that at least some of the functionality is to produce a compelling graphical output, ideally to support for exploratory analysis. Your package must not merely reproduce existing tools[1], and it must be distinct from the work of your classmates.
Ideas
You can draw on many sources for ideas:


Pitch
Your one-minute pitch is a presentation on Wednesday, September 19. that is based on a single slide which you upload as a jpg image to your "Project Page" - a subpage of your User Page on the Student Wiki. You will be timed and a hard cutoff of 60 seconds applies. We will solicit brief feedback from the class regarding ways to improve creativity, utility, and visual appeal.


 

2. Develop your package

 

You will develop your tool as an R package following principles outlined in Hadley Wickhams's R packages book. Your package is to be posted on github. It must be complete (i.e it must pass without errors, warnings or notes) by Sunday, October 7. 23:59:59[2].

In addition to the package contents, you post a brief synopsis on your Project Page and a link to your github repository.


 
Package/project requirement details
  • Your project / package must be posted on github.
  • It must result in a new, working project in RStudio when we check it out from github.
  • When building the package within the project locally, it must pass build checks without errors, warnings, or notes.
  • It must be installable from github using the following code:
library(devtools)
install_github("<user name>/<package name>")
library(<package name>)
  • All dependencies must be available on CRAN or Bioconductor.
  • All functions must have roxygen generated man pages that include meaningful examples.
  • Example data must be small, less than 100kb or so.
  • Your package must conform to CRAN policy on source packages, and Bioconductor package guidelines[3].


 
Code requirements
  • You must adhere to the R coding style rules for this course.
  • Your functions must not have side effects except for invoking the plot() function (w.o. side effects). In particular, never change global options permanently, and never assign into the global namespace with the <<- operator. Temporary files or directories must be created using tempfile() resp. tempdir().
  • Your code must be fully covered with unit and integration tests, as appropriate and test must run vie the testthat functions.
  • See the evaluation rubrics for further suggestions.



 

3. Code reviews

 

Your code will be examined by the entire class and will be reviewed in class.


 

4. Improvements and extensions

 

Based on the reviews and feedback from the instructor, you develop improvements and extensions.


 

5. Examples and documentation

 

Once your code is complete, you develop comprehensive examples, a user guide ("vignette") and documentation.


 


Supporting Knowledge Network

 

For reference, consider the Knowledge Network for the BCH441 - Bioinformatics course.

Learning units in the General Bioinformatics knowledge network.

In particular you must work through the "Journal" and "Plagiarism" units, and review the introduction to R units.


 


Marking

 
Activity Weight
One minute pitch 8 marks
Initial submission of your package 18 marks
Participation in Review panels 4 x 8 marks
General contributions to discussion and reviews 6 marks
Final submission: improvements over the first submission and documentation 16 marks
Journals 15 marks
Insights! 5 marks
Total 100 marks


 

What makes an excellent grade? See here.


 


First Class

 
  1. Overview of how this course will work.
  2. Overview of presenter and audience responsibilities and marking scheme.
  3. Define a first list of topics.
  4. Assign topics and dates.
  5. Subscribe everyone to the mailing list.
  6. Create a Student Wiki account for everyone.


 


Notes

  1. It is your responsibility to search the literature and available packages to define in what way your contribution is new.
  2. You must document that your package passes by posting the build-output in your journal on the student Wiki and your submission is only considered complete once all checks pass. Late penalties will be applied according to the following formula: (marks achieved) * 0.5^(fractional days late). However material submitted more than three days late, or less than 24 hours before code review will be marked zero
  3. We may allow deviations from these policies after discussion in class.