BCB410

From "A B C"
Jump to navigation Jump to search

BCB410H1F - 2018



Objectives and Participants

 

The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.

As a required course in the BCB curriculum, BCB410 assumes the prerequisites and goals of fourth-year students in the BCB Specialist Program. Other students may be permitted to enrol on a case by case basis, but they may need to catch up on prerequisites in computer science or life-science courses that BCB students have taken at this point. Generally speaking, this is an advanced course that presupposes familiarity with programming principles, algorithm analysis, and methods of modern systems biology, as well as introductory knowledge of linear algebra, graph theory, information theory, statistics, as well as molecular–, structural– and cellular biology. The varying topics will be discussed at a highly technical level that is likely only useful for students who plan to integrate much of this material into their actual practice.


 


Organization

 

First review session: Wednesday, October 10


 


Dates and Location

 

Classes meet Wednesdays between 10:00 and 12:00 in SS1080 (Sidney Smith Hall) throughout the Fall Term. Classes start at 10 minutes past the hour.


 


Coordinator

Boris Steipe


 



Office hours

(Virtual) face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than meetings. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.


 



 

Contact

Contact within the class is easiest via the Google Group that you will subscribe to at the beginning of class.


 

After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:

  • If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
  • If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like You must be a member of this group to view and participate in it. This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't know you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.

If you really need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here https://www.wikihow.com/Make-a-Google-Account-Without-Gmail .


 

Contents

 

In this year's course you will define a useful tool for the analysis of biological data, write an R package to support it, review and critique other packages, and improve and document your work.


 

Phases

We will work in five phases:

  • You will define a tool for data analysis and pitch it to the class for feedback in a one-minute presentation;
  • You will develop an R package for the analysis;
  • The class will work through your package and we will review your code;
  • You will respond to the review, improve the material and add code to support an interactive webpage for data exploration with your tool based on the shiny package;
  • You will finalize your package with a vignette with examples, and documentation.


 


Week Date Topic
 
1 September 12 Introduction, organization
2 September 19 Initial idea, one-minute pitch
3 September 26 R package principles
4 October 3 Tests and performance
5 October 8 All packages to be completed before Monday, October 8.
5 October 10 Code Review I
6 October 17 Code Review II
7 October 24 Code Review III
8 October 31 Code Review IV
- November 7 No class meeting, Fall Reading Week
9 November 14 R Shiny
10 November 21 Best practice, reproducible research
11 November 28 Vignettes, examples, documentation
12 December 5 No class meeting, all material due.


 

Details

 


1. Define your tool

 
Requirements
The scope of your R package is add to or improve a current workflow in bioinformatics or computational biology. It is required that at least some of the functionality is to produce a compelling graphical output, ideally to support for exploratory analysis. Your package must not merely reproduce existing tools[1], and it must be distinct from the work of your classmates.
Ideas
You can draw on many sources for ideas:


Pitch
Your one-minute pitch is a presentation on Wednesday, September 19. that is based on a single slide which you upload as a jpg image to your "Project Page" - a subpage of your User Page on the Student Wiki. You will be timed and a hard cutoff of 60 seconds applies. We will solicit brief feedback from the class regarding ways to improve creativity, utility, and visual appeal.


 

2. Develop your package

 

You will develop your tool as an R package following principles outlined in Hadley Wickhams's R packages book. Your package is to be posted on github. It must be complete (i.e it must pass without errors, warnings or notes) by Sunday, October 7. 23:59:59[2].

In addition to the package contents, you post a brief synopsis on your Project Page and a link to your github repository.


 
Package/project requirement details
  • Your project / package must be posted on github.
  • It must result in a new, working project in RStudio when we check it out from github.
  • When building the package within the project locally, it must pass build checks without errors, warnings, or notes.
  • It must be installable from github using the following code:
library(devtools)
install_github("<user name>/<package name>")
library(<package name>)
  • All dependencies must be available on CRAN or Bioconductor.
  • All functions must have roxygen generated man pages that include meaningful examples.
  • Example data must be small, less than 100kb or so.
  • Your package must conform to CRAN policy on source packages, and Bioconductor package guidelines[3].


 
Code requirements
  • You must adhere to the R coding style rules for this course.
  • Your functions must not have side effects except for invoking the plot() function (w.o. side effects). In particular, never change global options permanently, and never assign into the global namespace with the <<- operator. Temporary files or directories must be created using tempfile() resp. tempdir().
  • Your code must be fully covered with unit and integration tests, as appropriate and test must run vie the testthat functions.
  • See the evaluation rubrics for further suggestions.



 

3. Code reviews

 

Your code will be examined by the entire class and will be reviewed in class.


 

4. Improvements and extensions

 

Based on the reviews and feedback from the instructor, you develop improvements and extensions.


 

5. Examples and documentation

 

Once your code is complete, you develop comprehensive examples, a user guide ("vignette") and documentation.


 


Supporting Knowledge Network

 

For reference, consider the Knowledge Network for the BCH441 - Bioinformatics course.

Learning units in the General Bioinformatics knowledge network.

In particular you must work through the "Journal" and "Plagiarism" units, and review the introduction to R units.


 


Marking

 
Activity Weight
One minute pitch 8 marks
Initial submission of your package 18 marks
Participation in Review panels 4 x 8 marks
General contributions to discussion and reviews 6 marks
Final submission: improvements over the first submission and documentation 16 marks
Journals 15 marks
Insights! 5 marks
Total 100 marks


 

What makes an excellent grade? See here.


 


First Class

 
  1. Overview of how this course will work.
  2. Overview of presenter and audience responsibilities and marking scheme.
  3. Define a first list of topics.
  4. Assign topics and dates.
  5. Subscribe everyone to the mailing list.
  6. Create a Student Wiki account for everyone.


 


Notes

  1. It is your responsibility to search the literature and available packages to define in what way your contribution is new.
  2. You must document that your package passes by posting the build-output in your journal on the student Wiki and your submission is only considered complete once all checks pass. Late penalties will be applied according to the following formula: (marks achieved) * 0.5^(fractional days late). However material submitted more than three days late, or less than 24 hours before code review will be marked zero
  3. We may allow deviations from these policies after discussion in class.