BCB410
BCB410H1F - 2018
Objectives and Participants
The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.
As a required course in the BCB curriculum, BCB410 assumes the prerequisites and goals of fourth-year students in the BCB Specialist Program. Other students may be permitted to enrol on a case by case basis, but they may need to catch up on prerequisites in computer science or life-science courses that BCB students have taken at this point. Generally speaking, this is an advanced course that presupposes familiarity with programming principles, algorithm analysis, and methods of modern systems biology, as well as introductory knowledge of linear algebra, graph theory, information theory, statistics, as well as molecular–, structural– and cellular biology. The varying topics will be discussed at a highly technical level that is likely only useful for students who plan to integrate much of this material into their actual practice.
Organization
First review session: Wednesday, October 10
Dates and Location
Classes meet Wednesdays between 10:00 and 12:00 in SS1080 (Sidney Smith Hall) throughout the Fall Term. Classes start at 10 minutes past the hour.
Coordinator
Office hours
(Virtual) face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than meetings. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.
Contact
Contact within the class is easiest via the Google Group that you will subscribe to at the beginning of class.
After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:
- If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
- If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like
You must be a member of this group to view and participate in it.
This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't know you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.
If you really need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here https://www.wikihow.com/Make-a-Google-Account-Without-Gmail .
Contents
In this year's course you will define a useful tool for the analysis of biological data, write an R package to support it, review and critique other packages, and improve and document your work.
Phases
We will work in five phases:
- You will define a tool for data analysis and pitch it to the class for feedback in a one-minute presentation;
- You will develop an R package for the analysis;
- The class will work through your package and we will review your code;
- You will respond to the review, improve the material and add code to support an interactive webpage for data exploration with your tool based on the shiny package;
- You will finalize your package with a vignette with examples, and documentation.
Week | Date | Topic |
1 | September 12 | Introduction, organization |
2 | September 19 | Initial idea, one-minute pitch |
3 | September 26 | R package principles |
4 | October 3 | Tests and performance |
5 | October 8 | All packages to be completed before Monday, October 8. |
5 | October 10 | Code Review I |
6 | October 17 | Code Review II |
7 | October 24 | Code Review III |
8 | October 31 | Code Review IV |
- | November 7 | No class meeting, Fall Reading Week |
9 | November 14 | R Shiny |
10 | November 21 | Best practice, reproducible research |
11 | November 28 | Vignettes, examples, documentation |
12 | December 5 | No class meeting, all material due. |
Details
1. Define your tool
- Requirements
- The scope of your R package is add to or improve a current workflow in bioinformatics or computational biology. It is required that at least some of the functionality is to produce a compelling graphical output, ideally to support for exploratory analysis. Your package must not merely reproduce existing tools[1], and it must be distinct from the work of your classmates.
- Ideas
- You can draw on many sources for ideas:
- Current literature;
- Bioconductor workflows;
- CRAN task views;
- Tools collections at Bioinformatics.ca, Scripps etc.;
- Online examples (eg. the "Data is beautiful" subreddit;
- Best practice for information design you have come across, e.g. Edward Tufte's work;
- ...
- Pitch
- Your one-minute pitch is a presentation on Wednesday, September 19. that is based on a single slide which you upload as a jpg image to your "Project Page" - a subpage of your User Page on the Student Wiki. You will be timed and a hard cutoff of 60 seconds applies. We will solicit brief feedback from the class regarding ways to improve creativity, utility, and visual appeal.
2. Develop your package
You will develop your tool as an R package following principles outlined in Hadley Wickhams's R packages book. Your package is to be posted on github. It must be complete (i.e it must pass without errors, warnings or notes) by Sunday, October 7. 23:59:59[2].
In addition to the package contents, you post a brief synopsis on your Project Page and a link to your github repository.
- Package/project requirement details
- Your project / package must be posted on github.
- It must result in a new, working project in RStudio when we check it out from github.
- When building the package within the project locally, it must pass build checks without errors, warnings, or notes.
- It must be installable from github using the following code:
library(devtools)
install_github("<user name>/<package name>")
library(<package name>)
- All dependencies must be available on CRAN or Bioconductor.
- All functions must have roxygen generated
man
pages that include meaningful examples. - Example data must be small, less than 100kb or so.
- Your package must conform to CRAN policy on source packages, and Bioconductor package guidelines[3].
- Code requirements
- You must adhere to the R coding style rules for this course.
- Your functions must not have side effects except for invoking the
plot()
function (w.o. side effects). In particular, never change global options permanently, and never assign into the global namespace with the<<-
operator. Temporary files or directories must be created usingtempfile()
resp.tempdir()
. - Your code must be fully covered with unit and integration tests, as appropriate and test must run vie the
testthat
functions. - See the evaluation rubrics for further suggestions.
3. Code reviews
Your code will be examined by the entire class and will be reviewed in class.
4. Improvements and extensions
Based on the reviews and feedback from the instructor, you develop improvements and extensions.
5. Examples and documentation
Once your code is complete, you develop comprehensive examples, a user guide ("vignette") and documentation.
Supporting Knowledge Network
For reference, consider the Knowledge Network for the BCH441 - Bioinformatics course.
In particular you must work through the "Journal" and "Plagiarism" units, and review the introduction to R units.
Marking
Activity | Weight | |
One minute pitch | 8 marks | |
Initial submission of your package | 18 marks | |
Participation in Review panels | 4 x 8 marks | |
General contributions to discussion and reviews | 6 marks | |
Final submission: improvements over the first submission and documentation | 16 marks | |
Journals | 15 marks | |
Insights! | 5 marks | |
Total | 100 marks |
What makes an excellent grade? See here.
First Class
- Overview of how this course will work.
- Overview of presenter and audience responsibilities and marking scheme.
- Define a first list of topics.
- Assign topics and dates.
- Subscribe everyone to the mailing list.
- Create a Student Wiki account for everyone.
Notes
- ↑ It is your responsibility to search the literature and available packages to define in what way your contribution is new.
- ↑ You must document that your package passes by posting the build-output in your journal on the student Wiki and your submission is only considered complete once all checks pass. Late penalties will be applied according to the following formula:
(marks achieved) * 0.5^(fractional days late)
. However material submitted more than three days late, or less than 24 hours before code review will be marked zero - ↑ We may allow deviations from these policies after discussion in class.