Difference between revisions of "BCB410"

From "A B C"
Jump to navigation Jump to search
 
(31 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<div id="APB">
 
<div id="APB">
 
<div class="b1">
 
<div class="b1">
BCB410 2012
+
BCB410H1F - 2018
 
</div>
 
</div>
  
Line 11: Line 11:
 
==Objectives and Participants==
 
==Objectives and Participants==
  
 +
{{Smallvspace}}
  
The "Applied Bioinformatics" course is offered as a part of the BCB curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.
+
The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.
  
As a required course in the BCB curriculum, BCB410 assumes the prerequisites and goals of fourth-year students in the BCB Specialist Program. Other students may participate but they may need to catch up on prerequisites in computer science or life-science courses that BCB students have taken at this point. They may also need to consider whether their objectives match the course objectives well. Generally speaking, this is an advanced course that presupposes familiarity with programming principles, algorithm analysis, and methods of modern systems biology, as well as introductory knowledge of linear algebra, graph theory, information theory, statistics as well as molecular&ndash;, structural&ndash; and cellular biology.
+
As a required course in the BCB curriculum, BCB410 assumes the prerequisites and goals of fourth-year students in the BCB Specialist Program. Other students may be permitted to enrol on a case by case basis, but they may need to catch up on prerequisites in computer science or life-science courses that BCB students have taken at this point. Generally speaking, this is an advanced course that presupposes familiarity with programming principles, algorithm analysis, and methods of modern systems biology, as well as introductory knowledge of linear algebra, graph theory, information theory, statistics, as well as molecular&ndash;, structural&ndash; and cellular biology. The varying topics will be discussed at a '''highly technical level''' that is likely only useful for students who plan to integrate much of this material into their actual practice.
  
 +
{{Vspace}}
  
==Organization==
 
  
&nbsp;
 
  
=== Dates and Location ===
+
==Organization==
  
&nbsp;
+
{{Smallvspace}}
  
Classes meet Wednesdays between 10:00 and 12:00 in MS 2394 throughout the Fall Term.
+
<div class="alert">
 +
First review session: Wednesday, October 10
 +
</div>
  
&nbsp;
+
{{Vspace}}
  
=== Contact ===
 
  
&nbsp;
+
=== Dates and Location ===
  
Contact within the class is easiest via the [mailto:bcb410_2012@groups.google.com Google Group] that you have been subscribed to.
+
{{Smallvspace}}
  
&nbsp;
+
Classes meet Wednesdays between 10:00 and 12:00 in [http://map.utoronto.ca/utsg/building/033 SS1080 (Sidney Smith Hall)] throughout the Fall Term. Classes start at 10 minutes past the hour.
  
=== Marking ===
+
{{Vspace}}
  
&nbsp;
+
{{#lst:User:Boris|Coordinator}}
 +
{{#lst:User:Boris|Office_hours}}
  
<table>
+
{{Vspace}}
  
<tr class="sh">
+
=== Contact ===
<td><b>Activity</b></td>
 
<td><b>Weight</b></td>
 
</tr>
 
  
<tr><td colspan="3" class="sp"></td></tr>
+
Contact within the class is easiest via the [https://groups.google.com/forum/#!forum/bcb410_2018 '''Google Group'''] that you will subscribe to at the beginning of class.
  
<tr class="s1">
+
{{Vspace}}
<td>Design and coordination of your unit</td>
 
<td>20 marks</td>
 
</tr>
 
  
<tr class="s2">
+
After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:
<td>Delivery and contents of presentation</td>
 
<td>20 marks</td>
 
</tr>
 
 
 
<tr class="s1">
 
<td>Quality of exercises/assignments</td>
 
<td>30 marks</td>
 
</tr>
 
 
 
<tr class="s2">
 
<td>Participation</td>
 
<td>10 marks</td>
 
</tr>
 
  
<tr class="s1">
+
* If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
<td>Final exam</td>
 
<td>20 marks</td>
 
</tr>
 
  
<tr><td colspan="3" class="sp"></td></tr>
+
* If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like <code>You must be a member of this group to view and participate in it.</code> This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't '''know''' you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.
  
<tr class="s2">
+
<small>If you '''really''' need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here https://www.wikihow.com/Make-a-Google-Account-Without-Gmail . </small>
<td>'''Total'''</td>
 
<td>100 marks</td>
 
</tr>
 
</table>
 
  
&nbsp;
+
{{Vspace}}
  
 
== Contents ==
 
== Contents ==
  
&nbsp;
+
{{Smallvspace}}
 
 
===A syllabus of learning units===
 
 
 
&nbsp;
 
  
Working from a general collection of topics in the field, we identify learning units that are of the greatest interest and greatest relevance for the students in the class. We jointly select the most suitable topics. '''Every student in class will take responsibility for development and delivery of one of the learning units.'''
+
In this year's course you will define a useful tool for the analysis of biological data, write an R package to support it, review and critique other packages, and improve and document your work.
  
&nbsp;
+
{{Vspace}}
  
===Unit contents and delivery===
+
=== Phases ===
  
The detailed contents for each unit is to be be discussed with the coordinator. Each student will to lead a two hour session on their topic.
+
We will work in five phases:
  
'''Presenter's responsibilities''' include<ref>Details may vary as required, by mutual agreement.</ref>:
+
* You will define a tool for data analysis and pitch it to the class for feedback in a one-minute presentation;
* Outline of the unit contents, to be completed at least '''three weeks in advance'''; This is to include:
+
* You will develop an R package for the analysis;
** a detailed lecture outline that includes an introduction, discussion of algorithms, presentation of examples, exposition of practical- and implementation issues and an outlook on future developments in the field;
+
* The class will work through your package and we will review your code;
** suitable pre-reading material;
+
* You will respond to the review, improve the material and add code to support an interactive webpage for data exploration with your tool based on the '''''shiny''''' package;
** an outline of exercises for the class;
+
* You will finalize your package with a vignette with examples, and documentation.
* Iteration of the unit contents with the coordinator, to be completed at least '''two weeks in advance'''.
 
* Developing a set of exercises (iterated with the coordinator) around the implementation of the topics , at least '''one week in advance''';
 
* Communication of pre-reading materials to your classmates, at least '''one week in advance''';
 
* Delivery of your lecture at a sufficiently technical level to be appropriate for an advanced fourth-year course and engaging the class in discussion;
 
* Communication of exercise materials to the class, at or directly after the lecture;
 
* Drafting a final-exam question that tests the successful completion of the exercises, at the latest '''one week after the lecture'''.
 
  
'''Audience responsibilities''' include:
+
{{Vspace}}
* Pre-reading before class;
 
* Active participation in the discussion;
 
* Feedback on the exercises and completion in due time.
 
  
 
===Schedule===
 
 
This schedule exceeds the available dates of the term and additional time slots will be agreed on, as required, during the second half of the term. Students may swap their presentation dates among themselves but the coordinator '''must''' be informed of swaps.
 
 
&nbsp;
 
  
 
<table>
 
<table>
Line 129: Line 86:
 
<tr class="sh">
 
<tr class="sh">
 
<td><b>Week</b></td>
 
<td><b>Week</b></td>
<td><b>Presenter</b></td>
+
<td><b>Date</b></td>
 
<td><b>Topic</b></td>
 
<td><b>Topic</b></td>
 
</tr>
 
</tr>
  
<tr><td colspan="3" class="sp"></td></tr>
+
<tr><td colspan="3" class="sp">&nbsp;</td></tr>
  
 
<tr class="s1">
 
<tr class="s1">
 
<td>1</td>
 
<td>1</td>
<td>Neda Raji</td>
+
<td>September 12</td>
<td>High-throughput sequencing</td>
+
<td>Introduction, organization</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
 
<td>2</td>
 
<td>2</td>
<td>Andrei Soltan</td>
+
<td>September 19</td>
<td>unix tools<sup>*</sup></td>
+
<td>Initial idea, one-minute pitch</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
 
<td>3</td>
 
<td>3</td>
<td>Fahd Ananta</td>
+
<td>September 26</td>
<td>PHP<sup>*</sup></td>
+
<td>R package principles</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
 
<td>4</td>
 
<td>4</td>
<td>Inna Dimenshtein</td>
+
<td>October 3</td>
<td>'''R''' programming<sup>*</sup></td>
+
<td>Tests and performance</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
 
<td>5</td>
 
<td>5</td>
<td>Andrew Lugowski</td>
+
<td>October 8</td>
<td>Text mining</td>
+
<td>All packages to be completed before Monday, October 8.</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
 +
<td>5</td>
 +
<td>October 10</td>
 +
<td>Code Review I</td>
 +
</tr>
 +
 +
<tr class="s1">
 
<td>6</td>
 
<td>6</td>
<td>Lorenz Breu</td>
+
<td>October 17</td>
<td>Network metrics</td>
+
<td>Code Review II</td>
 +
</tr>
 +
 
 +
<tr class="s2">
 +
<td>7</td>
 +
<td>October 24</td>
 +
<td>Code Review III</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
<td>7</td>
+
<td>8</td>
<td>Samuel Law</td>
+
<td>October 31</td>
<td>BioPython</td>
+
<td>Code Review IV</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
<td>8</td>
+
<td>-</td>
<td>Kyle Kim</td>
+
<td>November 7</td>
<td>Correlation discovery in large datasets<sup>*</sup></td>
+
<td>No class meeting, Fall Reading Week</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
 
<td>9</td>
 
<td>9</td>
<td>Dylan Bethune-Waddell</td>
+
<td>November 14</td>
<td>Pattern discovery<sup>*</sup></td>
+
<td>R Shiny</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
 
<td>10</td>
 
<td>10</td>
<td>Taras Gordiyenko</td>
+
<td>November 21</td>
<td>High performance computing</td>
+
<td>Best practice, reproducible research</td>
 
</tr>
 
</tr>
  
 
<tr class="s1">
 
<tr class="s1">
 
<td>11</td>
 
<td>11</td>
<td>Chiho Kwon</td>
+
<td>November 28</td>
<td>Clustering<sup>*</sup></td>
+
<td>Vignettes, examples, documentation</td>
 
</tr>
 
</tr>
  
 
<tr class="s2">
 
<tr class="s2">
 
<td>12</td>
 
<td>12</td>
<td>Harun Mustafa</td>
+
<td>December 5</td>
<td>Cluster quality metrics</td>
+
<td>No class meeting, all material due.</td>
 
</tr>
 
</tr>
  
Line 210: Line 179:
  
  
<sup>*</sup> Older material and/or previous lectures on these topics are available. Coordinate with me ...
+
{{Vspace}}
  
 +
=== Details ===
  
<!--
+
{{Smallvspace}}
  
&nbsp;
 
==Topics==
 
  
 +
==== 1. Define your tool ====
  
* '''Basics'''
+
{{Smallvspace}}
**UNIX
 
***UNIX commands
 
***The UNIX pipe ("|")
 
***Installation of programs
 
***shellscripts
 
**IDE (Integrated Development Environment)
 
**Screenscraping
 
**wget
 
**Regular expressions
 
**HTML
 
**CGI
 
* '''Perl'''
 
**CPAN
 
**Perl programming
 
**Perl one-liners
 
* '''PHP'''
 
* '''MySQL'''
 
**MySQL installation
 
  
 +
;Requirements
 +
:The scope of your R package is add to or improve a current workflow in bioinformatics or computational biology. It is required that at least some of the functionality is to produce a compelling graphical output, ideally to support for exploratory analysis. Your package must not merely reproduce existing tools<ref>It is your responsibility to search the literature and available packages to define in what way your contribution is new.</ref>, and it must be distinct from the work of your classmates.
  
 +
;Ideas
 +
: You can draw on many sources for ideas:
 +
:* [https://www.nature.com/collections/vzsqzylnvx '''Current literature'''];
 +
:* [http://bioconductor.org/packages/release/BiocViews.html#___Workflow '''Bioconductor workflows'''];
 +
:* [https://cran.r-project.org/web/views/ '''CRAN task views'''];
 +
:* Tools collections at [https://links.bioinformatics.ca/ '''Bioinformatics.ca'''], [https://www.scripps.edu/research/cbb/tools.html '''Scripps'''] etc.;
 +
:* Online examples (eg. the [https://www.reddit.com/r/dataisbeautiful/ "Data is beautiful"] subreddit;
 +
:* Best practice for information design you have come across, e.g. [https://www.edwardtufte.com/tufte/ Edward Tufte's work];
 +
:* ...
  
&nbsp;
 
==Contents==
 
  
 +
;Pitch
 +
: Your '''one-minute pitch''' is a presentation on Wednesday, September 19. that is based on a single slide which you upload as a jpg image to your "Project Page" - a subpage of your User Page on the Student Wiki. You will be timed and a hard cutoff of 60 seconds applies. We will solicit brief feedback from the class regarding ways to '''improve creativity, utility, and visual appeal'''.
  
&nbsp;
+
{{Vspace}}
  
==Exercises==
+
==== 2. Develop your package ====
<section begin=exercises />
+
 
<section end=exercises />
+
{{Smallvspace}}
-->
+
 
 +
You will develop your tool as an R package following principles outlined in Hadley Wickhams's [http://r-pkgs.had.co.nz/ '''R packages''' book]. Your package is to be posted on github. It must be complete (i.e it must pass without errors, warnings or notes) by Sunday, October 7. 23:59:59<ref>You must document that your package passes by posting the build-output in your journal on the student Wiki and your submission is only considered complete once all checks pass. Late penalties will be applied according to the following formula: <code>(marks achieved) * 0.5^(fractional days late)</code>. However material submitted more than three days late, or less than 24 hours before code review will be marked zero</ref>.
 +
 
 +
In addition to the package contents, you post a brief synopsis on your Project Page and a link to your github repository.
 +
 
 +
{{Smallvspace}}
 +
 
 +
;Package/project requirement details
 +
* Your project / package must be posted on github.
 +
* It must result in a new, working project in RStudio when we check it out from github.
 +
* When building the package within the project locally, it must pass build checks without errors, warnings, or notes.
 +
* It must be installable from github using the following code:
 +
<source lang="R">
 +
library(devtools)
 +
install_github("<user name>/<package name>")
 +
library(<package name>)
 +
</source>
 +
* All dependencies must be available on CRAN or Bioconductor.
 +
* All functions must have roxygen generated <code>man</code> pages that include meaningful examples.
 +
* Example data must be small, less than 100kb or so.
 +
* Your package must conform to [https://cran.r-project.org/web/packages/policies.html '''CRAN policy'''] on source packages, and  [https://www.bioconductor.org/developers/package-guidelines/ '''Bioconductor package guidelines''']<ref>We may allow deviations from these policies after discussion in class.</ref>.
 +
 
 +
{{Smallvspace}}
 +
 
 +
;Code requirements
 +
* You must adhere to the [[RPR-Coding_style|'''R coding style rules''']] for this course.
 +
* Your functions must not have side effects except for invoking the <code>plot()</code> function (w.o. side effects). In particular, never change global options permanently, and '''never''' assign into the global namespace with the <code><<-</code> operator. Temporary files or directories must be created using <code>tempfile()</code> resp. <code>tempdir()</code>.
 +
* Your code must be fully covered with unit and integration tests, as appropriate and test must run vie the <code>testthat</code> functions.
 +
* See the [[ABC-Rubrics#Code|'''evaluation rubrics''']] for further suggestions.
 +
 
 +
 
 +
 
 +
{{Vspace}}
 +
 
 +
==== 3. Code reviews ====
 +
 
 +
{{Smallvspace}}
 +
 
 +
Your code will be examined by the entire class and will be [[APB-Code_review|'''reviewed in class''']].
 +
 
 +
{{Vspace}}
 +
 
 +
==== 4. Improvements and extensions ====
 +
 
 +
{{Smallvspace}}
 +
 
 +
Based on the reviews and feedback from the instructor, you develop improvements and extensions.
 +
 
 +
{{Vspace}}
 +
 
 +
==== 5. Examples and documentation ====
 +
 
 +
{{Smallvspace}}
 +
 
 +
Once your code is complete, you develop comprehensive examples, a user guide ("vignette") and documentation.
 +
 
 +
{{Vspace}}
 +
 
 +
 
 +
===Supporting Knowledge Network===
 +
 
 +
{{Smallvspace}}
 +
 
 +
For reference, consider the Knowledge Network for the BCH441 - Bioinformatics course.
 +
 
 +
[[File:ABC-units_map.svg|thumb|250px|none|link=http://steipe.biochemistry.utoronto.ca/abc/assets/ABC-units_map.svg|'''Learning units in the General Bioinformatics knowledge network.''']]
 +
 
 +
In particular you must work through the "Journal" and "Plagiarism" units, and review the introduction to R units.
 +
 
 +
 
 +
{{Vspace}}
 +
 
 +
 
 +
=== Marking ===
 +
 
 +
{{Smallvspace}}
 +
 
 +
<table>
 +
 
 +
<tr class="sh">
 +
<td><b>Activity</b></td>
 +
<td><b>Weight</b></td>
 +
</tr>
 +
 
 +
<tr><td colspan="3" class="sp"></td></tr>
 +
 
 +
<tr class="s1">
 +
<td>One minute pitch</td>
 +
<td>8 marks</td>
 +
</tr>
 +
 
 +
<tr class="s2">
 +
<td>Initial submission of your package</td>
 +
<td>18 marks</td>
 +
</tr>
 +
 
 +
<tr class="s1">
 +
<td>[[APB-Code_review|Participation in Review panels]]</td>
 +
<td>4 x 8 marks</td>
 +
</tr>
 +
 
 +
<tr class="s2">
 +
<td>General contributions to discussion and reviews</td>
 +
<td>6 marks</td>
 +
</tr>
 +
 
 +
<tr class="s1">
 +
<td>Final submission: improvements over the first submission and documentation</td>
 +
<td>16 marks</td>
 +
</tr>
 +
 
 +
<tr class="s2">
 +
<td>[[FND-Journal|Journals]]</td>
 +
<td>15 marks</td>
 +
</tr>
 +
 
 +
<tr class="s1">
 +
<td>[[ABC-Insights|Insights!]]</td>
 +
<td>5 marks</td>
 +
</tr>
 +
 
 +
<tr><td colspan="3" class="sp"></td></tr>
 +
 
 +
<tr class="s2">
 +
<td>'''Total'''</td>
 +
<td>100 marks</td>
 +
</tr>
 +
</table>
 +
 
 +
{{Vspace}}
 +
 
 +
What makes an ''excellent'' grade? [[ABC-Rubrics|'''See here.''']]
 +
 
 +
{{Vspace}}
 +
 
 +
 
 +
===First Class===
 +
 
 +
{{Smallvspace}}
 +
 
 +
# Overview of how this course will work.
 +
# Overview of presenter and audience responsibilities and marking scheme.
 +
# Define a first list of topics.
 +
# Assign topics and dates.
 +
# Subscribe everyone to [https://groups.google.com/forum/#!forum/bcb410_2018 the mailing list].
 +
# Create a [http://steipe.biochemistry.utoronto.ca/abc/students Student Wiki] account for everyone.
 +
 
 +
{{Vspace}}
  
&nbsp;
 
  
 
==Notes==
 
==Notes==
 
<references />
 
<references />
  
<!--
 
&nbsp;
 
==Further reading and resources==
 
 
-->
 
 
<!-- {{#pmid:21627854}} -->
 
<!-- {{WWW|WWW_UniProt}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
  
 +
{{Vspace}}
  
&nbsp;
 
 
[[Category:Applied_Bioinformatics]]
 
[[Category:Applied_Bioinformatics]]
 
</div>
 
</div>

Latest revision as of 14:37, 9 October 2018

BCB410H1F - 2018



Objectives and Participants

 

The "Applied Bioinformatics" course is offered as a part of the BCB Program curriculum to ensure that our students know enough about application issues in the field to be able to put their knowledge into practice in a research lab setting. This is to support the Specialist Program goal: to prepare students for graduate studies in the discipline.

As a required course in the BCB curriculum, BCB410 assumes the prerequisites and goals of fourth-year students in the BCB Specialist Program. Other students may be permitted to enrol on a case by case basis, but they may need to catch up on prerequisites in computer science or life-science courses that BCB students have taken at this point. Generally speaking, this is an advanced course that presupposes familiarity with programming principles, algorithm analysis, and methods of modern systems biology, as well as introductory knowledge of linear algebra, graph theory, information theory, statistics, as well as molecular–, structural– and cellular biology. The varying topics will be discussed at a highly technical level that is likely only useful for students who plan to integrate much of this material into their actual practice.


 


Organization

 

First review session: Wednesday, October 10


 


Dates and Location

 

Classes meet Wednesdays between 10:00 and 12:00 in SS1080 (Sidney Smith Hall) throughout the Fall Term. Classes start at 10 minutes past the hour.


 


Coordinator

Boris Steipe


 



Office hours

(Virtual) face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than meetings. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.


 



 

Contact

Contact within the class is easiest via the Google Group that you will subscribe to at the beginning of class.


 

After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:

  • If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
  • If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like You must be a member of this group to view and participate in it. This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't know you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.

If you really need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here https://www.wikihow.com/Make-a-Google-Account-Without-Gmail .


 

Contents

 

In this year's course you will define a useful tool for the analysis of biological data, write an R package to support it, review and critique other packages, and improve and document your work.


 

Phases

We will work in five phases:

  • You will define a tool for data analysis and pitch it to the class for feedback in a one-minute presentation;
  • You will develop an R package for the analysis;
  • The class will work through your package and we will review your code;
  • You will respond to the review, improve the material and add code to support an interactive webpage for data exploration with your tool based on the shiny package;
  • You will finalize your package with a vignette with examples, and documentation.


 


Week Date Topic
 
1 September 12 Introduction, organization
2 September 19 Initial idea, one-minute pitch
3 September 26 R package principles
4 October 3 Tests and performance
5 October 8 All packages to be completed before Monday, October 8.
5 October 10 Code Review I
6 October 17 Code Review II
7 October 24 Code Review III
8 October 31 Code Review IV
- November 7 No class meeting, Fall Reading Week
9 November 14 R Shiny
10 November 21 Best practice, reproducible research
11 November 28 Vignettes, examples, documentation
12 December 5 No class meeting, all material due.


 

Details

 


1. Define your tool

 
Requirements
The scope of your R package is add to or improve a current workflow in bioinformatics or computational biology. It is required that at least some of the functionality is to produce a compelling graphical output, ideally to support for exploratory analysis. Your package must not merely reproduce existing tools[1], and it must be distinct from the work of your classmates.
Ideas
You can draw on many sources for ideas:


Pitch
Your one-minute pitch is a presentation on Wednesday, September 19. that is based on a single slide which you upload as a jpg image to your "Project Page" - a subpage of your User Page on the Student Wiki. You will be timed and a hard cutoff of 60 seconds applies. We will solicit brief feedback from the class regarding ways to improve creativity, utility, and visual appeal.


 

2. Develop your package

 

You will develop your tool as an R package following principles outlined in Hadley Wickhams's R packages book. Your package is to be posted on github. It must be complete (i.e it must pass without errors, warnings or notes) by Sunday, October 7. 23:59:59[2].

In addition to the package contents, you post a brief synopsis on your Project Page and a link to your github repository.


 
Package/project requirement details
  • Your project / package must be posted on github.
  • It must result in a new, working project in RStudio when we check it out from github.
  • When building the package within the project locally, it must pass build checks without errors, warnings, or notes.
  • It must be installable from github using the following code:
library(devtools)
install_github("<user name>/<package name>")
library(<package name>)
  • All dependencies must be available on CRAN or Bioconductor.
  • All functions must have roxygen generated man pages that include meaningful examples.
  • Example data must be small, less than 100kb or so.
  • Your package must conform to CRAN policy on source packages, and Bioconductor package guidelines[3].


 
Code requirements
  • You must adhere to the R coding style rules for this course.
  • Your functions must not have side effects except for invoking the plot() function (w.o. side effects). In particular, never change global options permanently, and never assign into the global namespace with the <<- operator. Temporary files or directories must be created using tempfile() resp. tempdir().
  • Your code must be fully covered with unit and integration tests, as appropriate and test must run vie the testthat functions.
  • See the evaluation rubrics for further suggestions.



 

3. Code reviews

 

Your code will be examined by the entire class and will be reviewed in class.


 

4. Improvements and extensions

 

Based on the reviews and feedback from the instructor, you develop improvements and extensions.


 

5. Examples and documentation

 

Once your code is complete, you develop comprehensive examples, a user guide ("vignette") and documentation.


 


Supporting Knowledge Network

 

For reference, consider the Knowledge Network for the BCH441 - Bioinformatics course.

Learning units in the General Bioinformatics knowledge network.

In particular you must work through the "Journal" and "Plagiarism" units, and review the introduction to R units.


 


Marking

 
Activity Weight
One minute pitch 8 marks
Initial submission of your package 18 marks
Participation in Review panels 4 x 8 marks
General contributions to discussion and reviews 6 marks
Final submission: improvements over the first submission and documentation 16 marks
Journals 15 marks
Insights! 5 marks
Total 100 marks


 

What makes an excellent grade? See here.


 


First Class

 
  1. Overview of how this course will work.
  2. Overview of presenter and audience responsibilities and marking scheme.
  3. Define a first list of topics.
  4. Assign topics and dates.
  5. Subscribe everyone to the mailing list.
  6. Create a Student Wiki account for everyone.


 


Notes

  1. It is your responsibility to search the literature and available packages to define in what way your contribution is new.
  2. You must document that your package passes by posting the build-output in your journal on the student Wiki and your submission is only considered complete once all checks pass. Late penalties will be applied according to the following formula: (marks achieved) * 0.5^(fractional days late). However material submitted more than three days late, or less than 24 hours before code review will be marked zero
  3. We may allow deviations from these policies after discussion in class.