Bioinformatics Main Page

From "A B C"
Jump to: navigation, search

BCH441 - Bioinformatics

Welcome to the "Course Wiki" for BCH441 / BCH1441.

These wiki pages provide information and materials, and coordinate activities and projects in the introductory bioinformatics courses and other workshops taught by Boris Steipe at the University of Toronto. If you are not one of my students, you can still browse this site, however I can't provide support for your explorations. The material may be useful if you invest some effort in studying it systematically.

BCH441 will reconvene in the fall term of 2020. It will be delivered in an online only mode. Visit this page in early September for details.


The Course


BCH441 (BCH1441) is an introduction to current bioinformatics for life science students and the specialists in the BCB Program. The course provides an overview of the sources of biomolecular data, data annotation and integration, and the interpretation of results through evidence-based reasoning. This includes the components – sequence, structure, and function, the relationships in phylogeny and in the networks of interactions and regulation, and the “systems” through which we conceptually organize our knowledge.

Specific contents include:

  • large, public biomolecular data resources,
  • DNA and protein sequences and sequence analysis,
  • pairwise and multiple sequence alignment,
  • fast database searches to discover homologues,
  • protein structure interpretation and homology modeling,
  • phylogenetic analysis - tree building and interpretation,
  • work with genome-scale data,
  • functional annotation with Gene Ontology and other resources,
  • relationships discovered through co-expression and protein-protein interactions, and
  • introduction to systems-level concepts.

Practical, hands on tasks and assignments will introduce public data resources and analysis tools. Along with improving general computer literacy, you will learn to use the programming language and statistical workbench R, with a special emphasis on the kind of everyday tasks of data preparation and analysis that have become indispensable for any life-science laboratory. (Yes, you will learn to program.)

The course is complemented by BCB420 / JTB2020 (offered in the Winter Term) which consolidates aspects of computational systems biology in a project context.

BCH441H1F is the undergraduate course code.
BCH1441H1F is the cross-listed course code for graduate students.


Boris Steipe



BCH441/BCH1441 is a Fall Term course; contact times are Tuesdays, 17:00 to 20:00. These are listed nominally as Tutorials T5 and Lectures T6-8, but we will use the time in variable configurations.

First day of class – Tuesday, September 10. You must attend the first class.
We need this time to go over the course delivery and organizational details, and get you an account on the Course Wiki and on the course mailing list. Your personal presence is a requirement of the course. Please do not enrol in the course if your travel- or other plans prevent you from attending the first class session.


This course will be taught in a new format: the domain of Bioinformatics has been decomposed into small "learning units" that are focussed more or less on a single concept. You work through the units independently, in any order that makes sense to you, at your own pace, all with the goal to acquire the knowledge and skills to work on four main "Integrator Units" that bring the contents together.

Through this, the course accommodates different levels of preparation more flexibly, probably makes your work more efficient, and implicitly teaches a number of meta-skills such as reporting and time-management. Be mindful though: this format requires a high level of self-motivation and responsibility to do well. In terms of aiming for the highest level of understanding and competence, you will frequently be on your own - just like you are in "real life". On the other hand, you certainly are the best judge of how well prepared you are. Thus there should be no surprises when your deliverables are evaluated.


Grading and Activities


This course comprises four key, integrative activities and preparatory "learning units" that lead up to them. Learning units can be completed in any sequence that makes sense to you, at any time until the deadline to submit material. But note that some learning units require evaluation on our "Evaluation Days" (see below), and/or scheduling a test, and that has to be done well in advance.

There are also a few restrictions on which units you must complete within this course:

  • you must complete all four   Integrator Units   and submit them for marking. These will be worth maximally 40 marks ((3 x 8) + (1 x 16)), where 16 marks are available for the unit that you choose for your oral test;
  • you can submit a mix of other   learning units   worth up to an additional 30 marks for marking. These are typically worth 6 marks each. There are a number of units available, which ones you choose is up to you.
  • You must ensure that you have submitted units for evaluation that are worth at least 10% of your final grade by October 31, so they can be marked before the last day to drop a course in the Fall term (Monday, November 4 2019).
  • 25% of your mark will be given for your Course Journal at the end of class.
  • 5% of your mark will be given for your insights! page at the end of class.

Please carefully read the evaluation rubrics for each category of deliverables.

For graduate students (BCH1441), the marks you receive for learning units and Integrator Units will be scaled by 0.8, and 14 marks are available for your own design of a learning unit covering an aspect of your thesis project. Coordinate this with the instructor well in advance of the last day to submit course work for BCH441 in the Fall term (Tuesday, December 3 2019 23:59).

Activity Weight
BCH441 - (Undergraduates)
BCH1441 - (Graduates)
Integrator Units 40 marks 32 marks
Your selection of other learning units 30 marks 24 marks
Course Journals 25 marks 25 marks
insights! page 5 marks 5 marks
Graduate "Learning Unit Design"   14 marks
Total 100 marks 100 marks

A mix of evaluation methods
Learning units will be evaluated with a mix of approaches including technical reports, documentation of results in your journal, delivery of R code, quizzes, ... Details will be described in the individual units. You will be required to submit evaluations in different categories.
Quizzes and in-class discussion of the quiz results will be scheduled on the following dates, for the units for which quiz-evaluation is an option:

Integrator units will be evaluated with a mix of approaches including technical reports, delivery of R code, and oral tests. Technical reports, R code and other options will be worth 8 marks each, the oral test will be worth 16 marks. You can submit at most one of each evaluation category (and one of your evaluations must be an oral test). Also, your oral test can't be your first evaluation, and the topics will be cumulative. Details will be described in the individual units.
Oral tests for Integrator Units will be scheduled on November 14 and 15, and November 21 and 22. These are Thursday and Friday dates. Please refer to the sign-up page (linked from the resp. integrator units) for details.

A final note on marking policy...

I do not adjust marks towards a target mean and variance (i.e. there will be no "belling" of grades), but follow the principles laid out in the marking rubrics. I feel strongly that "normalization" of grade interferes with a collaborative and mutually supportive learning environment. If your classmate gets a great mark because you helped him with a difficult concept, this should never have the effect that it brings down your mark because the class average is "belled-down" by the instructor. Collaborate as much as possible, it is a great way to learn.

But take *utmost* care to follow the instructions on avoiding plagiarism and academic misconduct to the letter, they will be rigorously enforced.


The learning-units map

Here is a thematic overview of the topical areas of this course's learning units:

The bioinformatics learning unit landscape.

And here is the detailed map. It contains links to all of the units.

  • <command>-Click to open the Learning Units Map in a new tab, scale for detail.
A clickable map of the bioinformatics learning units.
  • Hover over a learning unit to see its keywords.
  • Click on a learning unit to open the associated page.
  • The nodes of the learning unit network are colour-coded:
    •   Live units   are green
    •   Units under development   are light green. These are still in progress.
    •   Stubs   (placeholders) are pale. These still need basic contents.
    •   Milestone units   are blue. These collect a number of prerequisites to simplify the network.
    •   Integrator units   are red. These embody the main goals of the course.
    •   Units that require revision  are pale orange.
  • Units that have a   black border   have deliverables that can be submitted for credit. Choose any you want to submit for credit, up to a maximum worth of 30 marks.
  • Arrows point from a prerequisite unit to a unit that requires it.

(Unit status will be updated as in-progress units are completed.)


Navigating the course

Everything starts with the following three units:

This will likely be the first learning unit you work with, since your Course Journal will be kept on a Wiki, as well as all other deliverables. This unit includes an introduction to authoring Wikitext and the structure of Wikis, in particular how different pages live in separate "Namespaces". The unit also covers the standard markup conventions - "Wikitext markup" - the same conventions that are used on Wikipedia - as well as some extensions that are specific to our Course- and Student Wiki. We also discuss page categories that help keep a Wiki organized, licensing under a Creative Commons Attribution license, and how to add licenses and other page components through template codes.

Keeping a journal is an essential task in a laboratory. To practice keeping a technical journal, you will document your activities as you are working through the material of the course. A significant part of your term grade will be given for this Course Journal. This unit introduces components and best practice for lab- and course journals and includes a wiki-source template to begin your own journal on the Student Wiki.

In paralell with your other work, you will maintain an insights! page on which you collect valuable insights and learning experiences of the course. Through this you ask yourself: what does this material mean - for the field, and for myself.

Everything leads to the Integrator Units. These cover four large areas of bioinformatics that make up the explicit goals of the course:

(i) algorithms and statistics;
(ii) structural modelling and interpretation;
iii) gene annotation; and;
(iv) phylogenetic analysis.

The knowledge and skills you need to work on these Integrator Units can be obtained from the other learning units that are shown on the learning units map as prerequisites. Note that "prerequisites" in this context does not mean you must do one thing before you can do another, the arrows simply point out which units assume what prior knowledge. You can acquire that knowledge in whatever sequence makes sense to you, and you don't have to learn from the learning units of this course at all. Just make sure that you submit enough general learning units for evaluation along the way. And document what you are doing in your Course Journal. Also, remember that all the material is cumulative - my evaluation of your work implicitly includes all of the prerequisite material.


Where to begin: possible scenarios for working though the units...

(These scenarios are for illustration, you don't have to follow these sequences. There is no implied claim that any of these sequences is better for learning the material than any other. Make your own choices!

Yvette might have done a project about protein structure in a previous course...
She decides to tackle the Homology modelling Integrator Unit first, because she is most confident about that material.
:Obviously, all paths through the learning units begin with the units leading up to the Course Preparation milestone, as well as the Introduction to R milestone. This is where she starts.
:The homology modelling unit requires the cluster of protein structure units (BIN-SX-...), as well as the sequence alignment units (BIN-ALI-...);
From the database units, she only needs BIN-PDB for now;
:For the sequence alignment cluster she needs BIN-Sequence and FND-Homology and both of these need BIN-Abstractions.
Next she goes for the Phylogenetic Analysis Integrator Unit, since it requires only a small number of additional prerequisites and she's a bit busy at that time with midterms in other courses. Some introductory statistics (FND-STA-Probability and FND-STA-Bayes_theorem gets her on the way to the BIN-PHYLO-... cluster.
She finds she enjoys working with R, and figuring out algorithms and workflows is like solving puzzles. So she does the R programming Integrator Unit next, in which she writes code that estimates what mutations in a gene can tell us about that gene's role involvement in a disease phenotype. Adding a small number of software-development focussed units and a bit more statistics gets her on the way.
Finally, as a kind of capstone, she completes the genome annotation Integrator Unit. So she fill in the rest of the database units, the units on statistics and analysis of differential expression in genes, the units on networks and protein-protein interactions, and the concepts of protein functions, to complete the BIN-FUNC-Annotation milestone, ...
:... followed by the genomics units (BIN-Genome-...).
And that's it. Units she hasn't covered were optional.


Nigel wants to sample different areas of the material broadly before he tackles any of the Integrator Units...

  • Again: all paths through the learning units begin with the Course Preparation milestone, and the Introduction to R milestone. This is where he starts.
  • He targets the information theory unit first, for which he prepares the data models unit and chooses a sample organism, to proceed to the bioinformatics abstractions unit and the macromolecular sequence unit.
  • This makes him curious about the relationship between information theory and molecular forcefields so he learns about database principles and 3D structure concepts next, to tackle the PDB database unit and the tutorial on the 3D structure viewer "UCSF Chimera".
  • Apparently another use of information theory is for quantifying the relatedness of functional annotations. Nigel completes the rest of the database cluster to proceed to learn about the Gene Ontology project to arrive at the "semantic similarity" unit.
  • It interests Nigel how such fields of mathematics can be employed to study biology:
  • This suggests for him to complete the missing units from the functional annotation cluster (BIN-SEQA-... and BIN-FUNC-...) to complete the BIN-FUNC-Annotation milestone.
  • The greatest importance of functional annotation lies in annotating whole genomes. Nigel learns more about this next with the genomics cluster (BIN-Genome-...).
  • He finds it intriguing how so much knowledge of function does not actually rely on the detailed, structural analysis of mechanism. To understand this better, Nigel works through the remainder of the structure units (BIN-SX-...) (for which he incidentally needs the BIN-ALI-Alignment unit.)
  • Next, he realizes that much of what he has worked with so far has implicitly relied on sequence-alignment tools. So he completes the alignment cluster next (Bin-ALI-...).
  • And realizing that multiple sequence alignments (BIN-ALI-MSA) are fundamental to phylogenetic analysis leads him to tackle the phylogenetic analysis cluster next (BIN-PHYLO-...).
  • That's it for the learning units. Nigel then completes the four Integrator Units one after another, starting with the genome annotation unit because that's the biggest one and he wants it safely submitted before the end-of-term rush in his other courses gets to him.




Timing and important dates

You can work through the material entirely at your own pace. There are only two restrictions: a minimum amount of evaluations have to be submitted and marked before the last day to drop a course in the Fall term (Monday, November 4 2019), and everything needs to be done by the last day to submit course work for BCH441 in the Fall term (Tuesday, December 3 2019 23:59):

  • Work from learning units worth at least 10% of your final grade must have been submitted for evaluation by October 31, so that I can mark it before the last day to drop a course in the Fall term (Monday, November 4 2019). If you have not submitted enough work for evaluation by October 31, I will randomly choose an appropriate number of learning units and record a 0 for these.
  • All remaining course work must have been submitted by the last day to submit course work for BCH441 in the Fall term (Tuesday, December 3 2019 23:59). At least one of the Integrator Units will be evaluated in an oral test and there is a limited number of test-dates.

  • Tuesday, September 10. 2019 - First class - mandatory attendance.
  • Tuesday, October 8. 2018 - Quiz option: BIN-NCBI, BIN-EBI
  • Tuesday, October 22. 2018 - Quiz option: FND-MAT-Graphs_and_networks
  • Wednesday, October 30 2018: at least 10 marks worth of material has to have been submitted for evaluation
  • last day to drop a course in the Fall term (Monday, November 4 2019)
  • November 4-8 - Fall Reading Week
  • Tuesday, November 12. 2018 - Quiz option: BIN-PPI-Analysis
  • Tuesday, November 26. 2018 - Quiz option: BIN-ALI-MSA
  • Tuesday, December 3. 2018 - No class meeting
  • last day to submit course work for BCH441 in the Fall term (Tuesday, December 3 2019 23:59)


Submission of items for marking

Details are listed with each evaluation unit, but in principle you create a separate sub-page of your user page and post your material there. You add an appropriate category tag to the page when it is ready to be evaluated and I can then easily find and mark your page.


Class time

Since most of the learning units include hands-on, practical components that you do on your own, we won't need to us class-time for textbook-like delivery of contents.There will be four main activities in class meetings:

  • We will always take time for open discussion of topics as they arise. This will be driven by student input and feedback.
  • We will jointly work on a designated integrator task: the GO term categories unit.
  • I will discuss marking of some submissions I have received, to make the process more transparent.
  • Some evaluations will include quizzes or presentations. These will happen in class on our dedicated "evaluation days".
  • We will organize conversations with scientists in the field – "Bioinformatics Investigator Chats" – who will talk about aspects of their own work that are related to the course contents, e.g. how did this database or that algorithm contribute to solve a real problem in the lab. This will provide you with some sense about how the material is meaningful in the real world.


Organizational Details



BA 1190 (Bahen Building)

Student Wiki

Many of the class activities will take place interactively on a separate Wiki site (the "Student Wiki"). You will create a personalized user page there, and use it to submit materials as required.

This Wiki is not accessible to the general public, you need an account that we will register in the first class-session.


Mailing list

All course announcements and all course discussion (outside of class) will take place on a mailing list. We use Google Groups for this purpose. You will be subscribed to the list in the first class. You will not be able to participate fully in the course if you are not subscribed. Make sure you are subscribed with the email address you use most frequently and set your preferences to immediate delivery - "Digest" or "Web only" delivery won't allow you to participate effectively.


After you you have been subscribed, you will receive an eMail from Google indicating that you have been added to the mailing list. Please note: this is a restricted list that can only be viewed by subscribed users, and only subscribed users can post to the list. There are two consequences:

  • If you try to post from a different account than the one that you are subscribed with, your mail will be rejected. Remedy: post from the right account.
  • If you try to view the group on the Web, and you are not logged into a Google account associated with the email address that you are subscribed with, you will not be able to access the group and you will probably see an error message like You must be a member of this group to view and participate in it. This is misleading since the problem is not that you are not a member, the problem is that the Webpage doesn't know you are member. Remedy: it depends - the easiest solution is just not to access the group via the Web - since you receive all mails anyway, there's usually no real need to visit the Webpage.

If you really need access to the group on the Web, you need a Google account and it needs to be associated with the address you are subscribed with. If you have a Gmail account, you already have a Google account, but that won't help you unless you are subscribed to the group with your Gmail address. If you are subscribed with your UofT address, you will need to create a Google account. That's possible e.g. see here .


Office hours

Face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than simply dropping in for a chat. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.



Introductory courses to biochemistry and molecular biology provide the contents background to the course. Such might be obtained through the listed prerequiste courses: BCH210H1/BCH242Y1; BCH311H1/MGY311Y1/PSL350H1[1]. However I have no way to assess your success in these courses, nor do I know what material actually was covered. Thus, I generally waive prerequisites. In all cases it is your responsibility to be sufficiently prepared and to make up for material that you have not covered previously.

A breakdown of knowledge that I expect you to acquire outside our course, or bring with you from previous courses, is listed here.

You must have access to the Internet via your own computer. From time to time it may be necessary to bring your computer to class. If you do not have a laptop computer that is set up to work in the University's wireless network, contact me so we can figure out how to work around any issues.


Extensions for term work


Extensions for term work in this course are subject to Faculty regulations and will only be considered within the framework determined by the Faculty policies.

  • Submissions due before the last day to drop a course in the Fall term (Monday, November 4 2019).
No extensions will be considered for these submissions[2]. Such an extension would not be "fair, equitable and reasonable"[3], i.e. granting this to individual students would violate the requirement to give all students equal opportunity to succeed in the course. In fact, if you find yourself so far behind in the course that you are unable to submit meaningful work for assessment by that date, I highly recommend you to drop the course.
  • Signing up for the oral tests.
The dates for the oral tests have been announced at the beginning of the term. If you fail to sign up for a slot, or if you fail to show up at the scheduled time, this is equivalent to a missed midterm exam in terms of applicable Faculty policy: "if the reasons for missing your test are acceptable to the instructor, a make-up opportunity should be offered to the student where practicable. "Acceptable" reasons will be considered if they are justified, if the consideration is "fair, equitable and reasonable", and if the reason is documented through one of the four types of "official" documentation: UofT Verification of Illness or Injury Form, Student Health or Disability Related Certificate, a College Registrar’s Letter, and an Accessibility Services Letter. Scope for a "practicable" make-up opportunity for the oral test will be extremely limited however.
  • Submissions due on the last day to submit course work for BCH441 in the Fall term (Tuesday, December 3 2019 23:59).
Since the course does not have a final exam, the Faculty requires grades to be marked, collated and submitted a few days after the last day to submit course work for BCH441 in the Fall term (Tuesday, December 3 2019 23:59). Therefore I cannot normally grant extensions beyond this date. The Faculty allows so called informal extensions to be granted "in extraordinary circumstances"; in those cases too, the requirement to be "fair, equitable and reasonable" will apply, i.e. you would need to demonstrate that the need for the extension was due to unavoidable circumstances that go significantly beyond what was expected of the rest of the class, and submit "official" documentation to me. In that case, (i) we would determine an adjusted submission date, (ii) I will initially submit a mark of 0 for the missing submissions, and (iii) I will submit an amended mark, after that date, if appropriate. Note that the Faculty requires that such extensions don't go beyond a few days after the end of the Final Examination Period. If you require an extension beyond that date you need to submit a formal petition through your College Registrar.


Exclusions & Enrolment controls



Printed material

This is an electronic submission only course; but if you must print material, you might consider printing double-sided. Learn how, at the Print-Double-Sided Student Initiative. Printing of course material is expressly discouraged since the material is updated frequently.



Course framework
Course related



  1. Please check the official Calendar for the academic year to confirm the "official" prerequisites.
  2. Since these requirements are so minimal, exceptions due to personal, medical, or accessibility reasons would require you to demonstrate (and document) that you have not been able to pursue academic work for that amount of time that you would normally have needed to fulfil the requirements. Given the volume of material, I would consider this time to be at least four weeks.
  3. It is Faculty policy to require assessments to be "fair, equitable and reasonable".


Personal tools