Bioinformatics Main Page

From "A B C"
Revision as of 16:10, 27 August 2021 by Boris (talk | contribs)
Jump to navigation Jump to search

BCH441 - Bioinformatics

Welcome to BCH441.

These wiki pages provide information and materials, and coordinate activities and projects in the introductory bioinformatics courses and other workshops taught by Boris Steipe at the University of Toronto. If you are not one of my students, you can still browse this site, however I can't provide support for your explorations. The material may be useful if you invest some effort in studying it systematically.


It is 2021.
This course will reconvene in the fall term.
It will be taught online only.

I have been told that ACORN lists this as an in-person course. This is wrong information. I am working to have this corrected.

 

Updates to syllabus, marking scheme, and course materials are now in progress. Last year was the first online-only delivery of the course and it actually went really well. But this year will be even better as we'll try a few things that work much better online than in the classroom. Drop me a line if you have any questions, or concerns, or want to share experiences with online classes that went beyond your expectations.

See you in class!

 



The Course

 

BCH441 is an introduction to current bioinformatics for life science students and the specialists in the BCB Program. The course provides an overview of the sources of biomolecular data, data annotation and integration, and the interpretation of results through evidence-based reasoning. This includes the components – sequence, structure, and function, the relationships in phylogeny and in the networks of interactions and regulation, and the “systems” through which we conceptually organize our knowledge.


Specific contents include:

  • large, public biomolecular data resources,
  • DNA and protein sequences and sequence analysis,
  • pairwise and multiple sequence alignment,
  • fast database searches to discover homologues,
  • protein structure interpretation and homology modeling,
  • phylogenetic analysis - tree building and interpretation,
  • work with genome-scale data,
  • functional annotation with Gene Ontology and other resources,
  • relationships discovered through co-expression and protein-protein interactions, and
  • introduction to systems-level concepts.

The emphasis is on practical, hands-on exploration of resources in tasks and assignments. You will improve your general computer literacy and you will learn to use the programming language and statistical workbench R. Writing your own code has become indispensable in any life-science laboratory. (Yes, you will learn to program.)



Instructor

Boris Steipe


 


Dates

 


For generic dates in the Fall term, see the Faculty "Academic Dates" calendar. For specific due dates, see the Assignments page on Quercus.

Contents

 

This course will be taught in a new format: I have split up the large domain of Bioinformatics into small "learning units" that focus more or less on a single concept. You work through the units independently, in any order that makes sense to you, at your own pace, all with the goal to acquire the knowledge and skills to work on four main "Integrator Units" that bring the contents together.

Through this, the course accommodates different levels of preparation more flexibly, probably makes your work more efficient, and implicitly teaches a number of meta-skills such as reporting and time-management. Be mindful though: this format requires a high level of self-motivation and responsibility to do well. You will frequently be on your own - just like you are in "real life". But since you are the best judge of what is easy or hard for you, you can use your time very efficiently, gloss over familiar things and play and explore with contents that you enjoy.


 


Grading and Activities

 

This course comprises three key, integrative activities and preparatory "learning units" that lead up to them. Learning units can be completed in any sequence that makes sense to you, at any time until the deadline to submit material. You can find the detailed due dates on Quercus. Note that scheduling your oral test needs to be done in advance.

There are also a few restrictions on which units you must complete within this course:

  • you must complete three of four   Integrator Units   and submit them for marking. These will be worth maximally 50 marks ((2 x 13) + (1 x 24)); 24 marks are available for the unit that you choose for your oral test;
  • you can submit up to four other   learning units   worth up to an additional 20 marks for marking. These units are worth 5 marks each. There are a number of units available, which ones you choose is up to you.
  • 25% of your mark will be earned with your Course Journal at the end of class.
  • 5% of your mark will be earned with your insights! page at the end of class.

Please carefully read the evaluation rubrics for each category of deliverables.


 
Activity Weight
Three Integrator Units 50 marks (2*13 + 1*24)
Four other learning units 20 marks (4*5)
Course Journals 25 marks
insights! page 5 marks
Total 100 marks


A mix of evaluation methods
Learning units will be evaluated with a mix of approaches including technical reports, documentation of results in your journal, delivery of R code, ... Details will be described in the individual units. You will be required to submit evaluations in different categories.


 
Integrator units will be evaluated with a mix of approaches including technical reports, delivery of R code, and oral tests. Technical reports, R code and other options will be worth 10 marks each, the oral test will be worth 20 marks. You can submit at most one of each evaluation category (and one of your evaluations must be an oral test). Also, your oral test can not be your first evaluation, and the topics will be cumulative. Details will be described in the individual units.
Oral tests for Integrator Units will be scheduled from mid-November to early December in the afternoon. Details can be found on the sign-up page that is linked from the resp. integrator units.


I do not adjust marks towards a target mean and variance (i.e. there will be no "belling" of grades), but I follow the principles laid out in the marking rubrics. "Normalization" of grade interferes with a collaborative and mutually supportive learning environment. If your classmate gets a great mark because you helped him with a difficult concept, this should never bring down your own mark. Collaborate as much as possible, it is a great way to learn.

But take *utmost* care to follow the instructions on avoiding plagiarism and academic misconduct to the letter, they will be rigorously enforced.


 

The learning-units map

Here is a thematic overview of the topical areas of this course's learning units:

The bioinformatics learning unit landscape.


And here is the detailed map. It contains links to all of the units.


  • <command>-Click to open the Learning Units Map in a new tab, scale for detail.
A clickable map of the bioinformatics learning units.


  • Hover over a learning unit to see its keywords.
  • Click on a learning unit to open the associated page.
  • The nodes of the learning unit network are colour-coded:
    •   Live units   are green
    •   Units under development   are light green. These are still in progress.
    •   Stubs   (placeholders) are pale. These still need basic contents.
    •   Milestone units   are blue. These collect a number of prerequisites to simplify the network.
    •   Integrator units   are red. These embody the main goals of the course.
    •   Units that require revision  are pale orange.
  • Units that have a   black border   have deliverables that can be submitted for credit. Choose any you want to submit for credit, up to a maximum worth of 30 marks.
  • Arrows point from a prerequisite unit to a unit that requires it.

(Unit status will be updated as in-progress units are completed.)


 

Navigating the course

Everything starts with the following three units:

This should be the first learning unit you work with, since your Course Journal will be kept on a Wiki, as well as all other deliverables. This unit includes an introduction to authoring Wikitext and the structure of Wikis, in particular how different pages live in separate "Namespaces". The unit also covers the standard markup conventions - "Wikitext markup" - the same conventions that are used on Wikipedia - as well as some extensions that are specific to our Course- and Student Wiki. We also discuss page categories that help keep a Wiki organized, licensing under a Creative Commons Attribution license, and how to add licenses and other page components through template codes.


Keeping a journal is an essential task in a laboratory. To practice keeping a technical journal, you will document your activities as you are working through the material of the course. A significant part of your term grade will be given for this Course Journal. This unit introduces components and best practice for lab- and course journals and includes a wiki-source template to begin your own journal on the Student Wiki.


In paralell with your other work, you will maintain an insights! page on which you collect valuable insights and learning experiences of the course. Through this you ask yourself: what does this material mean - for the field, and for myself.


Everything leads to the Integrator Units. These cover four large areas of bioinformatics that make up the explicit goals of the course:

(i) algorithms and statistics;
(ii) structural modelling and interpretation;
iii) gene annotation; and;
(iv) phylogenetic analysis.

The knowledge and skills you need to work on these Integrator Units can be obtained from the other learning units that are shown on the learning units map as prerequisites. Note that "prerequisites" in this context does not mean you must do one thing before you can do another, the arrows simply point out which units assume what prior knowledge. You can acquire that knowledge in whatever sequence makes sense to you, and you don't have to learn from the learning units of this course at all. Just make sure that you submit enough general learning units for evaluation along the way. And document what you are doing in your Course Journal. Also, remember that all the material is cumulative - my evaluation of your work implicitly includes all of the prerequisite material.


 
Scenarios

Where to begin: possible scenarios for working though the units...

(These scenarios are for illustration, you don't have to follow these sequences. There is no implied claim that any of these sequences is better for learning the material than any other. Make your own choices!


Yvette might have done a project about protein structure in a previous course...
She decides to tackle the Homology modelling Integrator Unit first, because she is most confident about that material.
ABC-Scenario-1-Step-1.jpg
:Obviously, all paths through the learning units begin with the units leading up to the Course Preparation milestone, as well as the Introduction to R milestone. This is where she starts.
ABC-Scenario-1-Step-2.jpg
:The homology modelling unit requires the cluster of protein structure units (BIN-SX-...), as well as the sequence alignment units (BIN-ALI-...);
From the database units, she only needs BIN-PDB for now;
ABC-Scenario-1-Step-3.jpg
:For the sequence alignment cluster she needs BIN-Sequence and FND-Homology and both of these need BIN-Abstractions.
ABC-Scenario-1-Step-4.jpg
Next she goes for the Phylogenetic Analysis Integrator Unit, since it requires only a small number of additional prerequisites and she's a bit busy at that time with midterms in other courses. Some introductory statistics (FND-STA-Probability and FND-STA-Bayes_theorem gets her on the way to the BIN-PHYLO-... cluster.
ABC-Scenario-1-Step-5.jpg
She finds she enjoys working with R, and figuring out algorithms and workflows is like solving puzzles. So she does the R programming Integrator Unit next, in which she writes code that estimates what mutations in a gene can tell us about that gene's role involvement in a disease phenotype. Adding a small number of software-development focussed units and a bit more statistics gets her on the way.
ABC-Scenario-1-Step-6.jpg
She is done with her deliverables mid November. So she completes the genome annotation Integrator Unit to learn on her own. She fill in the rest of the database units, the units on statistics and analysis of differential expression in genes, the units on networks and protein-protein interactions, and the concepts of protein functions, to complete the BIN-FUNC-Annotation milestone, ...
ABC-Scenario-1-Step-7.jpg
:... followed by the genomics units (BIN-Genome-...).
ABC-Scenario-1-Step-8.jpg
And that's it.
ABC-Scenario-1-Step-final.jpg



 

 
Nigel wants to sample different areas of the material broadly before he tackles any of the Integrator Units...


  • Again: all paths through the learning units begin with the Course Preparation milestone, and the Introduction to R milestone. This is where he starts.
  • He targets the information theory unit first, for which he prepares the data models unit and chooses a sample organism, to proceed to the bioinformatics abstractions unit and the macromolecular sequence unit.
  • This makes him curious about the relationship between information theory and molecular forcefields so he learns about database principles and 3D structure concepts next, to tackle the PDB database unit and the tutorial on the 3D structure viewer "UCSF Chimera".
  • Apparently another use of information theory is for quantifying the relatedness of functional annotations. Nigel completes the rest of the database cluster to proceed to learn about the Gene Ontology project to arrive at the "semantic similarity" unit.
  • It interests Nigel how such fields of mathematics can be employed to study biology:
  • This suggests for him to complete the missing units from the functional annotation cluster (BIN-SEQA-... and BIN-FUNC-...) to complete the BIN-FUNC-Annotation milestone.
  • The greatest importance of functional annotation lies in annotating whole genomes. Nigel learns more about this next with the genomics cluster (BIN-Genome-...).
  • He finds it intriguing how so much knowledge of function does not actually rely on the detailed, structural analysis of mechanism. To understand this better, Nigel works through the remainder of the structure units (BIN-SX-...) (for which he incidentally needs the BIN-ALI-Alignment unit.)
  • Next, he realizes that much of what he has worked with so far has implicitly relied on sequence-alignment tools. So he completes the alignment cluster next (Bin-ALI-...).
  • And realizing that multiple sequence alignments (BIN-ALI-MSA) are fundamental to phylogenetic analysis leads him to tackle the phylogenetic analysis cluster next (BIN-PHYLO-...).
  • That's it for the learning units. Nigel then completes all Integrator Units one after another except for the Homology modelling one; he starts with the genome annotation unit because that's the biggest one and he wants it safely submitted before the end-of-term rush in his other courses gets to him.



 


 


 

Timing and important dates

Having students work through the material entirely at their own pace has had limited success. A minority did very well, but the majority found themselves quite stressed out towards the end of the term. Thus we will proceed with a structured sequence of due dates that spreads the deliverables evenly. However: a due date is the last date on which assignments are due, and you can submit your assignments much, much sooner and relax towards the end of the term.

You can work through the material at your own pace. Non-negotiable restrictions arise from the last day to drop a course in the Fall term (Monday, November 9 2020), and that everything needs to be done by the last day to submit course work for BCH441 in the Fall term (Tuesday, December 8 2020 23:59):


 
  • Tuesday, September 29. 2020 - First of four learning units due. (Extended to 2020-10-06)
  • Tuesday, October 6. 2020 - Second of four learning units due.
  • Tuesday, October 13. 2020 - Third of four learning units due.
  • Tuesday, October 20. 2020 - Last of four learning units due.
  • November 9-13 - Fall Reading Week
  • Monday, November 9. 2020 - Drop Date, and first Integrator unit due.
  • Monday, November 16 to Friday, November 27. 2020 - Oral Tests. Second Integrator Unit due on the day before the test.
  • Tuesday, December 8. 2020 - All remaining submissions due


 

Submission of items for marking

Details are listed with each evaluation unit, but in principle you create a separate sub-page of your user page and post your material there. Then you submit a link to your page on Quercus and I can find the page and mark it.


 

Class time

Since most of the learning units include hands-on, practical components that you do on your own, we won't need to us class-time for textbook-like delivery of contents. In class meetings ...

  • ... we will always take time for open discussion of topics as they arise. This will be driven by student input and feedback.
  • ... I will go over some of the contents details.
  • ... I may discuss marking of some submissions I have received, to make the process more transparent.

Class sessions will be recorded and posted.


 

Organizational Details

 



Student Wiki

Many of the class activities will take place interactively on a separate Wiki site (the "Student Wiki"). You will create a personalized user page there, and use it to submit materials as required.

This Wiki is not accessible to the general public, you need an account that we will be registered after the first class-session.


 



Contact

Course communication will take place on the Quercus discussion section. We'll see how this goes. If it's not suitable for our needs we'll find an alternative.


 



Office hours

(Virtual) face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than meetings. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.


 



Prerequisites

Introductory courses to biochemistry and molecular biology provide the contents background to the course. Such might be obtained through the listed prerequiste courses: BCH210H1/BCH242Y1; BCH311H1/MGY311Y1/PSL350H1[1]. However I have no way to assess your success in these courses, nor do I know what material actually was covered. Thus, I generally waive prerequisites. In all cases it is your responsibility to be sufficiently prepared and to make up for material that you have not covered previously.

A breakdown of knowledge that I expect you to acquire outside our course, or bring with you from previous courses, is listed here.


 

Extensions for term work

 

Extensions for term work in this course are subject to Faculty regulations and will only be considered within the framework determined by the Faculty policies.


  • Submissions due before the last day to drop a course in the Fall term (Monday, November 9 2020).
No extensions will be considered for these submissions[2]. Such an extension would not be "fair, equitable and reasonable"[3], i.e. granting this to individual students would violate the requirement to give all students equal opportunity to succeed in the course. In fact, if you find yourself so far behind in the course that you are unable to submit meaningful work for assessment by that date, I highly recommend you to drop the course.
  • Signing up for the oral tests.
The dates for the oral tests are in two weeks after the Drop Date. If you fail to connect to the Zoom meeting at the scheduled time, this is equivalent to a missed midterm exam in terms of applicable Faculty policy: "if the reasons for missing your test are acceptable to the instructor, a make-up opportunity should be offered to the student where practicable. "Acceptable" reasons will be considered if they are justified, if the consideration is "fair, equitable and reasonable", and if the reason is documented through one of the four types of "official" documentation: UofT Verification of Illness or Injury Form, Student Health or Disability Related Certificate, a College Registrar’s Letter, and an Accessibility Services Letter. Scope for a "practicable" make-up opportunity for the oral test will be extremely limited however.
  • Submissions due on the last day to submit course work for BCH441 in the Fall term (Tuesday, December 8 2020 23:59).
Since the course does not have a final exam, the Faculty requires grades to be marked, collated and submitted a few days after the last day to submit course work for BCH441 in the Fall term (Tuesday, December 8 2020 23:59). Therefore I cannot normally grant extensions beyond this date. The Faculty allows so called informal extensions to be granted "in extraordinary circumstances"; in those cases too, the requirement to be "fair, equitable and reasonable" will apply, i.e. you would need to demonstrate that the need for the extension was due to unavoidable circumstances that go significantly beyond what was expected of the rest of the class, and submit "official" documentation to me. In that case, (i) we would determine an adjusted submission date, (ii) I will initially submit a mark of 0 for the missing submissions, and (iii) I will submit an amended mark, after that date, if appropriate. Note that the Faculty requires that such extensions don't go beyond a few days after the end of the Final Examination Period. If you require an extension beyond that date you need to submit a formal petition through your College Registrar.


 

Exclusions & Enrolment controls

none


 

Printed material

This is an electronic submission only course; but if you must print material, you might consider printing double-sided. Learn how, at the Print-Double-Sided Student Initiative. Printing of course material is expressly discouraged since the material is updated frequently.


 


Resources

Course framework
Course related


 

Notes

  1. Please check the official Calendar for the academic year to confirm the "official" prerequisites.
  2. Since these requirements are so minimal, exceptions due to personal, medical, or accessibility reasons would require you to demonstrate (and document) that you have not been able to pursue academic work for that amount of time that you would normally have needed to fulfil the requirements. Given the volume of material, I would consider this time to be at least four weeks.
  3. It is Faculty policy to require assessments to be "fair, equitable and reasonable".