Bioinformatics Main Page

From "A B C"
Jump to navigation Jump to search

BCH441 - Bioinformatics

Welcome to the BCH441 Course Wiki.

These wiki pages are provided to coordinate information, activities and projects in the introductory bioinformatics course taught by Boris Steipe at the University of Toronto. If you are not one of my students, you can still browse this site, however only users with a login account can edit or contribute or edit material. If you are here because you are interested in general aspects of bioinformatics or computational biology, you may want to review the Wikipedia article on bioinformatics, or visit Wikiomics. Contact boris.steipe(at)utoronto.ca with any questions you may have.




All materials on this site are currently undergoing revisions.

</div-->



The Course

BCH441 (BCH1441) is an introduction to current bioinformatics for life science students and the specialists in the BCB Program. The course provides an overview of the sources of biomolecular data, data annotation and integration, and the interpretation of results through evidence-based reasoning. This includes the components – sequence, structure, and function, the relationships in phylogeny and in the networks of interactions and regulation, and the “systems” through which we conceptually organize our knowledge.


Specific contents include:

  • large, public biomolecular data resources,
  • DNA and protein sequences and sequence analysis,
  • pairwise and multiple sequence alignment,
  • fast database searches to discover homologues,
  • protein structure interpretation and homology modeling,
  • phylogenetic analysis - tree building and interpretation,
  • work with genome-scale data,
  • functional annotation with Gene Ontology and other resources,
  • relationships discovered through co-expression and protein-protein interactions, and
  • introduction to systems-level concepts.

Practical, weekly, hands on assignments will introduce public data resources and analysis tools. Along with improving general computer literacy, you will learn to use the programming language and statistical workbench R, with a special emphasis on the kind of everyday tasks of data preparation and analysis that have become indispensable for any life-science laboratory. (Yes, you will learn programming.) Application of the material in a systems-biology oriented project will round off the course.

The course is complemented by BCB420 / JTB2020 (offered in the Winter Term) which consolidates aspects of cutting-edge computational systems biology in a project context.

BCH441H1F is the undergraduate course code.
BCH1441H1F is the cross-listed course code for graduate students.


General

We will make an attempt to teach BCH441H following an inverted teaching model. Concepts will be introduced through background reading and extensive, hands-on assignments. We will use the classroom time

  • to assess contents-milestones in a weekly quiz;
  • to discuss fine points, perspectives, and to resolve uncertainties; and
  • to introduce concepts for the upcoming week.


Coordinator

Boris Steipe


 


Dates

BCH441/BCH1441 is a Fall Term course; contact times are Tuesdays, 17:00 to 20:00.

Tutorial sessions: Tuesday, 17:00 to 18:00 for open discussion of lecture material, in-class quizzes, quiz debriefings, and other activities. Class begins at 10 minutes past the hour, don't be late. That's rude.

Lectures: right after the tutorials: Tuesday, 18:00 to 20:00.


Location

WI 1016 (Wilson Hall, New College) Note: this is a room change - the original room was MS4279.



Student Wiki

Many of the class activities will take place interactively on a separate Wiki site (the "Student Wiki"). You will create a personalized user page there, and use it to submit materials as required.

This Wiki is not accessible to the general public, you need an account that we will be registered after the first class-session.


 



Contact

Course communication will take place on the Quercus discussion section. We'll see how this goes. If it's not suitable for our needs we'll find an alternative.


 



Office hours

(Virtual) face to face meetings are by appointment, if required. However, we will be able to resolve almost all issues by e-mail. You will find that discussions by e-mail are both more efficient and effective than meetings. Moreover e-mail discussions leave you with a document trail of what was discussed, can contain links to information sources, and we can share points of general interest more easily with the class.


 



Prerequisites

Introductory courses to biochemistry and molecular biology provide the contents background to the course. Such might be obtained through the listed prerequistes: BCH210H1/BCH242Y1; BCH311H1/MGY311Y1/PSL350H1[1]; special permission of the course coordinator can be granted.

You must have access to the Internet via your own computer, preferably set up to work through a wireless connection.


Exclusions & Enrolment controls

none


Printed material

This is an electronic submission only course; but if you must print material, you might consider printing double-sided. Learn how, at the Print-Double-Sided Student Initiative. Printing of course material is expressly discouraged since the material is updated frequently.


Recommended textbooks

Depending on your background, various levels of textbooks may be suitable. I will bring my evaluation copies to class so you can have a look.
Understanding Bioinformatics (Zvelebil & Baum) is a decent general introduction to many aspects of bioinformatics. It was published in 2007, an updated version is urgently needed. Still, some of the basics (like the algorithm for optimal sequence alignment) don't change. (Amazon) (Indigo) (ABE books)
Practical Bioinformatics (Agostino) covers some of the material of the BCH441 exercises. Expect a no-nonsense introduction to the very most basic stuff. I have my pet peeves about this book (as I have for many others, eg. why in the world do they still teach CLUSTAL when all available studies demonstrate it to be the least accurate MSA algorithm by a margin???), but if you haven't taken BCH441, this may serve you well. And if you did take BCH441, it may consolidate some ideas that I wasn't clear about. (Amazon) (Indigo) (ABE books)
If you are aware of more recent good textbooks, or have your own opinions about these or other books, let me know.


 

Grading and Activities

 

Activity Weight
BCH441 - (Undergraduates)
Weight
BCH1441 - (Graduates)
11 Self-assessment and Feedback sessions 44 marks (11 x 4) 22 marks (11 x 2)
Bioinformatics project 26 marks (5 + 12 + 9) 26 marks
"Classroom" participation 10 marks (2 + 8) 10 marks
Thesis Project   22 marks
Final exam 20 marks 20 marks
Total 100 marks 100 marks


A note on marking

It is not my policy to adjust marks towards a target mean and variance (i.e. there will be no "belling" of grades). I feel strongly that such "normalization" detracts from a collaborative and mutually supportive learning environment. If your classmate gets a great mark because you helped him with a difficult concept, this should never have the effect that it brings down your mark through class average adjustments. Collaborate as much as possible, it is a great way to learn. However I may adjust marks is if we phrase questions ambiguously on quizzes, or if I decide that the final exam was too long.

 

Timetable and syllabus

 



Syllabus and assignments will still be in flux for a few weeks.


 

PREPARATION

 


Week In class: Tuesday, Sept. 13 Readings Assignment In class: Tuesday, Sept. 20
1
  • Organization
  • Syllabus
  • Important dates
  • First assignment
  • Projects
  • Grading
  • Signup to mailing list and Student Wiki.
  • Introduction to bioinformatics and computational biology
R Tutorial Assignment 1 Quiz 1
Remember to bring your red pen!

Perspectives:

Customizing R and R Studio. Subsetting and filtering of vectors, arrays and lists.


 

DATA

 


Week In class: Tuesday, Sept. 20 Readings Assignment In class: Tuesday, Sept. 27
2
  • Abstractions
  • Data modelling
  • Key Public Databases (NCBI, EBI)
Lecture 02: Annotated Notes Assignment 2 Quiz 2

Perspectives ... data modelling.


 

SEQUENCE ANALYSIS

 


Week In class: Tuesday, Sept. 27 Readings Assignment In class: Tuesday, Oct. 4
3
  • Introduction to the sequence abstraction
  • EMBOSS and other sequence analysis tools
TBD Assignment 3 Quiz 3

Perspectives ... machine learning.


 

SEQUENCE ALIGNMENT

 


Week In class: Tuesday, Oct. 4 Readings Assignment Tuesday, Oct. 11
4
  • Introduction to homology
  • Optimal sequence alignment
  • Sequence database searches: BLAST, PSI-BLAST et al.
  • Multiple sequence alignment.
Lecture 04: Annotated Notes (Part 1)
Lecture 04: Annotated Notes (Part 2)
Assignment 4 TBD

Perspectives ... TBD


 

3D STRUCTURE

 


Week Tuesday, Oct. 11 Readings Assignment In class: Tuesday, Oct. 18
5
  • 3D structures
  • The PDB
  • Structure interpretation
  • Structural domains
Week 05: Annotated Notes (PDF 55.5.MB)
Assignment 5 Quiz 4

Perspectives ... TBD


 

FUNCTION

 


Week In class: Tuesday, Oct. 18 Readings Assignment In class: Tuesday, Oct. 25
6
  • The concept of function
  • Function annotation
  • Function databases
  • GO: the gene ontology
  • Function prediction strategies
Week 06: Annotated Notes (PDF 23.1 MB) Assignment 6 Quiz 5 and 6

Perspectives ... computing semantic similarity


 

PHYLOGENETIC ANALYSIS

 


Week In class: Tuesday, Oct. 25 Readings Assignment In class: Tuesday, Nov. 1
7
  • Phylogenetic analysis principles
  • Building trees
  • Tree interpretation
  • Inference from phylogenies
  • Signals of selective pressure and recent change
Week 06: Annotated Notes (PDF 15.7 MB) Assignment 7 Quiz 7

Perspectives ... Traces of selective pressure


At midnight: Project stage 1 is due.


 

STRUCTURE PREDICTION

 


Week In class: Tuesday, Nov. 1 Readings Assignment In class: Tuesday, Nov. 15
8

Note: Nov. 8 - no class due to Fall Break.

  • Homology modelling of protein structure
  • Protein structure forcefields
  • Molecular dynamics
  • de novo prediction
TBD Assignment 8 Quiz 8

Perspectives ... Using Rosetta


 

GENOME ANALYSIS

 


Week In class: Tuesday, Nov. 15 Readings Assignment In class: Tuesday, Nov. 22
9
  • Genome sequencing
  • Genome annotation
  • Genome databases and browsers
  • Human genomics
TBD Assignment 9 Quiz 9

Perspectives ... Popular pipelines


 

EXPRESSION ANALYSIS

 


Week In class: Tuesday, Nov. 22 Readings Assignment In class: Tuesday, Nov. 29
10
  • Measuring gene expression levels: microarrays vs. NGS
  • GEO - Microarrays and RNAseq
  • GEO2R and RNAseq alternatives
  • Discovering differentially expressed genes
TBD Assignment 10 Quiz 10

Perspectives ...


 

PROTEIN-PROTEIN INTERACTIONS

 


Week In class: Tuesday, Nov. 29 Readings Assignment In class: Tuesday, Dec. 6
11
  • Concepts of protein-protein interactions
  • Interaction databases
  • Graph theory
  • Interactome
  • other -omes
TBD Assignment 11 Quiz 11

Perspectives ... Computing on graphs


 

EXPLORATIONS

 


Week In class: Tuesday, Dec. 6 Readings
12
  • Automation of queries
  • Integration of data
  • Principles of Exploratory Data Analysis (EDA)
    • Plotting
    • "Features"
    • Clustering
TBD


 


Resources

Course related


 

Contents related

 


 
Forums
BioStar: General bioinformatics, computational-, and systems biology questions (timesink warning!)
Reddit: the bioinformatics "subreddit" (timesink warning!)
R-help: The R programming language
Stack Overflow: R-related questions
BioConductor Support: for all questions about the BioConductor Project
Cross Validated: statistics related questions on Stack-exchange



Notes

  1. Please check the official Calendar for the academic year to confirm.