Difference between revisions of "Lecture 01"

Latest revision as of 14:58, 19 September 2007

(Next lecture)

Organisation and Orientation

What you should take home from this lecture

Bioinformatics is a science between the two poles of data management and biological modeling.
It is a young, rapidly changing field
To succeed in this course: attend the lectures and digest the material in your class notes, participate actively in discussions in class and online, explore resources on your own and make sure you understand the assignments!
And most importantly: If there is anything you don't understand, don't let it pass!

Links summary

NCBI (National Center of Biotechnology Information)
PDB (Protein structure DataBase)
KEGG (the Kyoto Encyclopedia of Genes and Genomes)
Fink and Mao, 1999, Nature, 398:31-32 (pdf)
Cargo Cult Science on Wikipedia
Special issue on 2007 databases, published by NAR
Special issue on 2007 Web services, published by NAR
The Bioinformatics Organization
Genome Canada Bioinformatics Help Desk
bioinformatics.ca, host of the Canadian Bioinformatics Workshop series
ISCB The International Society for Computational Biology

Lecture Slides

Slide 001

Lecture 01, Slide 001
Bioinformatics is not only required to master the quantitative aspects of the post-genomic era in molecular biology, it is a qualitative change in our approach to biology as well.

Slide 002

Lecture 01, Slide 002
Bioinformatics can be viewed as the science that develops between the two poles of data management and computational modeling of life.

Slide 003

Lecture 01, Slide 003
From its beginning, it was recognized that molecular biology is an information science, just as much as a molecular science. The abstractions and models that focus on the essence of this information, rather than on the details of its representation, have proven to be remarkably powerful in explaining the basic features of life, such as inheritance, self-organization and the process of evolution.

Slide 004

Lecture 01, Slide 004
The promises of genome analysis include harnessing the power of self assembly towards a bio-nanotechnologic revolution: growth, rather than manufacturing. This includes the vision of regenerative molecular medicine, essentially relegating disease to the dark past ages of ignorance. But while the information for a complete specification of life is undoubtedly present in the genome, life realizes itself in complex interactions between context-dependent components. This makes life essentially unpredictable, at least to our current approaches. The sheer volume of data is a comparatively minor obstacle.

Slide 005

Lecture 01, Slide 005
The current emphasis on -omic sciences creates novel challenges both in the quantity as well as the quality of scientific enquiry. The scale has become larger; molecular components are analyzed not in isolation but in their associations;comparison between genes within and across species is a major source of new insight and the absence of particular components and features is just as informative as their presence. However the availability technology should not lead to a purely methods-driven agenda.

Slide 006

Lecture 01, Slide 006
The US NCBI (National Center of Biotechnology Information) is one of the world's major centres for molecular data.

Slide 007

Lecture 01, Slide 007
The PDB (Protein structure DataBase) is the world's central repository for 3D structural data of proteins and nucleic acids.

Slide 008

Lecture 01, Slide 008
KEGG (the Kyoto Encyclopedia of Genes and Genomes) is one of a group of data resources that focus on the functional relationships of the components of biological systems. Note that sequences, structures and functions are complementary aspects of the same molecular entities. Cross-referencing between databases and ensuring consistency is a major challenge and task of biological datat management.

Slide 009

Lecture 01, Slide 009
On one hand, we can conclude that biological data management is what bioinformatics is all about. On the other hand, bioinformatics as a science is a way to study biology. And this aspect - which I like to refer to as "Computational Biology" - is not well described by data management. It has a lot more to do with modeling, and the question of understanding biology.

Slide 010

Lecture 01, Slide 010

Slide 011

Lecture 01, Slide 011
Tying ties may be at first an intimidatingly imprecise task, and indeed irrelevant. (Half of North Americans are not eligible to wear a tie, even to formal occasions, and those of the other half who are not working in a bank will maybe wear a tie on only two occasions and have the tie tied for them on the second one. Tying ties is, alas, a cultural technique that appears to be on the decline.). But it is a nice example for abstracting a complicated process down to its essential principles, and reasoning formally about these principles to obtain rigorous results about the process.

Here is an example of a systematic, albeit informal description of the process of how to tie a tie. But why is the process divided into exactly these steps? Are all of them necessary? How do we describe this process so that we can remember it ? Or do we need to refer to the sequence of images every time we would like to tie this knot? Is this a simple, or a rather complicated way to tie a tie; are there others? Are there better ways to tie a tie, and what could better even mean?

Slide 012

Lecture 01, Slide 012
The triangular lattice walk transposes the problem from your neck into the domain of mathematics !

Slide 013

Lecture 01, Slide 013

Slide 014

Lecture 01, Slide 014
How many alternatives do we have to consider, if we allow maximally nine moves and require at least three for the finishing moves? We noted previously that there are two possibilities for the three finishing moves (LRC or RLC). Since moves cannot repeat into the same sector, each of the possibilities can only have been preceded by two alternatives i.e.. (CLRC or RLRC) and (CRLC or LRLC). These four possibilities again can have been preceded by two alternatives each ... etc. Since we treat knots of different move-numbers as distinct, the total number of moves up to length L is the sum of all powers of 2 up to (L-2). (-2 because of the finishing moves!). You should be able to figure out reasoning like this on your own!

Slide 015

Lecture 01, Slide 015
It is not uncommon for models to be bounded by constraints. Here we define metrics for symmetry and balance and we can then use parameters to judge whether certain of the 254 possible walks are acceptable or not. (Details omitted, refer to the original paper of Fink and Mao, 1999, Nature, 398:31-32 (pdf)).

Slide 016

Lecture 01, Slide 016
The algorithm to creat all knots, finally, requires no more than exhaustive enumeration. What is important is to note the word "exhaustive". The result is complete, in the sense that every way to tie a tie has been captured. What is not in this list, cannot exist (under the assumptions the models makes). If this reminds you of the concept of "complete information" in the sequence of a genome, this is intended.

Slide 017

Lecture 01, Slide 017

Slide 018

Lecture 01, Slide 018

Slide 019

Lecture 01, Slide 019
While data technologies' goals and endpoints are obvious and straightforward to define, the same does not hold true for the modeling aspect of bioinformatics. Models cannot be derived directly from an observation of the data! They require insight, judgment and a sense of perspective and direction. If the modeling exercise does not lead to valuable and testable conclusions, it is pointless.

Slide 020

Lecture 01, Slide 020
From Feynman's 1974 Caltech commencement address (e.g. see Cargo Cult Science on Wikipedia).

But what are the airplanes of bioinformatics in the first place? And how do we construct strategies of scientific enquiry that not only include the airplanes but that also make them land?

Slide 021

Lecture 01, Slide 021
Examples of possible value derived from bioinformatic analysis. Note that value is to be understood not necessarily as a purely economic term - ethical advances and purely scientific insight are certainly valuable!

Slide 022

Lecture 01, Slide 022

Slide 023

Lecture 01, Slide 023
Taken from the 2007 special issue on databases, published by NAR.

By the time it takes you to study the existing databases, the majority will have seen significant updates and upgrades, gone out of existence or been superseded by more appropriate resources.

Slide 024

Lecture 01, Slide 024
Taken from the 2007 special issue on Web services, published by NAR.

The same holds for Web-services: it may be possible to find a service to do a particular task, it is virtually impossible to determine whether the suggested procedure can be considered "state-of-the-art" at the time you need to do your analysis.

Slide 025

Lecture 01, Slide 025

Slide 026

Lecture 01, Slide 026

Slide 027

Lecture 01, Slide 027

Slide 028

Lecture 01, Slide 028

Slide 029

Lecture 01, Slide 029

Slide 030

Lecture 01, Slide 030

Slide 031

Lecture 01, Slide 031

Slide 032

Lecture 01, Slide 032

Slide 033

Lecture 01, Slide 033

Slide 034

Lecture 01, Slide 034

Slide 035

Lecture 01, Slide 035
The amount of information that can be found by a Google search on a given topic is quite impressive. Some of the material is actually also very good.

Slide 036

Lecture 01, Slide 036
bioinformaticsw.ca - host of the Canadian Bioinformatics Workshop series - all lecture material online under a Creative Commons license.

Slide 037

Lecture 01, Slide 037
The Bioinformatics Organization

Browse the archives of the BioBB mailing list - it may be quite useful to subscribe to get a better idea of what's going on in the field.

Slide 038

Lecture 01, Slide 038
Genome Canada Bioinformatics Help Desk

Slide 039

Lecture 01, Slide 039
The International Society for Computational Biology (among other activities) host ISMB - the world's largest bioinformatics conference: the next one will be July 19-23 2008 in Toronto.

Slide 040

Lecture 01, Slide 040

Slide 041

Lecture 01, Slide 041