Assignment 4 - Phylogenetic Analysis

Nothing in Biology makes sense except in the light of evolution.
Theodosius Dobzhansky

... but does evolution make sense in the light of biology? As we have seen in the previous assignments, the Mbp1 transcription factor has homologues in all other fungi, yet - looking at orthologues - this is not always a clear one-to-one mapping of related genes to each other. It appears that various systems of APSES domain transcription factors have evolved independently. Of course this bears directly on our notion of function - what it means to say that two genes in different organisms have the "same" function. In case two organisms both have an orthologous gene for the same, distinct function, this may be warranted. But what if that gene has duplicated in one of them, and the two paralogues now perform different, related functions in one organism? In order to be able to even ask such questions, we need to understand how we can make the evolutionary history of gene families explicit. This is the domain of phylogenetic analysis. We can ask questions like: how many paralogues did the cenancestor of a group possess? Which of these underwent additional duplications in the phylogenesis of the organism I am studying? Did any genes get lost? And - adding additional biological insight to the picture - did the observed duplications lead to the "invention" of new biological systems? When was that? And how did the species benefit from this event?

We will develop some of this kind of analysis in this assignment. In the previous assignment you have established which genes are the reciprocally most closely related orthologues to Mbp1 in yeast. In this assignment, we will analyse their evolutionary relationship and compare it to the evolutionary relationship of all fungal APSES domains. The goal is to define families of related transcription factors and their evolutionary history.

A number of good tools for phylogenetic analysis exist; general purpose packages include the (free) PHYLIP package and the (commercial) PAUP package. Specialized tools for tree-building include Treepuzzle or Mr. Bayes. This assignment is conctructed around programs that are availble in PHYLIP, however you are welcome to use other tools that fulfil a similar purpose if you wish. In this field, researchers consider trees that have been built with ML (maximum likelihood) methods to be more reliable than trees that are built with parsimony methods, or distance methods such as NJ (Neighbor Joining). However ML methods are also much more compute-intensive. Just like with multiple sequence alignments, some algorithms will come closer to guessing the truth and others will not and usually it is hard to tell, which is the more trustworthy of two diverging results. The prudent researcher tries out alternatives and forms her own opinion. Specifically, we may usually assume results that converge, independent of the algorithm, to be more reliable than those that depend strongly on a particular algorithm or details of input data.

But regarding algorithm and rersources: we will take two shortcuts in this assignment (and both shortcuts are things you should not do in real life):

One: we will use an efficient tree-building algorithm, not the best-available one. This is an algorithm which is available on the Web, without the need for you to install software on your own machine. In real life you would of course use the most accurate algortihm you can lay your hands on, regardless of the resources this requires, since it makes no sense to waste your time on a careful analysis of inaccurate trees. Your supervisor would want it so as well. And if not she, the reviewers of your manuscript.

Two: we will assume the tree the algorithm constructs is correct. In real life you would establish its reliability with a bootstrap procedure: repeat the tree-building a hundred times with partial data and see which branches and groupings are robust and which depend on the details of the data. But we should still acknowledge that bifurcations that are very close to each other have not been" resolved". Any conscientious reviewer would flag such leniency and send your results back to you for a bootstrapping exercise at the computer. In phylogenetic analysis, not all lines that the program draws are equally trustworthy. Dont take the trees as a given fact just because a program suggests this. Look at the evidence, use your reasoning, and analyse them critically.

In case you want to review concept of trees, clades, LCAs OTUs and the like, I have linked two excellent and very understandable introduction-level articles on phylogenetic analysis to the resource section at the bottom of this page.


