BIN-PHYLO-Concepts

From "A B C"
Revision as of 02:47, 1 November 2017 by Boris (talk | contribs)
Jump to navigation Jump to search

Concepts of Phylogenetic Analysis


 

Keywords:  Phylogenetic trees, orthologues and paralogues, horizontal gene transfer (HGT)


 



 


 


Abstract

Introduction to the analysis of phyylogenies.


 


This unit ...

Prerequisites

You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:

  • Evolution: Theory of evolution; variation, neutral drift and selection.

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... introduce concepts and terminology of phylogenetic analysis.


 


Outcomes

After working through this unit you ...

  • ... are familar with the main ideas behind evolutionary relationships.


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.


 


Contents

 
Nothing in Biology makes sense except in the light of evolution.
Theodosius Dobzhansky

... but does evolution make sense in the light of biology?


 

 

Task:


 

As we have seen in the previous units, the Mbp1 transcription factor has homologues in all other fungi, yet there is not always a clear one-to-one mapping between members of a family in distantly related species. It appears that various systems of APSES domain transcription factors have evolved independently. Of course this bears directly on our notion of function - what it means to say that two genes in different organisms have the "same" function. In case two organisms both have an orthologous gene for the same, distinct function, calling these functions "the same" may be warranted. But what if that gene has duplicated in one species, and the two paralogues now perform different, related functions in one organism? Theses two are still orthologues to both their homologues in the other species, but now we expect functionally significant residues to have adapted to the new - and possibly distinct - roles of each paralogue. In order to be able to even ask such questions, we need to make the evolutionary history of gene families explicit. This is the domain of phylogenetic analysis. We can ask questions like: how many paralogues did the cenancestor of a clade possess? Which of these underwent additional duplications in the phylogenesis of the organism I am studying? Did any genes get lost? And - adding additional biological insight to the picture - did the observed duplications lead to the "invention" of new biological systems? When was that? And perhaps even: how did the species benefit from this event?

We will develop this kind of analysis in this unit and the following. You have established which gene in MYSPE is the reciprocally most closely related orthologue to yeast Mbp1 (with reciprocal best match) and you have identified the full complement of APSES domain genes in your assigned organism (as a result of your PSI-BLAST search). Here we will analyse these genes' evolutionary relationship and compare it to the evolutionary relationship of other fungal APSES domains. The goal is to define families of related transcription factors and their evolutionary history. All APSES domain annotations are now available in your protein "database". Now we will attempt to compute the phylogram for these proteins. The goal is to identify orthologues and paralogues.

A number of excellent tools for phylogenetic analysis exist; general purpose packages include the (free) PHYLIP package, the MEGA package and the (commercial) PAUP* package. Of these, only MEGA is still under active development, although PHYLIP still functions perfectly (except for problems with graphical windows under Mac OS 10.6). Specialized tools for tree-building include Treepuzzle or Mr. Bayes. This unit and the following is constructed around programs that are available in PHYLIP and R, however you are welcome to use other tools that fulfill a similar purpose if you wish. In this field, researchers consider trees that have been built with ML (maximum likelihood) methods to be more reliable than trees that are built with parsimony methods, or distance methods such as NJ (Neighbor Joining). However ML methods are also much more compute-intensive. Just like with multiple sequence alignments, some algorithms will come closer to guessing the truth and others will not and usually it is hard to tell which is the more trustworthy of two diverging results. The prudent researcher tries out alternatives and forms her own opinion. Specifically, we may usually assume results that converge when computed with different algorithms to be more reliable than those that depend strongly on a particular algorithm, parameters, or details of input data.

In phylogenetic analysis, not all lines a program draws are equally trustworthy. Don't take the trees as a given fact just because a program suggests this. Look at the evidence, include independent information where available, use your reasoning, and analyse the results critically. As you will see, there are some facts that we know for certain: we know which species the genes come from, and we can (usually) make good assumptions about the relationship of the species themselves - the history of speciation events that underlies all evolution of genes. This is extremely helpful information for our work.


If you would like to review concepts of trees, clades, LCAs, OTUs and the like, I have linked an excellent and very understandable introduction-level article on phylogenetic analysis here and to the resource section at the bottom of this page.

Baldauf (2003) Phylogeny for the faint of heart: a tutorial. Trends Genet 19:345-51. (pmid: 12801728)

PubMed ] [ DOI ] Phylogenetic trees seem to be finding ever broader applications, and researchers from very different backgrounds are becoming interested in what they might have to say. This tutorial aims to introduce the basics of building and interpreting phylogenetic trees. It is intended for those wanting to understand better what they are looking at when they look at someone else's trees or to begin learning how to build their own. Topics covered include: how to read a tree, assembling a dataset, multiple sequence alignment (how it works and when it does not), phylogenetic methods, bootstrap analysis and long-branch artefacts, and software and resources.


 

R packages that may be useful include the following:

  • R task view Phylogenetics - this task-view gives an excellent, curated overview of the important R-packages in the domain.
  • package ape - general purpose phylogenetic analysis, but (as far as I can tell ape only supports analysis with DNA sequences).
  • package ips - wrapper for MrBayes, Beast, RAxML "heavy-duty" phylogenetic analysis packages.
  • package Rphylip - Wrapper for Phylip, the most versatile set of phylogenetic inference tools.


 

Tidbit: the number of possible trees

Here is the formula to calculate the number of trees one can create from n OTUs, as an R function. It's the number of unrooted binary trees with n labeled leaves, and unlabeled internal nodes. Copy, paste into an R script and try it out. Then figure out: 10 reference species and your MYSPE: how many possible trees? Could you create them all and select the best one by complete enumeration?

nTrees <- function(nOTU) {
    if (nOTU < 3)  { return(1) }
    if (nOTU > 87) { return(Inf) }
    return(factorial((2 * nOTU) - 4) / ((2 ^ (nOTU - 2)) * factorial(nOTU - 2)))
}
nTrees(5)  # 15
nTrees(22) # approximately Loschmidt's number


 


 


Further reading, links and resources

Baldauf (2003) Phylogeny for the faint of heart: a tutorial. Trends Genet 19:345-51. (pmid: 12801728)

PubMed ] [ DOI ] Phylogenetic trees seem to be finding ever broader applications, and researchers from very different backgrounds are becoming interested in what they might have to say. This tutorial aims to introduce the basics of building and interpreting phylogenetic trees. It is intended for those wanting to understand better what they are looking at when they look at someone else's trees or to begin learning how to build their own. Topics covered include: how to read a tree, assembling a dataset, multiple sequence alignment (how it works and when it does not), phylogenetic methods, bootstrap analysis and long-branch artefacts, and software and resources.

Blair & Murphy (2011) Recent trends in molecular phylogenetic analysis: where to next?. J Hered 102:130-8. (pmid: 20696667)

PubMed ] [ DOI ] The acquisition of large multilocus sequence data is providing researchers with an unprecedented amount of information to resolve difficult phylogenetic problems. With these large quantities of data comes the increasing challenge regarding the best methods of analysis. We review the current trends in molecular phylogenetic analysis, focusing specifically on the topics of multiple sequence alignment and methods of tree reconstruction. We suggest that traditional methods are inadequate for these highly heterogeneous data sets and that researchers employ newer more sophisticated search algorithms in their analyses. If we are to best extract the information present in these data sets, a sound understanding of basic phylogenetic principles combined with modern methodological techniques are necessary.


 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-10-31

Version:

1.0

Version history:

  • 1.0 First live version.
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.