Difference between revisions of "BIN-PHYLO-Concepts"
m |
m |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<div id="ABC"> | <div id="ABC"> | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;"> |
Concepts of Phylogenetic Analysis | Concepts of Phylogenetic Analysis | ||
− | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:# | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; "> |
(Phylogenetic trees, orthologues and paralogues, horizontal gene transfer (HGT)) | (Phylogenetic trees, orthologues and paralogues, horizontal gene transfer (HGT)) | ||
</div> | </div> | ||
Line 10: | Line 10: | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;"> |
<div style="font-size:118%;"> | <div style="font-size:118%;"> | ||
<b>Abstract:</b><br /> | <b>Abstract:</b><br /> | ||
Line 37: | Line 37: | ||
<b>Deliverables:</b><br /> | <b>Deliverables:</b><br /> | ||
<section begin=deliverables /> | <section begin=deliverables /> | ||
− | |||
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | <li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | ||
− | |||
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | <li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | ||
− | |||
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | <li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | ||
<section end=deliverables /> | <section end=deliverables /> | ||
Line 48: | Line 45: | ||
<section begin=prerequisites /> | <section begin=prerequisites /> | ||
<b>Prerequisites:</b><br /> | <b>Prerequisites:</b><br /> | ||
− | |||
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:<br /> | You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:<br /> | ||
− | |||
*<b>Evolution</b>: Theory of evolution; variation, neutral drift and selection. | *<b>Evolution</b>: Theory of evolution; variation, neutral drift and selection. | ||
− | |||
This unit builds on material covered in the following prerequisite units:<br /> | This unit builds on material covered in the following prerequisite units:<br /> | ||
*[[BIN-ALI-MSA|BIN-ALI-MSA (Multiple Sequence Alignment)]] | *[[BIN-ALI-MSA|BIN-ALI-MSA (Multiple Sequence Alignment)]] | ||
Line 62: | Line 56: | ||
− | |||
{{Smallvspace}} | {{Smallvspace}} | ||
Line 72: | Line 65: | ||
+ | === Evaluation === | ||
+ | <b>Evaluation: NA</b><br /> | ||
+ | <div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div> | ||
== Contents == | == Contents == | ||
− | |||
<div class="quote-box"> | <div class="quote-box"> | ||
Line 137: | Line 132: | ||
with n labeled leaves, and unlabeled internal nodes. Copy, paste into an R script and try it out. Then figure out: 10 reference species and your MYSPE: how many possible trees? Could you create them all and select the best one by complete enumeration? | with n labeled leaves, and unlabeled internal nodes. Copy, paste into an R script and try it out. Then figure out: 10 reference species and your MYSPE: how many possible trees? Could you create them all and select the best one by complete enumeration? | ||
− | < | + | <pre> |
nTrees <- function(nOTU) { | nTrees <- function(nOTU) { | ||
Line 147: | Line 142: | ||
nTrees(22) # approximately Loschmidt's number | nTrees(22) # approximately Loschmidt's number | ||
− | </ | + | </pre> |
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Further reading, links and resources == | == Further reading, links and resources == | ||
Line 177: | Line 151: | ||
{{#pmid: 20696667}} | {{#pmid: 20696667}} | ||
+ | == Notes == | ||
+ | <references /> | ||
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<div class="about"> | <div class="about"> | ||
Line 201: | Line 165: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | : | + | :2020-09-25 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :1. | + | :1.1 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
+ | *1.1 2020 Maintenance | ||
*1.0 First live version. | *1.0 First live version. | ||
*0.1 First stub | *0.1 First stub | ||
</div> | </div> | ||
− | |||
− | |||
{{CC-BY}} | {{CC-BY}} | ||
+ | [[Category:ABC-units]] | ||
+ | {{UNIT}} | ||
+ | {{LIVE}} | ||
</div> | </div> | ||
<!-- [END] --> | <!-- [END] --> |
Latest revision as of 05:52, 26 September 2020
Concepts of Phylogenetic Analysis
(Phylogenetic trees, orthologues and paralogues, horizontal gene transfer (HGT))
Abstract:
Introduction to the analysis of phyylogenies.
Objectives:
|
Outcomes:
|
Deliverables:
Prerequisites:
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
- Evolution: Theory of evolution; variation, neutral drift and selection.
This unit builds on material covered in the following prerequisite units:
Contents
Evaluation
Evaluation: NA
Contents
- Nothing in Biology makes sense except in the light of evolution.
- Theodosius Dobzhansky
... but does evolution make sense in the light of biology?
Task:
- Read the introductory notes on concpets of phylogenetic analysis.
As we have seen in the previous units, the Mbp1 transcription factor has homologues in all other fungi, yet there is not always a clear one-to-one mapping between members of a family in distantly related species. It appears that various systems of APSES domain transcription factors have evolved independently. Of course this bears directly on our notion of function - what it means to say that two genes in different organisms have the "same" function. In case two organisms both have an orthologous gene for the same, distinct function, calling these functions "the same" may be warranted. But what if that gene has duplicated in one species, and the two paralogues now perform different, related functions in one organism? Theses two are still orthologues to both their homologues in the other species, but now we expect functionally significant residues to have adapted to the new - and possibly distinct - roles of each paralogue. In order to be able to even ask such questions, we need to make the evolutionary history of gene families explicit. This is the domain of phylogenetic analysis. We can ask questions like: how many paralogues did the cenancestor of a clade possess? Which of these underwent additional duplications in the phylogenesis of the organism I am studying? Did any genes get lost? And - adding additional biological insight to the picture - did the observed duplications lead to the "invention" of new biological systems? When was that? And perhaps even: how did the species benefit from this event?
We will develop this kind of analysis in this unit and the following. You have established which gene in MYSPE is the reciprocally most closely related orthologue to yeast Mbp1 (with reciprocal best match) and you have identified the full complement of APSES domain genes in your assigned organism (as a result of your PSI-BLAST search). Here we will analyse these genes' evolutionary relationship and compare it to the evolutionary relationship of other fungal APSES domains. The goal is to define families of related transcription factors and their evolutionary history. All APSES domain annotations are now available in your protein "database". Now we will attempt to compute the phylogram for these proteins. The goal is to identify orthologues and paralogues.
A number of excellent tools for phylogenetic analysis exist; general purpose packages include the (free) PHYLIP package, the MEGA package and the (commercial) PAUP* package. Of these, only MEGA is still under active development, although PHYLIP still functions perfectly (except for problems with graphical windows under Mac OS 10.6). Specialized tools for tree-building include Treepuzzle or Mr. Bayes. This unit and the following is constructed around programs that are available in PHYLIP and R, however you are welcome to use other tools that fulfill a similar purpose if you wish. In this field, researchers consider trees that have been built with ML (maximum likelihood) methods to be more reliable than trees that are built with parsimony methods, or distance methods such as NJ (Neighbor Joining). However ML methods are also much more compute-intensive. Just like with multiple sequence alignments, some algorithms will come closer to guessing the truth and others will not and usually it is hard to tell which is the more trustworthy of two diverging results. The prudent researcher tries out alternatives and forms her own opinion. Specifically, we may usually assume results that converge when computed with different algorithms to be more reliable than those that depend strongly on a particular algorithm, parameters, or details of input data.
In phylogenetic analysis, not all lines a program draws are equally trustworthy. Don't take the trees as a given fact just because a program suggests this. Look at the evidence, include independent information where available, use your reasoning, and analyse the results critically. As you will see, there are some facts that we know for certain: we know which species the genes come from, and we can (usually) make good assumptions about the relationship of the species themselves - the history of speciation events that underlies all evolution of genes. This is extremely helpful information for our work.
If you would like to review concepts of trees, clades, LCAs, OTUs and the like, I have linked an excellent and very understandable introduction-level article on phylogenetic analysis here and to the resource section at the bottom of this page.
Baldauf (2003) Phylogeny for the faint of heart: a tutorial. Trends Genet 19:345-51. (pmid: 12801728) |
[ PubMed ] [ DOI ] Phylogenetic trees seem to be finding ever broader applications, and researchers from very different backgrounds are becoming interested in what they might have to say. This tutorial aims to introduce the basics of building and interpreting phylogenetic trees. It is intended for those wanting to understand better what they are looking at when they look at someone else's trees or to begin learning how to build their own. Topics covered include: how to read a tree, assembling a dataset, multiple sequence alignment (how it works and when it does not), phylogenetic methods, bootstrap analysis and long-branch artefacts, and software and resources. |
R packages that may be useful include the following:
- R task view Phylogenetics - this task-view gives an excellent, curated overview of the important R-packages in the domain.
- package ape - general purpose phylogenetic analysis, but (as far as I can tell ape only supports analysis with DNA sequences).
- package ips - wrapper for MrBayes, Beast, RAxML "heavy-duty" phylogenetic analysis packages.
- package Rphylip - Wrapper for Phylip, the most versatile set of phylogenetic inference tools.
Tidbit: the number of possible trees
Here is the formula to calculate the number of trees one can create from n OTUs, as an R function. It's the number of unrooted binary trees with n labeled leaves, and unlabeled internal nodes. Copy, paste into an R script and try it out. Then figure out: 10 reference species and your MYSPE: how many possible trees? Could you create them all and select the best one by complete enumeration?
nTrees <- function(nOTU) { if (nOTU < 3) { return(1) } if (nOTU > 87) { return(Inf) } return(factorial((2 * nOTU) - 4) / ((2 ^ (nOTU - 2)) * factorial(nOTU - 2))) } nTrees(5) # 15 nTrees(22) # approximately Loschmidt's number
Further reading, links and resources
Baldauf (2003) Phylogeny for the faint of heart: a tutorial. Trends Genet 19:345-51. (pmid: 12801728) |
[ PubMed ] [ DOI ] Phylogenetic trees seem to be finding ever broader applications, and researchers from very different backgrounds are becoming interested in what they might have to say. This tutorial aims to introduce the basics of building and interpreting phylogenetic trees. It is intended for those wanting to understand better what they are looking at when they look at someone else's trees or to begin learning how to build their own. Topics covered include: how to read a tree, assembling a dataset, multiple sequence alignment (how it works and when it does not), phylogenetic methods, bootstrap analysis and long-branch artefacts, and software and resources. |
Blair & Murphy (2011) Recent trends in molecular phylogenetic analysis: where to next?. J Hered 102:130-8. (pmid: 20696667) |
[ PubMed ] [ DOI ] The acquisition of large multilocus sequence data is providing researchers with an unprecedented amount of information to resolve difficult phylogenetic problems. With these large quantities of data comes the increasing challenge regarding the best methods of analysis. We review the current trends in molecular phylogenetic analysis, focusing specifically on the topics of multiple sequence alignment and methods of tree reconstruction. We suggest that traditional methods are inadequate for these highly heterogeneous data sets and that researchers employ newer more sophisticated search algorithms in their analyses. If we are to best extract the information present in these data sets, a sound understanding of basic phylogenetic principles combined with modern methodological techniques are necessary. |
Notes
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2020-09-25
Version:
- 1.1
Version history:
- 1.1 2020 Maintenance
- 1.0 First live version.
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.