Difference between revisions of "BIN-PHYLO-Tree building"

From "A B C"
Jump to navigation Jump to search
m
m
Line 28: Line 28:
 
== Abstract ==
 
== Abstract ==
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "abstract" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "abstract" -->
 
Building phylogenetic trees in theory - and with phylip in R.
 
Building phylogenetic trees in theory - and with phylip in R.
 
<section end=abstract />
 
<section end=abstract />
Line 37: Line 37:
 
== This unit ... ==
 
== This unit ... ==
 
=== Prerequisites ===
 
=== Prerequisites ===
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "prerequisites" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "prerequisites" -->
<!-- included from "ABC-unit_components.wtxt", section: "notes-external_prerequisites" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "notes-external_prerequisites" -->
 
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
 
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
<!-- included from "FND-prerequisites.wtxt", section: "evolution" -->
+
<!-- included from "./data/ABC-unit_prerequisites.txt", section: "evolution" -->
 
*<b>Evolution</b>: Theory of evolution; variation, neutral drift and selection.
 
*<b>Evolution</b>: Theory of evolution; variation, neutral drift and selection.
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
You need to complete the following units before beginning this one:
 
*[[BIN-PHYLO-Data_preparation|BIN-PHYLO-Data_preparation (Preparing Data for Phylogenetic Analysis)]]
 
*[[BIN-PHYLO-Data_preparation|BIN-PHYLO-Data_preparation (Preparing Data for Phylogenetic Analysis)]]
Line 50: Line 50:
  
 
=== Objectives ===
 
=== Objectives ===
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "objectives" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "objectives" -->
 
This unit will ...
 
This unit will ...
 
* ... introduce the cocepts and algorithms used to build phylogenetic trees;
 
* ... introduce the cocepts and algorithms used to build phylogenetic trees;
Line 59: Line 59:
  
 
=== Outcomes ===
 
=== Outcomes ===
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "outcomes" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "outcomes" -->
 
After working through this unit you ...
 
After working through this unit you ...
 
* ... are familar with concepts and algorithms used to build phylogenetic trees;
 
* ... are familar with concepts and algorithms used to build phylogenetic trees;
Line 68: Line 68:
  
 
=== Deliverables ===
 
=== Deliverables ===
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "deliverables" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "deliverables" -->
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
{{Vspace}}
 
 
 
=== Evaluation ===
 
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
<b>Evaluation: NA</b><br />
 
:This unit is not evaluated for course marks.
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 91: Line 82:
 
<div id="BIO">
 
<div id="BIO">
 
== Contents ==
 
== Contents ==
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "contents" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "contents" -->
  
  
Line 198: Line 189:
  
 
== Notes ==
 
== Notes ==
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "notes" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "notes" -->
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "notes" -->
 
<references />
 
<references />
  
Line 208: Line 199:
 
<div id="ABC-unit-framework">
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
== Self-evaluation ==
<!-- included from "../components/BIN-PHYLO-Tree_building.components.wtxt", section: "self-evaluation" -->
+
<!-- included from "./components/BIN-PHYLO-Tree_building.components.txt", section: "self-evaluation" -->
 
<!--
 
<!--
 
=== Question 1===
 
=== Question 1===
Line 233: Line 224:
  
  
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_ask" -->
  
 
----
 
----
Line 261: Line 252:
 
</div>
 
</div>
 
[[Category:ABC-units]]
 
[[Category:ABC-units]]
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
+
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_footer" -->
  
 
{{CC-BY}}
 
{{CC-BY}}

Revision as of 01:25, 6 January 2018

Building Phylogenetic Trees


 

Keywords:  Calculating phylogenetic trees; tree visualization


 



 


 


Abstract

Building phylogenetic trees in theory - and with phylip in R.


 


This unit ...

Prerequisites

You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:

  • Evolution: Theory of evolution; variation, neutral drift and selection.

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... introduce the cocepts and algorithms used to build phylogenetic trees;
  • ... teach how to compute a maximum lielihood tree with the PHYLIP proml program in R;


 


Outcomes

After working through this unit you ...

  • ... are familar with concepts and algorithms used to build phylogenetic trees;
  • ... have computed a phylogenetic tree of Mbp1 orthologue APSES domains with the PHYLIP proml program via the RPhylip package.


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Contents

Task:


 

The result of the tree construction is a decision about the most likely evolutionary relationships. Fundamentally, tree-construction programs decide which sequences had common ancestors.

"Distance based" and "Parsimony based" methods are fast, but less acurate.

Distance based phylogeny programs start by using sequence comparisons to estimate evolutionary distances:

  • they apply a model of evolution such as a mutation data matrix, to calculate a score for each pair of sequences,
  • this score is stored in a "distance matrix" ...
  • ... and used to estimate a tree that groups sequences with close relationships together. (e.g. by using an NJ, Neigbor Joining, algorithm).

They are fast, can work on large numbers of sequences, but are less accurate if genes evolve at different rates.


Parsimony based phylogeny programs build a tree that minimizes the number of mutation events that are required to get from a common ancestral sequence to all observed sequences. They take all columns into account, not just a single number per sequence pair, as the Distance Methods do. For closely related sequences they work very well, but they construct inaccurate trees when they can't make good estimates for the required number of sequence changes.


"Maximum Likelihood" and "Bayesian" methods are accurate, but can take up very significant computational resources.

ML, or Maximum Likelihood methods attempt to find the tree for which the observed sequences would be the most likely under a particular evolutionary model. They are based on a rigorous statistical framework and yield the most robust results. But they are also quite compute intensive and a tree of the size that we are building in this assignment is a challenge for the resources of common workstation (runs about an hour on my computer). If the problem is too large, one may split a large problem into smaller, obvious subtrees (e.g. analysing orthologues as a group, only including a few paralogues for comparison) and then merge the smaller trees; this way even very large problems can become tractable.

ML methods suffer less from "long-branch attraction" - the phenomenon that weakly similar sequences can be grouped inappropriately close together in a tree due to spuriously shared differences.


Bayesian methods don't estimate the tree that gives the highest likelihood for the observed data, but find the most probable tree, given the data that has been observed. If you think this sounds conceptually similar to ML methods, then you are not wrong. However, the approaches employ very different algorithms. And Bayesian methods need a "prior" on trees before observation.


Calculating trees

 

In this section we perform the actual phylogenetic calculation.


 

Task:

  • Download the PHYLIP suite of programs from the Phylip homepage and install it on your computer.
 

Task:

 
  • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
  • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
  • Type init() if requested.
  • Open the file BIN-PHYLO-Tree_building.R and follow the instructions.


 

Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


 


 



 


Further reading, links and resources

Tuimala, Jarno (2006) A primer to phylogenetic analysis using the PHYLIP package.  
(pmid: None)Source URL ] The purpose of this tutorial is to demonstrate how to use PHYLIP, a collection of phylogenetic analysis software, and some of the options that are available. This tutorial is not intended to be a course in phylogenetics, although some phylogenetic concepts will be discussed briefly. There are other books available which cover the theoretical sides of the phylogenetic analysis, but the actual data analysis work is less well covered. Here we will mostly deal with molecular sequence data analysis in the current PHYLIP version 3.66.


 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-10-31

Version:

1.0

Version history:

  • 1.0 First live version.
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.