BIN-PHYLO-Tree analysis

From "A B C"
Revision as of 01:25, 6 January 2018 by Boris (talk | contribs)
Jump to navigation Jump to search

Analysing Phylogenetic Trees


 

Keywords:  Species trees, gene trees and the importance of naming, Speciation and duplication signatures


 



 


 


Abstract

The analysis of mixed gene trees.


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... introduce ;
  • ... demonstrate ;
  • ... teach how to fetch a species tree from the NCBI taxonomy page;


 


Outcomes

After working through this unit you ...

  • ... can ;
  • ... are familar with ;
  • ... have begun to.


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Contents

Task:


Analysis

Analysing your tree

 

In order to analyse your tree, you need a species tree as reference. This really is an absolute prerequisite to make your expectations about the observed tree explicit. Fortunately we have all species nicely documented in our database.


 

The reference species tree

 

Task:
To get a species tree, we make use of the smart and useful PhyloT service:

  • Execute the following R command to create a list of all taxonomy records for the species in your database (plus E. coli):
cat(paste(c(myDB$taxonomy$ID, "83333"), collapse=", "))


  • Copy this list and paste it into the search field of the PhyloT page. Select: Scientific names; Internal Nodes collapsed; polytomy no; Format newick; Filename fungiTree.txt. Click on Generate tree. The file fungiTree.txt will be downloaded to your computer into your default download directory. Move it to your project directory. Then click on Visualize in iTOL and confirm the tree: the resulting tree should have twelve species names listed - ten "reference" fungi, E. coli (as the outgroup), and MYSPE. Make sure MYSPE is included! If it's not there, you did something wrong that needs to be fixed.
  • Open fungiTree.txt in RStudio. This is a tree, specified in the so-called "Newick format". The topology of the tree is defined through the brackets, and the branch-lengths are all the same: this is a cladogram, not a phylogram.

Let's continue the analysis in R.

Task:

 
  • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
  • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
  • Type init() if requested.
  • Open the file BIN-PHYLO-Tree_analysis.R and follow the instructions.


 

Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


 


 

I have constructed a cladogram for many of the species we are analysing, based on data published for 1551 fungal ribosomal sequences. The six reference species are included. Such reference trees from rRNA data are a standard method of phylogenetic analysis, supported by the assumption that rRNA sequences are monophyletic and have evolved under comparable selective pressure in all species.


 
FungiCladogram.jpg


Cladogram of the "reference" fungi studied in the assignments. This cladogram is based on a tree returned by the NCBI Common Tree. It is thus a digest of cladistic relationships, not a representation of a specific molecular phylogeny.

Alternatively, you can look up your species in the latest version of the species tree for the fungi and add it to the tree by hand while resolving the trifurcations. See:

Ebersberger et al. (2012) A consistent phylogenetic backbone for the fungi. Mol Biol Evol 29:1319-34. (pmid: 22114356)

PubMed ] [ DOI ] The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen phylogenetic conclusions. We reason that branches (splits) recovered with independent data and different tree reconstruction methods are likely to reflect true evolutionary relationships. Two complementary phylogenomic data sets based on 99 fungal genomes and 109 fungal expressed sequence tag (EST) sets analyzed with four different tree reconstruction methods shed light from different angles on the fungal tree of life. Eleven additional data sets address specifically the phylogenetic position of Blastocladiomycota, Ustilaginomycotina, and Dothideomycetes, respectively. The combined evidence from the resulting trees supports the deep-level stability of the fungal groups toward a comprehensive natural system of the fungi. In addition, our analysis reveals methodologically interesting aspects. Enrichment for EST encoded data-a common practice in phylogenomic analyses-introduces a strong bias toward slowly evolving and functionally correlated genes. Consequently, the generalization of phylogenomic data sets as collections of randomly selected genes cannot be taken for granted. A thorough characterization of the data to assess possible influences on the tree reconstruction should therefore become a standard in phylogenomic analyses.


 


Task:

  • Return to the RStudio project and continue with the script to its end. Note the deliverable at the end: to print out your trees and bring them to class.




 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-10-31

Version:

1.0

Version history:

  • 1.0 First live version.
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.