Difference between revisions of "BIO bootstrapping with PHYLIP"

From "A B C"
Jump to navigation Jump to search
(Created page with "<div id="BIO"> <div class="b1"> Bootstrapping PHYLIP trees </div> A maximally brief overview how to produce bootstrapping results for PHYLIP trees using PROML. __TOC__ &n...")
 
m
Line 4: Line 4:
 
</div>
 
</div>
  
A maximally brief overview how to produce bootstrapping results for PHYLIP trees using PROML.
+
A brief overview how to produce bootstrapping results for PHYLIP trees.
  
  
Line 17: Line 17:
 
# Use the program '''consense''' to calculate your consensus tree.  
 
# Use the program '''consense''' to calculate your consensus tree.  
  
 +
 +
&nbsp;
 
===Input data===
 
===Input data===
 
Create a PHYLIP input file with the usual <code>infile</code> filename. Something like this:
 
Create a PHYLIP input file with the usual <code>infile</code> filename. Something like this:
Line 38: Line 40:
 
</source>
 
</source>
  
 +
 +
&nbsp;
 
===seqboot===
 
===seqboot===
 
#Read the documentation [http://evolution.genetics.washington.edu/phylip/doc/seqboot.html for the '''<code>seqboot</code>''' program.]  
 
#Read the documentation [http://evolution.genetics.washington.edu/phylip/doc/seqboot.html for the '''<code>seqboot</code>''' program.]  
Line 63: Line 67:
 
Note how approximately 1/3 of the columns are replicates.
 
Note how approximately 1/3 of the columns are replicates.
  
 +
 +
&nbsp;
 
===proml===
 
===proml===
 
The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.
 
The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.
Line 69: Line 75:
 
#Rename the previous <code>outfile</code> as the new <code>infile</code>.
 
#Rename the previous <code>outfile</code> as the new <code>infile</code>.
 
#Run '''<code>proml</code>''' on your <code>infile</code>.
 
#Run '''<code>proml</code>''' on your <code>infile</code>.
#Set your parameters. I have used the defaults for this example, except for choosing  the '''<code>M</code>''' option for multiple datasets and as prompted  '''<code>D</code>''' for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. <small>(While this is running, you can read about common input options such as what "jumble means [http://evolution.genetics.washington.edu/phylip/doc/main.html#options '''here'''].)</small>
+
#Set your parameters. I have used the defaults for this example, except for choosing the option '''<code>S</code>''' (not speedy and rough), the '''<code>M</code>''' option for multiple datasets and as prompted  '''<code>D</code>''' for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. <small>(While this is running &ndash; 5 minutes or so for my example &ndash; you can read about common input options such as what "jumble means [http://evolution.genetics.washington.edu/phylip/doc/main.html#options '''here'''].)</small>
#The usual <code>outfile</code> and  <code>outtree</code> is created. Have a look.
+
#The usual <code>outfile</code> and  <code>outtree</code> is created. Have a look. Here are the first two trees from my <code>outfile</code>:
 +
<source lang="text">
 +
Data set # 1:
 +
          +-Mbp1_USTMA
 +
  +--------2 
 +
  |        |  +-Mbp1_ASPNI
 +
  |        +--3 
 +
  |          |      +-Mbp1_NEUCR
 +
  |          +------1 
 +
  |                  |      +--------Mbp1_SCHPO
 +
  |                  +------5 
 +
  |                        +----------Mbp1_CANAL
 +
  | 
 +
  4---Mbp1_SACCE
 +
  | 
 +
  +-----------------------KilA_ESCCO
 +
Ln Likelihood =  -818.27365
 +
 
 +
Data set # 2:
 +
                  +----Mbp1_USTMA
 +
              +--3 
 +
              |  |  +------Mbp1_NEUCR
 +
      +--------1  +---4 
 +
      |        |      +Mbp1_ASPNI
 +
  +---5        | 
 +
  |  |        +-----Mbp1_SCHPO
 +
  |  | 
 +
  |  +---Mbp1_SACCE
 +
  | 
 +
  2-------Mbp1_CANAL
 +
  | 
 +
  +-----------------------------------------------KilA_ESCCO
 +
Ln Likelihood =  -825.36962
 +
</source>
 +
 
 +
 
 +
&nbsp;
 +
===consense===
 +
You can use  '''<code>consense</code>''' to calculate a consensus tree.
 +
#Read the documentation [http://evolution.genetics.washington.edu/phylip/doc/consense.html for the '''<code>consense</code>''' program.]
 +
#Rename the previous <code>outtree</code> as the new <code>intree</code>.
 +
#Run '''<code>consense</code>''' on your <code>intree</code>.
 +
#Set your parameters. I have used the defaults for this example.
 +
#The usual <code>outfile</code> is created, and the consensus tree (<code>outtree</code>). Have a look.
 +
 
 +
<source lang="text">
 +
          +-------------------------------Mbp1 SCHPO
 +
          |
 +
          |                      +-------Mbp1 SACCE
 +
  +-------|              +--61.0-|
 +
  |      |      +--52.0-|      +-------Mbp1 CANAL
 +
  |      |      |      |
 +
  |      +--26.0-|      +---------------KilA ESCCO
 +
  |              |
 +
  |              |              +-------Mbp1 NEUCR
 +
  |              +----------69.0-|
 +
  |                              +-------Mbp1 ASPNI
 +
  |
 +
  +---------------------------------------Mbp1 USTMA
 +
</source>
 +
 
 +
The bootstrap values are poor overall. The reason is that the sequences are short to begin with, and eliminating 1/3 of the information by resampling makes the estimation process quite brittle. The topology of the tree is not quite right either - this is what it looks like when I use '''<code>retree</code>''' to redraw it with <tt>KilA-N<tt> as the outgroup. However the bootstrap values had to be put in by hand from the data in <code>outfile</code>, PHYLIP can't do that for you :-(
 +
<source lang="text">
 +
  ┌─────────|KilA ESCCO
 +
  | 
 +
  |                                ┌───────────────────|Mbp1 NEUCR
 +
  |                  ┌────────0.69─| 
 +
──|                  |            └───────────────────|Mbp1 ASPNI
 +
  |        ┌──0.52───| 
 +
  |        |        |    ┌───────────────────────────────────────|Mbp1 USTMA
 +
  |        |        └0.26| 
 +
  └─────────|              └───────────────────|Mbp1 SCHPO
 +
            | 
 +
            |          ┌───────────────────|Mbp1 SACCE
 +
            └──────0.61─|
 +
                        └───────────────────|Mbp1 CANAL
 +
</source>
 +
 
 +
 
  
  

Revision as of 14:06, 27 November 2013

Bootstrapping PHYLIP trees

A brief overview how to produce bootstrapping results for PHYLIP trees.



 

Principle

  1. Create multiple boostrapped copies (e.g. 100) of your input data using seqboot.
  2. Run your tree estimation program of choice using the M input option (analyze multiple trees).
  3. Use the program consense to calculate your consensus tree.


 

Input data

Create a PHYLIP input file with the usual infile filename. Something like this:

 7 77
KilA_ESCCO   ---------R AKDGYINATS MCRTAGKLLS DYTRLLSRDM GIPISEIQSF
Mbp1_SACCE   IHSTGSIMKR KKDDWVNATH ILKAANFAKA KRTRILEKEV LKE--THEKV
Mbp1_NEUCR   -VNNVAVMRR RHDDWVNATH ILKAAGFDKP ARTRILEREV QKD--THEKI
Mbp1_CANAL   VTSEGPIMRR KKDSWINATH ILKIAKFPKA KRTRILEKDV QTG--IHEKV
Mbp1_USTMA   IINNVAVMRR RSDDWLNATQ ILKVVGLDKP QRTRVLEREI QKG--IHEKV
Mbp1_ASPNI   -----SVMRR RSDDWINATH ILKVAGFDKP ARTRILEREV QKG--VHEKV
Mbp1_SCHPO   -IKGVSVMRR RRDSWLNATQ ILKVADFDKP QRTRVLERQV QIG--AHEKV

             KGGRPENQGT WVHPDIAINL AQ-----
             QGGFGKYQGT WVPLNIAKQL AEKFSVY
             QGGYGRYQGT WIPLEQAEAL ARRNNIY
             QGGYGKYQGT YVPLDLGAAI ARNFGVY
             QGGYGKYQGT WIPLDVAIEL AERYNI-
             QGGYGKYQGT WIPLQEGRQL AERNNI-
             QGGYGKYQGT WVPFQRGVDL ATKYKV-


 

seqboot

  1. Read the documentation for the seqboot program.
  2. Run seqboot on your infile.
  3. Set your parameters. I have used the defaults for this example. The random seed should be of the form 4n+1.
  4. The usual outfile is created. Here is the first bootstrap replicate from the run.
    7    77
KilA_ESCCO ---------- -RKKGGGYIA TTMMCCRRRL SIISSEIQQQ GGRRRNQQQQ GTWVPIIIAI
Mbp1_SACCE HHSSTGSIMK KRKKDDDWVA TTIILLKRRL E----THEEE GGFFFYQQQQ GTWVLIIIAK
Mbp1_NEUCR VVNNNVAVMR RRHHDDDWVA TTIILLKRRL E----THEEE GGYYYYQQQQ GTWILQQQAE
Mbp1_CANAL TTSSEGPIMR RRKKSSSWIA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTYVLLLLGA
Mbp1_USTMA IINNNVAVMR RRSSDDDWLA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTWILVVVAI
Mbp1_ASPNI ------SVMR RRSSDDDWIA TTIILLKRRL E----VHEEE GGYYYYQQQQ GTWILEEEGR
Mbp1_SCHPO IIKKGVSVMR RRRRSSSWLA TTIILLKRRL E----AHEEE GGYYYYQQQQ GTWVFRRRGV

           INNLLAAAQQ Q------
           KQQLLAAAEE EKKSSVY
           EAALLAAARR RRRNNIY
           AAAIIAAARR RNNGGVY
           IEELLAAAEE ERRNNI-
           RQQLLAAAEE ERRNNI-
           VDDLLAAATT TKKKKV-

Note how approximately 1/3 of the columns are replicates.


 

proml

The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.

  1. Read the documentation for the proml program.
  2. Rename the previous outfile as the new infile.
  3. Run proml on your infile.
  4. Set your parameters. I have used the defaults for this example, except for choosing the option S (not speedy and rough), the M option for multiple datasets and as prompted D for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. (While this is running – 5 minutes or so for my example – you can read about common input options such as what "jumble means here.)
  5. The usual outfile and outtree is created. Have a look. Here are the first two trees from my outfile:
Data set # 1:
           +-Mbp1_USTMA
  +--------2  
  |        |  +-Mbp1_ASPNI
  |        +--3  
  |           |      +-Mbp1_NEUCR
  |           +------1  
  |                  |      +--------Mbp1_SCHPO
  |                  +------5  
  |                         +----------Mbp1_CANAL
  |  
  4---Mbp1_SACCE
  |  
  +-----------------------KilA_ESCCO
Ln Likelihood =  -818.27365

Data set # 2:
                  +----Mbp1_USTMA
               +--3  
               |  |   +------Mbp1_NEUCR
      +--------1  +---4  
      |        |      +Mbp1_ASPNI
  +---5        |  
  |   |        +-----Mbp1_SCHPO
  |   |  
  |   +---Mbp1_SACCE
  |  
  2-------Mbp1_CANAL
  |  
  +-----------------------------------------------KilA_ESCCO
Ln Likelihood =  -825.36962


 

consense

You can use consense to calculate a consensus tree.

  1. Read the documentation for the consense program.
  2. Rename the previous outtree as the new intree.
  3. Run consense on your intree.
  4. Set your parameters. I have used the defaults for this example.
  5. The usual outfile is created, and the consensus tree (outtree). Have a look.
          +-------------------------------Mbp1 SCHPO
          |
          |                       +-------Mbp1 SACCE
  +-------|               +--61.0-|
  |       |       +--52.0-|       +-------Mbp1 CANAL
  |       |       |       |
  |       +--26.0-|       +---------------KilA ESCCO
  |               |
  |               |               +-------Mbp1 NEUCR
  |               +----------69.0-|
  |                               +-------Mbp1 ASPNI
  |
  +---------------------------------------Mbp1 USTMA

The bootstrap values are poor overall. The reason is that the sequences are short to begin with, and eliminating 1/3 of the information by resampling makes the estimation process quite brittle. The topology of the tree is not quite right either - this is what it looks like when I use retree to redraw it with KilA-N as the outgroup. However the bootstrap values had to be put in by hand from the data in outfile, PHYLIP can't do that for you :-(

   ┌─────────|KilA ESCCO
  |  
  |                                 ┌───────────────────|Mbp1 NEUCR
  |                   ┌────────0.69─|  
──|                   |             └───────────────────|Mbp1 ASPNI
  |         ┌──0.52───|  
  |         |         |    ┌───────────────────────────────────────|Mbp1 USTMA
  |         |         └0.26|  
  └─────────|              └───────────────────|Mbp1 SCHPO
            |  
            |           ┌───────────────────|Mbp1 SACCE
            └──────0.61─|
                        └───────────────────|Mbp1 CANAL



 

Notes


-->  

Further reading and resources