BIO bootstrapping with PHYLIP

From "A B C"
Revision as of 13:19, 27 November 2013 by Boris (talk | contribs) (Created page with "<div id="BIO"> <div class="b1"> Bootstrapping PHYLIP trees </div> A maximally brief overview how to produce bootstrapping results for PHYLIP trees using PROML. __TOC__ &n...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Bootstrapping PHYLIP trees

A maximally brief overview how to produce bootstrapping results for PHYLIP trees using PROML.



 

Principle

  1. Create multiple boostrapped copies (e.g. 100) of your input data using seqboot.
  2. Run your tree estimation program of choice using the M input option (analyze multiple trees).
  3. Use the program consense to calculate your consensus tree.

Input data

Create a PHYLIP input file with the usual infile filename. Something like this:

 7 77
KilA_ESCCO   ---------R AKDGYINATS MCRTAGKLLS DYTRLLSRDM GIPISEIQSF
Mbp1_SACCE   IHSTGSIMKR KKDDWVNATH ILKAANFAKA KRTRILEKEV LKE--THEKV
Mbp1_NEUCR   -VNNVAVMRR RHDDWVNATH ILKAAGFDKP ARTRILEREV QKD--THEKI
Mbp1_CANAL   VTSEGPIMRR KKDSWINATH ILKIAKFPKA KRTRILEKDV QTG--IHEKV
Mbp1_USTMA   IINNVAVMRR RSDDWLNATQ ILKVVGLDKP QRTRVLEREI QKG--IHEKV
Mbp1_ASPNI   -----SVMRR RSDDWINATH ILKVAGFDKP ARTRILEREV QKG--VHEKV
Mbp1_SCHPO   -IKGVSVMRR RRDSWLNATQ ILKVADFDKP QRTRVLERQV QIG--AHEKV

             KGGRPENQGT WVHPDIAINL AQ-----
             QGGFGKYQGT WVPLNIAKQL AEKFSVY
             QGGYGRYQGT WIPLEQAEAL ARRNNIY
             QGGYGKYQGT YVPLDLGAAI ARNFGVY
             QGGYGKYQGT WIPLDVAIEL AERYNI-
             QGGYGKYQGT WIPLQEGRQL AERNNI-
             QGGYGKYQGT WVPFQRGVDL ATKYKV-

seqboot

  1. Read the documentation for the seqboot program.
  2. Run seqboot on your infile.
  3. Set your parameters. I have used the defaults for this example. The random seed should be of the form 4n+1.
  4. The usual outfile is created. Here is the first bootstrap replicate from the run.
    7    77
KilA_ESCCO ---------- -RKKGGGYIA TTMMCCRRRL SIISSEIQQQ GGRRRNQQQQ GTWVPIIIAI
Mbp1_SACCE HHSSTGSIMK KRKKDDDWVA TTIILLKRRL E----THEEE GGFFFYQQQQ GTWVLIIIAK
Mbp1_NEUCR VVNNNVAVMR RRHHDDDWVA TTIILLKRRL E----THEEE GGYYYYQQQQ GTWILQQQAE
Mbp1_CANAL TTSSEGPIMR RRKKSSSWIA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTYVLLLLGA
Mbp1_USTMA IINNNVAVMR RRSSDDDWLA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTWILVVVAI
Mbp1_ASPNI ------SVMR RRSSDDDWIA TTIILLKRRL E----VHEEE GGYYYYQQQQ GTWILEEEGR
Mbp1_SCHPO IIKKGVSVMR RRRRSSSWLA TTIILLKRRL E----AHEEE GGYYYYQQQQ GTWVFRRRGV

           INNLLAAAQQ Q------
           KQQLLAAAEE EKKSSVY
           EAALLAAARR RRRNNIY
           AAAIIAAARR RNNGGVY
           IEELLAAAEE ERRNNI-
           RQQLLAAAEE ERRNNI-
           VDDLLAAATT TKKKKV-

Note how approximately 1/3 of the columns are replicates.

proml

The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.

  1. Read the documentation for the proml program.
  2. Rename the previous outfile as the new infile.
  3. Run proml on your infile.
  4. Set your parameters. I have used the defaults for this example, except for choosing the M option for multiple datasets and as prompted D for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. (While this is running, you can read about common input options such as what "jumble means here.)
  5. The usual outfile and outtree is created. Have a look.


 

Notes


-->  

Further reading and resources