Difference between revisions of "BIO bootstrapping with PHYLIP"
m (→consense) |
|||
| Line 172: | Line 172: | ||
<!-- {{#pmid:21627854}} --> | <!-- {{#pmid:21627854}} --> | ||
<!-- {{WWW|WWW_UniProt}} --> | <!-- {{WWW|WWW_UniProt}} --> | ||
| − | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/seqboot.html '''seqboot''' documentation]</div> | + | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/seqboot.html PHYLIP '''seqboot''' documentation]</div> |
| − | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/proml.html '''proml''' documentation]</div> | + | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/proml.html PHYLIP '''proml''' documentation]</div> |
| − | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/consense.html '''consense''' documentation]</div> | + | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/consense.html PHYLIP '''consense''' documentation]</div> |
| | ||
Latest revision as of 14:36, 27 November 2013
Bootstrapping PHYLIP trees
A brief overview how to produce bootstrapping results for PHYLIP trees.
Contents
Principle
- Create multiple boostrapped copies (e.g. 100) of your input data using seqboot.
- Run your tree estimation program of choice using the
Minput option (analyze multiple trees). - Use the program consense to calculate your consensus tree.
Input data
Create a PHYLIP input file with the usual infile filename. Something like this:
7 77
KilA_ESCCO ---------R AKDGYINATS MCRTAGKLLS DYTRLLSRDM GIPISEIQSF
Mbp1_SACCE IHSTGSIMKR KKDDWVNATH ILKAANFAKA KRTRILEKEV LKE--THEKV
Mbp1_NEUCR -VNNVAVMRR RHDDWVNATH ILKAAGFDKP ARTRILEREV QKD--THEKI
Mbp1_CANAL VTSEGPIMRR KKDSWINATH ILKIAKFPKA KRTRILEKDV QTG--IHEKV
Mbp1_USTMA IINNVAVMRR RSDDWLNATQ ILKVVGLDKP QRTRVLEREI QKG--IHEKV
Mbp1_ASPNI -----SVMRR RSDDWINATH ILKVAGFDKP ARTRILEREV QKG--VHEKV
Mbp1_SCHPO -IKGVSVMRR RRDSWLNATQ ILKVADFDKP QRTRVLERQV QIG--AHEKV
KGGRPENQGT WVHPDIAINL AQ-----
QGGFGKYQGT WVPLNIAKQL AEKFSVY
QGGYGRYQGT WIPLEQAEAL ARRNNIY
QGGYGKYQGT YVPLDLGAAI ARNFGVY
QGGYGKYQGT WIPLDVAIEL AERYNI-
QGGYGKYQGT WIPLQEGRQL AERNNI-
QGGYGKYQGT WVPFQRGVDL ATKYKV-
seqboot
- Read the documentation for the
seqbootprogram. - Run
seqbooton yourinfile. - Set your parameters. I have used the defaults for this example. The random seed should be of the form
4n+1. - The usual
outfileis created. Here is the first bootstrap replicate from the run.
7 77
KilA_ESCCO ---------- -RKKGGGYIA TTMMCCRRRL SIISSEIQQQ GGRRRNQQQQ GTWVPIIIAI
Mbp1_SACCE HHSSTGSIMK KRKKDDDWVA TTIILLKRRL E----THEEE GGFFFYQQQQ GTWVLIIIAK
Mbp1_NEUCR VVNNNVAVMR RRHHDDDWVA TTIILLKRRL E----THEEE GGYYYYQQQQ GTWILQQQAE
Mbp1_CANAL TTSSEGPIMR RRKKSSSWIA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTYVLLLLGA
Mbp1_USTMA IINNNVAVMR RRSSDDDWLA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTWILVVVAI
Mbp1_ASPNI ------SVMR RRSSDDDWIA TTIILLKRRL E----VHEEE GGYYYYQQQQ GTWILEEEGR
Mbp1_SCHPO IIKKGVSVMR RRRRSSSWLA TTIILLKRRL E----AHEEE GGYYYYQQQQ GTWVFRRRGV
INNLLAAAQQ Q------
KQQLLAAAEE EKKSSVY
EAALLAAARR RRRNNIY
AAAIIAAARR RNNGGVY
IEELLAAAEE ERRNNI-
RQQLLAAAEE ERRNNI-
VDDLLAAATT TKKKKV-Note how approximately 1/3 of the columns are replicates.
proml
The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.
- Read the documentation for the
promlprogram. - Rename the previous
outfileas the newinfile. - Run
promlon yourinfile. - Set your parameters. I have used the defaults for this example, except for choosing the option
S(not speedy and rough), theMoption for multiple datasets and as promptedDfor data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. (While this is running – 5 minutes or so for my example – you can read about common input options such as what "jumble means here.) - The usual
outfileandouttreeis created. Have a look. Here are the first two trees from myoutfile:
Data set # 1:
+-Mbp1_USTMA
+--------2
| | +-Mbp1_ASPNI
| +--3
| | +-Mbp1_NEUCR
| +------1
| | +--------Mbp1_SCHPO
| +------5
| +----------Mbp1_CANAL
|
4---Mbp1_SACCE
|
+-----------------------KilA_ESCCO
Ln Likelihood = -818.27365
Data set # 2:
+----Mbp1_USTMA
+--3
| | +------Mbp1_NEUCR
+--------1 +---4
| | +Mbp1_ASPNI
+---5 |
| | +-----Mbp1_SCHPO
| |
| +---Mbp1_SACCE
|
2-------Mbp1_CANAL
|
+-----------------------------------------------KilA_ESCCO
Ln Likelihood = -825.36962
consense
You can use consense to calculate a consensus tree.
- Read the documentation for the
consenseprogram. - Rename the previous
outtreeas the newintree. - Run
consenseon yourintree. - Set your parameters. I have used the defaults for this example.
- The usual
outfileis created, and the consensus tree (outtree). Have a look.
+-------------------------------Mbp1 SCHPO
|
| +-------Mbp1 SACCE
+-------| +--61.0-|
| | +--52.0-| +-------Mbp1 CANAL
| | | |
| +--26.0-| +---------------KilA ESCCO
| |
| | +-------Mbp1 NEUCR
| +----------69.0-|
| +-------Mbp1 ASPNI
|
+---------------------------------------Mbp1 USTMAThe bootstrap values are poor overall. The reason is that the sequences are short to begin with, and eliminating 1/3 of the information by resampling makes the estimation process quite brittle. The topology of the tree is not quite right either: in order to get the correct species tree, the (SCHPO/YSTMA) clade braqnchpoint would need to be moved up in the tree one level.
This is what the tree looks like when I use retree to redraw it with KilA-N as the outgroup. However the bootstrap values had to be entered by hand from the data in outfile, PHYLIP can't do that for you :-(
┌─────────│KilA ESCCO
│
│ ┌───────────────────│Mbp1 NEUCR
│ ┌────────0.69─│
──│ │ └───────────────────│Mbp1 ASPNI
│ ┌──0.52───│
│ │ │ ┌───────────────────────────────────────│Mbp1 USTMA
│ │ └0.26│
└─────────│ └───────────────────│Mbp1 SCHPO
│
│ ┌───────────────────│Mbp1 SACCE
└──────0.61─│
└───────────────────│Mbp1 CANAL
Further reading and resources