Difference between revisions of "BIO bootstrapping with PHYLIP"
(Created page with "<div id="BIO"> <div class="b1"> Bootstrapping PHYLIP trees </div> A maximally brief overview how to produce bootstrapping results for PHYLIP trees using PROML. __TOC__ &n...") |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
</div> | </div> | ||
− | A | + | A brief overview how to produce bootstrapping results for PHYLIP trees. |
Line 17: | Line 17: | ||
# Use the program '''consense''' to calculate your consensus tree. | # Use the program '''consense''' to calculate your consensus tree. | ||
+ | |||
+ | | ||
===Input data=== | ===Input data=== | ||
Create a PHYLIP input file with the usual <code>infile</code> filename. Something like this: | Create a PHYLIP input file with the usual <code>infile</code> filename. Something like this: | ||
Line 38: | Line 40: | ||
</source> | </source> | ||
+ | |||
+ | | ||
===seqboot=== | ===seqboot=== | ||
#Read the documentation [http://evolution.genetics.washington.edu/phylip/doc/seqboot.html for the '''<code>seqboot</code>''' program.] | #Read the documentation [http://evolution.genetics.washington.edu/phylip/doc/seqboot.html for the '''<code>seqboot</code>''' program.] | ||
Line 63: | Line 67: | ||
Note how approximately 1/3 of the columns are replicates. | Note how approximately 1/3 of the columns are replicates. | ||
+ | |||
+ | | ||
===proml=== | ===proml=== | ||
The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates. | The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates. | ||
Line 69: | Line 75: | ||
#Rename the previous <code>outfile</code> as the new <code>infile</code>. | #Rename the previous <code>outfile</code> as the new <code>infile</code>. | ||
#Run '''<code>proml</code>''' on your <code>infile</code>. | #Run '''<code>proml</code>''' on your <code>infile</code>. | ||
− | #Set your parameters. I have used the defaults for this example, except for choosing the '''<code>M</code>''' option for multiple datasets and as prompted '''<code>D</code>''' for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. <small>(While this is running | + | #Set your parameters. I have used the defaults for this example, except for choosing the option '''<code>S</code>''' (not speedy and rough), the '''<code>M</code>''' option for multiple datasets and as prompted '''<code>D</code>''' for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. <small>(While this is running – 5 minutes or so for my example – you can read about common input options such as what "jumble means [http://evolution.genetics.washington.edu/phylip/doc/main.html#options '''here'''].)</small> |
− | #The usual <code>outfile</code> and <code>outtree</code> is created. Have a look. | + | #The usual <code>outfile</code> and <code>outtree</code> is created. Have a look. Here are the first two trees from my <code>outfile</code>: |
+ | <source lang="text"> | ||
+ | Data set # 1: | ||
+ | +-Mbp1_USTMA | ||
+ | +--------2 | ||
+ | | | +-Mbp1_ASPNI | ||
+ | | +--3 | ||
+ | | | +-Mbp1_NEUCR | ||
+ | | +------1 | ||
+ | | | +--------Mbp1_SCHPO | ||
+ | | +------5 | ||
+ | | +----------Mbp1_CANAL | ||
+ | | | ||
+ | 4---Mbp1_SACCE | ||
+ | | | ||
+ | +-----------------------KilA_ESCCO | ||
+ | Ln Likelihood = -818.27365 | ||
+ | |||
+ | Data set # 2: | ||
+ | +----Mbp1_USTMA | ||
+ | +--3 | ||
+ | | | +------Mbp1_NEUCR | ||
+ | +--------1 +---4 | ||
+ | | | +Mbp1_ASPNI | ||
+ | +---5 | | ||
+ | | | +-----Mbp1_SCHPO | ||
+ | | | | ||
+ | | +---Mbp1_SACCE | ||
+ | | | ||
+ | 2-------Mbp1_CANAL | ||
+ | | | ||
+ | +-----------------------------------------------KilA_ESCCO | ||
+ | Ln Likelihood = -825.36962 | ||
+ | </source> | ||
| | ||
+ | ===consense=== | ||
+ | You can use '''<code>consense</code>''' to calculate a consensus tree. | ||
+ | #Read the documentation [http://evolution.genetics.washington.edu/phylip/doc/consense.html for the '''<code>consense</code>''' program.] | ||
+ | #Rename the previous <code>outtree</code> as the new <code>intree</code>. | ||
+ | #Run '''<code>consense</code>''' on your <code>intree</code>. | ||
+ | #Set your parameters. I have used the defaults for this example. | ||
+ | #The usual <code>outfile</code> is created, and the consensus tree (<code>outtree</code>). Have a look. | ||
+ | |||
+ | <source lang="text"> | ||
+ | +-------------------------------Mbp1 SCHPO | ||
+ | | | ||
+ | | +-------Mbp1 SACCE | ||
+ | +-------| +--61.0-| | ||
+ | | | +--52.0-| +-------Mbp1 CANAL | ||
+ | | | | | | ||
+ | | +--26.0-| +---------------KilA ESCCO | ||
+ | | | | ||
+ | | | +-------Mbp1 NEUCR | ||
+ | | +----------69.0-| | ||
+ | | +-------Mbp1 ASPNI | ||
+ | | | ||
+ | +---------------------------------------Mbp1 USTMA | ||
+ | </source> | ||
+ | |||
+ | The bootstrap values are poor overall. The reason is that the sequences are short to begin with, and eliminating 1/3 of the information by resampling makes the estimation process quite brittle. The topology of the tree is not quite right either: in order to get the correct species tree, the (SCHPO/YSTMA) clade braqnchpoint would need to be moved up in the tree one level. | ||
+ | |||
+ | This is what the tree looks like when I use '''<code>retree</code>''' to redraw it with <tt>KilA-N</tt> as the outgroup. However the bootstrap values had to be entered by hand from the data in <code>outfile</code>, PHYLIP can't do that for you :-( | ||
+ | <source lang="text"> | ||
+ | ┌─────────│KilA ESCCO | ||
+ | │ | ||
+ | │ ┌───────────────────│Mbp1 NEUCR | ||
+ | │ ┌────────0.69─│ | ||
+ | ──│ │ └───────────────────│Mbp1 ASPNI | ||
+ | │ ┌──0.52───│ | ||
+ | │ │ │ ┌───────────────────────────────────────│Mbp1 USTMA | ||
+ | │ │ └0.26│ | ||
+ | └─────────│ └───────────────────│Mbp1 SCHPO | ||
+ | │ | ||
+ | │ ┌───────────────────│Mbp1 SACCE | ||
+ | └──────0.61─│ | ||
+ | └───────────────────│Mbp1 CANAL | ||
+ | </source> | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | | ||
+ | |||
+ | <!-- | ||
==Notes== | ==Notes== | ||
<references /> | <references /> | ||
Line 80: | Line 168: | ||
--> | --> | ||
| | ||
+ | |||
==Further reading and resources== | ==Further reading and resources== | ||
<!-- {{#pmid:21627854}} --> | <!-- {{#pmid:21627854}} --> | ||
<!-- {{WWW|WWW_UniProt}} --> | <!-- {{WWW|WWW_UniProt}} --> | ||
− | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/seqboot.html '''seqboot''' documentation]</div> | + | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/seqboot.html PHYLIP '''seqboot''' documentation]</div> |
− | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/proml.html '''proml''' documentation]</div> | + | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/proml.html PHYLIP '''proml''' documentation]</div> |
− | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/consense.html '''consense''' documentation]</div> | + | <div class="reference-box">[http://evolution.genetics.washington.edu/phylip/doc/consense.html PHYLIP '''consense''' documentation]</div> |
| |
Latest revision as of 14:36, 27 November 2013
Bootstrapping PHYLIP trees
A brief overview how to produce bootstrapping results for PHYLIP trees.
Contents
Principle
- Create multiple boostrapped copies (e.g. 100) of your input data using seqboot.
- Run your tree estimation program of choice using the
M
input option (analyze multiple trees). - Use the program consense to calculate your consensus tree.
Input data
Create a PHYLIP input file with the usual infile
filename. Something like this:
7 77
KilA_ESCCO ---------R AKDGYINATS MCRTAGKLLS DYTRLLSRDM GIPISEIQSF
Mbp1_SACCE IHSTGSIMKR KKDDWVNATH ILKAANFAKA KRTRILEKEV LKE--THEKV
Mbp1_NEUCR -VNNVAVMRR RHDDWVNATH ILKAAGFDKP ARTRILEREV QKD--THEKI
Mbp1_CANAL VTSEGPIMRR KKDSWINATH ILKIAKFPKA KRTRILEKDV QTG--IHEKV
Mbp1_USTMA IINNVAVMRR RSDDWLNATQ ILKVVGLDKP QRTRVLEREI QKG--IHEKV
Mbp1_ASPNI -----SVMRR RSDDWINATH ILKVAGFDKP ARTRILEREV QKG--VHEKV
Mbp1_SCHPO -IKGVSVMRR RRDSWLNATQ ILKVADFDKP QRTRVLERQV QIG--AHEKV
KGGRPENQGT WVHPDIAINL AQ-----
QGGFGKYQGT WVPLNIAKQL AEKFSVY
QGGYGRYQGT WIPLEQAEAL ARRNNIY
QGGYGKYQGT YVPLDLGAAI ARNFGVY
QGGYGKYQGT WIPLDVAIEL AERYNI-
QGGYGKYQGT WIPLQEGRQL AERNNI-
QGGYGKYQGT WVPFQRGVDL ATKYKV-
seqboot
- Read the documentation for the
seqboot
program. - Run
seqboot
on yourinfile
. - Set your parameters. I have used the defaults for this example. The random seed should be of the form
4n+1
. - The usual
outfile
is created. Here is the first bootstrap replicate from the run.
7 77
KilA_ESCCO ---------- -RKKGGGYIA TTMMCCRRRL SIISSEIQQQ GGRRRNQQQQ GTWVPIIIAI
Mbp1_SACCE HHSSTGSIMK KRKKDDDWVA TTIILLKRRL E----THEEE GGFFFYQQQQ GTWVLIIIAK
Mbp1_NEUCR VVNNNVAVMR RRHHDDDWVA TTIILLKRRL E----THEEE GGYYYYQQQQ GTWILQQQAE
Mbp1_CANAL TTSSEGPIMR RRKKSSSWIA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTYVLLLLGA
Mbp1_USTMA IINNNVAVMR RRSSDDDWLA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTWILVVVAI
Mbp1_ASPNI ------SVMR RRSSDDDWIA TTIILLKRRL E----VHEEE GGYYYYQQQQ GTWILEEEGR
Mbp1_SCHPO IIKKGVSVMR RRRRSSSWLA TTIILLKRRL E----AHEEE GGYYYYQQQQ GTWVFRRRGV
INNLLAAAQQ Q------
KQQLLAAAEE EKKSSVY
EAALLAAARR RRRNNIY
AAAIIAAARR RNNGGVY
IEELLAAAEE ERRNNI-
RQQLLAAAEE ERRNNI-
VDDLLAAATT TKKKKV-
Note how approximately 1/3 of the columns are replicates.
proml
The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.
- Read the documentation for the
proml
program. - Rename the previous
outfile
as the newinfile
. - Run
proml
on yourinfile
. - Set your parameters. I have used the defaults for this example, except for choosing the option
S
(not speedy and rough), theM
option for multiple datasets and as promptedD
for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. (While this is running – 5 minutes or so for my example – you can read about common input options such as what "jumble means here.) - The usual
outfile
andouttree
is created. Have a look. Here are the first two trees from myoutfile
:
Data set # 1:
+-Mbp1_USTMA
+--------2
| | +-Mbp1_ASPNI
| +--3
| | +-Mbp1_NEUCR
| +------1
| | +--------Mbp1_SCHPO
| +------5
| +----------Mbp1_CANAL
|
4---Mbp1_SACCE
|
+-----------------------KilA_ESCCO
Ln Likelihood = -818.27365
Data set # 2:
+----Mbp1_USTMA
+--3
| | +------Mbp1_NEUCR
+--------1 +---4
| | +Mbp1_ASPNI
+---5 |
| | +-----Mbp1_SCHPO
| |
| +---Mbp1_SACCE
|
2-------Mbp1_CANAL
|
+-----------------------------------------------KilA_ESCCO
Ln Likelihood = -825.36962
consense
You can use consense
to calculate a consensus tree.
- Read the documentation for the
consense
program. - Rename the previous
outtree
as the newintree
. - Run
consense
on yourintree
. - Set your parameters. I have used the defaults for this example.
- The usual
outfile
is created, and the consensus tree (outtree
). Have a look.
+-------------------------------Mbp1 SCHPO
|
| +-------Mbp1 SACCE
+-------| +--61.0-|
| | +--52.0-| +-------Mbp1 CANAL
| | | |
| +--26.0-| +---------------KilA ESCCO
| |
| | +-------Mbp1 NEUCR
| +----------69.0-|
| +-------Mbp1 ASPNI
|
+---------------------------------------Mbp1 USTMA
The bootstrap values are poor overall. The reason is that the sequences are short to begin with, and eliminating 1/3 of the information by resampling makes the estimation process quite brittle. The topology of the tree is not quite right either: in order to get the correct species tree, the (SCHPO/YSTMA) clade braqnchpoint would need to be moved up in the tree one level.
This is what the tree looks like when I use retree
to redraw it with KilA-N as the outgroup. However the bootstrap values had to be entered by hand from the data in outfile
, PHYLIP can't do that for you :-(
┌─────────│KilA ESCCO
│
│ ┌───────────────────│Mbp1 NEUCR
│ ┌────────0.69─│
──│ │ └───────────────────│Mbp1 ASPNI
│ ┌──0.52───│
│ │ │ ┌───────────────────────────────────────│Mbp1 USTMA
│ │ └0.26│
└─────────│ └───────────────────│Mbp1 SCHPO
│
│ ┌───────────────────│Mbp1 SACCE
└──────0.61─│
└───────────────────│Mbp1 CANAL
Further reading and resources