BIO bootstrapping with PHYLIP
Jump to navigation
Jump to search
Bootstrapping PHYLIP trees
A maximally brief overview how to produce bootstrapping results for PHYLIP trees using PROML.
Principle
- Create multiple boostrapped copies (e.g. 100) of your input data using seqboot.
- Run your tree estimation program of choice using the
M
input option (analyze multiple trees). - Use the program consense to calculate your consensus tree.
Input data
Create a PHYLIP input file with the usual infile
filename. Something like this:
7 77
KilA_ESCCO ---------R AKDGYINATS MCRTAGKLLS DYTRLLSRDM GIPISEIQSF
Mbp1_SACCE IHSTGSIMKR KKDDWVNATH ILKAANFAKA KRTRILEKEV LKE--THEKV
Mbp1_NEUCR -VNNVAVMRR RHDDWVNATH ILKAAGFDKP ARTRILEREV QKD--THEKI
Mbp1_CANAL VTSEGPIMRR KKDSWINATH ILKIAKFPKA KRTRILEKDV QTG--IHEKV
Mbp1_USTMA IINNVAVMRR RSDDWLNATQ ILKVVGLDKP QRTRVLEREI QKG--IHEKV
Mbp1_ASPNI -----SVMRR RSDDWINATH ILKVAGFDKP ARTRILEREV QKG--VHEKV
Mbp1_SCHPO -IKGVSVMRR RRDSWLNATQ ILKVADFDKP QRTRVLERQV QIG--AHEKV
KGGRPENQGT WVHPDIAINL AQ-----
QGGFGKYQGT WVPLNIAKQL AEKFSVY
QGGYGRYQGT WIPLEQAEAL ARRNNIY
QGGYGKYQGT YVPLDLGAAI ARNFGVY
QGGYGKYQGT WIPLDVAIEL AERYNI-
QGGYGKYQGT WIPLQEGRQL AERNNI-
QGGYGKYQGT WVPFQRGVDL ATKYKV-
seqboot
- Read the documentation for the
seqboot
program. - Run
seqboot
on yourinfile
. - Set your parameters. I have used the defaults for this example. The random seed should be of the form
4n+1
. - The usual
outfile
is created. Here is the first bootstrap replicate from the run.
7 77
KilA_ESCCO ---------- -RKKGGGYIA TTMMCCRRRL SIISSEIQQQ GGRRRNQQQQ GTWVPIIIAI
Mbp1_SACCE HHSSTGSIMK KRKKDDDWVA TTIILLKRRL E----THEEE GGFFFYQQQQ GTWVLIIIAK
Mbp1_NEUCR VVNNNVAVMR RRHHDDDWVA TTIILLKRRL E----THEEE GGYYYYQQQQ GTWILQQQAE
Mbp1_CANAL TTSSEGPIMR RRKKSSSWIA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTYVLLLLGA
Mbp1_USTMA IINNNVAVMR RRSSDDDWLA TTIILLKRRL E----IHEEE GGYYYYQQQQ GTWILVVVAI
Mbp1_ASPNI ------SVMR RRSSDDDWIA TTIILLKRRL E----VHEEE GGYYYYQQQQ GTWILEEEGR
Mbp1_SCHPO IIKKGVSVMR RRRRSSSWLA TTIILLKRRL E----AHEEE GGYYYYQQQQ GTWVFRRRGV
INNLLAAAQQ Q------
KQQLLAAAEE EKKSSVY
EAALLAAARR RRRNNIY
AAAIIAAARR RNNGGVY
IEELLAAAEE ERRNNI-
RQQLLAAAEE ERRNNI-
VDDLLAAATT TKKKKV-
Note how approximately 1/3 of the columns are replicates.
proml
The output of seqboot works for most of the tree estimation programs. Be aware that running time will increase by a factor of 100 for 100 bootstrap replicates.
- Read the documentation for the
proml
program. - Rename the previous
outfile
as the newinfile
. - Run
proml
on yourinfile
. - Set your parameters. I have used the defaults for this example, except for choosing the
M
option for multiple datasets and as promptedD
for data (not weights), the number of replicates (100), and a random seed, and "jumbling" only once. (While this is running, you can read about common input options such as what "jumble means here.) - The usual
outfile
andouttree
is created. Have a look.
Notes
-->
Further reading and resources