Tools for the bioinformatics lab

From "A B C"
Jump to navigation Jump to search

Tools for the bioinformatics lab


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Summary ...



EMBOSS

EMBOSS installation

Outdated (written 2006).

Download

1. navigate to the EMBOSS download page on sourceforge and read the information on the latest download there. As of this writing, the latest major release is version 3.0.
2. Download this compressed archive.
3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip EMBOSS-3.0.0.tar.gz
tar -xvf EMBOSS-3.0.0.tar
rm EMBOSS-3.0.0.tar
cd EMBOSS-3.0.0

Before you begin, it may be a good idea to browse through some of the files that have been downloaded to get you oriented, these include:

INSTALL
KNOWN_BUGS  (this is an empty file in this release)
README

Compile

EMBOSS requires a number of system specific options to be set and thus will generate its makefile before it can be used, by running the program configure. Type:

configure

Then type

make

Compilation will run for some time. Then type

sudo make install

and finally

make clean

Test

First see whether installation was successful in principle. Typing

ls /usr/local/share/EMBOSS/data/

should list some of the data resources that have been installed and where they are located. Now open a new shell and type

tfm needle

You should see man-like help pages for EMBOSS commands. In fact tfm is itself an EMBOSS command, it runs a program that formats and displays help files. If the above works, it tells you two things: (i) that EMBOSSS programs have been compiled and installed, and (ii) that the installation is on your PATH.

Next, try a simple pairwise alignment. Create two sequence files (2):

HBA.fa
>HBA_HUMAN
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKL
LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
HBB.fa
>HBB_HUMAN
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST
PDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP
ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

and type the following comand (note: I am using the multiline command character "\" here to wrap the command, but it could also be type all on one line:

needle -asequence HBA.fa -bsequence HBB.fa \
-gapopen 10.0 -gapextend 0.5 -datafile EBLOSUM62 \
-outfile test.ali

Then typing

cat test.ali

should give you the following output:

########################################
# Program: needle
# Rundate: Sun Mar 02 2006 14:32:03
# Align_format: srspair
# Report_file: test2.ali
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: HBA_HUMAN
# 2: HBB_HUMAN
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 148
# Identity:      63/148 (42.6%)
# Similarity:    88/148 (59.5%)
# Gaps:           9/148 ( 6.1%)
# Score: 290.5
# 
#
#=======================================

HBA_HUMAN          1 -VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DL     48
                      .|:|.:|:.|.|.||||  :..|.|.|||.|:.:.:|.|:.:|..| ||
HBB_HUMAN          1 VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDL     48

HBA_HUMAN         49 S-----HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRV     93
                     |     .|:.:||.|||||..|.::.:||:|::....:.||:||..||.|
HBB_HUMAN         49 STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV     98

HBA_HUMAN         94 DPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR    141
                     ||.||:||.:.|:..||.|...||||.|.|:..|.:|.|:..|..||.
HBB_HUMAN         99 DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH    146


#---------------------------------------
#---------------------------------------



Notes

(1) In these notes, I assume the "." directory is on your PATH - if it is not, you may have to prepend "./" to commands to tell the operating system the executable file for the command is in your current working directory.

(2) My favorite quick and dirty way to create a text file (e.g. called file.txt) based on something I can copy and paste, is to type

cat > file.txt

then I simply paste the contents and close the file with <ctrl>d. Ask me if you don't understand how this works.


Phylip

Phylip installation

Outdated (written 2006).
Download
1. navigate to the download section of the PHYLIP homepage.
2. read the instructions ... depending on your platform, there may be an easier way than installing from source. Neverthless, since this is the most general, here I will compile from source.
2. Download the compressed archive.
3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip phylip-3.65.tar.gz
tar -xvf phylip-3.65.tar
cd phylip3.65/src
Compile

PHYLIP uses graphical routines in some of its programs. These have to be linked against X-terminal libraries. The Makefile should know where to find them on your system. On my Mac I need to type make -f Makefile.osx install, on your Linux boxes, it should simply work in the standard way: Type:

make install

On my system the whole package compiles with almost no nag. Bravo Joe Felsenstein, for understanding the benefit of writing plain, robust, portable code.

The excutables are being put into the directory distribution.exe and as usual have to be put on on your PATH, your PATH changed or (my preferred way) copied to /usr/local/bin.

cd ..
ll exe
sudo cp exe/* /usr/local/bin
Test

...

That should be all


Clustal

This is provided for reference, but you mustknow that CLUSTAL is an inferior algorithm for protein MSA.

Clustal installation

Outdated (written 2006).


Download
1. navigate to the CLUSTAL homepage at the EBI:
2. at the top of the page, there are icons for Mac and Linux installations (Windows too). Clicking on the Apple folder downloads the latest precompiled Max OS X version (clustalw1.82.mac-osx.tar.gz as of this writing). Don't do this even if you are on a Mac! (1). Clicking on the folder with the penguin icon takes you to an ftp directory which also contains sources for parallel architecture machines. clustalw1.83.UNIX.tar.gz appears to be the current UNIX version as of this writing. Download this compressed archive.
3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip clustalw1.83.UNIX.tar.gz
tar -xvf clustalw1.83.UNIX.tar
cd clustalw1.83
Compile

Type:

make

On my system this compiles with one warning about a redefined symbol which does not appear to be of any consequence. The executable clustalw is being generated. The makefile is not really up to standard since it has no provisions for make test or make install. So we will run our own very simple test and installation. Remove the object files (they are no longer needed after being compiled and linked into the executable. Type

make clean

Since you also do not require the C sources anymore and could download them from the server at anytime if you did, you may also type

rm *.c *.h

to clean up the directory.


Test

The directory contains a test input by the name globin.pep. These are globin sequences in the dated PIR format. I have transformed them into Fasta format below. Copy the following sequences and save them in a file by the name globin.mfa.

>HBB_HUMAN
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST
PDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP
ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

>HBB_HORSE
VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSN
PGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDP
ENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH

>HBA_HUMAN
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKL
LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

>HBA_HORSE
VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSH
GSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKL
LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR

>MYG_PHYCA
VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLK
TEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIP
IKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELG
YQG

>GLB5_PETMA
PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQE
FFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRD
LSGKHAKSFQVDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSAY

>LGB2_LUPLU
GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGT
SEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSK
GVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMN
DAA

Now type

clustalw -options

to verify that the program runs in principle (this will print a list of the commandline options). Then run the following command:

clustalw -infile=globin.mfa

This should have created the two files globin.aln and globin.dnd with the following contents:

globin.aln
CLUSTAL W (1.83) multiple sequence alignment


HBB_HUMAN       --------VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLST
HBB_HORSE       --------VQLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN
HBA_HUMAN       ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-
HBA_HORSE       ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-
GLB5_PETMA      PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT
MYG_PHYCA       ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT
LGB2_LUPLU      --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE
                          *:  :   :   *  .           :  .:   * :   *  :   . 

HBB_HUMAN       PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL
HBB_HORSE       PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL
HBA_HUMAN       ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL
HBA_HORSE       ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL
GLB5_PETMA      ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV
MYG_PHYCA       EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF
LGB2_LUPLU      VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG-VADAHFPV
                      . .:: *.  :   .                  :  *.  *  .  :    : .

HBB_HUMAN       LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------
HBB_HORSE       LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------
HBA_HUMAN       LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------
HBA_HORSE       LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------
GLB5_PETMA      LAAVIADTVAAG---------DAGFEKLMSMICILLRSAY-------
MYG_PHYCA       ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG
LGB2_LUPLU      VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---
                :   :  .:            ...       .   :         
globin.dnd
(
(
(
(
HBB_HUMAN:0.08080,
HBB_HORSE:0.08359)
:0.23578,
(
HBA_HUMAN:0.06516,
HBA_HORSE:0.05541)
:0.19444)
:0.07579,
GLB5_PETMA:0.37023)
:0.02699,
MYG_PHYCA:0.37220,
LGB2_LUPLU:0.47094);

Install

To be able to run clustal from the commandline, it needs to be in a directory on your PATH. This could either be done by putting the program into a directory on the path, or by modifying the PATH appropriately. My preferred way to do this is to keep the executables in /usr/local/bin. First I copy the executable to the directory in which I keep my locally installed programs (type echo $PATH if you are not sure that this directory exists and is on your path on your own machine).

sudo cp clustalw /usr/local/bin

Finally I copy the help file into the same directory, so clustalw can find it:

sudo cp clustalw_help /usr/local/bin
That should be all



Notes
(1) The Mac Os X archives contain two compiled binaries and an unintelligible readme.html. The binaries appear to run but miss any helpfiles or documentation. This is useless pseudosupport, the only thing you save yourself is the trivial task of compiling the executables but when you compile from source at least you get the complete kit and everything is nicely in its place.


 

T-Coffee

T-Coffee installation

Outdated (written 2006).
Download
1. navigate to the T-Coffee project homepage. Find the link to the latest Unix or Windows version (3.79 as of this writing).:
2. Download this compressed archive.
3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip T-COFFEE_distribution_Version_3.79.tar.gz
tar -xvf T-COFFEE_distribution_Version_3.79.tar
cd T-COFFEE_distribution_Version_3.79
Compile

T-Coffee provides an installation file for its standard installation. Type

install

On my system this compiles with a spate of warnings about incompatible implicit declaration of a built-in function, but the tests appear to run oK. The executable t_coffee was generated and moved to the directory bin/. Remove the object files (they are no longer needed after being compiled and linked into the executable. Type

cd t_coffee_source
make clean
cd .. 
Install

To be able to run T-Coffee from the commandline, it needs to be in a directory on your PATH. This could either be done by putting the program into a directory on the path, or by modifying the PATH appropriately. My preferred way to do this is to keep the executables in /usr/local/bin. First I copy the executable to the directory in which I keep my locally installed programs (type echo $PATH if you are not sure that this directory exists and is on your path on your own machine).

sudo cp bin/t_coffee /usr/local/bin
Test

As a final test, type the following (this assumes you are still in the main directory of the distribution:

t_coffee test/test.pep -in fast_pair -outfile=test.aln -outorder=input>/dev/null
cat test.aln

should produce the following output

CLUSTAL FORMAT for T-COFFEE Version_3.79, CPU=0.16 sec, SCORE=24, Nseq=5, Len=87 

1aboA           ---nlf-valydfvasgdntlsitkgeklrvlg-----------ynhnge--wceaqtkn
1ycsB           --kgvi-yalwdyepqnddelpmkegdcmtiih-----------rededeiewwwarlnd
1pht            ---gyqyralydykkereedidlhlgdiltvnkgslvalgfsdgqearpeeigwlngyne
1ihvA           -nfrvy-y--------rdsrdpvwkgpakllwk-----------gegavv-----iqdns
1vie            drvrkk-s--------gaawqgqivgwyctnlt-----------pegyaves-----eah
                                         *                   :              

1aboA           gq-----------gwvpsnyitpvn--
1ycsB           ke-----------gyvprnllglyp--
1pht            ttgergdfpgtyveyigrkkisp----
1ihvA           di-----------kvvprrkakiird-
1vie            pg-----------svqiypvaalerin
                                           
That should be all


   

Further reading and resources