Tools for the bioinformatics lab
Tools for the bioinformatics lab
Summary ...
Contents
EMBOSS
EMBOSS installation
- Outdated (written 2006).
Download
- 1. navigate to the EMBOSS download page on sourceforge and read the information on the latest download there. As of this writing, the latest major release is version 3.0.
- 2. Download this compressed archive.
- 3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip EMBOSS-3.0.0.tar.gz tar -xvf EMBOSS-3.0.0.tar rm EMBOSS-3.0.0.tar cd EMBOSS-3.0.0
Before you begin, it may be a good idea to browse through some of the files that have been downloaded to get you oriented, these include:
INSTALL KNOWN_BUGS (this is an empty file in this release) README
Compile
EMBOSS requires a number of system specific options to be set and thus will generate its makefile before it can be used, by running the program configure. Type:
configure
Then type
make
Compilation will run for some time. Then type
sudo make install
and finally
make clean
Test
First see whether installation was successful in principle. Typing
ls /usr/local/share/EMBOSS/data/
should list some of the data resources that have been installed and where they are located. Now open a new shell and type
tfm needle
You should see man-like help pages for EMBOSS commands. In fact tfm is itself an EMBOSS command, it runs a program that formats and displays help files. If the above works, it tells you two things: (i) that EMBOSSS programs have been compiled and installed, and (ii) that the installation is on your PATH.
Next, try a simple pairwise alignment. Create two sequence files (2):
- HBA.fa
>HBA_HUMAN VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKL LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
- HBB.fa
>HBB_HUMAN VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST PDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
and type the following comand (note: I am using the multiline command character "\" here to wrap the command, but it could also be type all on one line:
needle -asequence HBA.fa -bsequence HBB.fa \ -gapopen 10.0 -gapextend 0.5 -datafile EBLOSUM62 \ -outfile test.ali
Then typing
cat test.ali
should give you the following output:
######################################## # Program: needle # Rundate: Sun Mar 02 2006 14:32:03 # Align_format: srspair # Report_file: test2.ali ######################################## #======================================= # # Aligned_sequences: 2 # 1: HBA_HUMAN # 2: HBB_HUMAN # Matrix: EBLOSUM62 # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 148 # Identity: 63/148 (42.6%) # Similarity: 88/148 (59.5%) # Gaps: 9/148 ( 6.1%) # Score: 290.5 # # #======================================= HBA_HUMAN 1 -VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DL 48 .|:|.:|:.|.|.|||| :..|.|.|||.|:.:.:|.|:.:|..| || HBB_HUMAN 1 VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDL 48 HBA_HUMAN 49 S-----HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRV 93 | .|:.:||.|||||..|.::.:||:|::....:.||:||..||.| HBB_HUMAN 49 STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV 98 HBA_HUMAN 94 DPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR 141 ||.||:||.:.|:..||.|...||||.|.|:..|.:|.|:..|..||. HBB_HUMAN 99 DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH 146 #--------------------------------------- #---------------------------------------
Notes
(1) In these notes, I assume the "." directory is on your PATH - if it is not, you may have to prepend "./" to commands to tell the operating system the executable file for the command is in your current working directory.
(2) My favorite quick and dirty way to create a text file (e.g. called file.txt) based on something I can copy and paste, is to type
cat > file.txt
then I simply paste the contents and close the file with <ctrl>d. Ask me if you don't understand how this works.
Phylip
Phylip installation
- Outdated (written 2006).
- Download
- 1. navigate to the download section of the PHYLIP homepage.
- 2. read the instructions ... depending on your platform, there may be an easier way than installing from source. Neverthless, since this is the most general, here I will compile from source.
- 2. Download the compressed archive.
- 3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip phylip-3.65.tar.gz tar -xvf phylip-3.65.tar cd phylip3.65/src
- Compile
PHYLIP uses graphical routines in some of its programs. These have to be linked against X-terminal libraries. The Makefile should know where to find them on your system. On my Mac I need to type make -f Makefile.osx install, on your Linux boxes, it should simply work in the standard way: Type:
make install
On my system the whole package compiles with almost no nag. Bravo Joe Felsenstein, for understanding the benefit of writing plain, robust, portable code.
The excutables are being put into the directory distribution.exe and as usual have to be put on on your PATH, your PATH changed or (my preferred way) copied to /usr/local/bin.
cd .. ll exe sudo cp exe/* /usr/local/bin
- Test
...
- That should be all
Clustal
This is provided for reference, but you mustknow that CLUSTAL is an inferior algorithm for protein MSA.
Clustal installation
- Outdated (written 2006).
- Download
- 1. navigate to the CLUSTAL homepage at the EBI:
- 2. at the top of the page, there are icons for Mac and Linux installations (Windows too). Clicking on the Apple folder downloads the latest precompiled Max OS X version (clustalw1.82.mac-osx.tar.gz as of this writing). Don't do this even if you are on a Mac! (1). Clicking on the folder with the penguin icon takes you to an ftp directory which also contains sources for parallel architecture machines. clustalw1.83.UNIX.tar.gz appears to be the current UNIX version as of this writing. Download this compressed archive.
- 3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip clustalw1.83.UNIX.tar.gz tar -xvf clustalw1.83.UNIX.tar cd clustalw1.83
- Compile
Type:
make
On my system this compiles with one warning about a redefined symbol which does not appear to be of any consequence. The executable clustalw is being generated. The makefile is not really up to standard since it has no provisions for make test or make install. So we will run our own very simple test and installation. Remove the object files (they are no longer needed after being compiled and linked into the executable. Type
make clean
Since you also do not require the C sources anymore and could download them from the server at anytime if you did, you may also type
rm *.c *.h
to clean up the directory.
- Test
The directory contains a test input by the name globin.pep. These are globin sequences in the dated PIR format. I have transformed them into Fasta format below. Copy the following sequences and save them in a file by the name globin.mfa.
>HBB_HUMAN VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST PDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH >HBB_HORSE VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSN PGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDP ENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH >HBA_HUMAN VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKL LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR >HBA_HORSE VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSH GSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKL LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR >MYG_PHYCA VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLK TEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIP IKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELG YQG >GLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQE FFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRD LSGKHAKSFQVDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSAY >LGB2_LUPLU GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGT SEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSK GVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMN DAA
Now type
clustalw -options
to verify that the program runs in principle (this will print a list of the commandline options). Then run the following command:
clustalw -infile=globin.mfa
This should have created the two files globin.aln and globin.dnd with the following contents:
- globin.aln
CLUSTAL W (1.83) multiple sequence alignment HBB_HUMAN --------VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLST HBB_HORSE --------VQLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN HBA_HUMAN ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- HBA_HORSE ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- GLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT MYG_PHYCA ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT LGB2_LUPLU --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . HBB_HUMAN PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL HBB_HORSE PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL HBA_HUMAN ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL HBA_HORSE ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL GLB5_PETMA ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV MYG_PHYCA EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF LGB2_LUPLU VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG-VADAHFPV . .:: *. : . : *. * . : : . HBB_HUMAN LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ HBB_HORSE LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ HBA_HUMAN LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ HBA_HORSE LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ GLB5_PETMA LAAVIADTVAAG---------DAGFEKLMSMICILLRSAY------- MYG_PHYCA ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG LGB2_LUPLU VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: ... . :
- globin.dnd
( ( ( ( HBB_HUMAN:0.08080, HBB_HORSE:0.08359) :0.23578, ( HBA_HUMAN:0.06516, HBA_HORSE:0.05541) :0.19444) :0.07579, GLB5_PETMA:0.37023) :0.02699, MYG_PHYCA:0.37220, LGB2_LUPLU:0.47094);
- Install
To be able to run clustal from the commandline, it needs to be in a directory on your PATH. This could either be done by putting the program into a directory on the path, or by modifying the PATH appropriately. My preferred way to do this is to keep the executables in /usr/local/bin. First I copy the executable to the directory in which I keep my locally installed programs (type echo $PATH if you are not sure that this directory exists and is on your path on your own machine).
sudo cp clustalw /usr/local/bin
Finally I copy the help file into the same directory, so clustalw can find it:
sudo cp clustalw_help /usr/local/bin
- That should be all
- Notes
- (1) The Mac Os X archives contain two compiled binaries and an unintelligible readme.html. The binaries appear to run but miss any helpfiles or documentation. This is useless pseudosupport, the only thing you save yourself is the trivial task of compiling the executables but when you compile from source at least you get the complete kit and everything is nicely in its place.
T-Coffee
T-Coffee installation
- Outdated (written 2006).
- Download
- 1. navigate to the T-Coffee project homepage. Find the link to the latest Unix or Windows version (3.79 as of this writing).:
- 2. Download this compressed archive.
- 3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip T-COFFEE_distribution_Version_3.79.tar.gz tar -xvf T-COFFEE_distribution_Version_3.79.tar cd T-COFFEE_distribution_Version_3.79
- Compile
T-Coffee provides an installation file for its standard installation. Type
install
On my system this compiles with a spate of warnings about incompatible implicit declaration of a built-in function, but the tests appear to run oK. The executable t_coffee was generated and moved to the directory bin/. Remove the object files (they are no longer needed after being compiled and linked into the executable. Type
cd t_coffee_source make clean cd ..
- Install
To be able to run T-Coffee from the commandline, it needs to be in a directory on your PATH. This could either be done by putting the program into a directory on the path, or by modifying the PATH appropriately. My preferred way to do this is to keep the executables in /usr/local/bin. First I copy the executable to the directory in which I keep my locally installed programs (type echo $PATH if you are not sure that this directory exists and is on your path on your own machine).
sudo cp bin/t_coffee /usr/local/bin
- Test
As a final test, type the following (this assumes you are still in the main directory of the distribution:
t_coffee test/test.pep -in fast_pair -outfile=test.aln -outorder=input>/dev/null cat test.aln
should produce the following output
CLUSTAL FORMAT for T-COFFEE Version_3.79, CPU=0.16 sec, SCORE=24, Nseq=5, Len=87 1aboA ---nlf-valydfvasgdntlsitkgeklrvlg-----------ynhnge--wceaqtkn 1ycsB --kgvi-yalwdyepqnddelpmkegdcmtiih-----------rededeiewwwarlnd 1pht ---gyqyralydykkereedidlhlgdiltvnkgslvalgfsdgqearpeeigwlngyne 1ihvA -nfrvy-y--------rdsrdpvwkgpakllwk-----------gegavv-----iqdns 1vie drvrkk-s--------gaawqgqivgwyctnlt-----------pegyaves-----eah * : 1aboA gq-----------gwvpsnyitpvn-- 1ycsB ke-----------gyvprnllglyp-- 1pht ttgergdfpgtyveyigrkkisp---- 1ihvA di-----------kvvprrkakiird- 1vie pg-----------svqiypvaalerin
- That should be all
GBrowse
GBrowse:viewing annotations
Viewing anotations in GBrowse is actually quite straightforward.
If you study the section on third-party annotations in the GBrowse tutorial, you will notice that you can load GFF files from a remote server. So all you actually need to do is write a cgi-script that uploads a GFF formatted record. Try the following: put the following file into your /usr/local/apache/cgi-bin directory, call it annotest:
#!/usr/bin/perl -w use strict; print"Content-type: text/plain\n"; # MIME header print "\n"; # Blank line: payload begins here print "ctgA example motif 1 15000 . + . Motif mxy ; Note \"this is a test\""; exit;
Note the special MIME type text/plain!
Now first execute this by typing into your browser:
http://localhost/cgi-bin/annotest
Then acess the GBrowse tutorial volvox example and type the same URL into the URL field for "Add remote annotations"...
This shows the principle. Of course, to do something useful, we would like to send some parameters with the request. Type the following script and save it as /usr/local/apache/cgi-bin/annotate. Set the right ownership (sudo chown root annotate) and permissions (sudo chmod 755 annotate).
#!/usr/bin/perl -w # reads input from CGI in the form # http://localhost/cgi-bin/annotate?id=ctgA;start=1020;end=12250;accession=1XYZ:20..250 # returns an annotation in GFF format; # cf. http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml use strict; use CGI; my $input = CGI->new(); my $acc_ID = 'XXX'; my $acc_start = '000'; my $acc_end = '000'; my $accession=$input->param('accession'); if ($accession =~ m/^([^:]+):(\d+)\.\.(\d+)$/) { $acc_ID = $1; $acc_start = $2; $acc_end = $3; } my $seqid = $input->param('id'); my $source = "Annotbot"; my $type = "region"; # cf. SOFA ontology # http://cvs.sourceforge.net/viewcvs.py/song/ontology/sofa.ontology my $start = $input->param('start'); my $end = $input->param('end'); my $score = 0.0; my $strand = '-'; my $phase = 0; my @attributes; $attributes[0]= "$type \"Test annotation\";"; $attributes[1]= "Note \"Accession No. $acc_ID from $acc_start to $acc_end\";"; print"Content-type: text/plain\n"; # MIME header print "\n"; # Blank line: payload begins here print "$seqid\t"; print "$source\t"; print "$type\t"; print "$start\t"; print "$end\t"; print "$score\t"; print "$strand\t"; print "$phase\t"; foreach my $att (@attributes) { print $att; } exit;
Then try this out by typing the following into your browser
http://localhost/cgi-bin/annotate?id=ctgA;start=1020;end=1250;accession=1XYZ:20..250
... and finally paste this into the "remote annotations" field of the Volvox example database. Then try changing some of the parameters.
Gbrowse installation
(Outdated: written 2006)
Refer to http://www.gmod.org/ to ensure the installation instructions are current.
- Download
- 1. navigate to the GMOD download pages on sourceforge
- 2. Find the most recent version of the Generic-Genome-Browser (1.64 as of this writing). Download this compressed archive.
- 3. open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip Generic-Genome-Browser-1.64.tar.gz tar -xvf Generic-Genome-Browser-1.64.tar cd Generic-Genome-Browser-1.64
- Compile
Before you continue, read through the entire page of installatio information. There is information on how to install into non-default directories and how to install without requiring root access, and this may be useful for your specific situation. If you decide to go the default way, it is simply a question of typing:
perl Makefile.PL make sudo make install make clean
- Test
The installation instruction page discuss a quick test run with data that is supplied in the installation. Point your browser to http://localhost/cgi-bin/gbrowse (of course your Apache server has to be running for this to work).
More instructions and a more detailed tutorial are found at http://localhost/gbrowse/tutorial/tutorial.html .
Further reading and resources