BioPerl

From "A B C"
Revision as of 13:01, 16 September 2012 by Boris (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

BioPerl


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Summary ...


Related Pages


 

Introductory reading

Stajich (2007) An Introduction to BioPerl. Methods Mol Biol 406:535-48. (pmid: 18287711)

PubMed ] [ DOI ] The BioPerl toolkit provides a library of hundreds of routines for processing sequence, annotation, alignment, and sequence analysis reports. It often serves as a bridge between different computational biology applications assisting the user to construct analysis pipelines. This chapter illustrates how BioPerl facilitates tasks such as writing scripts summarizing information from BLAST reports or extracting key annotation details from a GenBank sequence record.


 

Installation

Installing BioPerl is reasonably straightforward, all that needs to be done is to ensure that the right files (Perl modules) end up in the directories where a normal Perl installation can access them.

See the notes on the Bioperl Wiki (getting Bioperl) and the installation notes for Unix (or for Windows, note that regarding Windows you are on your own, we will only cover Unix in the course).

1: download the appropriate release from the Bioperl site (as of this writing this should be at least 1.5.1, but may be a later version) into your home directory or another directory to which you have write access. (For me this might be /home/steipe/downloads). You might want to create a downloads directory to store original downloads for a while if you don't have one.

2: ensure you have the right permissions to work in the libraries where Perl modules are to be found (i.e. directrories on the Perl path, stored in the environment variable @INC). Best if you have root access, otherwise you need a personal installation (see the INSTALL notes). On my system (Mac OS X), Perl itself and its core modules are installed in System/Library/Perl/ and additional Perl modules that I install from CPAN or elsewhere are installed below the directory /Library/Perl; both these directories are on the Perl path; I'll use the latter directory name for these instructions but you can of course substitute another if necessary.

3: uncompress and un-archive the distribution

$ gunzip current_core_unstable.tar.gz
$ tar -xvf current_core_unstable.tar
$ rm current_core_unstable.tar

4: this should have created the directory bioperl-1.5.1/. Now prepare the modules: creating the makefile will tell you what dependencies are not installed and which parts of Bioperl this will affect, don't worry about this for now, we'll install more things later in the course.

$ cd bioperl-1.5.1
$ perl Makefile.PL
$ make
$ make test

5: the final command should have rattled off a large number of unit tests. Those that can't function because of missing dependencies should have been skipped and the others should mostly have passed. Finally put all modules into their right place on Perl's path. (Note that this command will probably fail if not issued under sudo i.e. with root-user privileges.)

$ sudo make install

6: Now you can navigate back to your home directory and test that the right version of BioPerl is being found and used:

$ cd ~
$ perl -MBio::Perl -le "print Bio::Perl->VERSION;"

If this prints 1.5 instead of complaining about not finding the module, you're all good to go. If you encounter any problems with the installation, please add to the discussion page here!

Installing Bundle::BioPerl

A Bundle is a collection of modules that somehow belong together. Bundle::BioPerl contains most (if not all) of the dependencies that a standard installation of BioPerl uses. See the BioPerl Install notes for details but actually all you need to do is type:

sudo perl -MCPAN -e "install Bundle::BioPerl"

Note that this installation will not be entirely successful at the first run, due to missing dependencies that currently are not obtained from CPAN and need to be compiled. expat appears to be one of these and GD won't install before libgd has been installed. If this happens, look at the logs and figure out what appears to be missing, then search either via CPAN or via Google. Download, following the Perl installation notes or the Unix installation notes. Then simply run the CPAN installation again. You may have to iterate a few times, but CPAN will keep track of what files have been successfully installed and skip these.

Installing the run modules

These modules contain installers and interfaces to important tools such as T-Coffee, EMBOSS applications, Phylogeny ...

1: download the latest stable run modules archive via the Bioperl Wiki Download page.

2: download and install minimally the following supported programs:

3: read and follow the BioPerl run installation notes:

4: test successful installation: (TBC).

BioPerl ext the "C extension" modules

Download

Navigate to the the Bioperl Wiki Download page and download the latest stable ext modules archive (if you haven't done so already).

Open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):

gunzip current_ext_stable.tar.gz
tar -xvf current_ext_stable.tar
cd bioperl-ext-1.4
cat README

As you see in the README file, the ext library currently contains two modules: Bio::Seq::align and Bio:SeqIO::staden:: read. The former is a generally useful tool, but the latter is only needed to read a particular format of sequence trace files. We will skip this part of the installation and run only the Makefile.PL that installs the alignment modules, by navigating down to the subdirectory and running only the Makefile that is there:

Prepare, test and install
cd Bio/Ext/Align
perl Makefile.PL
make
make test
sudo make install

That should generate a successfull sequence -alignment test and then move the modules in their right place on the Perl path. Done.

 

BioPerl run

Download

Navigate to the the Bioperl Wiki Download page and download the latest stable run modules archive (if you haven't done so already).

Open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):

gunzip current_run_stable.tar.gz
tar -xvf current_run_stable.tar
cd bioperl-run-1.4
Prepare

Since this is a Perl distribution, we type the usual

perl Makefile.PL

On my system this generates the following output:

[...]
External Module Algorithm::Diff, Compute intelligent differences between two files,
 is not installed on this computer.
  The TribeMCL module in bioperl-run  needs it for generating consensus protein family descriptions

Warning:

   There are some external packages and perl modules, listed above, which 
   bioperl-run uses. This only effects the functionality which is listed above:
   the rest of bioperl-run will work fine.

Thus I install the missing module(s) from CPAN ...

cd ..
sudo perl -MCPAN -e 'install Algorithm::Diff' 

This went on without problem, back to bioperl ...

cd bioperl-run-1.4
perl Makefile.PL

This goes without warnings now, so ...

make
Test

Most of the suppported programs have not been installled by us, so there is no point in testing them. But we can list the availble tests and execute only those that interest us.

make show_tests
make test_Clustalw

This runs oK. However the following tests pass only mostly correctly (The BioPerl community is not overly concerned about distributing code that does not rigorously pass all tests. It seems to me that in most of these cases the problems are with the tests, not with the code.).

make test_ProtPars

... tests the protein parsimony module in the PHYLIP package

make test_TCoffee
make test EMBOSS

Nevertheless, despite the reported errors most tests concerning EMBOSS, most test concerning the PHYLIP programs and most tests concerning T-Coffee should pass.

Install

Finally type:

sudo make install

That should move the modules in their right place on the Perl path and create man files for the installed components. Done.


Programming BioPerl

Please see the following tutorials on the Web:

BioPerl Wiki - Howto:Beginners
The Beginners' instructions in the BioPerl Wiki covers the basic use of Bio::Seq and Bio::SeqIO with first steps of BLAST (however we haven't installed BLAST yet).
BioPerl Tutorial
The excellent and comprehensive work of many BioPerl authors.
A BioPerl course
A comprehensive course at the Institut Pasteur. Contains structured chapters covering the essential aspects of BioPerl, sample data and example code, as well as references to the BioPerl tutorial.


 

BioPerl tutorials

   

Further reading and resources