BioPerl
BioPerl
Summary ...
Contents
Related Pages
|
Introductory reading
Stajich (2007) An Introduction to BioPerl. Methods Mol Biol 406:535-48. (pmid: 18287711) |
[ PubMed ] [ DOI ] The BioPerl toolkit provides a library of hundreds of routines for processing sequence, annotation, alignment, and sequence analysis reports. It often serves as a bridge between different computational biology applications assisting the user to construct analysis pipelines. This chapter illustrates how BioPerl facilitates tasks such as writing scripts summarizing information from BLAST reports or extracting key annotation details from a GenBank sequence record. |
Installation
Installing BioPerl is reasonably straightforward, all that needs to be done is to ensure that the right files (Perl modules) end up in the directories where a normal Perl installation can access them.
See the notes on the Bioperl Wiki (getting Bioperl) and the installation notes for Unix (or for Windows, note that regarding Windows you are on your own, we will only cover Unix in the course).
1: download the appropriate release from the Bioperl site (as of this writing this should be at least 1.5.1, but may be a later version) into your home directory or another directory to which you have write access. (For me this might be /home/steipe/downloads). You might want to create a downloads directory to store original downloads for a while if you don't have one.
2: ensure you have the right permissions to work in the libraries where Perl modules are to be found (i.e. directrories on the Perl path, stored in the environment variable @INC). Best if you have root access, otherwise you need a personal installation (see the INSTALL notes). On my system (Mac OS X), Perl itself and its core modules are installed in System/Library/Perl/ and additional Perl modules that I install from CPAN or elsewhere are installed below the directory /Library/Perl; both these directories are on the Perl path; I'll use the latter directory name for these instructions but you can of course substitute another if necessary.
3: uncompress and un-archive the distribution
- $ gunzip current_core_unstable.tar.gz
- $ tar -xvf current_core_unstable.tar
- $ rm current_core_unstable.tar
4: this should have created the directory bioperl-1.5.1/. Now prepare the modules: creating the makefile will tell you what dependencies are not installed and which parts of Bioperl this will affect, don't worry about this for now, we'll install more things later in the course.
- $ cd bioperl-1.5.1
- $ perl Makefile.PL
- $ make
- $ make test
5: the final command should have rattled off a large number of unit tests. Those that can't function because of missing dependencies should have been skipped and the others should mostly have passed. Finally put all modules into their right place on Perl's path. (Note that this command will probably fail if not issued under sudo i.e. with root-user privileges.)
- $ sudo make install
6: Now you can navigate back to your home directory and test that the right version of BioPerl is being found and used:
- $ cd ~
- $ perl -MBio::Perl -le "print Bio::Perl->VERSION;"
If this prints 1.5 instead of complaining about not finding the module, you're all good to go. If you encounter any problems with the installation, please add to the discussion page here!
Installing Bundle::BioPerl
A Bundle is a collection of modules that somehow belong together. Bundle::BioPerl contains most (if not all) of the dependencies that a standard installation of BioPerl uses. See the BioPerl Install notes for details but actually all you need to do is type:
sudo perl -MCPAN -e "install Bundle::BioPerl"
Note that this installation will not be entirely successful at the first run, due to missing dependencies that currently are not obtained from CPAN and need to be compiled. expat appears to be one of these and GD won't install before libgd has been installed. If this happens, look at the logs and figure out what appears to be missing, then search either via CPAN or via Google. Download, following the Perl installation notes or the Unix installation notes. Then simply run the CPAN installation again. You may have to iterate a few times, but CPAN will keep track of what files have been successfully installed and skip these.
Installing the run modules
These modules contain installers and interfaces to important tools such as T-Coffee, EMBOSS applications, Phylogeny ...
1: download the latest stable run modules archive via the Bioperl Wiki Download page.
2: download and install minimally the following supported programs:
- CLUSTAL
- the EMBOSS package
- T-Coffee
- PHYLIP
- we will add local BLAST at a later time ...
3: read and follow the BioPerl run installation notes:
4: test successful installation: (TBC).
BioPerl ext the "C extension" modules
- Download
Navigate to the the Bioperl Wiki Download page and download the latest stable ext modules archive (if you haven't done so already).
Open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip current_ext_stable.tar.gz tar -xvf current_ext_stable.tar cd bioperl-ext-1.4 cat README
As you see in the README file, the ext library currently contains two modules: Bio::Seq::align and Bio:SeqIO::staden:: read. The former is a generally useful tool, but the latter is only needed to read a particular format of sequence trace files. We will skip this part of the installation and run only the Makefile.PL that installs the alignment modules, by navigating down to the subdirectory and running only the Makefile that is there:
- Prepare, test and install
cd Bio/Ext/Align perl Makefile.PL make make test sudo make install
That should generate a successfull sequence -alignment test and then move the modules in their right place on the Perl path. Done.
BioPerl run
- Download
Navigate to the the Bioperl Wiki Download page and download the latest stable run modules archive (if you haven't done so already).
Open a terminal session, navigate to your download directory and type the usual (remember to use the tab key for filename completion :-):
gunzip current_run_stable.tar.gz tar -xvf current_run_stable.tar cd bioperl-run-1.4
- Prepare
Since this is a Perl distribution, we type the usual
perl Makefile.PL
On my system this generates the following output:
[...] External Module Algorithm::Diff, Compute intelligent differences between two files, is not installed on this computer. The TribeMCL module in bioperl-run needs it for generating consensus protein family descriptions Warning: There are some external packages and perl modules, listed above, which bioperl-run uses. This only effects the functionality which is listed above: the rest of bioperl-run will work fine.
Thus I install the missing module(s) from CPAN ...
cd .. sudo perl -MCPAN -e 'install Algorithm::Diff'
This went on without problem, back to bioperl ...
cd bioperl-run-1.4 perl Makefile.PL
This goes without warnings now, so ...
make
- Test
Most of the suppported programs have not been installled by us, so there is no point in testing them. But we can list the availble tests and execute only those that interest us.
make show_tests make test_Clustalw
This runs oK. However the following tests pass only mostly correctly (The BioPerl community is not overly concerned about distributing code that does not rigorously pass all tests. It seems to me that in most of these cases the problems are with the tests, not with the code.).
make test_ProtPars
... tests the protein parsimony module in the PHYLIP package
make test_TCoffee make test EMBOSS
Nevertheless, despite the reported errors most tests concerning EMBOSS, most test concerning the PHYLIP programs and most tests concerning T-Coffee should pass.
- Install
Finally type:
sudo make install
That should move the modules in their right place on the Perl path and create man files for the installed components. Done.
Programming BioPerl
Please see the following tutorials on the Web:
- BioPerl Wiki - Howto:Beginners
- The Beginners' instructions in the BioPerl Wiki covers the basic use of Bio::Seq and Bio::SeqIO with first steps of BLAST (however we haven't installed BLAST yet).
- BioPerl Tutorial
- The excellent and comprehensive work of many BioPerl authors.
- A BioPerl course
- A comprehensive course at the Institut Pasteur. Contains structured chapters covering the essential aspects of BioPerl, sample data and example code, as well as references to the BioPerl tutorial.
BioPerl tutorials
Further reading and resources