BioPerl exercise alignment
Revision as of 15:09, 15 September 2012 by Boris (talk | contribs) (→Further reading and resources)
BioPerl: alignment exercise
The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!
Summary ...
Contents
Introductory reading
Introduction
This is an exercise to produce a pairwise sequence alignment in BioPerl and to analyse the results. I includes work with sequence objects, alignment objects and a Bio::Tools factory.
Task
Write a small program that produces an optimal sequence alignment of the yeast transcription factor APSES domains Mbp1 and Phd1 (given here in Fasta format):
>Mbp1_APSES domain from alignment to PFAM|02292.11 SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG
>Phd1_APSES domain SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKG VWIPFERAYILAQREQILDHLYPLFVKDIESI
References
Hints
- The BioPerl alignment extensions (ext package) must have been compiled and installed
- The location of the blosum62 scoring matrix must be known and defined. The matrix came with the BioPerl ext package, it is in bioperl-ext-1.4/Bio/Ext/Align/ but you probably should create a subdirectory for data such as this one, for example /usr/local/share/Bio/); another, similar matrix of this type came with CLUSTALW: clustalw1.83/gon90.bla
- Create two variables that hold the two sequences to be aligned
- Create a sequence object for each variable
- Create an alignment factory object
- Use the alignment factory's align_and_show() method on the two sequence objects to align and print the result
Solution(s)
Minimal solution: define sequences as literals, just exercise alignment
#!/usr/bin/perl -w use strict; use Bio::Seq; use Bio::Tools::pSW; # Define two sequences my $Mbp1 = "SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY"; $Mbp1 .= "QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG"; my $Phd1 = "SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKG"; $Phd1 .= "VWIPFERAYILAQREQILDHLYPLFVKDIESI"; # Define location of matrix my $DataDir ='/usr/local/share/Bio/'; my $matrix = $DataDir . 'blosum62.bla'; # Create two sequence objects my $seq_obj_A = Bio::Seq->new(-seq => $Mbp1); my $seq_obj_B = Bio::Seq->new(-seq => $Phd1); # Create an alignment factory object my $align_factory = new Bio::Tools::pSW('-matrix' => $matrix, '-gap' => 12, '-ext' => 2, ); # Have the two sequences aligned in the factory and # the output sent to STDOUT $align_factory->align_and_show($seq_obj_A, $seq_obj_B,\*STDOUT); exit();
More elaborate solution: begin as before, but then capture the alignment in a Bio::SimpleAlign object, analyse it, then open a Bio::AlignIO output stream and print the alignment.
#!/usr/bin/perl -w use strict; use Bio::Seq; use Bio::Tools::pSW; use Bio::SimpleAlign; use Bio::AlignIO; # Define two sequences my $Mbp1 = "SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY"; $Mbp1 .= "QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG"; my $Phd1 = "SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKG"; $Phd1 .= "VWIPFERAYILAQREQILDHLYPLFVKDIESI"; # Define location of matrix my $DataDir ='/usr/local/share/Bio/'; my $matrix = $DataDir . 'blosum62.bla'; # Create two sequence objects my $seq_obj_A = Bio::Seq->new(-seq => $Mbp1); my $seq_obj_B = Bio::Seq->new(-seq => $Phd1); # Create an alignment factory object my $align_factory = new Bio::Tools::pSW('-matrix' => $matrix, '-gap' => 12, '-ext' => 2, ); # align, and create a Bio::SimpleAlign object my $ align_obj = $align_factory->pairwise_alignment($seq_obj_A, $seq_obj_B); # print identity in percent as formatted output printf ("Identity: %4.1f %%\n", $align_obj->percentage_identity() ); # create output stream with AlignIO my $OUTSTREAM = Bio::AlignIO->newFh('-format' => 'msf', '-fh' => \*STDOUT ); print $OUTSTREAM $align_obj; exit();
Further reading and resources