BioPerl exercise alignment

From "A B C"
Revision as of 15:09, 15 September 2012 by Boris (talk | contribs) (→‎Further reading and resources)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

BioPerl: alignment exercise


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Summary ...



 

Introductory reading



 

Introduction

This is an exercise to produce a pairwise sequence alignment in BioPerl and to analyse the results. I includes work with sequence objects, alignment objects and a Bio::Tools factory.

Task

Write a small program that produces an optimal sequence alignment of the yeast transcription factor APSES domains Mbp1 and Phd1 (given here in Fasta format):

>Mbp1_APSES domain from alignment to PFAM|02292.11
SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY
QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG
>Phd1_APSES domain
SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKG
VWIPFERAYILAQREQILDHLYPLFVKDIESI

References

Hints

  1. The BioPerl alignment extensions (ext package) must have been compiled and installed
  2. The location of the blosum62 scoring matrix must be known and defined. The matrix came with the BioPerl ext package, it is in bioperl-ext-1.4/Bio/Ext/Align/ but you probably should create a subdirectory for data such as this one, for example /usr/local/share/Bio/); another, similar matrix of this type came with CLUSTALW: clustalw1.83/gon90.bla
  3. Create two variables that hold the two sequences to be aligned
  4. Create a sequence object for each variable
  5. Create an alignment factory object
  6. Use the alignment factory's align_and_show() method on the two sequence objects to align and print the result

Solution(s)

Minimal solution: define sequences as literals, just exercise alignment

#!/usr/bin/perl -w
use strict;

use Bio::Seq;
use Bio::Tools::pSW;

# Define two sequences
my $Mbp1  = "SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY";
   $Mbp1 .= "QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG";

my $Phd1  = "SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKG";
   $Phd1 .= "VWIPFERAYILAQREQILDHLYPLFVKDIESI";

# Define location of matrix
my $DataDir ='/usr/local/share/Bio/';
my $matrix = $DataDir . 'blosum62.bla';

# Create two sequence objects
my $seq_obj_A = Bio::Seq->new(-seq => $Mbp1);
my $seq_obj_B = Bio::Seq->new(-seq => $Phd1);

# Create an alignment factory object
my $align_factory = new Bio::Tools::pSW('-matrix' => $matrix,
                                        '-gap' => 12,
                                        '-ext' => 2,
                                       );

# Have the two sequences aligned in the factory and
# the output sent to STDOUT
$align_factory->align_and_show($seq_obj_A, $seq_obj_B,\*STDOUT);

exit();

More elaborate solution: begin as before, but then capture the alignment in a Bio::SimpleAlign object, analyse it, then open a Bio::AlignIO output stream and print the alignment.

#!/usr/bin/perl -w
use strict;

use Bio::Seq;
use Bio::Tools::pSW;
use Bio::SimpleAlign;
use Bio::AlignIO;

# Define two sequences
my $Mbp1  = "SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY";
   $Mbp1 .= "QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG";

my $Phd1  = "SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKG";
   $Phd1 .= "VWIPFERAYILAQREQILDHLYPLFVKDIESI";

# Define location of matrix
my $DataDir ='/usr/local/share/Bio/';
my $matrix = $DataDir . 'blosum62.bla';

# Create two sequence objects
my $seq_obj_A = Bio::Seq->new(-seq => $Mbp1);
my $seq_obj_B = Bio::Seq->new(-seq => $Phd1);

# Create an alignment factory object
my $align_factory = new Bio::Tools::pSW('-matrix' => $matrix,
                                        '-gap' => 12,
                                        '-ext' => 2,
                                       );

# align, and create a Bio::SimpleAlign object 
my $ align_obj = $align_factory->pairwise_alignment($seq_obj_A, $seq_obj_B);

# print identity in percent as formatted output
printf ("Identity: %4.1f %%\n", $align_obj->percentage_identity() );

# create output stream with AlignIO
my $OUTSTREAM = Bio::AlignIO->newFh('-format' => 'msf',
                                    '-fh'     => \*STDOUT
                                   );
print $OUTSTREAM $align_obj;

exit();


   

Further reading and resources