BioPerl exercise signal cleavage

From "A B C"
Jump to navigation Jump to search

BioPerl exercise: signal cleavage site


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Summary ...



 

Introductory reading



 

Introduction

This is a small task to exercise sequence object creation with a protein sequence, and use in signal peptide prediction.


Task

Write a small program that predicts (and prints) the signal sequence cleavage site in the E.coli ompA protein:

MKKTAIAIAVALAGFATVAQAAPKDNTWYTGAKLGWSQYHDTGFI
NNNGPTHENQLGAGAFGGYQVNPYVGFEMGYDWLGRMPYKGSVENGAYKAQGVQLTAKL
GYPITDDLDIYTRLGGMVWRADTKSNVYGKNHDTGVSPVFAGGVEYAITPEIATRLEYQ
WTNNIGDAHTIGTRPDNGMLSLGVSYRFGQGEAAPVVAPAPAPAPEVQTKHFTLKSDVL
FNFNKATLKPEGQAALDQLYSQLSNLDPKDGSVVVLGYTDRIGSDAYNQGLSERRAQSV
VDYLISKGIPADKISARGMGESNPVTGNTCDNVKQRAALIDCLAPDRRVEIEVKGIKDV
VTQPQA

References

Hints

  1. Create a variable that holds the sequence
  2. Create a sequence object using that variable
  3. Create a SigCleave object with the sequence object
  4. Use the SigCleave Object's pretty_print() method on the sequence object to calculate the result and print the output

Solution(s)

Minimal solution: coding sequence as a literal, use all default settings, just exercising the creation and use of sequence and signal cleavage objects:

#!/usr/bin/perl -w

use strict;

use Bio::Seq;
use Bio::Tools::Sigcleave;

my $sequence  = "MKKTAIAIAVALAGFATVAQAAPKDNTWYTGAKLGWSQYHDTGFI";
   $sequence .= "NNNGPTHENQLGAGAFGGYQVNPYVGFEMGYDWLGRMPYKGSVENGAYKAQGVQLTAKL";
   $sequence .= "GYPITDDLDIYTRLGGMVWRADTKSNVYGKNHDTGVSPVFAGGVEYAITPEIATRLEYQ";
   $sequence .= "WTNNIGDAHTIGTRPDNGMLSLGVSYRFGQGEAAPVVAPAPAPAPEVQTKHFTLKSDVL";
   $sequence .= "FNFNKATLKPEGQAALDQLYSQLSNLDPKDGSVVVLGYTDRIGSDAYNQGLSERRAQSV";
   $sequence .= "VDYLISKGIPADKISARGMGESNPVTGNTCDNVKQRAALIDCLAPDRRVEIEVKGIKDV";
   $sequence .= "VTQPQA";

my $seq_obj = new Bio::Seq (-seq => $sequence);

# Create a new SigCleave Object
my $cleave_obj = new Bio::Tools::Sigcleave(-seq => $seq_obj);

# Carry out Sigcleave process and print formatted output
print $cleave_obj->pretty_print();

exit ();

Compact usable solution: read threshold and matrix type from commandline, pass sequence via STDIN, a bit of sanity handling of input sequence.

#!/usr/bin/perl -w
# reads a sequence from STDIN abd an optional argument flag
# -e or -p for (p)rokaryotic or (e)ukaryotic cleavage prediction.
# Sequence can be raw amino acids or fasta. Prints results.
# usage example:  cleave.pl -e < test.fa

use strict;

use Bio::Seq;
use Bio::Tools::Sigcleave;

#read the optional commandline argument
my $type = 'procaryotic';
if (defined($ARGV[0]) && substr($ARGV[0],1,1) eq 'e') {
    $type = 'eucaryotic';
}

my $sequence = "";
# read the sequence
while (my $string = <STDIN>) {
    if ($string !~ m/^>/) {  # if the string does not begin with a fasta header character ...
        chomp($string);
        $string =~ s/\s+|[^A-Za-z]//g;   # delete all whitespace characters
                                         # or all characters not in the range A-Z,a-z
        $sequence .= $string;            # concatenate
    }
}

# Create a new sequence object
my $seq_obj = new Bio::Seq (-seq => $sequence, -alphabet => 'protein');

# Create a new SigCleave Object
my $cleave_obj = new Bio::Tools::Sigcleave(-seq => $seq_obj);

# Set the analysis matrix
$cleave_obj->matrix($type);

# Carry out Sigcleave process and print formatted output
print $cleave_obj->pretty_print();

exit ();

Interactive solution: asks for sequence type and filename, reads sequence from file using Bio::SeqIO (contributed by Jamie)


#!/usr/bin/perl -w
use strict;

use Bio::Seq;
use Bio::SeqIO;
use Bio::Tools::Sigcleave;

my $seq_type;

# Introduction and obtaining filename containing sequence and source (prok or euk)

print "Welcome to the Signal Cleavage Site Analysis Tool.\n";
print "Enter Sequence Filename: ";

my $sequencefile = <STDIN>;
chomp $sequencefile;

print "Prokaryotic (P) / Eukaryotic (E): ";
my $source = <STDIN>;
chomp $source;

if ($source =~ /^P/)
    {
        $seq_type = "procaryotic";
    }
elsif ($source =~ /^E/)
    {
        $seq_type = "eucaryotic";
    }
else
    {
        print "Invalid Entry. Program Aborted. \n";
        exit ();
    }

# Creation of input object and opening of sequence file
my $seqio_obj = Bio::SeqIO->new(-file => $sequencefile, -format => "fasta");
my $seq_obj = $seqio_obj->next_seq;

# Printing out (verifying) input sequence
print "\nInput Sequence:\n";
print $seq_obj->seq;
print "\n\n";

# Create a new SigCleave Object
my $cleave = new Bio::Tools::Sigcleave(-seq => $seq_obj, -threshold=>'3.5');
$cleave ->matrix($seq_type);

# Carry out Sigcleave process and output
my $result = $cleave->pretty_print;
print $result;

exit ();


   

Further reading and resources