BioPerl exercise signal cleavage
Jump to navigation
Jump to search
BioPerl exercise: signal cleavage site
The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!
Summary ...
Contents
Introductory reading
Introduction
This is a small task to exercise sequence object creation with a protein sequence, and use in signal peptide prediction.
Task
Write a small program that predicts (and prints) the signal sequence cleavage site in the E.coli ompA protein:
MKKTAIAIAVALAGFATVAQAAPKDNTWYTGAKLGWSQYHDTGFI NNNGPTHENQLGAGAFGGYQVNPYVGFEMGYDWLGRMPYKGSVENGAYKAQGVQLTAKL GYPITDDLDIYTRLGGMVWRADTKSNVYGKNHDTGVSPVFAGGVEYAITPEIATRLEYQ WTNNIGDAHTIGTRPDNGMLSLGVSYRFGQGEAAPVVAPAPAPAPEVQTKHFTLKSDVL FNFNKATLKPEGQAALDQLYSQLSNLDPKDGSVVVLGYTDRIGSDAYNQGLSERRAQSV VDYLISKGIPADKISARGMGESNPVTGNTCDNVKQRAALIDCLAPDRRVEIEVKGIKDV VTQPQA
References
Hints
- Create a variable that holds the sequence
- Create a sequence object using that variable
- Create a SigCleave object with the sequence object
- Use the SigCleave Object's pretty_print() method on the sequence object to calculate the result and print the output
Solution(s)
Minimal solution: coding sequence as a literal, use all default settings, just exercising the creation and use of sequence and signal cleavage objects:
#!/usr/bin/perl -w use strict; use Bio::Seq; use Bio::Tools::Sigcleave; my $sequence = "MKKTAIAIAVALAGFATVAQAAPKDNTWYTGAKLGWSQYHDTGFI"; $sequence .= "NNNGPTHENQLGAGAFGGYQVNPYVGFEMGYDWLGRMPYKGSVENGAYKAQGVQLTAKL"; $sequence .= "GYPITDDLDIYTRLGGMVWRADTKSNVYGKNHDTGVSPVFAGGVEYAITPEIATRLEYQ"; $sequence .= "WTNNIGDAHTIGTRPDNGMLSLGVSYRFGQGEAAPVVAPAPAPAPEVQTKHFTLKSDVL"; $sequence .= "FNFNKATLKPEGQAALDQLYSQLSNLDPKDGSVVVLGYTDRIGSDAYNQGLSERRAQSV"; $sequence .= "VDYLISKGIPADKISARGMGESNPVTGNTCDNVKQRAALIDCLAPDRRVEIEVKGIKDV"; $sequence .= "VTQPQA"; my $seq_obj = new Bio::Seq (-seq => $sequence); # Create a new SigCleave Object my $cleave_obj = new Bio::Tools::Sigcleave(-seq => $seq_obj); # Carry out Sigcleave process and print formatted output print $cleave_obj->pretty_print(); exit ();
Compact usable solution: read threshold and matrix type from commandline, pass sequence via STDIN, a bit of sanity handling of input sequence.
#!/usr/bin/perl -w # reads a sequence from STDIN abd an optional argument flag # -e or -p for (p)rokaryotic or (e)ukaryotic cleavage prediction. # Sequence can be raw amino acids or fasta. Prints results. # usage example: cleave.pl -e < test.fa use strict; use Bio::Seq; use Bio::Tools::Sigcleave; #read the optional commandline argument my $type = 'procaryotic'; if (defined($ARGV[0]) && substr($ARGV[0],1,1) eq 'e') { $type = 'eucaryotic'; } my $sequence = ""; # read the sequence while (my $string = <STDIN>) { if ($string !~ m/^>/) { # if the string does not begin with a fasta header character ... chomp($string); $string =~ s/\s+|[^A-Za-z]//g; # delete all whitespace characters # or all characters not in the range A-Z,a-z $sequence .= $string; # concatenate } } # Create a new sequence object my $seq_obj = new Bio::Seq (-seq => $sequence, -alphabet => 'protein'); # Create a new SigCleave Object my $cleave_obj = new Bio::Tools::Sigcleave(-seq => $seq_obj); # Set the analysis matrix $cleave_obj->matrix($type); # Carry out Sigcleave process and print formatted output print $cleave_obj->pretty_print(); exit ();
Interactive solution: asks for sequence type and filename, reads sequence from file using Bio::SeqIO (contributed by Jamie)
#!/usr/bin/perl -w use strict; use Bio::Seq; use Bio::SeqIO; use Bio::Tools::Sigcleave; my $seq_type; # Introduction and obtaining filename containing sequence and source (prok or euk) print "Welcome to the Signal Cleavage Site Analysis Tool.\n"; print "Enter Sequence Filename: "; my $sequencefile = <STDIN>; chomp $sequencefile; print "Prokaryotic (P) / Eukaryotic (E): "; my $source = <STDIN>; chomp $source; if ($source =~ /^P/) { $seq_type = "procaryotic"; } elsif ($source =~ /^E/) { $seq_type = "eucaryotic"; } else { print "Invalid Entry. Program Aborted. \n"; exit (); } # Creation of input object and opening of sequence file my $seqio_obj = Bio::SeqIO->new(-file => $sequencefile, -format => "fasta"); my $seq_obj = $seqio_obj->next_seq; # Printing out (verifying) input sequence print "\nInput Sequence:\n"; print $seq_obj->seq; print "\n\n"; # Create a new SigCleave Object my $cleave = new Bio::Tools::Sigcleave(-seq => $seq_obj, -threshold=>'3.5'); $cleave ->matrix($seq_type); # Carry out Sigcleave process and output my $result = $cleave->pretty_print; print $result; exit ();
Further reading and resources