BioPerl exercise restriction

From "A B C"
Jump to navigation Jump to search

BioPerl exercise: restriction site


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Summary ...



 

Introductory reading



 

Introduction

This is an exercise in BioPerl fundamentals: create a sequence object and do something with it.

Task

Write a small program that finds (and prints) the XbaI restriction site in the following nucleotide sequence:

AGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCG
GATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGAATTTCTAGATAACGAGG
GCAAAAAATGAAAAAGACAGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTACCGT

References

Hints

  1. Create a variable that holds the sequence
  2. Create a sequence object using that variable
  3. Create a restriction object with the enzyme you want to use
  4. Use the restriction object's cut_seq() method on the sequence object to create a list of fragments
  5. Print the fragments

Solution(s)

Minimal solution: coding sequence and enzyme name as literals, just exercising the creation and use of sequence and restriction objects:

#!/usr/bin/perl -w

use strict;

use Bio::Seq;
use Bio::Tools::RestrictionEnzyme;

my $enzyme = "XbaI";

my $sequence  = "AGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCG";
   $sequence .= "GATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGAATTTCTAGATAACGAGG";
   $sequence .= "GCAAAAAATGAAAAAGACAGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTACCGT";

# Creation of sequence object
my $seq_obj = Bio::Seq->new(-seq => $sequence, -alphabet => 'dna');

# Creation of Restriction Enzyme Object
my $rest_obj = new Bio::Tools::RestrictionEnzyme(-NAME => "XbaI");

# Print fragments
foreach ($rest_obj->cut_seq($seq_obj)) {
      print "$_\n\n";
   }

exit ();


Compact usable solution: read enzyme name from commandline, pass sequence via STDIN, a bit of sanity handling of input sequence

#!/usr/bin/perl -w
# reads a restriction enzyme from commandline and a sequence from
# STDIN. Sequence can be raw nucleotides or fasta
# usage example  restrict.pl XbaI < test.fa
use strict;

use Bio::Seq;
use Bio::Tools::RestrictionEnzyme;

my $enzyme = shift(@ARGV);

my $sequence = "";
# read the sequence
while (my $string = <STDIN>) {
    if ($string !~ m/^>/) {  # if the string does not begin with a fasta header character ...
        chomp($string);
        $string =~ s/\s+|[^A-Za-z]//g;   # delete all whitespace characters
                                         # or all characters not in the range A-Z,a-z
        $sequence .= $string;            # concatenate
    }
}

# Creation of sequence object
my $seq_obj = Bio::Seq->new(-seq => $sequence, -alphabet => 'dna');

# Creation of Restriction Enzyme Object
my $rest_obj = new Bio::Tools::RestrictionEnzyme(-NAME => $enzyme);

# Get a list of fragments
my @fragments = $rest_obj->cut_seq($seq_obj);

# Outputs enzyme name, restriction site and the fragments generated from cleavage
print "Enzyme: $enzyme \n";
print "Recognition Site: ", $rest_obj->site, "\n\n";
print "List of Fragments: \n";

foreach my $temp (@fragments) {
      print "$temp\n\n";
   }

exit ();


Interactive solution: asks for enzyme and filename, reads sequence from file using Bio::SeqIO (contributed by Jamie)


#!/usr/bin/perl -w
use strict;

use Bio::Seq;
use Bio::SeqIO;
use Bio::Tools::RestrictionEnzyme;

# Introduction and obtaining filename containing sequence/ restriction site
print "Welcome to the Restriction Analysis Tool.\n";
print "Enter Sequence Filename: ";
my $sequencefile = <STDIN>;
chomp($sequencefile);

print "Enter Query Enzyme: ";
my $enz = <STDIN>;
chomp($enz);

# Creation of input object and opening of sequence file
my $seqio_obj = Bio::SeqIO->new(-file => $sequencefile, -format => "fasta");
my $seq_obj = $seqio_obj->next_seq;

# Printing out (verifying) input sequence
print "\nInput Sequence:\n";
print $seq_obj->seq;
print "\n\n";

# Create a new Restriction Enzyme Object
my $rest = new Bio::Tools::RestrictionEnzyme(-NAME => $enz);

# Get a list of fragments resulting when a sequence is cut by the indicated enzyme
my @fragments = $rest->cut_seq($seq_obj);

# Outputs enzyme name, restriction site and the fragments generated from cleavage
print "Enzyme: $enz \n";
print "Recognition Site: ", $rest->site, "\n\n";
print "List of Fragments: \n";

foreach my $temp (@fragments) {
      print "$temp\n\n";
   }

exit ();


   

Further reading and resources