Perl basic programming

From "A B C"
Jump to navigation Jump to search

Perl: basic programming examples


The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!


Simple examples of Perl code.

Parts of this code have been originally contributed by Sohrab Shah, Sanja Rogic, Wil Hsiao and others (let me know if you happen to read this and should be listed here) and these parts were taken from the Canadian Bioinformatics Workshop bioinformatics course, where it has been made available through a Creative Commons license.



 

Contents

Exercise 1 - First print statement

#!/usr/bin/perl
use strict;
use warnings;

print "My first Perl program\n"; #also try this with single quotes
print "First line\nsecond line and there is a tab\there\n";

exit();

Notes:

  1. I always use strict; and use warnings;, even on the shortest programs. Mighty warts from tiny programs grow.
  2. I always end a program with exit(); even though it is not necessary. Why? It immediately tells me where the program ends and that I have copied it completely from wherever I got it.

Exercise 2 - Numerical variables and operators

#!/usr/bin/perl
use strict;
use warnings;

#assign values to variables $x and $y and print them out
my $x = 4;
my $y = 5.7;
print "x is $x and y is $y\n";

#example of arithmetic expression
my $z = $x + $y**2;
$x++;
print "x is $x and z is $z\n"; 

#evaluating arithmetic expression within print command
print "add 3 to $z: $z + 3\n"; #did it work?
print "add 3 to $z:", $z + 3,"\n";

exit();

Notes:

  1. within "strings", variables are interpolated, but not evaluated!
  2. however, within 'strings', variables are neither interpolated nor evaluated.

Exercise 3 - String variables and operators

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Concatenate two given sequences, 
#find the length of the new sequence and
#print out the second codon of the sequence

#assign strings to variables
my $DNA = "GATTACACAT";
my $polyA = "AAAA";

#concatenate two strings
my $modifiedDNA = $DNA . $polyA;

#calculate the length of $modifiedDNA and
#print out the value of the variable and its length
my $DNAlength = length($modifiedDNA);
print "Modified DNA: $modifiedDNA has length $DNAlength\n";

#extract the second codon in $modifiedDNA
my $codon = substr($modifiedDNA,3,3);
print "Second codon is $codon\n";

exit();

Exercise 4 - working with user input

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Ask the user for her name and age and
#calculate her age in days

#get a string from the keyboard
print "Please enter your name\n";
my $name = <STDIN>;
chomp($name); #getting rid of the new line character

#prompt the user for his/her age
#get a number from the keyboard
print "$name please enter your age\n";
my $age = <STDIN>;
chomp($age);

#calculate age in days
my $age_in_days = $age*365;
print "You are approximately $age_in_days days old\n";

exit();

Exercise 5 - Arrays

#!/usr/bin/perl
use strict;
use warnings;

#initialize an array 
my @bases = ("A","C","G","T");

#print two elements of the array
print $bases[0],$bases[2],"\n";

#print the whole array
print @bases,"\n"; #try with double quotes

#print the number of elements in the array
print scalar(@bases),"\n";

exit();

Exercise 6 - While and if statements

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Count the frequency of base G in a given DNA sequence

my $DNA = "GATTACACAT";

#initialize $countG and $currentPos
my $countG = 0;
my $currentPos = 0;
my $base; 

#calculate the length of $DNA
my $DNAlength = length($DNA);

#for each letter in the sequence check if it is the base G
#if 'yes' increment $countG
while($currentPos < $DNAlength){
	$base = substr($DNA,$currentPos,1);
	if($base eq "G"){ 
		$countG++;
	}
	$currentPos++;
} #end of while loop

#print out the number of Gs
print "There are $countG G bases\n";

exit();

Exercise 7 - For and foreach loops

#!/usr/bin/perl
use strict;
use warnings;

my @array;

#initialize a 20-element array with numbers 0,...19
for (my $i=0;$i<20;$i++){
    $array[$i] = $i;
}

 #print elements one-by-one using foreach
my $element;
 foreach $element (@array){
    print "$element\n";
}

exit();

Notes:

  1. a more Perl-ish way to write the first for loop would be the following (although personally I prefer the first version, often called C-style, as being more explicit).
my $i;
for $i (0..19){
    $array[$i] = $i;
}

Exercise 8 - Regular expressions

#!/usr/bin/perl
use strict;
use warnings;

#TASK: For a given DNA sequence find its RNA transcript,
#find its reverse complement and check if
#the reverse complement contains a start codon

my $DNA = "GATTACACAT";

#transcribe DNA to RNA - T changes to U
my $RNA = $DNA;
$RNA =~ s/T/U/g;
print "RNA sequence is $RNA\n";

#find the reverse complement of $DNA using substitution operator
#first - reverse the sequence
my $rcDNA = reverse($DNA);

$rcDNA =~ s/T/A/g;
$rcDNA =~ s/A/T/g;
$rcDNA =~ s/G/C/g;
$rcDNA =~ s/C/G/g;

print "Reverse complement of $DNA is $rcDNA\n"; #did it work?

#find the reverse complement of $DNA using translation operator

#first - reverse the sequence
$rcDNA = reverse($DNA);
$rcDNA =~ tr/ACGT/TGCA/;
print "Reverse complement of $DNA is $rcDNA\n";

#look for a start codon in te reverse sequence
if($rcDNA =~ /ATG/){
    print "Start codon found\n";
}
else{
    print "Start codon not found\n";
}

exit();

Exercise 9 - Subroutines

#!/usr/bin/perl
use strict;
use warnings;
 
#TASK: Make a subroutine that calculates the reverse
#complement of a DNA sequence and call it from the main program

#body of the main program with the function call
my $DNA = "GATTACACAT";
my $rcDNA = revcomp($DNA); 
print "$rcDNA\n";

exit();

#definition of the function for reverse complement
sub revcomp{
    my($DNAin) = @_;
    my($DNAout) = reverse($DNAin);
    $DNAout =~ tr/ACGT/TGCA/;
    return $DNAout;
}

Notes;

  1. Parameters are passed into a subroutine via the "anonymous array" @_. Accordingly, parameters must be assigned in a so called list context - note the parentheses around $DNAin! If you would omit the parenetheses, $DNAin would be asigned in scalar context, i.e. it would be assigned the length of the array, which is 1, one string element. This is a very common newcomer's mistake.

Exercise 10 - Input and output files

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Read DNA sequences from ‘DNAseq’ input file – 
#there is one sequence per line
#For each sequence find the reverse complement and 
#print it to ‘DNAseqRC’ output file


#open input and output files
open(IN,"DNAseq");
open(OUT,">DNAseqRC");

#read the input file line-by-line
#for each line find the reverse complement
#print it in the output file
my $rcDNA;
while(<IN>){
    chomp;
    $rcDNA = revcomp($_);
    print OUT "$rcDNA\n";
}

#close input and output files
close(IN);
close(OUT);

exit();

#definition of the function for reverse complement
sub revcomp{
    my($DNAin) = @_;
    my($DNAout) = reverse($DNAin);
    $DNAout =~ tr/ACGT/TGCA/;
    return $DNAout;
}

Exercise 11 - System calls

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Print a list of all Perl programs you wrote today.
#These files can be found in your current directory and 
#they end with the file extension ‘.pl’

print "List of programs I wrote today:\n";

#system call for 'ls' function - the result goes into a string
my $listing = `ls`;  #these are back quotes

#split the string to get individual files
my @files = split(/\n/,$listing);

#use foreach to step through the array
#if a file contains the string '.pl' print it out
my $file; 
foreach $file (@files){
    if($file =~ /\.pl$/){ #regular expression: the '.' is escaped "\."
        print "$file\n";
    }
}

exit();

Exercise 12 - Putting it all together

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Find the reverse complement of a gene, its GC content 
#and the GC content of its reverse complement.
#The gene is stored in a DNA.fasta file.

#body of the main program

#open input file
open(IN,"DNAfasta");

#get a gene name
my $name = <IN>;
chomp($name);

#concatenate all lines from fasta file in one string
my $DNA = "";
while(<IN>){ #input goes into $_
    chomp;
    $DNA = $DNA . $_;
}
close(IN);

#call functions to get the reverse complement and GC content
my $DNA_gc = gc_content($DNA);
my $DNArc = revcomp($DNA);
my $DNArc_gc = gc_content($DNArc);

#print out the results
print "$name has GC content: $DNA_gc\n";
print "reverse complement of $name has GC content: $DNArc_gc\n";

exit();

 #definition of the function for reverse complement
sub revcomp{
    my($DNAin) = @_;
    my $DNAout = reverse($DNAin);
    $DNAout =~ tr/ACGT/TGCA/;
    return ($DNAout);
}

#definition of GC content function
sub gc_content{
    my($DNAin) = @_;
    my $count = 0;	
    my $DNAlength = length($DNAin);

    #explode DNA string into an array
    my @bases = split(//,$DNA);

    #step through the array and count the occurrences of G and C
    for (my $i=0;$i<$DNAlength;$i++){
        if ($bases[$i] =~ /[GC]/){
            $count++;
        }
    }
    #return percentage of GC bases
    return ($count/$DNAlength);
}



   

Further reading and resources