Difference between revisions of "Perl programming exercises 1"

From "A B C"
Jump to navigation Jump to search
(Created page with "<div id="APB"> <div class="b1"> Perl programming exercises 1 </div> {{fix}} Summary ... __TOC__   ==Introductory reading== <section begin=reading /> <section end...")
 
 
(9 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
Perl programming exercises 1
 
Perl programming exercises 1
 
</div>
 
</div>
 
 
{{fix}}
 
 
 
Summary ...
 
  
  
 
__TOC__
 
__TOC__
  
 
 
&nbsp;
 
==Introductory reading==
 
<section begin=reading />
 
<section end=reading />
 
  
  
Line 48: Line 36:
 
This page covers a section I have called syntax examples ... they are simple tasks that ask you to write functioning code syntactically correct.
 
This page covers a section I have called syntax examples ... they are simple tasks that ask you to write functioning code syntactically correct.
  
[[Perl_programming_2| The following section]] covers programming exercises, designed to help you try your hand at translating an idea, or a concept into a working program.
+
[[Perl_programming_exercises_2| The following section]] covers programming exercises, designed to help you try your hand at translating an idea, or a concept into a working program.
  
 +
 +
These exercises are best done by pasting the code into your IDE, and stepping through them line by line.
  
 
   
 
   
Line 65: Line 55:
 
</source>
 
</source>
  
 
+
;Notes
 
+
# I '''always''' <tt>use strict;</tt> and <tt>use warnings;</tt>, even on the shortest programs. Mighty warts from tiny programs grow.
 
+
# I '''always''' end a program with <tt>exit();</tt> even though it is not necessary. Why? It immediately tells me where the program ends and that I have copied it completely from wherever I got it.
 
 
  
 
====Exercise 2 - Numerical variables and operators====
 
====Exercise 2 - Numerical variables and operators====
Line 92: Line 81:
 
</source>
 
</source>
  
+
;Notes:
 +
# within "strings", variables are '''interpolated''', but not '''evaluated'''!
 +
# however, within 'strings', variables are neither interpolated nor evaluated.
  
 
====Exercise 3 - String variables and operators====
 
====Exercise 3 - String variables and operators====
Line 274: Line 265:
 
</source>
 
</source>
  
 
+
;Note
 +
We have a long and comprehensive page on '''[[Regular Expressions]]''' on this Wiki.
  
 
====Exercise 9 - Subroutines====
 
====Exercise 9 - Subroutines====
Line 319: Line 311:
 
#For each sequence find the reverse complement and  
 
#For each sequence find the reverse complement and  
 
#print it to ‘DNAseqRC’ output file
 
#print it to ‘DNAseqRC’ output file
 
#definition of the function for reverse complement
 
sub revcomp{
 
    my($DNAin) = @_;
 
    my $DNAout = reverse($DNAin);
 
    $DNAout =~ tr/ACGT/TGCA/;
 
    return($DNAout);
 
}
 
  
 
#open input and output files
 
#open input and output files
Line 335: Line 319:
 
#for each line find the reverse complement
 
#for each line find the reverse complement
 
#print it in the output file
 
#print it in the output file
while(<IN>){
+
while(<IN>){   # here, everything is passed IMPLICITLY via @_
 
     chomp;
 
     chomp;
 
     my $rcDNA = revcomp($_);
 
     my $rcDNA = revcomp($_);
Line 344: Line 328:
 
close(IN);
 
close(IN);
 
close(OUT);
 
close(OUT);
 +
 +
exit;
 +
 +
 +
#definition of the function for reverse complement
 +
sub revcomp{
 +
    my($DNAin) = @_;
 +
    my $DNAout = reverse($DNAin);
 +
    $DNAout =~ tr/ACGT/TGCA/;
 +
    return($DNAout);
 +
}
 +
 +
 +
 
</source>
 
</source>
  
 
 
====Exercise 11 - System calls====
 
====Exercise 11 - System calls====
  
Line 386: Line 383:
 
#TASK: Find the reverse complement of a gene, its GC content  
 
#TASK: Find the reverse complement of a gene, its GC content  
 
#and the GC content of its reverse complement.
 
#and the GC content of its reverse complement.
#The gene is stored in a DNA.fasta file.
+
#The gene is stored in a file called DNA.fasta.
  
 
#body of the main program
 
#body of the main program
  
 
#open input file
 
#open input file
open(IN,"DNAfasta");
+
open(IN,"DNA.fasta");
  
 
#get a gene name
 
#get a gene name
Line 464: Line 461:
 
-->
 
-->
 
&nbsp;
 
&nbsp;
 +
 
==Further reading and resources==
 
==Further reading and resources==
 
<!-- {{#pmid:21627854}} -->
 
<!-- {{#pmid:21627854}} -->

Latest revision as of 01:43, 26 October 2012

Perl programming exercises 1



 

Basic Perl Programming - Syntax examples

Boris Steipe acknowledges contributions by Jennifer Tsai, Sanja Rogic and Sohrab Shah.


These pages are designed to practice elementary programming techniques and datastructures. They progress from extremely trivial programs to more interesting tools. They assume that you are familiar with the basic structure of a Perl program - for example:

#!/usr/bin/perl
use warnings;
use strict;

# declarations and initializations

# main program

exit();

# subroutines


For all of your programs, please use the modules strict and warnings to force declaration of variables and to catch unintended errors. Both modules make it easier to produce error-free programs that do exactly what they are intended to do.

This page covers a section I have called syntax examples ... they are simple tasks that ask you to write functioning code syntactically correct.

The following section covers programming exercises, designed to help you try your hand at translating an idea, or a concept into a working program.


These exercises are best done by pasting the code into your IDE, and stepping through them line by line.


Syntax Exercises

Exercise 1 - First print statement

#!/usr/bin/perl
use strict;
use warnings;

print "My first Perl program\n"; #try single quotes
print "First line\nsecond line and there is a tab\there\n";
Notes
  1. I always use strict; and use warnings;, even on the shortest programs. Mighty warts from tiny programs grow.
  2. I always end a program with exit(); even though it is not necessary. Why? It immediately tells me where the program ends and that I have copied it completely from wherever I got it.

Exercise 2 - Numerical variables and operators

#!/usr/bin/perl
use strict;
use warnings;

#assign values to variables $x and $y and print them out
$x = 4;
$y = 5.7;
print "x is $x and y is $y\n";

#example of arithmetic expression
$z = $x + $y**2;
$x++;
print "x is $x and z is $z\n";

#evaluating arithmetic expression within print command
print "add 3 to $z: $z + 3\n"; #did it work?
print "add 3 to $z:", $z + 3,"\n";
Notes
  1. within "strings", variables are interpolated, but not evaluated!
  2. however, within 'strings', variables are neither interpolated nor evaluated.

Exercise 3 - String variables and operators

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Concatenate two given sequences, 
#find the length of the new sequence and
#print out the second codon of the sequence

#assign strings to variables
my $DNA = "GATTACACAT";
my $polyA = "AAAA";

#concatenate two strings
my $modifiedDNA = $DNA.$polyA;

#calculate the length of $modifiedDNA and
#print out the value of the variable and its length
my $DNAlength = length($modifiedDNA);
print "Modified DNA: $modifiedDNA has length $DNAlength\n";

#extract the second codon in $modifiedDNA
my $codon = substr($modifiedDNA,3,3);
print "Second codon is $codon\n";


Exercise 4 - Taking user input

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Ask the user for her name and age and
#calculate her age in days

#get a string from the keyboard
print "Please enter your name\n";
my $name = <>;
chomp($name); #get rid of the new line character

#prompt the user for his/her age
#get a number from the keyboard
print "$name please enter your age\n";
my $age = <>;
chomp($age);

#calculate age in days
my $age_in_days = $age*365;
print "You are $age_in_days days old\n";


Exercise 5 - Arrays

#!/usr/bin/perl
use strict;
use warnings;

#initialize an array
my @bases = ("A","C","G","T");

#print two elements of the array
print $bases[0],$bases[2],"\n";

#print the whole array
print @bases,"\n"; #try with double quotes

#print the number of elements in the array
print scalar(@bases),"\n";

Exercise 6 - While and if statements

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Count the frequency of base G in a given DNA sequence

my $DNA = "GATTACACAT";

#initialize $countG and $currentPos
my $countG = 0;
my $currentPos = 0;

#calculate the length of $DNA
$DNAlength = length($DNA);

#for each letter in the sequence check if it is the base G
#if 'yes' increment $countG
while($currentPos < $DNAlength){
    $base = substr($DNA,$currentPos,1);
    if($base eq "G"){
        $countG++;
    }
    $currentPos++;
} #end of while loop

#print out the number of Gs
print "There are $countG G bases\n";

Exercise 7 - For and foreach loops

#!/usr/bin/perl
use strict;
use warnings;

my @array;
#initialize a 20-element array with numbers 0,...19
for(my $i=0;$i<20;$i++){
    $array[$i] = $i;
}

#print elements one-by-one using foreach
foreach my $element (@array){
    print "$element\n";
}


Exercise 8 - Regular expressions

#!/usr/bin/perl
use strict;
use warnings;

#TASK: For a given DNA sequence find its RNA transcript,
#find its reverse complement and check if
#the reverse complement contains a start codon

my $DNA = "GATTACACAT";

#transcribe DNA to RNA - T changes to U
my $RNA = $DNA;
$RNA =~ s/T/U/g;
print "RNA sequence is $RNA\n";

#find the reverse complement of $DNA using substitution operator

#first - reverse the sequence
my $rcDNA = reverse($DNA);

$rcDNA =~ s/T/A/g;
$rcDNA =~ s/A/T/g;
$rcDNA =~ s/G/C/g;
$rcDNA =~ s/C/G/g;

print "Reverse complement of $DNA is $rcDNA\n"; #did it work?
 
#find the reverse complement of $DNA using translation operator

#first - reverse the sequence
$rcDNA = reverse($DNA);
#then - complement the sequence
$rcDNA =~ tr/ACGT/TGCA/;
#then - print the reverse complement
print "Reverse complement of $DNA is $rcDNA\n";

#look for a start codon in the reverse sequence
if($rcDNA =~ /ATG/){
    print "Start codon found\n";
}
else{
    print "Start codon not found\n";
}
Note

We have a long and comprehensive page on Regular Expressions on this Wiki.

Exercise 9 - Subroutines

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Make a subroutine that calculates the reverse
#complement of a DNA sequence and call it from the main program 

#body of the main program with the function call
my $DNA = "GATTACACAT";
my $rcDNA = revcomp($DNA); 
print "$rcDNA\n";

exit;

#definition of the function for reverse complement
sub revcomp{
    my($DNAin) = @_;
    my $DNAout = reverse($DNAin);
    $DNAout =~ tr/ACGT/TGCA/;
    return ($DNAout);
}



Exercise 10 - Input and output files

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Read DNA sequences from ‘DNAseq’ input file – 
#there is one sequence per line
#For each sequence find the reverse complement and 
#print it to ‘DNAseqRC’ output file

#open input and output files
open(IN, "DNAseq");
open(OUT, ">DNAseqRC");

#read the input file line-by-line
#for each line find the reverse complement
#print it in the output file
while(<IN>){   # here, everything is passed IMPLICITLY via @_
    chomp;
    my $rcDNA = revcomp($_);
    print OUT "$rcDNA\n";
}

#close input and output files
close(IN);
close(OUT);

exit;


#definition of the function for reverse complement
sub revcomp{
    my($DNAin) = @_;
    my $DNAout = reverse($DNAin);
    $DNAout =~ tr/ACGT/TGCA/;
    return($DNAout);
}

Exercise 11 - System calls

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Print a list of all Perl programs we did today.
#These files can be found in your current directory and 
#they start with the word ‘program’

print "List of programs we made today:\n";

#system call for 'ls' function - the result goes into a string
my $listing = `ls`; #these are back quotes

#split the string to get individual files
my @files = split(/\n/,$listing);

#use foreach to step through the array
#if a file contains word 'program' print it out
foreach $file (@files){
    if($file =~ /program/){
        print "$file\n";
    }
}


Exercise 12 - Putting it all together

#!/usr/bin/perl
use strict;
use warnings;

#TASK: Find the reverse complement of a gene, its GC content 
#and the GC content of its reverse complement.
#The gene is stored in a file called DNA.fasta.

#body of the main program

#open input file
open(IN,"DNA.fasta");

#get a gene name
my $name = <IN>;
chomp($name);

#concatenate all lines from fasta file in one string
my $DNA = "";
while(<IN>){ #input goes into $_
    chomp;
    $DNA = $DNA.$_;
}
close(IN);

#call functions to get the reverse complement and GC content
my $DNA_gc = gc_content($DNA);
my $DNArc = revcomp($DNA);
my $DNArc_gc = gc_content($DNArc);

#print out the results
print "$name has GC content: $DNA_gc\n";
print "reverse complement of $name has GC content: $DNArc_gc\n";

exit();



# ====== revcomp =============================
#definition of the function for reverse complement
sub revcomp{
    my($DNAin) = @_;
    my $DNAout = reverse($DNAin);
    $DNAout =~ tr/ACGT/TGCA/;
    return $DNAout;
}

# ====== gc_content =============================
#definition of GC content function
sub gc_content{
    my ($DNAin) = @_;
    my $count = 0;    

    my $DNAlength = length($DNAin);

    #explode DNA string into an array
    my @bases = split(//,$DNA);

    #step through the array and count the occurrences of G and C
    for (my $i=0;$i<$DNAlength;$i++){
        if($bases[$i] =~ /[GC]/){
            $count++;
        }
    }
    #return percentage of GC bases
    return $count/$DNAlength;
}


 

Further reading and resources