Perl programming exercises 2

From "A B C"
Revision as of 22:26, 25 October 2012 by Boris (talk | contribs) (→‎fastaParser)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Perl programming exercises 2

Small programming exercises for an introduction to Perl.


Basic Perl Programming - Programming exercises

Boris Steipe acknowledges contributions by Jennifer Tsai, Sanja Rogic and Sohrab Shah.

Here are programming exercises that focus on translating a concept into a working script.

The preceding section covers a section I have called syntax examples ... they are simple tasks that ask you to write functioning code syntactically correct.

For each task you will find

  1. a description of the task your code should achieve;
  2. some hints how to go about solving it - which functions you might use or which strategy; and
  3. sample code for reference if you are stuck. Should you really need to look up the samples, carefully study the code, put it away and then write your own script from scratch, with different code and perhaps some variation in function. If you merely copy code, or read it with mild interest and move on, you will probably be wasting your time.
Don't be satisfied until you understand what you are doing.


Hello World


Executable program

Write a Perl program that prints out "Hello World" (or whatever you fancy) to the terminal. Make your program executable (chmod u+x so that you don't need to invoke the Perl interpreter explicitly from the command line (i.e. just "$ ./" should run it, you shouldn't need to type "$ perl").



  Simply use the print(); function.




use warnings;
use strict;

print("Hello World !\n");






Keyboard input

Write a Perl program that prints to the terminal a single line that you type at the keyboard.



  Use the diamond operator to read from STDIN, assign this to a variable, then print the contents of the variable. Just one statement, no loop is required.




use warnings;
use strict;

my $line;

$line = <STDIN>;

print( $line, "\n");






More keyboard input

Write a Perl program that reads one or many lines from STDIN, converts them to lowercase and prints them to the terminal. Use this interactively, typing input (end by typing <ctrl>D), then use this by redirecting a textfile to your program, then "pipe" the output of the Unix "ls" command into your program.



  Use a while loop to test the successful assignment of <STDIN> to a variable as its loop condition. This way thee loop runs until STDIN reads EOF (End of File). Use the perl lc(); function to change case. Assign the return value to a variable and print it.




use warnings;
use strict;

while (my $line = <STDIN>) {
    $line = lc($line);
    print( $line, "\n");






Write a Perl program that prompts for and reads two numbers from STDIN, and outputs the larger of the two numbers to the terminal. Remember to consider the case that the numbers may be equal.



  You need an

if (condition) { do ... }

construction to print one or the other numbers, depending on the result of the comparison. Remember the difference between numeric and alphanumeric comparisons! You have to chomp(); your input variables, to be able to compare them as numbers.




use warnings;
use strict;

print("Enter a number: ");            # User inputs
my $input1 = <STDIN>;

print("Enter a second number: ");
my $input2 = <STDIN>;

chomp($input1);                       # Chomp off trailing newline characters

if ($input1 > $input2) {
   print("$input1 is larger than $input2.\n");
} elsif ($input1 == $input2) {
   print ("$input1 and $input2 are equal.\n");
} else {
   print ("$input2 is larger than $input1.\n");




max (with subroutine)



Rewrite so the comparison is done in a subroutine: pass the two numbers as arguments into a subroutine and return the larger of the two. Such a program may be a useful framework for comparing two datasets with a non-trivial metric. Instead of simply picking the larger value, the subroutine could compare according to some sophisticated algortithm



  Remember that Perl uses the default array "@_" to pass values into subroutines. You need to assign the contents of @_ to variables (or other arrays) in order to be able to use the values. The easiest way to do this, is to assign the array to values in a list - e.g.

my ($a) = @_; or ...
my ($a, $b) = @_;

Note that the following will not work as expected !

my $a = @_;

If you would do this, you would be assigning an array "@" to a scalar "$". The problem is that this is legal, the compiler does not complain or warn, but this does not assign the first value in the array, it assigns the integer value of the number of fields the array uses ! This is a fine case of a statement being syntactically correct but logically wrong. If in doubt whether you are doing the right thing, always print your values from within the subroutine, as a development test, to make sure they are what you expect them to be.




use warnings;
use strict;

print("Enter a number: ");                 # User inputs
my $input1 = <STDIN>;

print("Enter a second number: ");
my $input2 = <STDIN>;

if ($input1 eq $input2) {                  # eq: string equal.
   print("Both inputs are equal.\n");
} else { 
   my $larger = compare($input1, $input2); # compare arguments in subroutine
                                           # and return larger value
   print("$larger is larger.\n");


# =======================================================
# Subroutine "compare" returns the larger of two inputs
# or $a if they are equal.
sub compare {

   my ($a, $b) = @_;   # Pass a list of variables into the subroutine

   chomp($a);          # Chomp off trailing newline characters
   chomp($b);          # for numeric comparison

   if ($a >= $b) {     # numeric greater-or-equal
   } else {

} # end subroutine






Write a Perl program that reads a string from STDIN and returns ten random permutations of this string. This will require a number of concepts and techniques of working with arrays - defining an array, assigning values to an array, or to individual fields of an array, using a variable as an index to an array in order to read from or write to specific fields, and more. First split your string into individual elements of an array. Use a subroutine that randomizes this array by looping over every position of the array, and swapping the contents of this position with a randomly chosen other position of the array, except itself. Write this in pseudocode first. The Perl functions you will need are split(); and rand();.



  You have to chomp(); your input in order not to shuffle the newline character (“return” character) into your randomized strings; otherwise you’ll end up with strangely shortened versions of your randomized string, split into two parts. To get the array size, use the index of the last array position plus one (remember that array positions are numbered starting at 0, not 1 ! ). To split a string into individual elements of an array, use split(//, $input); with no delimiter, i.e. with no other characters in between the slashes, not even a space. Assigning the result of split(); to an array puts every character of the string into its own array field. When randomizing the array, note that rand(); returns a random rational number, not an integer, so you may need to use int(); to truncate the result of rand(); and just return the integer part. Use variables to store values from the array before the swap, otherwise the original value stored in a given array position will be lost before it can be copied over to the new array position that you want to swap it to. Also note that all array positions should be switched, so you need to consider the case that your random integer is the same as the position of the original value.

When you are done, see what happens when you comment out the chomp(); function, for effect.




use warnings;
use strict;

# Constants
my $COUNT = 10;    # Number of times to call randomizing subroutine

# Declare variables
my $stringInput;   # Initial input string
my @stringArray;   # Input string array after splitting into
                   # an array that stores each character in one array element

print("Enter a string to randomize: ");   # Accept user input
$stringInput = <STDIN>;

chomp($stringInput);            # Remove newline character from input string

# Split string input into an array that stores each character as a 
# separate element
@stringArray = split(//, $stringInput);

# Call randomize 10 times in order to return ten random permutations
# of the input string
for (my $i = 0; $i < $COUNT; $i++) {
   # Pass string array to subroutine "randomize"


# ===== randomize () =============================
# Subroutine "randomize" loops over every position of the array
# passed to it, and swaps the contents of this position with
# a randomly chosen other position of the array. This implements the
# so-called Fisher-Yates shuffle, an efficient in-place shuffle 
# that gives equal weight to all N! permutations.
sub printRandomized {
   my (@randArray) = @_;

   # get array size: since arrays start from index 0, the size
   # of the array is equal to the index of the last array element plus 1
   # Perl provides three ways to get the array size:
   # "$#Array" is the index of the last element.
   # You can also assign the array to a scalar, you get its size
   # as in "$N_fields = @Array;" This value is one more than "$#Array",
   # since the index of the first element is 0, not 1. 
   # For clean code I prefer the third version: "$size = scalar(@Array);"
   # since it is more explicit.
   my $arraySize = scalar(@randArray);

   # iterate through every element in the array, beginning at
   # the last element and counting down
   for (my $j = $arraySize - 1; $j > 0; $j--) {
      # assign the contents of the first element to a temporary
      # variable
      my $arrayPos1 = $randArray[$j];

      # get random array position less or equal to $j
      my $randInt = int(rand($j + 1));

      # assign the contents of the second element to a temporary
      # variable
      my $arrayPos2 = $randArray[$randInt];

      # swap the contents of the two array positions
      $randArray[$randInt] = $arrayPos1;
      $randArray[$j] = $arrayPos2;
   }  # end for (iterating through elements in the array)

   # print result of the randomization
   for (my $k = 0; $k < $arraySize; $k++) {
   }  # end for (printing randomized array)

} # end subroutine "printRandomize"

The construct


deserves some comment. $j starts as the index of the last element in the array. $rand(n) returns a random, rational number from the interval [0,n[ i.e. 0 ≤ number < n. Assume our array had four elements: $rand(3+1) would return numbers from 0.000... to 3.999... Since int() does not round the number, but just truncates its decimals and returns its integer part, we return random integers from 0 to 3, each with uniform probability. That happens to be exactly the range of elements that can be used to randomly point somewhere into our array.





Substring calisthenics

Copy to a file named and use the Perl strlen(); and substr(); functions to permute the string (in place !) instead of shuffling fields of an array. Make a point of programming this incrementally step by step, writing output as you go along to make sure you are doing it right. Of course you could also shuffle using the split() and join() functions on a string ... but that would not be "in place".



  This is similar to but uses substr(); on the original string instead of shuffling fields of an array. Remember that substr(); can be used to extract defined substrings as well as to replace them. As with, use variables to store the characters that you want to swap, to prevent the original character from being lost when you overwrite one of the two positions in the string. Use int(); on the result of rand(); to get a random position in the string and think carefully about the range of numbers that this should produce. The range is obviously a function of the string-length - but does it start at 0 or 1 and does it extend to the length itself, or more or less ? Test whether the range you produce is correct.




use warnings;
use strict;

my $count = 10;                       # controls number of anagrams to print

my $string = <STDIN>;                 # retrieve one string
chomp($string);                       # if we would not chomp(), we would 
                                      # swap the linefeed into our anagrams
my $len = length($string);
my $pos;                              # a variable to store a random
                                      # position in the string

for (my $i=0; $i < $count; $i++) {    # for desired number of anagrams ...
    for (my $j=0; $j < $len; $j++) {  # for every character in string ...
        $pos = int(rand($len));       # calculate random position in string
        while ($pos == $j) {          # if this is the same integer  as $j ...
            $pos = int(rand($len));   # ... try again

        my $tmp = substr($string, $j, 1);                  # store character j
        substr($string, $j, 1) = substr($string, $pos, 1); # swap pos to j
        substr($string, $pos, 1) = $tmp;                   # swap tmp to pos
print($string, "\n");                 # print the randomized string

exit ();



anastring (with commandline input)


Commandline arguments

Modify so you can pass the number of permutations to the program in the commandline ( via @ARGV ), make sure that the default is 1, if no argument is given. This tool could be part of a routine to generate random data to test statistical significance.



  The whole commandline that you give to a Perl program is stored in the array-variable named @ARGV. $ARGV[0] is the first argument $ARGV[0] is the second, and so on. To check whether some variable is defined, use the function defined($someVariable); in an if statement. If no command line argument has been typed, $ARGV[0] will be undefined.



  Simply change:

my $count = 10;


my $count = 1;
if (defined($ARGV[0]) ) { $count = $ARGV[0] };

Then use like (for example)

$ 100 < test.txt

assuming you have a file named test.txt with the contents you want to randomize, or

$ echo "acdefghiklmnpqrstvwy" | 100






Write a Perl program that takes in strings (e.g. names) from STDIN, stores them in an array, sorts them in alphabetical order, using the Perl sort(); function and prints them out to the terminal.



  Declare a variable to use as an array index and initialize it with the value 0. Assign the entire input string to the current array position $array[$index], then increment the index variable so it points to the next available field. (The field of an array can hold integers, floats, strings, other arrays, hashes, references to arrays, ...) Sort the array using the Perl sort(); function.




use warnings;
use strict;

my $index = 0;             # initialize a variable to use as index to array
my $currentInput;          # stores input values from STDIN
my @arrayOfStrings;        # array of strings in their original order
my @sortedArray;           # array of strings sorted in alphabetical order

while ($currentInput = <STDIN>) {               # Retrieve strings from STDIN 
   $arrayOfStrings[$index] = $currentInput;     # Store in array
   $index++;                                    # increment index

@sortedArray = sort(@arrayOfStrings);           # Sort array fields

for (my $i = 0; $i < $index; $i++) {            # Print all array fields






Hash (associative array)

Write a Perl program that takes in a FASTA file of any protein as input and outputs it as three-letter amino acid code separated by spaces to the terminal. Use a hash to store the mapping between the one-letter amino acid code and the three-letter amino acid code.



  To parse out the definition line of a FASTA file, use substr(); to get the first character of each line and test to see if it is ">". Read in each line of the FASTA file and store it as an array, character by character (as with Loop over the contents of the array and retrieve the three-letter code for the amino acid, using a hash that maps one-letter amino acid codes to three-letter amino acid codes.

Hint about the hash: it’s similar in concept to the amino acid code hash that was used in one of the programs written in class… think about which way the amino acid code mapping was applied with that hash and try to apply the principles here.




use warnings;
use strict;

# Declare variables
my $line;            # current line being read in from STDIN
my $char;            # store first character in the line (to test for ">")
my @oneLetterLine;   # split each line of FASTA file into individual letters
my %oneToThree;      # hash that stores mappings from one-letter amino acid
                     # code to three-letter amino acid code

# Initialize the hash mapping

while ($line = <STDIN>) {           # Read input (FASTA format) line by lin
   $line = uc($line);               # Translate to uppercase characters
   $char = substr($line, 0, 1);     # Extract first character

   if ($char ne ">") {              # only if it's not the title line ...
      chomp($line);                 # chomp off the newline character

      # store each character in the line as an element in an array
      @oneLetterLine = split(//,$line);

      # get the size of the array (since arrays start at index 0,
      # the size of the array is the last array index plus 1
      my $arraySize = $#oneLetterLine + 1;

      # print three-letter amino acid code mapping
      for (my $i = 0; $i < $arraySize; $i++) {
         print($oneToThree{$oneLetterLine[$i]}, " ");

      }  # end for
   }  # end if
}  # end while

print("\n");           # Print final newline character

exit();                # Exit the program

# =================================================================
# Subroutine to generate the hash that maps one-letter amino acid
# code to three-letter amino acid code
sub mapOneToThree {

   $oneToThree{'A'} = 'Ala';
   $oneToThree{'C'} = 'Cys';
   $oneToThree{'D'} = 'Asp';
   $oneToThree{'E'} = 'Glu';
   $oneToThree{'F'} = 'Phe';
   $oneToThree{'G'} = 'Gly';
   $oneToThree{'H'} = 'His';
   $oneToThree{'I'} = 'Ile';
   $oneToThree{'K'} = 'Lys';
   $oneToThree{'L'} = 'Leu';
   $oneToThree{'M'} = 'Met';
   $oneToThree{'N'} = 'Asn';
   $oneToThree{'P'} = 'Pro';
   $oneToThree{'Q'} = 'Gln';
   $oneToThree{'R'} = 'Arg';
   $oneToThree{'S'} = 'Ser';
   $oneToThree{'T'} = 'Thr';
   $oneToThree{'V'} = 'Val';
   $oneToThree{'W'} = 'Trp';
   $oneToThree{'Y'} = 'Tyr';

}  # end sub





Iteration and recursion

Write a Perl program that takes in a number as input and calculates the factorial of that number. Note that this can be done in (at least) two ways: the first way is to use a for loop in the body of the program...



  Remember to think about all types of outcomes when designing your conditions in if/else statements: a negative factorial is undefined, and both 0! and 1! are equal to 1. Use die(); rather than exit(); to indicate that an unexpected input has been entered that the program cannot handle. Both cause the program to terminate, but die(); allows you to enter an error message on program exit, e.g. die(“Negative factorial is undefined.”);. Use a for loop to multiply out the factorial of the input number, and use a variable to store the value of the factorial during intermediate steps in calculation.




use warnings;
use strict;

my $number = <STDIN>;       # number is read in from STDIN
chomp($number);             # remove newline
print(fact($number),"\n");  # print statement calls subroutine fact()

# ========================================================
sub fact {

my ($n) = @_;               # receive argument via @_
my $factorial = 1;

   if ($n < 0) { 
      die("panic: fact($n) negative factorial is undefined. ");
   } elsif ($n == 0 or $n == 1) {
      return 1;
   } else {
      for (my $i = 2; $i <= $n; $i++) {
         $factorial = $factorial * $i;
   }  # end if

   return $factorial;
}  # end sub




  (OPTIONAL) ...the second way is to use a subroutine recursively to yield the factorial of a number. Try programming it this way as well.



  Recursion means a function calls itself. Such a subroutine or program needs defined “base cases”, for which the subroutine can return a value without having to call itself again (allowing the program or subroutine to terminate, otherwise it would just go deeper, and deeper...). The base cases for factRecurse are exactly the same as for – negative factorial should return an error, and both 0! and 1! should return 1. In place of the for loop used in, each recursion of the subroutine in performs one small step (the small step that would be performed with each iteration of the for loop) and then applies it to the next subroutine call. (e.g. $resultOfSomeStep + subRoutine($currentCall – 1)).




use warnings;
use strict;

my $number = <STDIN>;       # number is read in from STDIN
chomp($number);             # remove newline
print(fact($number),"\n");  # print statement calls subroutine fact()

# ========================================================
sub fact {

my ($n) = @_;

   if ($n < 0) {
      die("panic: fact($n) negative factorial is undefined. ");
   } elsif ($n == 0 or $n == 1) {
   } else {
      return( $n * fact($n-1) );  # recursive: subroutine calls itself
}  # end sub




Further reading and resources