Perl simulation
Perl: simulation
Summary ...
Contents
Contents
Random sequence
Create a random sequence with an iid distribution of characters. Count the frequency of each character. Print the sequence and the counts.
Programming elements:
- Scope of variables
- Use of scalars, arrays and associative arrays (hashes)
- Passing parameters into a subroutine
- The rand() function
- Using a hash to count frequencies
#!/usr/bin/perl
use strict;
use warnings;
my @Alphabet;
my @Sequence;
my $length = 100;
my %Counts;
initialize();
makeSequence($length);
analyzeSequence();
printOutput();
exit;
# ===================================================
sub initialize {
# @Alphabet = split(//, "ACDEFGHIKLMNPQRSTVWY");
$Alphabet[0] = "A";
$Alphabet[1] = "C";
$Alphabet[2] = "G";
$Alphabet[3] = "T";
return;
}
sub makeSequence {
my $l = $_[0];
for (my $i=0; $i<$l; $i++) {
$Sequence[$i] = randPick();
}
return;
}
sub randPick {
my $r = scalar(@Alphabet);
my $q = rand($r);
my $i = int($q);
my $s = $Alphabet[$i];
return($s);
# return ($Alphabet[int(rand(scalar(@Alphabet)))]);
}
sub analyzeSequence {
foreach my $c (@Sequence) {
$Counts{$c}++;
}
return();
}
sub printOutput {
# print sequence
print @Sequence, "\n";
#print Counts:
foreach my $a (@Alphabet) {
print "$a: $Counts{$a}\n";
}
}
Step through the code, try it and change things around. Try running it for amino acids. Note how the iteration loops were written to adapt to the actual size of the array - there are no "magic numbers" in the code. If the array size changes, we should not have to change the rest of the code. This is how it should be done.
Weighted random sequence
Create a random sequence with arbitrary target frequencies for each character. Only the subroutines initialize() and randPick() change from the example above. Make sure you understand how the identical distribution of numbers returned from rand() is translated into a weighted distribution of choices from our array.
#!/usr/bin/perl
use strict;
use warnings;
my @Alphabet;
my @Weights;
my @Sequence;
my $length = 100;
my %Counts;
initialize();
makeSequence($length);
analyzeSequence();
printOutput();
exit;
# ===================================================
sub initialize {
$Alphabet[0] = "A";
$Weights[0] = 0.1;
$Alphabet[1] = "C";
$Weights[1] = 0.6;
$Alphabet[2] = "G";
$Weights[2] = 0.25;
$Alphabet[3] = "T";
$Weights[3] = 0.05;
return;
}
sub makeSequence {
my $l = $_[0];
for (my $i=0; $i<$l; $i++) {
$Sequence[$i] = randPick();
}
return;
}
sub randPick {
my $r = rand;
my $s = 0;
my $i;
for ($i = 0; $i<scalar(@Alphabet);$i++) {
$s += $Weights[$i];
if ($s >= $r) {
last;
}
}
return($Alphabet[$i]);
}
sub analyzeSequence {
foreach my $c (@Sequence) {
$Counts{$c}++;
}
return();
}
sub printOutput {
# print sequence
print @Sequence, "\n";
#print Counts:
foreach my $a (@Alphabet) {
print "$a: $Counts{$a}\n";
}
}
Note how we iterate through the weights to determine which part of the weights interval our random number falls into, and which element we should thus pick from the array. This can be done slightly more efficient by doing the summation during initialization, rather than every time we go through the loop. Also: if you can, put the highest weighted elements lowest in the array, for efficiency.
Function weighted sequence
Create a random sequence with target frequencies for each character given by some function. Only the subroutine randPick() changes from the first example above. Make sure you understand how the identical distribution of numbers returned from rand() is used to first choose a value on the abscissa, then a value on the ordinate, how the function is evaluated to decide whether to accept the first value and why this process (that corresponds to numerical integration) gives a distribution of random numbers whose density matches that of the function.
#!/usr/bin/perl
use strict;
use warnings;
my @Alphabet;
my @Weights;
my @Sequence;
my $length = 100;
my %Counts;
my $F = 1 / sqrt(2 * 3.141592654);
initialize();
makeSequence($length);
analyzeSequence();
printOutput();
exit;
# ===================================================
sub initialize {
$Alphabet[0] = "A";
$Alphabet[1] = "C";
$Alphabet[2] = "G";
$Alphabet[3] = "T";
return;
}
sub makeSequence {
my $l = $_[0];
for (my $i=0; $i<$l; $i++) {
$Sequence[$i] = randPick();
}
return;
}
sub randPick {
# pick according to integral over Normal Distribution
my ($x, $y, $v);
my $sigma = 2.5;
do {
$x = rand($sigma);
my $t = -0.5 * $x * $x;
$v = $F * exp( $t );
$y = 0.4 * rand;
} while ( $y > $v ); # 0.4 ~ value of Normal Distribution at x=0
$x /= $sigma;
my $i = int(scalar(@Alphabet) * $x);
return($Alphabet[$i]);
}
sub analyzeSequence {
foreach my $c (@Sequence) {
$Counts{$c}++;
}
return();
}
sub printOutput {
# print sequence
print @Sequence, "\n";
#print Counts:
foreach my $a (@Alphabet) {
print "$a: $Counts{$a}\n";
}
}
Look at the structure of the do { } while()</loop>. Make sure you understand why $x needs to be declared outside of that loop, else it would have gone out of scope where it is needed to pick an element from @Alphabet.
Further reading and resources