Difference between revisions of "RPR-Control structures"

Latest revision as of 09:28, 25 September 2020

Control structures of R

(if, else if, ifelse, for, and while; vectorized commands as alternatives)

Abstract:

Introducing control structures: if, else if, ifelse, for, and while.

Objectives:
This unit will ...

... introduce the main control structures of R;

Outcomes:
After working through this unit you ...

... can read, analyze and write conditional expressions using if, else, and the ifelse() function;
... can read, analyze and write for loops using the range operator and the seq_along() function;
... can construct while loops with a termination condition.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:

RPR-Subsetting (Subsetting and filtering R objects)

Control structures

"Control structures" are parts of the syntax that execute code according to whether some condition is met. Let's look at this with some simple examples:

if and else


# Code pattern

if (<conditional expression evaluates to TRUE>) {
    <execute this block>
} else if (<expression evaluates to TRUE>) {
    <execute this block>
} else {
    <execute this block>
}

# conditional expressions
# anything that evaluates to TRUE or FALSE, or can be
# coerced to a logical.
# Obviously the operators ! == < > <= >= can be used
# but there are also a number of in-built functions that
# are useful for this purpose:

?all
?any
?exists
?is.character
?is.factor
?is.integer
?is.null
?is.numeric
?is.unsorted
?is.vector



# Simple "if" statement:
# Rolling a die. If you get a "six", you get to roll again.

x <- sample(1:6, 1)
if (x == 6) {
    x <- c(x, sample(1:6, 1))
}
print(x)

# "if", "else if", and "else"
# Here is a popular dice game called high-low.

a <-  sample(1:6, 1)
b <-  sample(1:6, 1)
if (a + b > 7) {
    print("high")
} else if (a + b < 7) {
    print("low")
} else {
    print("seven")
}

We need to talk about conditional expressions that test for more than one condition for a moment, things like: "if this is true OR that is true OR my birthday is a Sunday this year ...". To join logical expressions, R has two distinct sets of operators: | and ||, and & and &&. | is for "or" and & is for "and". But what about && and ||? The single operators are "vectorized" whereas the doubled operators short-circuit. This means if you apply the single operators to a vector, you get a vector of results:


x <- c(1, 3, 5, 7, 11, 13, 17)
x > 3 &  x < 17 # FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE: all comparisons
x [x > 3 &  x < 17]  #  5  7 11 13

x > 3 && x < 17 # FALSE: stop at the first FALSE

The vectorized version is usually what you want, e.g. for subsetting, as above. But it is not the right way in control structures: the "if ()..." condition must be unambiguously TRUE or FALSE. What we really mean by "or" in this context is that we should do something if _any_ element of the vector is TRUE, and "and" means that we should do something if _all_ elements of the vector are TRUE. So we could write any(A) or all(A) respectively. But it is actually may not be necessary to even make all comparisons; you canavoid to evaluate unnecessary expressions. Chained "OR" expressions can be aborted after the first TRUE is encountered, and chained "AND" expressions can be aborted after the first FALSE. Which is what the double operators do.


x <- numeric()

if (length(x) == 0 |  is.na(x)) { print("zero") }  # throws an error, because is.na() is
                                                   # evaluated even though x has length zero.

if (length(x) == 0 || is.na(x)) { print("zero") }  # no error: length test is TRUE so is.na()
                                                   # never gets evaluated.

Bottom line: always use || and && in control structures.

ifelse

The ifelse() function deserves special mention: its arguments work like an if / else construct ...

if (5 > 7) {
  x <- TRUE
} else {
  x <- FALSE
}

# equivalent to
x <- ifelse(5 > 7, TRUE, FALSE)

ifelse(runif(1) > 0.5, "pickles", "gherkins")  # randomly choose

i.e. ifelse(<condition is true>, <evaluate this>, <evaluate that> )

But the cool thing about ifelse() is that it's vectorized! You can apply it to a whole vector of conditions at once:

runif(10)
runif(10) > 0.2   # 20% of random choices will be TRUE
ifelse(runif(10) > 0.2, "caution", " to the wind")

for

"for" loops are the workhorse of innumerable R scripts. They are controlled by a sequence, and a variable. The body of the loop is executed once for each element in the sequence. Most often, the sequence is a sequence of integers, created with the colon - the range operator. The pattern is:

for (<element> in <vector>) {
   <expressions using element...>
}

# simple for loop
for (i in 1:10) {
	print(c(i, i^2, i^3))
}


# Let's stay with the high-low game for a moment:
# What are the odds of winning?
# Let's simulate some runs with a "for" loop.

N <- 25000
outcomes <- character(N)  # initialize an empty vector
for (i in 1:N) {          # repeat, assign each element of 1:N to
                          # the variable "i" in turn
    a <-  sample(1:6, 1)
    b <-  sample(1:6, 1)
    if (a + b > 7) {
        outcomes[i] <- "high"
    } else if  (a + b < 7) {
        outcomes[i] <- "low"
    } else {
        outcomes[i] <- "seven"
    }
}
head(outcomes, 36)
table(outcomes)  # the table() function tabulates the elements
                 # of a vector

round((36 * table(outcomes))/N) # Can you explain this expression?

Note that there is nothing special about the expression for (i in 1:N) { ... . Any expression that generates a sequence of items will do; I write a lot of code like for (fileName in dir()) { ... or for (gene in data$name) {... , or for (column in colnames(expressionTable)) {... etc.

Loops in R can be slow if you are not careful how you write them. The reason is usually related to dynamically managing memory. If you can, you should always pre-define objects of sufficient size to hold your results. Even better, use a vectorized approach.

# Compare excution times: one million square roots from a vector of random numbers ...

# Version 1: Naive for-loop: grow result object as required
N <- 1000000                 # Set N to a large number
x <- runif(N)                # get N uniformily distributed random numbers
y <- numeric()               # create a variable to assign to
startTime <- Sys.time()      # save start time
for (i in 1:N) {             # loop N-times
  y[i] <- sqrt(x[i])         # calculate one square root, grow y to store it
}
Sys.time() - startTime       # time it took
rm(x)                        # clean up
rm(y)

# Version 2: Define result object to be large enough
N <- 1000000                 # Set N to a large number
x <- runif(N)                # get N uniformily distributed random numbers
y <- numeric(N)              # create a variable with N slots
startTime <- Sys.time()      # save start time
for (i in 1:N) {             # loop N-times
  y[i] <- sqrt(x[i])         # calculate one square root, store in Y
}
Sys.time() - startTime       # time it took
rm(x)                        # clean up
rm(y)


# Version 3: vectorized
N <- 1000000                 # Set N to a large number
x <- runif(N)                # get N uniformily distributed random numbers
startTime <- Sys.time()      # save start time
y <- sqrt(x)                 # sqrt() is vectorized!
Sys.time() - startTime       # time it took
rm(x)                        # clean up
rm(y)


# The tiny change of pre-allocating memory for the result object y, rather than
# dynamically growing the vector has made a huge difference. But using
# the vectorized version of the sqrt() function directly is the fastest
# approach.

seq_along() vs. range

Consider the following carefully:

# Assume we write a loop to iterate over vectors of variable length for example
# goin from e to pi with a given number of elements:
( v5 <- seq(exp(1), pi, length.out = 5) )
( v2 <- seq(exp(1), pi, length.out = 2) )
( v1 <- seq(exp(1), pi, length.out = 1) )
( v0 <- seq(exp(1), pi, length.out = 0) )

#  The idiom we will probably find most commonly for this task is uses the
#  range operator ":" ...
1:length(v5)
1:length(v2)

# etc.

for (i in 1:length(v5)) {
  print(v5[i])
}

for (i in 1:length(v2)) {
  print(v2[i])
}

for (i in 1:length(v1)) {
  print(v1[i])
}

for (i in 1:length(v0)) {
  print(v0[i])
}

# The problem with the last iteration is: we probably didn't want to execute
# the loop if the vector has length 0. But since 1:length(v0) is the same as
# 1:0, we get an erroneous execution.

# This is why we should always use the following idiom instead, when iterating
# over a vector: the function seq_along().
#
# seq_along() builds a vector of indices over its argument.
seq_along(v5)
seq_along(v2)
seq_along(v1)
seq_along(v0)

for (i in seq_along(v5)) {
  print(v5[i])
}

for (i in seq_along(v2)) {
  print(v2[i])
}

for (i in seq_along(v1)) {
  print(v1[i])
}

for (i in seq_along(v0)) {
  print(v0[i])
}

# Now we get the expected behaviour: no output if the vector is empty.

loops vs. vectorized expressions

If you can achieve your result with an R vector expression, it will be faster than using a loop. But sometimes you need to do things explicitly, for example if you need to access intermediate results.

Here is an example to play some more with loops: a password generator. Passwords are a pain. We need them everywhere, they are frustrating to type, to remember and since the cracking programs are getting smarter they become more and more likely to be broken. Here is a simple password generator that creates random strings with consonant/vowel alterations. These are melodic and easy to memorize, but actually as strong as an 8-character, fully random password that uses all characters of the keyboard such as )He.{2jJ or #h$bB2X^ (which is pretty much unmemorizable). The former is taken from 20⁷ * 7⁷ 10¹⁵ possibilities, the latter is from 94⁸ ~ 6*10¹⁵ possibilities. High-end GPU supported password crackers can test about 10⁹ passwords a second, the passwords generated by this little algorithm would thus take on the order of 10⁶ seconds or eleven days to crack^[1]. This is probably good enough to deter a casual attack.

Task:
Copy, study and run ...

# Suggest memorizable passwords
# Below we use the functions:
?nchar
?sample
?substr
?paste
?print

#define a string of  consonants ...
con <- "bcdfghjklmnpqrstvwxz"
# ... and a string of of vowels
vow <- "aeiouy"

for (i in 1:10) {  # ten sample passwords to choose from ...
    pass = rep("", 14)  # make an empty character vector
    for (j in 1:7) {    # seven consonant/vowel pairs to be created ...
        k   <- sample(1:nchar(con), 1)  # pick a random index for consonants ...
        ch  <- substr(con,k,k)          # ... get the corresponding character ...
        idx <- (2*j)-1                  # ... compute the position (index) of where to put the consonant ...
        pass[idx] <- ch                 # ...  and put it in the right spot

        # same thing for the vowel, but coded with fewer intermediate assignments
        # of results to variables
        k <- sample(1:nchar(vow), 1)
        pass[(2*j)] <- substr(vow,k,k)
    }
    print( paste(pass, collapse="") )  # collapse the vector in to a string and print
}

Try this a few times.

while

Whereas a for-loop runs for a fixed number of times, a "while" loop runs as long as a condition is true, possibly forever. Here is an example, again our high-low game: this time we simulate what happens when we play it more than once with a strategy that compensates us for losing.


# Let's assume we are playing high-low in a casino. You can bet
# high or low. You get two dollars for one if you win, nothing
# if you lose. If you bet "high", you lose if the dice-roll is "low"
# or "seven". Thus your chances of winning are 15/36 = 42%. You play
# the following strategy: start with 33 dollars. Bet one dollar.
# If you win, good. If you loose, triple your bet. Stop the game
# when your funds are gone (bad), or if you have more than 100
# dollars (good) - i.e. you have tripled the funds you risked.
# Also stop if you've played more than 100 rounds and start
# getting bored.


funds <- 33
bet <- 1         # our first bet

nPlays <- 0      # this counts how often we've played
MAXPLAYS <- 100

set.seed(1234567)
while (funds > 0 && funds < 100 && nPlays < MAXPLAYS) {

    bet <- min(bet, funds)  # can't bet more than we have.
    funds <- funds - bet    # place the bet
    a <-  sample(1:6, 1)    # roll the dice
    b <-  sample(1:6, 1)

    # we always play "high"
    if (a + b > 7) {        # we win :-)
        result <- "Win!  "
        funds <- funds + (2 * bet)
        bet <- 1            # reset the bet to one dollar
    } else {                # we lose :-(
        result <- "Lose."
        bet <- 3 * bet      # increase the bet to 3 times previous
    }
    print(paste("Round", nPlays, result,
                "Funds now:", funds,
                "Next bet:", bet))
    nPlays <- nPlays + 1
}
set.seed(NULL)

# Now before you get carried away - try this with different seeds
# and you'll quickly figure out that the odds of beating the game
# are not all that great...

A word of caution: if you make a programming error, and the termination condition is never reached, the program will run until the heat-death of the universe, or until your computer crashes, whatever comes first. ALWAYS include a "safety net" in your while condition. Something like:

nIter <- 0
MAXITER <- 10000

while (<whatever>  && nIter < MAXITER) {
nIter <- nIter + 1
 ... do something
}

# and you could add ...
if (nIter == MAXITER) { stop("Ooops") }

Task:

Exercise

A rocket ship has to sequence a countdown for the rocket to launch. You are starting the countdown from 3. You want to print the value of variable named txt that outputs:

[1]  "3"          "2"          "1"          "0"          "Lift Off!"

Using what you learned above, write a while loop that gives the output above.

Sample Solution:

start <- 3
txt <- as.character(start)
countdown <- start
while (countdown > 0) {
  countdown <- countdown - 1
  txt <- c(txt, countdown)
}
txt <- c(txt, "Lift Off!")
txt

Notes

↑ That's assuming the worst case in that the attacker needs to know the pattern with which the password is formed, i.e. the number of characters and the alphabet that we chose from. But note that there is an even worse case: if the attacker had access to our code and the seed to our random number generator. If you start the random number generator e.g. with a new seed that is generated from Sys.time(), the possible space of seeds can be devastatingly small. But even if a seed is set explicitly with the set.seed(<number>) function, the <number> seed is a 32-bit integer (check this with .Machine$integer.max) and thus can take only a bit more than 4*10⁹ values, six orders of magnitude less than the 10¹⁵ password complexity we thought we had! It turns out that the code may be a much greater vulnerability than the password itself. Keep that in mind. Keep it secret. Keep it safe.

About ...

Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-09-17

Version:

1.2

Version history:

1.2 Maintenance
1.0 Update set.seed() usage
1.0.1 Maintenance; clarify for-loop comparison
1.0 Completed to first live version
0.1 Material collected from previous tutorial

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

[1] That's assuming the worst case in that the attacker needs to know the pattern with which the password is formed, i.e. the number of characters and the alphabet that we chose from. But note that there is an even worse case: if the attacker had access to our code and the seed to our random number generator. If you start the random number generator e.g. with a new seed that is generated from Sys.time(), the possible space of seeds can be devastatingly small. But even if a seed is set explicitly with the set.seed(<number>) function, the <number> seed is a 32-bit integer (check this with .Machine$integer.max) and thus can take only a bit more than 4*10⁹ values, six orders of magnitude less than the 10¹⁵ password complexity we thought we had! It turns out that the code may be a much greater vulnerability than the password itself. Keep that in mind. Keep it secret. Keep it safe.

[1]

Difference between revisions of "RPR-Control structures"

Latest revision as of 09:28, 25 September 2020

Contents

Evaluation

Contents

Control structures

if and else

ifelse

for

seq_along() vs. range

loops vs. vectorized expressions

while

Exercise

Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools

@@ Line 1: / Line 1: @@
-<div id="BIO">
+<div id="ABC">
-  <div class="b1">
+<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
-Control structures of R: if, else if, ifelse, for, and while
+Control structures of R
-  </div>
+<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
+(if, else if, ifelse, for, and while; vectorized commands as alternatives)
-  {{Vspace}}
+</div>
-<div class="keywords">
-<b>Keywords:</b>&nbsp;
-if, else if, ifelse, for, and while; vectorized commands as alternatives
 </div>
-{{Vspace}}
+{{Smallvspace}}
-__TOC__
+<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
+<div style="font-size:118%;">
-{{Vspace}}
+<b>Abstract:</b><br />
-{{LIVE}}
-{{Vspace}}
-</div>
-<div id="ABC-unit-framework">
-== Abstract ==
 <section begin=abstract />
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "abstract" -->
+Introducing control structures: if, else if, ifelse, for, and while.
-...
 <section end=abstract />
+</div>
-{{Vspace}}
+<!-- ============================  -->
+<hr>
+<table>
-== This unit ... ==
+<tr>
-=== Prerequisites ===
+<td style="padding:10px;">
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "prerequisites" -->
+<b>Objectives:</b><br />
-<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
-You need to complete the following units before beginning this one:
-*[[RPR-Subsetting]]
-{{Vspace}}
-=== Objectives ===
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "objectives" -->
 This unit will ...
-* ... introduce ;
+* ... introduce the main control structures of R;
-* ... discuss ;
+</td>
-* ... teach ;
+<td style="padding:10px;">
+<b>Outcomes:</b><br />
+After working through this unit you ...
+* ... can read, analyze and write conditional expressions using if, else, and the ifelse() function;
+* ... can read, analyze and write for loops using the range operator and the seq_along() function;
+* ... can construct while loops with a termination condition.
+</td>
+</tr>
+</table>
+<!-- ============================  -->
+<hr>
+<b>Deliverables:</b><br />
+<section begin=deliverables />
+<ul>
+<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
+<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
+<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
+</ul>
+<section end=deliverables />
+<!-- ============================  -->
+<hr>
+<section begin=prerequisites />
+<b>Prerequisites:</b><br />
+This unit builds on material covered in the following prerequisite units:<br />
+*[[RPR-Subsetting|RPR-Subsetting (Subsetting and filtering R objects)]]
+<section end=prerequisites />
+<!-- ============================  -->
+</div>
-{{Vspace}}
+{{Smallvspace}}
-=== Outcomes ===
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "outcomes" -->
-After working through this unit you ...
-* ... have done;
-* ... know how ;
-* ... can ;
-{{Vspace}}
+{{Smallvspace}}
-=== Deliverables ===
+__TOC__
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "deliverables" -->
-<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
-*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
-<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
-*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
-<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
-*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 {{Vspace}}
@@ Line 78: / Line 68: @@
 === Evaluation ===
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "evaluation" -->
-<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 <b>Evaluation: NA</b><br />
-:This unit is not evaluated for course marks.
+<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
-{{Vspace}}
-</div>
-<div id="BIO">
 == Contents ==
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "contents" -->
 ==Control structures==
@@ Line 100: / Line 81: @@
+<pre>
-<source lang="rsplus">
 # Code pattern
@@ Line 157: / Line 135: @@
 }
-</source>
+</pre>
 {{Vspace}}
@@ Line 164: / Line 142: @@
-<source lang="rsplus">
+<pre>
 x <- c(1, 3, 5, 7, 11, 13, 17)
@@ Line 172: / Line 150: @@
 x > 3 && x < 17 # FALSE: stop at the first FALSE
-</source>
+</pre>
-The vectorized version is usually what you want, e.g. for subsetting, as above. But it is usually '''not''' the right way in '''control structures''': there, you usually want never to evaluate unnecessary expressions. Chained "OR" expressions can be aborted after the first TRUE is encountered, and chained "AND" expressions can be aborted after the first FALSE. Which is what the double operators do.
+The vectorized version is usually what you want, e.g. for subsetting, as above. But it is  '''not''' the right way in '''control structures''': the "if ()..." condition must be unambiguously TRUE or FALSE. What we really mean by "or" in this context is that we should do something if _any_ element of the vector is TRUE, and "and" means that we should do something if _all_ elements of the vector are TRUE. So we could write any(A) or all(A) respectively. But it is actually may not be necessary to even make all comparisons; you canavoid to evaluate unnecessary expressions. Chained "OR" expressions can be aborted after the first TRUE is encountered, and chained "AND" expressions can be aborted after the first FALSE. Which is what the double operators do.
-<source lang="rsplus">
+<pre>
 x <- numeric()
@@ Line 186: / Line 164: @@
                                                     # never gets evaluated.
-</source>
+</pre>
 Bottom line: '''always use <code>||</code> and  <code>&&</code> in control structures.
+{{Vspace}}
+===ifelse===
+The <code>ifelse()</code> function deserves special mention: its arguments work like an if / else construct ...
+<pre>
+if (5 > 7) {
+  x <- TRUE
+} else {
+  x <- FALSE
+}
+# equivalent to
+x <- ifelse(5 > 7, TRUE, FALSE)
+ifelse(runif(1) > 0.5, "pickles", "gherkins")  # randomly choose
+</pre>
+i.e. <code>ifelse(&lt;condition is true&gt;, &lt;evaluate this&gt;, &lt;evaluate that&gt; )</code>
+But the cool thing about <code>ifelse()</code> is that it's vectorized! You can apply it to a whole vector of conditions at once:
+<pre>
+runif(10)
+runif(10) > 0.2   # 20% of random choices will be TRUE
+ifelse(runif(10) > 0.2, "caution", " to the wind")
+</pre>
 {{Vspace}}
@@ Line 194: / Line 202: @@
 ===for===
-"for" loops are the workhorse of innumerable '''R''' scripts. They are controlled by a ''sequence'', and a ''variable''. The body of the loop is executed once for each element in the sequence. Most often, the sequence is a sequence of integers, created with the colon - the ''range operator''. The template is:
+"for" loops are the workhorse of innumerable '''R''' scripts. They are controlled by a ''sequence'', and a ''variable''. The body of the loop is executed once for each element in the sequence. Most often, the sequence is a sequence of integers, created with the colon - the ''range operator''. The pattern is:
-<source lang="rsplus">
+<pre>
-for (<name> in <vector>) {
+for (<element> in <vector>) {
-    <expressions ...>
+    <expressions using element...>
 }
-</source>
+</pre>
-<source lang="rsplus">
+<pre>
 # simple for loop
 for (i in 1:10) {
@@ Line 234: / Line 242: @@
 round((36 * table(outcomes))/N) # Can you explain this expression?
-</source>
+</pre>
 Note that there is nothing special about the expression <code>for (i in 1:N) { ... </code>. Any expression that generates a sequence of items will do; I write a lot of code like <code>for (fileName in dir()) { ... </code> or <code>for (gene in data$name) {... </code>, or <code>for (column in colnames(expressionTable)) {... </code> etc.
-For-loops in '''R''' can be slow. That's usually not an issue if you only need to iterate over a thousand items or so. But you should know that there is a function <code>apply()</code> with a whole family of siblings that is usually about a hundred times faster than a for-loop.
+Loops in '''R''' can be slow if you are not careful how you write them. The reason is usually related to dynamically managing memory. If you can, you should always pre-define objects of sufficient size to hold your results. Even better, use a vectorized approach.
-<source lang="rsplus">
+<pre>
-# Compare excution times: one million square roots from a vector ...
+# Compare excution times: one million square roots from a vector of random numbers ...
-n <- 1000000
-x <- 1:n
-y <- sqrt(x)
-# ... or done explicitly in a for-loop
+# Version 1: Naive for-loop: grow result object as required
-for (i in 1:n) {
+N <- 1000000                 # Set N to a large number
-   y[i] <- sqrt (x[i])
+x <- runif(N)                # get N uniformily distributed random numbers
+y <- numeric()               # create a variable to assign to
+startTime <- Sys.time()      # save start time
+for (i in 1:N) {             # loop N-times
+   y[i] <- sqrt(x[i])         # calculate one square root, grow y to store it
 }
+Sys.time() - startTime       # time it took
+rm(x)                        # clean up
+rm(y)
-</source>
+# Version 2: Define result object to be large enough
+N <- 1000000                 # Set N to a large number
+x <- runif(N)                # get N uniformily distributed random numbers
+y <- numeric(N)              # create a variable with N slots
+startTime <- Sys.time()      # save start time
+for (i in 1:N) {             # loop N-times
+  y[i] <- sqrt(x[i])         # calculate one square root, store in Y
+}
+Sys.time() - startTime       # time it took
+rm(x)                        # clean up
+rm(y)
+# Version 3: vectorized
+N <- 1000000                 # Set N to a large number
+x <- runif(N)                # get N uniformily distributed random numbers
+startTime <- Sys.time()      # save start time
+y <- sqrt(x)                 # sqrt() is vectorized!
+Sys.time() - startTime       # time it took
+rm(x)                        # clean up
+rm(y)
+# The tiny change of pre-allocating memory for the result object y, rather than
+# dynamically growing the vector has made a huge difference. But using
+# the vectorized version of the sqrt() function directly is the fastest
+# approach.
+</pre>
+{{Vspace}}
+====seq_along() vs. range====
+Consider the following carefully:
+<pre>
+# Assume we write a loop to iterate over vectors of variable length for example
+# goin from e to pi with a given number of elements:
+( v5 <- seq(exp(1), pi, length.out = 5) )
+( v2 <- seq(exp(1), pi, length.out = 2) )
+( v1 <- seq(exp(1), pi, length.out = 1) )
+( v0 <- seq(exp(1), pi, length.out = 0) )
+#  The idiom we will probably find most commonly for this task is uses the
+#  range operator ":" ...
+:length(v5)
+:length(v2)
+# etc.
+for (i in 1:length(v5)) {
+  print(v5[i])
+}
+for (i in 1:length(v2)) {
+  print(v2[i])
+}
+for (i in 1:length(v1)) {
+  print(v1[i])
+}
+for (i in 1:length(v0)) {
+  print(v0[i])
+}
+# The problem with the last iteration is: we probably didn't want to execute
+# the loop if the vector has length 0. But since 1:length(v0) is the same as
+# 1:0, we get an erroneous execution.
+# This is why we should always use the following idiom instead, when iterating
+# over a vector: the function seq_along().
+#
+# seq_along() builds a vector of indices over its argument.
+seq_along(v5)
+seq_along(v2)
+seq_along(v1)
+seq_along(v0)
+for (i in seq_along(v5)) {
+  print(v5[i])
+}
+for (i in seq_along(v2)) {
+  print(v2[i])
+}
+for (i in seq_along(v1)) {
+  print(v1[i])
+}
+for (i in seq_along(v0)) {
+  print(v0[i])
+}
+# Now we get the expected behaviour: no output if the vector is empty.
+</pre>
+{{Vspace}}
+===loops vs. vectorized expressions===
 ''If'' you can achieve your result with an '''R''' vector expression, it will be faster than using a loop. But sometimes you need to do things explicitly, for example if you need to access intermediate results.
-Here is an example to play some more with loops: a password generator. Passwords are a '''pain'''. We need them everywhere, they are frustrating to type, to remember and since the cracking programs are getting smarter they become more and more likely to be broken. Here is a simple password generator that creates random strings with consonant/vowel alterations. These are melodic and easy to memorize, but actually as '''strong''' as an 8-character, fully random password that uses all characters of the keyboard such as <code>)He.{2jJ</code> or <code>#h$bB2X^</code> (which is pretty much unmemorizable). The former is taken from 20<sup>7</sup> * 7<sup>7</sup> 10<sup>15</sup> possibilities, the latter is from 94<sup>8</sup> ~ 6*10<sup>15</sup> possibilities. High-end GPU supported {{WP|Password cracking|password crackers}} can test about 10<sup>9</sup> passwords a second, the passwords generated by this little algorithm would thus take on the order of 10<sup>6</sup> seconds or eleven days to crack<ref>That's assuming the worst case in that the attacker needs to know the pattern with which the password is formed, i.e. the number of characters and the alphabet that we chose from. But note that there is an even worse case: if the attacker had access to our code and the seed to our random number generator. When the random number generator starts up, a new seed is generated from system time, thus the possible space of seeds can be devastatingly small. But even if a seed is set explicitly with the <code>set.seed()</code> function, that seed is a 32-bit integer and thus can take only a bit more than 4*10<sup>9</sup> values, six orders of magnitude less than the 10<sup>15</sup> password complexity we thought we had. It turns out that the code may be a much greater vulnerability than the password itself. Keep that in mind. <small>Keep it secret. <small>Keep it safe.</small></small></ref>. This is probably good enough to deter a casual attack.
+Here is an example to play some more with loops: a password generator. Passwords are a '''pain'''. We need them everywhere, they are frustrating to type, to remember and since the cracking programs are getting smarter they become more and more likely to be broken. Here is a simple password generator that creates random strings with consonant/vowel alterations. These are melodic and easy to memorize, but actually as '''strong''' as an 8-character, fully random password that uses all characters of the keyboard such as <code>)He.{2jJ</code> or <code>#h$bB2X^</code> (which is pretty much unmemorizable). The former is taken from 20<sup>7</sup> * 7<sup>7</sup> 10<sup>15</sup> possibilities, the latter is from 94<sup>8</sup> ~ 6*10<sup>15</sup> possibilities. High-end GPU supported {{WP|Password cracking|password crackers}} can test about 10<sup>9</sup> passwords a second, the passwords generated by this little algorithm would thus take on the order of 10<sup>6</sup> seconds or eleven days to crack<ref>That's assuming the worst case in that the attacker needs to know the pattern with which the password is formed, i.e. the number of characters and the alphabet that we chose from. But note that there is an even worse case: if the attacker had access to our code and the seed to our random number generator. If you start the random number generator e.g. with a new seed that is generated from {{c|Sys.time()}}, the possible space of seeds can be devastatingly small. But even if a seed is set explicitly with the <code>set.seed(<number>)</code> function, the {{c|<number>}} seed is a 32-bit integer (check this with {{c|.Machine$integer.max}}) and thus can take only a bit more than 4*10<sup>9</sup> values, six orders of magnitude less than the 10<sup>15</sup> password complexity we thought we had! It turns out that the code may be a much greater vulnerability than the password itself. Keep that in mind. <small>Keep it secret. <small>Keep it safe.</small></small></ref>. This is probably good enough to deter a casual attack.
 {{task|1=
 Copy, study and run ...
-<source lang="rsplus">
+<pre>
 # Suggest memorizable passwords
 # Below we use the functions:
@@ Line 277: / Line 393: @@
      for (j in 1:7) {    # seven consonant/vowel pairs to be created ...
          k   <- sample(1:nchar(con), 1)  # pick a random index for consonants ...
-         ch  <- substr(con,k,k)          #  ... get the corresponding character ...
+         ch  <- substr(con,k,k)          # ... get the corresponding character ...
          idx <- (2*j)-1                  # ... compute the position (index) of where to put the consonant ...
          pass[idx] <- ch                 # ...  and put it in the right spot
@@ Line 288: / Line 404: @@
      print( paste(pass, collapse="") )  # collapse the vector in to a string and print
 }
-</source>
+</pre>
+Try this a few times.
 }}
@@ Line 298: / Line 417: @@
 Whereas a for-loop runs for a fixed number of times, a "while" loop runs as long as a condition is true, possibly forever. Here is an example, again our high-low game: this time we simulate what happens when we play it more than once with a strategy that compensates us for losing.
-<source lang="rsplus">
+<pre>
 # Let's assume we are playing high-low in a casino. You can bet
 # high or low. You get two dollars for one if you win, nothing
-# if you lose. If you bet "high", you lose if we roll "low"
+# if you lose. If you bet "high", you lose if the dice-roll is "low"
 # or "seven". Thus your chances of winning are 15/36 = 42%. You play
 # the following strategy: start with 33 dollars. Bet one dollar.
@@ Line 312: / Line 431: @@
-set.seed(1234567)
 funds <- 33
 bet <- 1         # our first bet
@@ Line 319: / Line 437: @@
 MAXPLAYS <- 100
+set.seed(1234567)
 while (funds > 0 && funds < 100 && nPlays < MAXPLAYS) {
@@ Line 340: / Line 459: @@
      nPlays <- nPlays + 1
 }
+set.seed(NULL)
 # Now before you get carried away - try this with different seeds
@@ Line 345: / Line 465: @@
 # are not all that great...
-</source>
+</pre>
+A word of caution: if you make a programming error, and the termination condition is never reached, the program will run until the heat-death of the universe, or until your computer crashes, whatever comes first. ALWAYS include a "safety net" in your while condition. Something like:
+<pre>
+nIter <- 0
+MAXITER <- 10000
+while (<whatever>  && nIter < MAXITER) {
+nIter <- nIter + 1
+ ... do something
+}
+# and you could add ...
+if (nIter == MAXITER) { stop("Ooops") }
+</pre>
 {{Vspace}}
@@ Line 354: / Line 489: @@
 ====Exercise====
-A rocket ship has to make a countdown call for the rocket to launch.
+A rocket ship has to sequence a countdown for the rocket to launch.
-You are starting the countdown call from 3.
+You are starting the countdown  from 3.
-You want to print the variable named 'call' that outputs:
+You want to print the value of variable named txt that outputs:
-<source lang="rsplus">
+<pre>
-[1]  "3"          "2"          "1"          "0"          "Blast Off!"
+[1]  "3"          "2"          "1"          "0"          "Lift Off!"
-</source>
+</pre>
-Using what you learned above, write a while loop that gives the output above when calling call.
+Using what you learned above, write a while loop that gives the output above.
 Sample Solution:
-<source lang="rsplus">
+<pre>
-call <- c(3)
+start <- 3
-countdown <- 3
+txt <- as.character(start)
+countdown <- start
 while (countdown > 0) {
    countdown <- countdown - 1
-   call <- c(call, countdown)
+   txt <- c(txt, countdown)
 }
-call <- c(call, "Blast Off!")
+txt <- c(txt, "Lift Off!")
-call
+txt
-</source>
+</pre>
 }}
@@ Line 381: / Line 517: @@
 {{Vspace}}
-{{Vspace}}
-== Further reading, links and resources ==
-<!-- {{#pmid: 19957275}} -->
-<!-- {{WWW|WWW_GMOD}} -->
-<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
-{{Vspace}}
 == Notes ==
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "notes" -->
-<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 <references />
 {{Vspace}}
-</div>
-<div id="ABC-unit-framework">
-== Self-evaluation ==
-<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "self-evaluation" -->
-<!--
-=== Question 1===
-Question ...
-<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
-Answer ...
-<div class="mw-collapsible-content">
-Answer ...
-</div>
-  </div>
-  {{Vspace}}
--->
-{{Vspace}}
-{{Vspace}}
-<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
-----
-{{Vspace}}
-<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
-----
-{{Vspace}}
 <div class="about">
@@ Line 450: / Line 532: @@
 :2017-08-05
 <b>Modified:</b><br />
-:2017-09-10
+:2020-09-17
 <b>Version:</b><br />
-:1.0
+:1.2
 <b>Version history:</b><br />
+*1.2 Maintenance
+*1.0 Update set.seed() usage
+*1.0.1 Maintenance; clarify for-loop comparison
 *1.0 Completed to first live version
 *0.1 Material collected from previous tutorial
 </div>
-[[Category:ABC-units]]
-<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 {{CC-BY}}
+[[Category:ABC-units]]
+{{UNIT}}
+{{LIVE}}
 </div>
 <!-- [END] -->