Difference between revisions of "RPR-Control structures"
m (Created page with "<div id="BIO"> <div class="b1"> Control structures of R: if, else if, ifelse, for, and while </div> {{Vspace}} <div class="keywords"> <b>Keywords:</b> if, else...") |
m |
||
Line 27: | Line 27: | ||
<div id="ABC-unit-framework"> | <div id="ABC-unit-framework"> | ||
== Abstract == | == Abstract == | ||
+ | <section begin=abstract /> | ||
<!-- included from "../components/RPR-Control_structures.components.wtxt", section: "abstract" --> | <!-- included from "../components/RPR-Control_structures.components.wtxt", section: "abstract" --> | ||
... | ... | ||
+ | <section end=abstract /> | ||
{{Vspace}} | {{Vspace}} | ||
Line 62: | Line 64: | ||
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. | *<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. | ||
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" --> | <!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" --> | ||
− | *<b>Journal</b>: Document your progress in your [[FND-Journal| | + | *<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these. |
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" --> | <!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" --> | ||
− | *<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]]. | + | *<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]]. |
{{Vspace}} | {{Vspace}} |
Revision as of 17:33, 7 September 2017
Control structures of R: if, else if, ifelse, for, and while
Keywords: if, else if, ifelse, for, and while; vectorized commands as alternatives
Contents
This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.
Abstract
...
This unit ...
Prerequisites
You need to complete the following units before beginning this one:
Objectives
...
Outcomes
...
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Evaluation
Evaluation: NA
- This unit is not evaluated for course marks.
Contents
Control structures
"Control structures" are parts of the syntax that execute code according to whether some condition is met. Let's look at this with some simple examples:
if and else
# Code pattern
if (<conditional expression evaluates to TRUE>) {
<execute this block>
} else if (<expression evaluates to TRUE>) {
<execute this block>
} else {
<execute this block>
}
# conditional expressions
# anything that evaluates to TRUE or FALSE, or can be
# coerced to a logical.
# Obviously the operators ! == < > <= >= can be used
# but there are also a number of in-built functions that
# are useful for this purpose:
?all
?any
?exists
?is.character
?is.factor
?is.integer
?is.null
?is.numeric
?is.unsorted
?is.vector
# Simple "if" statement:
# Rolling a die. If you get a "six", you get to roll again.
x <- sample(1:6, 1)
if (x == 6) {
x <- c(x, sample(1:6, 1))
}
print(x)
# "if", "else if", and "else"
# Here is a popular dice game called high-low.
a <- sample(1:6, 1)
b <- sample(1:6, 1)
if (a + b > 7) {
print("high")
} else if (a + b < 7) {
print("low")
} else {
print("seven")
}
We need to talk about conditional expressions that test for more than one condition for a moment, things like: "if this is true OR that is true OR my birthday is a Sunday this year ...". To join logical expressions, R has two distinct sets of operators: |
and ||
, and &
and &&
. |
is for "or" and &
is for "and". But what about &&
and ||
? The single operators are "vectorized" whereas the doubled operators short-circuit. This means if you apply the single operators to a vector, you get a vector of results:
x <- c(1, 3, 5, 7, 11, 13, 17)
x > 3 & x < 17 # FALSE FALSE TRUE TRUE TRUE TRUE FALSE: all comparisons
x [x > 3 & x < 17] # 5 7 11 13
x > 3 && x < 17 # FALSE: stop at the first FALSE
The vectorized version is usually what you want, e.g. for subsetting, as above. But it is usually not the right way in control structures: there, you usually want never to evaluate unnecessary expressions. Chained "OR" expressions can be aborted after the first TRUE is encountered, and chained "AND" expressions can be aborted after the first FALSE. Which is what the double operators do.
x <- numeric()
if (length(x) == 0 | is.na(x)) { print("zero") } # throws an error, because is.na() is
# evaluated even though x has length zero.
if (length(x) == 0 || is.na(x)) { print("zero") } # no error: length test is TRUE so is.na()
# never gets evaluated.
Bottom line: always use ||
and &&
in control structures.
for
"for" loops are the workhorse of innumerable R scripts. They are controlled by a sequence, and a variable. The body of the loop is executed once for each element in the sequence. Most often, the sequence is a sequence of integers, created with the colon - the range operator. The template is:
for (<name> in <vector>) {
<expressions ...>
}
# simple for loop
for (i in 1:10) {
print(c(i, i^2, i^3))
}
# Let's stay with the high-low game for a moment:
# What are the odds of winning?
# Let's simulate some runs with a "for" loop.
N <- 25000
outcomes <- character(N) # initialize an empty vector
for (i in 1:N) { # repeat, assign each element of 1:N to
# the variable "i" in turn
a <- sample(1:6, 1)
b <- sample(1:6, 1)
if (a + b > 7) {
outcomes[i] <- "high"
} else if (a + b < 7) {
outcomes[i] <- "low"
} else {
outcomes[i] <- "seven"
}
}
head(outcomes, 36)
table(outcomes) # the table() function tabulates the elements
# of a vector
round((36 * table(outcomes))/N) # Can you explain this expression?
Note that there is nothing special about the expression for (i in 1:N) { ...
. Any expression that generates a sequence of items will do; I write a lot of code like for (fileName in dir()) { ...
or for (gene in data$name) {...
, or for (column in colnames(expressionTable)) {...
etc.
For-loops in R can be slow. That's usually not an issue if you only need to iterate over a thousand items or so. But you should know that there is a function apply()
with a whole family of siblings that is usually about a hundred times faster than a for-loop.
# Compare excution times: one million square roots from a vector ...
n <- 1000000
x <- 1:n
y <- sqrt(x)
# ... or done explicitly in a for-loop
for (i in 1:n) {
y[i] <- sqrt (x[i])
}
If you can achieve your result with an R vector expression, it will be faster than using a loop. But sometimes you need to do things explicitly, for example if you need to access intermediate results.
Here is an example to play some more with loops: a password generator. Passwords are a pain. We need them everywhere, they are frustrating to type, to remember and since the cracking programs are getting smarter they become more and more likely to be broken. Here is a simple password generator that creates random strings with consonant/vowel alterations. These are melodic and easy to memorize, but actually as strong as an 8-character, fully random password that uses all characters of the keyboard such as )He.{2jJ
or #h$bB2X^
(which is pretty much unmemorizable). The former is taken from 207 * 77 1015 possibilities, the latter is from 948 ~ 6*1015 possibilities. High-end GPU supported password crackers can test about 109 passwords a second, the passwords generated by this little algorithm would thus take on the order of 106 seconds or eleven days to crack[1]. This is probably good enough to deter a casual attack.
Task:
Copy, study and run ...
# Suggest memorizable passwords
# Below we use the functions:
?nchar
?sample
?substr
?paste
?print
#define a string of consonants ...
con <- "bcdfghjklmnpqrstvwxz"
# ... and a string of of vowels
vow <- "aeiouy"
for (i in 1:10) { # ten sample passwords to choose from ...
pass = rep("", 14) # make an empty character vector
for (j in 1:7) { # seven consonant/vowel pairs to be created ...
k <- sample(1:nchar(con), 1) # pick a random index for consonants ...
ch <- substr(con,k,k) # ... get the corresponding character ...
idx <- (2*j)-1 # ... compute the position (index) of where to put the consonant ...
pass[idx] <- ch # ... and put it in the right spot
# same thing for the vowel, but coded with fewer intermediate assignments
# of results to variables
k <- sample(1:nchar(vow), 1)
pass[(2*j)] <- substr(vow,k,k)
}
print( paste(pass, collapse="") ) # collapse the vector in to a string and print
}
while
Whereas a for-loop runs for a fixed number of times, a "while" loop runs as long as a condition is true, possibly forever. Here is an example, again our high-low game: this time we simulate what happens when we play it more than once with a strategy that compensates us for losing.
# Let's assume we are playing high-low in a casino. You can bet
# high or low. You get two dollars for one if you win, nothing
# if you lose. If you bet "high", you lose if we roll "low"
# or "seven". Thus your chances of winning are 15/36 = 42%. You play
# the following strategy: start with 33 dollars. Bet one dollar.
# If you win, good. If you loose, triple your bet. Stop the game
# when your funds are gone (bad), or if you have more than 100
# dollars (good) - i.e. you have tripled the funds you risked.
# Also stop if you've played more than 100 rounds and start
# getting bored.
set.seed(1234567)
funds <- 33
bet <- 1 # our first bet
nPlays <- 0 # this counts how often we've played
MAXPLAYS <- 100
while (funds > 0 && funds < 100 && nPlays < MAXPLAYS) {
bet <- min(bet, funds) # can't bet more than we have.
funds <- funds - bet # place the bet
a <- sample(1:6, 1) # roll the dice
b <- sample(1:6, 1)
# we always play "high"
if (a + b > 7) { # we win :-)
result <- "Win! "
funds <- funds + (2 * bet)
bet <- 1 # reset the bet to one dollar
} else { # we lose :-(
result <- "Lose."
bet <- 3 * bet # increase the bet to 3 times previous
}
print(paste("Round", nPlays, result,
"Funds now:", funds,
"Next bet:", bet))
nPlays <- nPlays + 1
}
# Now before you get carried away - try this with different seeds
# and you'll quickly figure out that the odds of beating the game
# are not all that great...
Task:
Exercise
A rocket ship has to make a countdown call for the rocket to launch. You are starting the countdown call from 3. You want to print the variable named 'call' that outputs:
[1] "3" "2" "1" "0" "Blast Off!"
Using what you learned above, write a while loop that gives the output above when calling call.
Sample Solution:
call <- c(3)
countdown <- 3
while (countdown > 0) {
countdown <- countdown - 1
call <- c(call, countdown)
}
call <- c(call, "Blast Off!")
call
Further reading, links and resources
Notes
- ↑ That's assuming the worst case in that the attacker needs to know the pattern with which the password is formed, i.e. the number of characters and the alphabet that we chose from. But note that there is an even worse case: if the attacker had access to our code and the seed to our random number generator. When the random number generator starts up, a new seed is generated from system time, thus the possible space of seeds can be devastatingly small. But even if a seed is set explicitly with the
set.seed()
function, that seed is a 32-bit integer and thus can take only a bit more than 4*109 values, six orders of magnitude less than the 1015 password complexity we thought we had. It turns out that the code may be a much greater vulnerability than the password itself. Keep that in mind. Keep it secret. Keep it safe.
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-08-05
Version:
- 0.1
Version history:
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.