Expected Preparations:
|
|||||||
|
|||||||
Keywords: Debugging with RStudio; the browser(); debug() and debugonce() commands; setting conditional breakpoints | |||||||
|
|||||||
Objectives:
This unit will …
|
Outcomes:
After working through this unit you …
|
||||||
|
|||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. |
|||||||
|
|||||||
Evaluation: NA: This unit is not evaluated for course marks. |
Working effectively with your IDE’s debugging tools is a prerequisite for efficient software development.
When something goes wrong in your code, you need to look at
intermediate values, as the code executes. Almost always sprinkling
print()
statements all over your code to retrieve such
intermediate values is the least efficient way to isolate problems.
Don’t even be tempted: when you print()
values you are
temporarily modifying your code, and there is a significant risk that
that this will create problems later.
Right from the beginning of your programming trajectory, you should make yourself familiar with R’s debug functions.
Error:
, RStudio will usually
show you a traceback button that can tell you where the error occurred,
and an option to re-run the code in debugger-mode.debug(function)
. When
function()
is next executed, R
will enter the browser mode. Call
undebug(function)
to clear the debugging flag on
your function.debugonce(function)
, which will put the function
into browser mode only at the next execution. Note that you don’t have
to set the debug flag at your top-level function, it could just as well
be set on a function that is called by a function which is called by the
function that you call when the error occurs. Etc.if (condition) { browser() }
to enter the
debugging mode only when your function goes into a state of interest -
e.g. a loop iteration variable right before the error occurs. This is
called a “conditional breakpoint” (or “watchpoint”“). A conditional
breakpoint is especially useful if the problem occurs only rarely, in a
particular context, or late in a long iteration.
Here is an example: let’s write a rollDice()-function, i.e. a function that creates a vector of n integers between 1 and MAX - the number of faces on your die. Open a new R script in RStudio, and copy/paste the following code. Execute, and try this.
rollDice <- function(n = 1, min = 1, max = 6) {
# Simulating the roll of a fair die
# Parameters:
# n numeric the number of rolls that are returned
# min numeric the minimum value returned
# max numeric the maximum value returned
# Value
# Integer vector of length n containing the values
v <- integer(n)
for (i in 1:n) {
x <- runif(1, min, max)
x <- as.integer(x)
v[i] <- x
}
return(v)
}
Lets try running this and see whether the distribution of numbers is fair…
rollDice()
set.seed(112358)
x <- rollDice(10000)
set.seed(NULL)
table(x)
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
Problem: our “fair” die seems to return “fair” numbers - but it only returns values from 1 to 5. Why? Lets flag the function for debugging…
debug(rollDice)
rollDice(10)
# We switch to the browser interface. You can use the icons to go through the
# code step by step, or execute more of the code. You can also step into the
# next function, if one is being called, or step over it (by default). The
# current expression is highlighted in the code pane.
> debug(rollDice)
> rollDice(10)
debugging in: rollDice(10)
debug at #1: {
v <- integer(n)
for (i in 1:n) {
x <- runif(1, min, max)
x <- as.integer(x)
v[i] <- x
}
return(v)
}
Browse[2]>
debug at #10: v <- integer(n)
Browse[2]>
debug at #11: for (i in 1:n) {
x <- runif(1, min, max)
x <- as.integer(x)
v[i] <- x
}
Browse[2]>
debug at #12: x <- runif(1, min, max)
Browse[2]>
debug at #13: x <- as.integer(x)
Browse[2]>
# Typing a variable name allows us to examine its current value:
Browse[2]> x
[1] 4.706351
# Note that as.integer() hasn't been called yet. The Browser shows you the
# next statement or block it will execute.
Browse[2]>
debug at #6: v[i] <- x
Browse[2]>x
debug at #4: x <- runif(1, min = MIN, max = MAX)
Browse[2]> v
[1] 4 # Aha: as.integer() truncates values! So all 5.something values
# get turned into 5 and no 6 is ever returned. So, shall we round()
# instead?
Browse[2]> Q
undebug(rollDice)
So lets change the function to round instead…
rollDice <- function(n = 1, min = 1, max = 6) {
# Simulating the roll of a fair die
# Parameters:
# n numeric the number of rolls that are returned
# min numeric the minimum value returned
# max numeric the maximum value returned
# Value
# Integer vector of length n containing the values
v <- integer(n)
for (i in 1:n) {
x <- runif(1, min, max)
x <- round(x) # <<<- changed to round() from as.integer()
v[i] <- x
}
return(v)
}
rollDice()
set.seed(112358)
x <- rollDice(10000)
set.seed(NULL)
table(x) # Good - now all six numbers are there ...
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
Ooooo! Wrong thinking. That’s even worse - now all the values are there, but our function is no longer fair!
So we actually have to think a bit. * runif(n, min, max)
gives a uniform distribution of numbers. According to the documentation,
this is in the interval (min, max), i.e. the actual limit values are not
included. * as.integer()
is not safe to use in any case,
because it’s behaviour is not explicit. Does it round? Does it truncate?
Does it round up? We should have used trunc()
,
floor()
, ceiling()
, or round()
instead1
for explicit, predictable behaviour. * The key problem is that we have
created values for only 5 intervals, not six. So what we actually need
to do is change the range, by adding 1 to max.
rollDice <- function(n = 1, min = 1, max = 6) {
# Simulating the roll of a fair die
# Parameters:
# n numeric the number of rolls that are returned
# min numeric the minimum value returned
# max numeric the maximum value returned
# Value
# Integer vector of length n containing the values
v <- integer(n)
for (i in 1:n) {
x <- runif(1, min, max + 1) # <<<- increase max by one to give correct number of intervals
x <- trunc(x) # <<<- changed to trunc() from as.integer()
v[i] <- x
}
return(v)
}
rollDice()
set.seed(112358)
x <- rollDice(10000)
set.seed(NULL)
table(x)
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
Now the output looks correct.
sample()
set.seed(112358)
x <- sample(1:6, 10000, replace=TRUE)
set.seed(NULL)
table(x)
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
Now if you look at the table() output, you see that these are the EXACT same numbers, because sample() does exactly the same as our rollDice() function. So why write our own? Because we might want to simulate more complex behaviour, like having a loaded die, or a memory effect, and writing the function ourselves gives us detailed control over the simulation.
For more on RStudio’s debugging interface, see here and here. For a deeper excursion into R debugging, see this article by Duncan Murdoch at UWO, and Roger Peng’s introduction to R debugging tools (PDF).
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]
If you think you know how to round, have a look at the help page to the round function. I looked, I didn’t.↩︎