Difference between revisions of "RPR-Debugging"

From "A B C"
Jump to navigation Jump to search
m
m
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Debugging R
 
Debugging R
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Debugging with RStudio; the browser(), debug() and debugonce() commands; setting conditional breakpoints)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Debugging with RStudio; the browser(), debug() and debugonce() commands; setting conditional breakpoints
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
__TOC__
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 
+
<div style="font-size:118%;">
{{Vspace}}
+
<b>Abstract:</b><br />
 
 
 
 
{{DEV}}
 
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="ABC-unit-framework">
 
== Abstract ==
 
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "abstract" -->
+
Working effectively with your IDE's debugging tools is a prerequisite for efficient software development.
...
 
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================  -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
+
This unit will ...
You need to complete the following units before beginning this one:
+
* ... introduce the R "Browser", the in-built debugging tool;
 +
* ... demonstrate debugging a function.
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... can invoke the debugger on a function once or multiple times;
 +
* ... can step through code line by line, and examine the values of variables as you are doing so;
 +
* ... are familar with "conditional breakpoints" and know how to set them;
 +
* ... can confidently debug your own functions.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================ -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]]
 
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Objectives ===
 
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "objectives" -->
 
...
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Outcomes ===
+
__TOC__
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "outcomes" -->
 
...
 
 
 
{{Vspace}}
 
 
 
 
 
=== Deliverables ===
 
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 72: Line 68:
  
 
=== Evaluation ===
 
=== Evaluation ===
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
 
<b>Evaluation: NA</b><br />
 
<b>Evaluation: NA</b><br />
:This unit is not evaluated for course marks.
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="BIO">
 
 
== Contents ==
 
== Contents ==
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "contents" -->
+
<!-- ToDo
<!-- ... develop additional example from http://r.789695.n4.nabble.com/Bug-td4743928.html#a4743929 -->
+
... develop additional example from http://r.789695.n4.nabble.com/Bug-td4743928.html#a4743929
 
+
... develop conditional breakpoint example from anomalous sample() behaviour
 
+
-->
also see: http://steipe.biochemistry.utoronto.ca/abc/students/index.php/User:Farzan.Taj/Tasks/Debugging_With_RStudio
 
 
 
===Debugging===
 
  
When something goes wrong in your code, you need to look at intermediate values, as the code executes. Almost always sprinkling {{c|print()}} statements all over your code to retrieve such intermediate values is the least efficient way to isolate problems. But what is worse, you are temporarily modifying your code, and there is a significant risk that that this will create problems later.
+
When something goes wrong in your code, you need to look at intermediate values, as the code executes. Almost always sprinkling <code>print()</code> statements all over your code to retrieve such intermediate values is the least efficient way to isolate problems. Don't even be tempted: when you <code>print()</code> values you are temporarily modifying your code, and there is a significant risk that that this will create problems later.
  
 
Right from the beginning of your programming trajectory, you should make yourself familiar with '''R''''s debug functions.
 
Right from the beginning of your programming trajectory, you should make yourself familiar with '''R''''s debug functions.
  
* At first, you may need to pin down approximately where an error occurs. Read the error message carefully, or perhaps do print out some intermediate values from a loop.
+
* At first, you may need to pin down approximately where an error occurs. Read the error message carefully, or perhaps do print out some intermediate values from a loop. If there is an <span style="color:#CC0000;"><tt>Error: </tt></span>, RStudio will usually show you a traceback button that can tell you where the error occurred, and an option to re-run the code in debugger-mode.
 +
* Make sure that it's your code that is at fault, not something else - Google for the error message to get a better idea about what is happening.
 
* Debugging is done by entering a "browser" mode that allows you to step through a function.
 
* Debugging is done by entering a "browser" mode that allows you to step through a function.
 
* To enter this browser mode ...
 
* To enter this browser mode ...
** Call {{c|debug(''function'')}}. When ''function()'' is next executed, '''R''' will enter the browser mode. Call {{c|undebug(''function'')}} to clear the debugging mode. (Or use the {{c|debugonce(''function'')}}. )
+
** Call <code>debug(''function'')</code>. When <code>''function()''</code> is next executed, '''R''' will enter the browser mode. Call <code>undebug(''function'')</code> to clear the debugging flag on your function.
** Alternatively insert {{c|browser()}} into your function code to enter the browser mode. This sets a ''breakpoint'' into your function; use  {{c|if (condition) browser()}} to insert a ''conditional breakpoint'' (or watchpoint). This is especially useful if the problem occurs only rarely, in a particular context.
+
** Alternatively, you can use the <code>debugonce(''function'')</code>, which will put the function into browser mode only at the next execution. Note that you don't have to set the debug flag at your top-level function, it could just as well be set on a function that is called by a function which is called by the function that you call when the error occurs. Etc.
* It should go without saying that you need to discover that problems exist in the first place: study the [[RPR-Testing|Testing learning unit]] and test, test, and test again.
+
** Alternatively, open the function in the editor and '''click into the left margin'' (next to the line numbers). A red dot will appear there, and R will go into debug mode when that line of the code is reached. This sets a ''breakpoint'' into your function. You can also use  <code>if (''condition'') { browser() }</code> to enter the debugging mode only when your function goes into a state of interest - e.g. a loop iteration variable right before the error occurs. This is called a "conditional breakpoint" (or "watchpoint""). A conditional breakpoint is especially useful if the problem occurs only rarely, in a particular context, or late in a long iteration.
 +
* It should go without saying that you need to discover that problems exist in the first place: study the [[RPR-Unit_testing|Testing learning unit]] and test, test, and test again.
  
 
{{Vspace}}
 
{{Vspace}}
  
Here is an example: let's write a rollDice-function, i.e. a function that creates a vector of ''n'' integers between 1 and MAX - the number of faces on your die.
+
Here is an example: let's write a rollDice()-function, i.e. a function that creates a vector of ''n'' integers between 1 and MAX - the number of faces on your die. Open a new R script in RStudio, and copy/paste the following code. Execute, and try this.
 +
 
 +
 
 +
<pre>
 +
rollDice <- function(n = 1, min = 1, max = 6) {
 +
  # Simulating the roll of a fair die
 +
  # Parameters:
 +
  #    n    numeric  the number of rolls that are returned
 +
  #    min  numeric  the minimum value returned
 +
  #    max  numeric  the maximum value returned
 +
  # Value
 +
  #    Integer vector of length n containing the values
  
<source lang="rsplus">
+
  v <- integer(n)
rollDice <- function(len=1, MIN=1, MAX=6) {
+
  for (i in 1:n) {
    v <- rep(0, len)
+
    x <- runif(1, min, max)
    for (i in 1:len) {
+
    x <- as.integer(x)
        x <- runif(1, min=MIN, max=MAX)
+
    v[i] <- x
        x <- as.integer(x)
+
  }
        v[i] <- x
+
  return(v)
    }
 
    return(v)
 
 
}
 
}
</source>
+
</pre>
  
Lets try running this...
+
Lets try running this and see whether the distribution of numbers is fair...
<source lang="rsplus">
+
<pre>
 
rollDice()
 
rollDice()
table(rollDice(1000))
 
</source>
 
  
Problem: we see only values from 1 to 5. Why? Lets flag the function for debugging...
+
set.seed(112358)
<source lang="rsplus">
+
x <- rollDice(10000)
 +
set.seed(NULL)
 +
 
 +
table(x)
 +
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
 +
</pre>
 +
 
 +
Problem: our "fair" die seems to return "fair" numbers - but it only returns values from 1 to 5. Why? Lets flag the function for debugging...
 +
<pre>
 
debug(rollDice)
 
debug(rollDice)
 
rollDice(10)
 
rollDice(10)
 +
 +
# We switch to the browser interface. You can use the icons to go through the
 +
# code step by step, or execute more of the code. You can also step into the
 +
# next function, if one is being called, or step over it (by default). The
 +
# current expression is highlighted in the code pane.
 +
 +
> debug(rollDice)
 +
> rollDice(10)
 
debugging in: rollDice(10)
 
debugging in: rollDice(10)
 
debug at #1: {
 
debug at #1: {
     v <- rep(0, len)
+
     v <- integer(n)
     for (i in 1:len) {
+
     for (i in 1:n) {
         x <- runif(1, min = MIN, max = MAX)
+
         x <- runif(1, min, max)
 
         x <- as.integer(x)
 
         x <- as.integer(x)
    v[i] <- x
+
        v[i] <- x
 
     }
 
     }
 
     return(v)
 
     return(v)
 
}
 
}
 
Browse[2]>
 
Browse[2]>
debug at #2: v <- rep(0, len)
+
debug at #10: v <- integer(n)
 
Browse[2]>
 
Browse[2]>
debug at #3: for (i in 1:len) {
+
debug at #11: for (i in 1:n) {
     x <- runif(1, min = MIN, max = MAX)
+
     x <- runif(1, min, max)
 
     x <- as.integer(x)
 
     x <- as.integer(x)
 
     v[i] <- x
 
     v[i] <- x
 
}
 
}
 
Browse[2]>
 
Browse[2]>
debug at #4: x <- runif(1, min = MIN, max = MAX)
+
debug at #12: x <- runif(1, min, max)
 +
Browse[2]>
 +
debug at #13: x <- as.integer(x)
 
Browse[2]>
 
Browse[2]>
debug at #5: x <- as.integer(x)
+
 
Browse[2]> x  # Here we examine the current value of x
+
# Typing a variable name allows us to examine its current value:
[1] 4.506351
+
 
 +
Browse[2]> x
 +
[1] 4.706351
 +
 
 +
# Note that as.integer() hasn't been called yet. The Browser shows you the
 +
# next statement or block it will execute.
 +
 
 
Browse[2]>
 
Browse[2]>
 
debug at #6: v[i] <- x
 
debug at #6: v[i] <- x
Browse[2]>
+
Browse[2]>x
 
debug at #4: x <- runif(1, min = MIN, max = MAX)
 
debug at #4: x <- runif(1, min = MIN, max = MAX)
 
Browse[2]> v
 
Browse[2]> v
[1] 4      # Aha: as.integer() truncates, but doesn't round!
+
[1] 4      # Aha: as.integer() truncates values! So all 5.something values
 +
          # get turned into 5 and no 6 is ever returned. So, shall we round()
 +
          # instead?
 
Browse[2]> Q
 
Browse[2]> Q
 
undebug(rollDice)
 
undebug(rollDice)
</source>
+
</pre>
  
  
We need to change the range of the random input values...
+
So lets change the function to round instead...
<source lang="rsplus">
+
<pre>
rollDice <- function(len=1, MIN=1, MAX=6) {
+
rollDice <- function(n = 1, min = 1, max = 6) {
    v <- rep(0, len)
+
  # Simulating the roll of a fair die
    for (i in 1:len) {
+
  # Parameters:
    x <- runif(1, min=MIN, max=MAX+1)
+
  #    n    numeric  the number of rolls that are returned
    x <- as.integer(x)
+
  #    min  numeric  the minimum value returned
    v[i] <- x
+
  #    max  numeric  the maximum value returned
    }
+
  # Value
    return(v)
+
  #    Integer vector of length n containing the values
 +
 
 +
  v <- integer(n)
 +
  for (i in 1:n) {
 +
    x <- runif(1, min, max)
 +
    x <- round(x)    # <<<- changed to round() from as.integer()
 +
    v[i] <- x
 +
  }
 +
  return(v)
 
}
 
}
table(rollDice(1000))
 
</source>
 
  
 +
rollDice()
  
Now the output looks correct.
+
set.seed(112358)
<source lang="rsplus">
+
x <- rollDice(10000)
# Disclaimer 1: this function would be better
+
set.seed(NULL)
# written as ...
 
  
rollDice <- function(len=1, MIN=1, MAX=6) {
+
table(x)  # Good - now all six numbers are there ...
return(as.integer(runif(len, min=MIN, max=MAX+1)))
+
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
}
+
</pre>
  
# Check the output:
+
Ooooo! '''Wrong''' thinking. That's even worse - now all the values are there, but our function is no longer fair!
table(rollDice(1000))
 
  
# This works, since runif() can return a vector of deviates,
+
So we actually have to think a bit.
# but if we write the function this way we can't check the value of
+
* <code>runif(n, min, max)</code> gives a uniform distribution of numbers. According to the documentation, this is in the interval (min, max), i.e. the actual limit values are not included.
# individual trials.
+
* <code>as.integer()</code> is not safe to use in any case, because it's behaviour is not explicit. Does it round? Does it truncate? Does it round up? We should have used <code>trunc()</code>, <code>floor()</code>, <code>ceiling()</code>, or <code>round()</code> instead<ref>If you think you know how to round, have a look at the help page to the round function. I looked, I didn't.</ref> for explicit, predictable behaviour.
 +
* The key problem is that we have created values for only 5 intervals, not six. So what we actually need to do is change the range, by adding 1 to max.
  
 +
{{Smallvspace}}
  
# Disclaimer 2: the function relies on a side-effect of as.integer(), which is
+
<pre>
# to drop the digits after the comma when it converts. More explicit and
+
rollDice <- function(n = 1, min = 1, max = 6) {
# therefore clearer would be to use the function floor() instead. Here, the
+
  # Simulating the roll of a fair die
# truncation is not a side effect, but the desired behaviour. This is
+
  # Parameters:
# actually important: there is no guarantee how as.integer() constructs an
+
  #   n    numeric  the number of rolls that are returned
# integer from a float, it could e.g. round, instead of truncating. But rounding
+
  #   min  numeric  the minimum value returned
# would give a wrong distribution! An error that may be hard to spot. (You
+
  #   max  numeric  the maximum value returned
# can easily try using the round() function and think about how the result is wrong.)
+
  # Value
 +
  #   Integer vector of length n containing the values
  
# A better alternative is thus to write:
+
  v <- integer(n)
 +
  for (i in 1:n) {
 +
    x <- runif(1, min, max + 1) # <<<- increase max by one to give correct number of intervals
 +
    x <- trunc(x)              # <<<- changed to trunc() from as.integer()
 +
    v[i] <- x
 +
  }
 +
  return(v)
 +
}
  
rollDice <- function(len=1, MIN=1, MAX=6) {
+
rollDice()
return(floor(runif(len, min=MIN, max=MAX+1)))
 
}
 
  
 +
set.seed(112358)
 +
x <- rollDice(10000)
 +
set.seed(NULL)
  
 +
table(x)
 +
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
 +
</pre>
  
# Disclaimer 3
 
# A base R function exists that already rolls dice in the required way: sample()
 
  
table(sample(1:6, 1000, replace=TRUE))
+
Now the output looks correct.
</source>
 
  
 +
{{Smallvspace}}
  
 +
;Disclaimer!
 +
:A base R function exists that does the same thing: <code>sample()</code>
  
For the helpful debugging interface that comes with with '''RStudio''', see [http://www.r-bloggers.com/visual-debugging-with-rstudio/ '''here'''] and [https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio '''here'''].
+
{{Smallvspace}}
  
For a deeper excursion into '''R''' debugging, see [http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml this overview by Duncan Murdoch at UWO], and {{PDFlink|[http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf Roger Peng's introduction to R debugging tools]}}.
+
<pre>
 +
set.seed(112358)
 +
x <- sample(1:6, 10000, replace=TRUE)
 +
set.seed(NULL)
  
{{Vspace}}
+
table(x)
 +
hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
 +
</pre>
  
 +
Now if you look at the table() output, you see that these are the EXACT same numbers, because sample() does exactly the same as our rollDice() function. So why write our own? Because we might want to simulate more complex behaviour, like having a loaded die, or a memory effect, and writing the function ourselves gives us detailed control over the simulation.
  
 
{{Vspace}}
 
{{Vspace}}
 
  
 
== Further reading, links and resources ==
 
== Further reading, links and resources ==
<!-- {{#pmid: 19957275}} -->
+
{{Smallvspace}}
<!-- {{WWW|WWW_GMOD}} -->
+
For more on RStudio's debugging interface, see [http://www.r-bloggers.com/visual-debugging-with-rstudio/ '''here'''] and [https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio '''here'''].
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
 
 
{{Vspace}}
 
  
 +
For a deeper excursion into '''R''' debugging, see [https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Murdoch.pdf this article by Duncan Murdoch at UWO], and {{PDFlink|[http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf Roger Peng's introduction to R debugging tools]}}.
  
 
== Notes ==
 
== Notes ==
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/RPR-Debugging.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 294: Line 294:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-25
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.2
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.2 2020 Maintenance
 +
*1.1 Update set.seed() usage
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 12:38, 26 September 2020

Debugging R

(Debugging with RStudio; the browser(), debug() and debugonce() commands; setting conditional breakpoints)


 


Abstract:

Working effectively with your IDE's debugging tools is a prerequisite for efficient software development.


Objectives:
This unit will ...

  • ... introduce the R "Browser", the in-built debugging tool;
  • ... demonstrate debugging a function.

Outcomes:
After working through this unit you ...

  • ... can invoke the debugger on a function once or multiple times;
  • ... can step through code line by line, and examine the values of variables as you are doing so;
  • ... are familar with "conditional breakpoints" and know how to set them;
  • ... can confidently debug your own functions.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    When something goes wrong in your code, you need to look at intermediate values, as the code executes. Almost always sprinkling print() statements all over your code to retrieve such intermediate values is the least efficient way to isolate problems. Don't even be tempted: when you print() values you are temporarily modifying your code, and there is a significant risk that that this will create problems later.

    Right from the beginning of your programming trajectory, you should make yourself familiar with R's debug functions.

    • At first, you may need to pin down approximately where an error occurs. Read the error message carefully, or perhaps do print out some intermediate values from a loop. If there is an Error: , RStudio will usually show you a traceback button that can tell you where the error occurred, and an option to re-run the code in debugger-mode.
    • Make sure that it's your code that is at fault, not something else - Google for the error message to get a better idea about what is happening.
    • Debugging is done by entering a "browser" mode that allows you to step through a function.
    • To enter this browser mode ...
      • Call debug(function). When function() is next executed, R will enter the browser mode. Call undebug(function) to clear the debugging flag on your function.
      • Alternatively, you can use the debugonce(function), which will put the function into browser mode only at the next execution. Note that you don't have to set the debug flag at your top-level function, it could just as well be set on a function that is called by a function which is called by the function that you call when the error occurs. Etc.
      • Alternatively, open the function in the editor and 'click into the left margin (next to the line numbers). A red dot will appear there, and R will go into debug mode when that line of the code is reached. This sets a breakpoint into your function. You can also use if (condition) { browser() } to enter the debugging mode only when your function goes into a state of interest - e.g. a loop iteration variable right before the error occurs. This is called a "conditional breakpoint" (or "watchpoint""). A conditional breakpoint is especially useful if the problem occurs only rarely, in a particular context, or late in a long iteration.
    • It should go without saying that you need to discover that problems exist in the first place: study the Testing learning unit and test, test, and test again.


     

    Here is an example: let's write a rollDice()-function, i.e. a function that creates a vector of n integers between 1 and MAX - the number of faces on your die. Open a new R script in RStudio, and copy/paste the following code. Execute, and try this.


    rollDice <- function(n = 1, min = 1, max = 6) {
      # Simulating the roll of a fair die
      # Parameters:
      #    n    numeric  the number of rolls that are returned
      #    min  numeric  the minimum value returned
      #    max  numeric  the maximum value returned
      # Value
      #    Integer vector of length n containing the values
    
      v <- integer(n)
      for (i in 1:n) {
        x <- runif(1, min, max)
        x <- as.integer(x)
        v[i] <- x
      }
      return(v)
    }
    

    Lets try running this and see whether the distribution of numbers is fair...

    rollDice()
    
    set.seed(112358)
    x <- rollDice(10000)
    set.seed(NULL)
    
    table(x)
    hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
    

    Problem: our "fair" die seems to return "fair" numbers - but it only returns values from 1 to 5. Why? Lets flag the function for debugging...

    debug(rollDice)
    rollDice(10)
    
    # We switch to the browser interface. You can use the icons to go through the
    # code step by step, or execute more of the code. You can also step into the
    # next function, if one is being called, or step over it (by default). The
    # current expression is highlighted in the code pane.
    
    > debug(rollDice)
    > rollDice(10)
    debugging in: rollDice(10)
    debug at #1: {
        v <- integer(n)
        for (i in 1:n) {
            x <- runif(1, min, max)
            x <- as.integer(x)
            v[i] <- x
        }
        return(v)
    }
    Browse[2]>
    debug at #10: v <- integer(n)
    Browse[2]>
    debug at #11: for (i in 1:n) {
        x <- runif(1, min, max)
        x <- as.integer(x)
        v[i] <- x
    }
    Browse[2]>
    debug at #12: x <- runif(1, min, max)
    Browse[2]>
    debug at #13: x <- as.integer(x)
    Browse[2]>
    
    # Typing a variable name allows us to examine its current value:
    
    Browse[2]> x
    [1] 4.706351
    
    # Note that as.integer() hasn't been called yet. The Browser shows you the
    # next statement or block it will execute.
    
    Browse[2]>
    debug at #6: v[i] <- x
    Browse[2]>x
    debug at #4: x <- runif(1, min = MIN, max = MAX)
    Browse[2]> v
    [1] 4      # Aha: as.integer() truncates values! So all 5.something values
               # get turned into 5 and no 6 is ever returned. So, shall we round()
               # instead?
    Browse[2]> Q
    undebug(rollDice)
    


    So lets change the function to round instead...

    rollDice <- function(n = 1, min = 1, max = 6) {
      # Simulating the roll of a fair die
      # Parameters:
      #    n    numeric  the number of rolls that are returned
      #    min  numeric  the minimum value returned
      #    max  numeric  the maximum value returned
      # Value
      #    Integer vector of length n containing the values
    
      v <- integer(n)
      for (i in 1:n) {
        x <- runif(1, min, max)
        x <- round(x)     # <<<- changed to round() from as.integer()
        v[i] <- x
      }
      return(v)
    }
    
    rollDice()
    
    set.seed(112358)
    x <- rollDice(10000)
    set.seed(NULL)
    
    table(x)   # Good - now all six numbers are there ...
    hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
    

    Ooooo! Wrong thinking. That's even worse - now all the values are there, but our function is no longer fair!

    So we actually have to think a bit.

    • runif(n, min, max) gives a uniform distribution of numbers. According to the documentation, this is in the interval (min, max), i.e. the actual limit values are not included.
    • as.integer() is not safe to use in any case, because it's behaviour is not explicit. Does it round? Does it truncate? Does it round up? We should have used trunc(), floor(), ceiling(), or round() instead[1] for explicit, predictable behaviour.
    • The key problem is that we have created values for only 5 intervals, not six. So what we actually need to do is change the range, by adding 1 to max.


     
    rollDice <- function(n = 1, min = 1, max = 6) {
      # Simulating the roll of a fair die
      # Parameters:
      #    n    numeric  the number of rolls that are returned
      #    min  numeric  the minimum value returned
      #    max  numeric  the maximum value returned
      # Value
      #    Integer vector of length n containing the values
    
      v <- integer(n)
      for (i in 1:n) {
        x <- runif(1, min, max + 1) # <<<- increase max by one to give correct number of intervals
        x <- trunc(x)               # <<<- changed to trunc() from as.integer()
        v[i] <- x
      }
      return(v)
    }
    
    rollDice()
    
    set.seed(112358)
    x <- rollDice(10000)
    set.seed(NULL)
    
    table(x)
    hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
    


    Now the output looks correct.


     
    Disclaimer!
    A base R function exists that does the same thing: sample()


     
    set.seed(112358)
    x <- sample(1:6, 10000, replace=TRUE)
    set.seed(NULL)
    
    table(x)
    hist(x, breaks = seq(0.5, 6.5, by = 1), xlim = c(0, 7), col = "#BBEEFF")
    

    Now if you look at the table() output, you see that these are the EXACT same numbers, because sample() does exactly the same as our rollDice() function. So why write our own? Because we might want to simulate more complex behaviour, like having a loaded die, or a memory effect, and writing the function ourselves gives us detailed control over the simulation.


     

    Further reading, links and resources

     

    For more on RStudio's debugging interface, see here and here.

    For a deeper excursion into R debugging, see this article by Duncan Murdoch at UWO, and Roger Peng's introduction to R debugging tools.

    Notes

    1. If you think you know how to round, have a look at the help page to the round function. I looked, I didn't.


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-25

    Version:

    1.2

    Version history:

    • 1.2 2020 Maintenance
    • 1.1 Update set.seed() usage
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.