Difference between revisions of "RPR-Functions"

From "A B C"
Jump to navigation Jump to search
m
m
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
R Functions
 
R Functions
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Anatomy of a function: arguments, parameters and values; the concept of functional programming.)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Anatomy of a function anatomy, arguments, parameters and values; the concept of functional programming
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
  
{{Vspace}}
 
  
 
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
{{LIVE}}
+
<div style="font-size:118%;">
 
+
<b>Abstract:</b><br />
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="ABC-unit-framework">
 
== Abstract ==
 
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/RPR-Functions.components.wtxt", section: "abstract" -->
+
In this unit we discuss the "anatomy"" of R functions: arguments, parameters and values, and how R's treatment of functions supports "functional programming".
...
 
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================ -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "../components/RPR-Functions.components.wtxt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[RPR-Control_structures]]
 
 
 
{{Vspace}}
 
 
 
 
 
=== Objectives ===
 
<!-- included from "../components/RPR-Functions.components.wtxt", section: "objectives" -->
 
 
This unit will ...
 
This unit will ...
* ... introduce ;
+
* ... introduce the basic pattern of R functions;
* ... discuss ;
+
* ... discuss arguments and parameters;
* ... teach ;
+
* ... show how to retrieve the source code from within a function;
 +
* ... practice writing your own functions.
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... know how to pass parameters into functions and assign the returned values;
 +
* ... can read, analyze, and write your own functions.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<ul>
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
</ul>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[RPR-Control_structures|RPR-Control_structures (Control structures of R)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Outcomes ===
 
<!-- included from "../components/RPR-Functions.components.wtxt", section: "outcomes" -->
 
After working through this unit you ...
 
* ... have done;
 
* ... know how ;
 
* ... can ;
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Deliverables ===
+
__TOC__
<!-- included from "../components/RPR-Functions.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 78: Line 70:
  
 
=== Evaluation ===
 
=== Evaluation ===
<!-- included from "../components/RPR-Functions.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
 
<b>Evaluation: NA</b><br />
 
<b>Evaluation: NA</b><br />
:This unit is not evaluated for course marks.
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="BIO">
 
 
== Contents ==
 
== Contents ==
<!-- included from "../components/RPR-Functions.components.wtxt", section: "contents" -->
 
 
===Functions===
 
===Functions===
  
 +
R is considered an (impure) {{WP|Functional_programming|functional programming language}} and thus the focus of '''R''' programs is on functions. The key advantage is that this encourages programming without side-effects and this makes it easier to write error free code and maintain it. Function parameters<ref>The terms ''parameter'' and ''argument'' have similar but distinct meanings. A ''parameter'' is an item that appears in the function definition, an ''argument'' is the actual value that is passed into the function.</ref> are instantiated for use inside a function as the function's arguments, and a single result is returned<ref>However a function may have ''side-effects'', such as writing something to console, plotting graphics, saving data to a file, or changing the value of variables outside the function ''scope''. But changing values outside the scope is poor practice, and should always be avoided.</ref>. The return values can either be assigned to a variable, or used directly as the argument of another function. This means functions can be nested, and intermediate assignment is not required.
  
R is considered an (impure) {{WP|Functional_programming|functional programming language}} and thus the focus of '''R''' programs is on functions. The key advantage is that this encourages programming without side-effects and this makes it easier to reason about the correctness of programs. Function parameters<ref>The terms ''parameter'' and ''argument'' have similar but distinct meanings. A ''parameter'' is an item that appears in the function definition, an ''argument'' is the actual value that is passed into the function.</ref> are instantiated for use inside functions as the function's arguments, and a single result is returned<ref>However a function may have ''side-effects'', such as writing something to console, plotting graphics, saving data to a file, or changing the value of variables outside the function ''scope''. Avoid the latter, it is fragile and poor practice.</ref>. The return values can either be assigned to a variable, or used directly as the argument of another function - intermediate assignment is not required.
+
Functions are either ''built-in'' (''i.e.'' available in the basic '''R''' installation), loaded via specific packages, or they can be defined by you (see below). In general a function is invoked through its name, followed by zero or more arguments in parentheses, separated by commas. Whenever I refer to a function, I write the parentheses to identify it as such and not a constant or other keyword eg. <code>log()</code>. Here are some examples for you to try and play with:
 
 
Functions are either ''built-in'' (''i.e.'' available in the basic '''R''' installation), loaded via specific packages (see above), or they can be easily defined by you (see below). In general a function is invoked through a name, followed by one or more arguments in parentheses, separated by commas. Whenever I refer to a function, I write the parentheses to identify it as such and not a constant or other keyword eg. <code>log()</code>. Here are some examples for you to try and play with:
 
  
<source lang="rsplus">
+
<pre>
 
cos(pi) #"pi" is a predefined constant.
 
cos(pi) #"pi" is a predefined constant.
 
sin(pi) # Note the rounding error. This number is not really different from zero.
 
sin(pi) # Note the rounding error. This number is not really different from zero.
 
sin(30 * pi/180) # Trigonometric functions use radians as their argument - this conversion calculates sin(30 degrees)
 
sin(30 * pi/180) # Trigonometric functions use radians as their argument - this conversion calculates sin(30 degrees)
 
exp(1) # "e" is not predefined, but easy to calculate.
 
exp(1) # "e" is not predefined, but easy to calculate.
log(exp(1)) # functions can be arguments to functions - they are evaluated from the inside out.
+
log(exp(1)) # functions can be arguments to functions - nested functions are evaluated from the inside out.
 
log(10000) / log(10) # log() calculates natural logarithms; convert to any base by dividing by the log of the base. Here: log to base 10.
 
log(10000) / log(10) # log() calculates natural logarithms; convert to any base by dividing by the log of the base. Here: log to base 10.
exp(complex(r=0, i=pi)) #Euler's identity
+
exp(complex(r=0, i=pi)) # Euler's identity
</source>
+
utils::example("wilcox.test")    # example() is a function from the util:: package
 +
                                # and runs the code in the Examples sexction of
 +
                                # R-help pages
 +
</pre>
 +
 
 +
There are several ways to populate the argument list for a function and '''R''' makes a reasonable guess what you want to do. Arguments can either be used in their predefined order, or assigned via an argument ''name''. Let's look at the <code>complex()</code> function to illustrate this. Consider the specification of a complex number in Euler's identity above. The function <code>complex()</code> can work with a number of arguments that are explained in the documentation (see: <code>?complex</code>). Its signature includes <code>length.out</code>, <code>real</code>, <code>imaginary</code>, and some more.
 +
 
 +
<pre>
 +
complex(length.out = 0, real = numeric(), imaginary = numeric(), modulus = 1, argument = 0)
 +
</pre>
 +
 
 +
The <code>length.out</code> argument creates a vector with one or more complex numbers. If nothing else is specified, this will be a vector of complex zero(s). If there are two, or three arguments, they will be placed in the respective slots. However, since the arguments are '''named''', we can also define which slot of the argument list they should populate.
  
There are several ways to populate the argument list for a function and '''R''' makes a reasonable guess what you want to do. Arguments can either be used in their predefined order, or assigned via an argument ''name''. Let's look at the <code>complex()</code> function to illustrate this. Consider the specification of a complex number in Euler's identity above. The function {{c|complex()}} can work with a number of arguments that are given in the documentation (see: <code>?complex</code>). These include <code>length.out</code>, <code>real</code>, <code>imaginary</code>, and some more. The <code>length.out</code> argument creates a vector with one or more complex numbers. If nothing else is specified, this will be a vector of complex zero(s). If there are two, or three arguments, they will be placed in the respective slots. However, since the arguments are '''named''', we can also define which slot of the argument list they should populate. Consider the following to illustrate this:
 
  
<source lang="rsplus">
+
Consider the following to illustrate this:
complex(1)
+
 
 +
<pre>
 +
complex(1)   # parameter is in the first slot -> length.out
 
complex(4)
 
complex(4)
complex(1, 2) # imaginary part missing: if it's missing it defaults to zero
+
complex(1, 2) # imaginary part missing
complex(1, 2, 3) # one complex number
+
complex(1, 2, 3) # one complex number with real and imaginary parts defined
 
complex(4, 2, 3) # four complex numbers
 
complex(4, 2, 3) # four complex numbers
 
complex(real = 0, imaginary = pi) # defining values via named parameters
 
complex(real = 0, imaginary = pi) # defining values via named parameters
 
complex(imaginary = pi, real = 0) # same thing - if names are used, order is not important
 
complex(imaginary = pi, real = 0) # same thing - if names are used, order is not important
 
complex(re = 0, im = pi) # names can be abbreviated ...
 
complex(re = 0, im = pi) # names can be abbreviated ...
complex(r = 0, i = pi)  # ... to the shortest string that is unique among the named parameters.
+
complex(r = 0, i = pi)  # ... to the shortest string that is unique among the named parameters,
                         # A strongly advise against this to keep your code readable for others.
+
                         # but this is _poor_ practice, strongly advises against.
 
complex(i = pi, 1, 0) # Think: what have I done here? Why does this work?
 
complex(i = pi, 1, 0) # Think: what have I done here? Why does this work?
 
exp(complex(i = pi, 1, 0)) # (The complex number above is the same as in Euler's identity.)
 
exp(complex(i = pi, 1, 0)) # (The complex number above is the same as in Euler's identity.)
</source>
+
</pre>
  
 
{{task|1=
 
{{task|1=
Line 134: Line 128:
  
 
}}
 
}}
 +
 +
{{Vspace}}
 +
 +
====On missing parameters====
 +
 +
If a parameter is missing several things can happen. Let's illustrate wih a little function that returns the golden-ratio pair to a number, either the smaller, or the larger one.
 +
 +
<pre>
 +
goldenRatio <- function(x, smaller) {
 +
  phi <- (1 + sqrt(5)) / 2
 +
  if (smaller == TRUE) {
 +
    return(x / phi)
 +
  } else {
 +
    return(x * phi)
 +
  }
 +
}
 +
</pre>
 +
 +
* If there's no way to recover, executing the function will throw an error:
 +
 +
<pre>
 +
goldenRatio(1)
 +
# Error in goldenRatio(1) : argument "smaller" is missing, with no default
 +
</pre>
 +
 +
* If the function has a default parameter defined, it is used :
 +
<pre>
 +
goldenRatio <- function(x, smaller = TRUE) {
 +
  phi <- (1 + sqrt(5)) / 2
 +
  if (smaller == TRUE) {
 +
    return(x / phi)
 +
  } else {
 +
    return(x * phi)
 +
  }
 +
}
 +
 +
goldenRatio(1)
 +
# [1] 0.618034
 +
</pre>
 +
 +
* Alternatively, the function body can check whether a parameter is missing with the <code>missing()</code> function, and then react accordingly:
 +
 +
<pre>
 +
goldenRatio <- function(x, smaller) {
 +
  if (missing(smaller)) {
 +
    smaller <- TRUE
 +
  }
 +
  phi <- (1 + sqrt(5)) / 2
 +
  if (smaller == TRUE) {
 +
    return(x / phi)
 +
  } else {
 +
    return(x * phi)
 +
  }
 +
}
 +
 +
goldenRatio(1)
 +
# [1] 0.618034
 +
 +
goldenRatio(1, smaller = FALSE)
 +
# [1] 1.618034
 +
</pre>
 +
Why is this useful, if you could just define a default? Because the parameter can then be the result of a (complex) computation, based on other parameters, done in the function body. Whereas if you pass the argument into the function, you need to know the desired value ahead of time.
 +
 +
 +
{{Vspace}}
 +
 +
===Reading functions===
 +
 +
R is open-source; this means that you can find, and study the source code of all functions - IF you know where to find it. For many cases this is very easy. I cover the most frequent cases below; for a more detailed discussion, see [https://stackoverflow.com/questions/19226816/how-can-i-view-the-source-code-for-a-function '''here''' (StackOverflow)].
 +
 +
{{Vspace}}
 +
 +
====Basic R====
 +
 +
If the function is a normal R function, like the ones we have defined above, you can read the function code when you type its name <b>without parantheses</b>:
 +
 +
<pre>
 +
goldenRatio
 +
 +
# function(x, smaller) {
 +
#  if (missing(smaller)) {
 +
#    smaller <- TRUE
 +
#  }
 +
#  phi <- (1 + sqrt(5)) / 2
 +
#  if (smaller == TRUE) {
 +
#    return(x / phi)
 +
#  } else {
 +
#    return(x * phi)
 +
#  }
 +
#}
 +
</pre>
 +
 +
But that strictly only works for functions which have been written in basic R code.
 +
 +
 +
====S3 methods====
 +
 +
You might also get a line saying <code>UseMethod(&lt;function name&gt;)</code>. Then you are looking at a "method" from R's S3 object oriented system - such a function is also called a "generic", because it dispatches to more specific code, depending on the type of the parameter it is being given. Use <code>methods()</code> to see which specific methods are defined, and then use <code>getAnywhere(&lt;function.class&gt;)</code> to get the code.
 +
 +
<pre>
 +
seq
 +
 +
# function (...)
 +
# UseMethod("seq")
 +
# <bytecode: 0x103f3f9c8>
 +
# <environment: namespace:base>
 +
 +
methods(seq)
 +
 +
# [1] seq.Date    seq.default seq.POSIXt
 +
# see '?methods' for accessing help and source code
 +
 +
getAnywhere(seq.default)
 +
 +
# Lots of code ...
 +
</pre>
 +
 +
{{Vspace}}
 +
 +
 +
====Primitives====
 +
 +
You might also get a line saying <code>.Call(C_&lt;function name&gt; &lt;arguments&gt;)</code>. Then you are looking at a primitive - a function that has been compiled in the C programming language, for efficiency.
 +
 +
<pre>
 +
runif
 +
 +
# function (n, min = 0, max = 1)
 +
# .Call(C_runif, n, min, max)
 +
# <bytecode: 0x103a5b098>
 +
# <environment: namespace:stats>
 +
 +
</pre>
 +
 +
To read the C source code, just do a Google search for the function name in the repository where the R sources are kept:
 +
 +
* [https://www.google.ca/search?q=site%3Ahttps%3A%2F%2Fsvn.r-project.org%2FR%2Ftrunk%2Fsrc+runif <code>site:https://svn.r-project.org/R/trunk/src runif</code>]
 +
: This search finds [https://svn.r-project.org/R/trunk/src/nmath/runif.c <code>runif.c</code>] (have a look).
  
 
{{Vspace}}
 
{{Vspace}}
Line 139: Line 271:
 
==Writing your own functions==
 
==Writing your own functions==
  
'''R''' is a "functional programming language" and most if not all serious work will involve writing your own functions. This is easy and gives you access to flexible, powerful and reusable solutions. You have to understand the "anatomy" of an '''R''' function however.
+
'''R''' is a "functional programming language" and working with R will involve writing your own functions. This is easy and gives you access to flexible, powerful and reusable solutions. You have to understand the "anatomy" of an '''R''' function however.
  
 
* Functions are assigned to function names. They are treated like any other '''R''' object and you can have vectors of functions, and functions that return functions etc.
 
* Functions are assigned to function names. They are treated like any other '''R''' object and you can have vectors of functions, and functions that return functions etc.
 
* Data gets '''into''' the function via the function's parameters.
 
* Data gets '''into''' the function via the function's parameters.
* Data is '''returned''' from a function via the {{c|return()}} statement<ref>Actually the return() statement is optional, if missing, the result of the last expression is returned. I consider it poor practice to omit return(), this gives rise to error-prone code.</ref>. One and only one object is returned. However the object can be a list, and thus contain values of arbitrary complexity. This is called the "value" of the function. Well-written functions have no side-effects like changing global variables.
+
* Data is '''returned''' from a function via the {{c|return()}} statement<ref>Actually the return() statement is optional, if missing, the result of the last expression is returned. You will find this frequently in other people's code, somthing to be aware of. However, you'll surely understand that it is really poor practice to omit <code>return()</return>, it makes the code harder to read and can give rise to misunderstandings. Never use implicit behaviour where you can be explicit instead.</ref>. One and only one object is returned. However the object can be a list, and thus contain values of arbitrary complexity. This is called the "value" of the function. Well-written functions have no side-effects like changing global variables.
 +
 
  
 +
<pre>
 +
# the function definition pattern:
  
<source lang="rsplus">
+
<myName> <- function(<myArguments>) {
#defining the function:
+
  # <documentation!>
myFunction <- function(<myParameters>) {
+
result <- <do something with the parameters>
result <- <do something with my parameters>
 
 
return(result)
 
return(result)
 
}
 
}
  
</source>
+
</pre>
  
 +
In this pattern, the function is assigned to the ''name'' - any valid name in '''R'''. Once it is assigned, it the function can be invoked with <code>myName()</code>. The parameter list (the values we write into the parentheses following the function name) can be empty, or hold a list of variable names. If variable names are present, you need to enter the corresponding parameters when you execute the function. These assigned variables are available inside the function, and can be used for computations. This is called "passing variables into the function".
  
  
Line 162: Line 297:
  
 
This exercise is similar to the while loop exercise. The only difference is to put the code into a function.
 
This exercise is similar to the while loop exercise. The only difference is to put the code into a function.
Write a function rocketShip(n) so that you can start the countdown call from any number.
+
Write a function <code>countDown()</code> so that you can start the countdown call from any number.
For example if the rocketShip countdown from 7, the output would be:
+
For example calling <code>countDown(5)</code> should give:
<source lang="rsplus">
+
<pre>
[1]  "5"          "4"          "3"          "2"          "1"          "0"          "Blast Off!"
+
[1]  "5"          "4"          "3"          "2"          "1"          "0"          "Lift Off!"
</source>
+
</pre>
  
Solution:
+
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
<source lang="rsplus">
+
Solution ... <small>No peeking!</small>
rocketShip <- function(n) {
+
<div class="mw-collapsible-content">
 +
 
 +
<pre>
 +
countDown <- function(n) {
 
   start <- n
 
   start <- n
  call <- c(start)
 
 
   countdown <- start
 
   countdown <- start
 +
  txt <- as.character(start)
 +
 
   while (countdown > 0) {
 
   while (countdown > 0) {
 
     countdown <- countdown - 1
 
     countdown <- countdown - 1
     call <- c(call, countdown)
+
     txt <- c(txt, countdown)
 
   }
 
   }
   call <- c(call, "Blast Off!")
+
   txt <- c(txt, "Lift Off!")
   return(call)
+
   return(txt)
 
}
 
}
  
rocketShip(7)
+
# Try it ...
</source>
+
countDown(7)
}}
+
</pre>
 
 
 
 
  
  
 +
</div>
 +
</div>
  
 +
}}
  
  
Line 195: Line 335:
  
 
The '''scope''' of functions is local: this means all variables within a function are lost upon return, and global variables are not overwritten by a definition within a function. However variables that are defined outside the function are also available inside.
 
The '''scope''' of functions is local: this means all variables within a function are lost upon return, and global variables are not overwritten by a definition within a function. However variables that are defined outside the function are also available inside.
 
Here is a simple example: a function that takes a binomial species name as input and creates a five-letter code as output:
 
 
<source lang="rsplus">
 
biCode <- function(s) {
 
substr(s, 4, 5) <- substr(strsplit(s,"\\s+")[[1]][2], 1, 2)
 
return (toupper(substr(s, 1, 5)))
 
}
 
 
biCode("Homo sapiens")              # HOMSA
 
biCode("saccharomyces cerevisiae")  # SACCE
 
</source>
 
  
 
We can use loops and control structures inside functions. For example the following creates a vector containing ''n'' Fibonacci numbers.
 
We can use loops and control structures inside functions. For example the following creates a vector containing ''n'' Fibonacci numbers.
  
<source lang="rsplus">
+
<pre>
 
fibSeq <- function(n) {
 
fibSeq <- function(n) {
 
   if (n < 1) { return( 0 ) }
 
   if (n < 1) { return( 0 ) }
Line 225: Line 353:
 
   }
 
   }
 
}
 
}
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
  
 
+
Here is another example to play with: a function that calculates how old you are. In days. This is neat - you can celebrate your 10,000 birth'''day''' - or so.
The function template looks like:
 
 
 
<source lang="rsplus">
 
<name> <- function (<parameters>) {
 
  <statements>
 
}
 
</source>
 
 
 
In this statement, the function is assigned to the ''name'' - any valid name in '''R'''. Once it is assigned, it the function can be invoked with <code>name()</code>. The parameter list (the values we write into the parentheses followin the function name) can be empty, or hold a list of variable names. If variable names are present, you need to enter the corresponding parameters when you execute the function. These assigned variables are available inside the function, and can be used for computations. This is called "passing the variable into the function".
 
 
 
You have encountered a function to choose YFO names. In this function, your Student ID was the parameter. Here is another example to play with: a function that calculates how old you are. In days. This is neat - you can celebrate your 10,000 birth'''day''' - or so.
 
  
 
{{task|1=
 
{{task|1=
Line 250: Line 367:
 
# A lifedays calculator function
 
# A lifedays calculator function
  
myLifeDays <- function(date = NULL) { # give "date" a default value so we can test whether it has been set
+
myLifeDays <- function(birthday) {
    if (is.null(date)) {
+
  if (missing(birthday)) {
        print ("Enter your birthday as a string in \"YYYY-MM-DD\" format.")
+
    print ("Enter your birthday as a string in \"YYYY-MM-DD\" format.")
        return()
+
    return()
    }
+
  }
    x <- strptime(date, "%Y-%m-%d") # convert string to time
+
  bd <- strptime(birthday, "%Y-%m-%d") # convert string to time
    y <- format(Sys.time(), "%Y-%m-%d") # convert "now" to time
+
  now <- format(Sys.time(), "%Y-%m-%d") # convert "now" to time
    diff <- round(as.numeric(difftime(y, x, unit="days")))
+
  diff <- round(as.numeric(difftime(now, bd, unit="days")))
    print(paste("This date was ", diff, " days ago."))
+
  print(sprintf("This date was %d days ago.", diff))
 
}
 
}
</source>
+
</pre>
  
 
;Use the function (example):
 
;Use the function (example):
 
<source lang = "rsplus">
 
<source lang = "rsplus">
 
   myLifeDays("1932-09-25")  # Glenn Gould's birthday
 
   myLifeDays("1932-09-25")  # Glenn Gould's birthday
</source>
+
</pre>
 
}}
 
}}
  
Here is a good opportunity to play and practice programming: modify this function to accept a second argument. When a second argument is present (e.g. 10000) the function should print the calendar date on which the input date will be that number of days ago. Then you could use it to know when to celebrate your 10,000<sup>th</sup> lifeDay, or your 777<sup>th</sup> anniversary day or whatever.
+
Here is a good opportunity to practice programming: modify this function to accept a second argument. When a second argument is present (e.g. 10000) the function should print the calendar date on which the input date will be the required number of days ago. Then you could use it to know when to celebrate your 10,000<sup>th</sup> life-day, or your 888<sup>th</sup> anniversary or whatever.
 
 
{{Vspace}}
 
 
 
 
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 280: Line 393:
 
<!-- {{#pmid: 19957275}} -->
 
<!-- {{#pmid: 19957275}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
+
<div class="reference-box">[https://stackoverflow.com/questions/19226816/how-can-i-view-the-source-code-for-a-function "How can I view the source code for a function?"] (On Stack Overflow)</div>
 
 
{{Vspace}}
 
 
 
 
 
 
== Notes ==
 
== Notes ==
<!-- included from "../components/RPR-Functions.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/RPR-Functions.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 341: Line 408:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-09-10
+
:2020-09-17
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:1.0
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 Maintenance
 +
*1.0.1 Maintenance
 
*1.0 Completed to first live version
 
*1.0 Completed to first live version
 
*0.1 Material collected from previous tutorial
 
*0.1 Material collected from previous tutorial
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 09:28, 25 September 2020

R Functions

(Anatomy of a function: arguments, parameters and values; the concept of functional programming.)


 


Abstract:

In this unit we discuss the "anatomy"" of R functions: arguments, parameters and values, and how R's treatment of functions supports "functional programming".


Objectives:
This unit will ...

  • ... introduce the basic pattern of R functions;
  • ... discuss arguments and parameters;
  • ... show how to retrieve the source code from within a function;
  • ... practice writing your own functions.

Outcomes:
After working through this unit you ...

  • ... know how to pass parameters into functions and assign the returned values;
  • ... can read, analyze, and write your own functions.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:


 



 



 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.

Contents

Functions

R is considered an (impure) functional programming language and thus the focus of R programs is on functions. The key advantage is that this encourages programming without side-effects and this makes it easier to write error free code and maintain it. Function parameters[1] are instantiated for use inside a function as the function's arguments, and a single result is returned[2]. The return values can either be assigned to a variable, or used directly as the argument of another function. This means functions can be nested, and intermediate assignment is not required.

Functions are either built-in (i.e. available in the basic R installation), loaded via specific packages, or they can be defined by you (see below). In general a function is invoked through its name, followed by zero or more arguments in parentheses, separated by commas. Whenever I refer to a function, I write the parentheses to identify it as such and not a constant or other keyword eg. log(). Here are some examples for you to try and play with:

cos(pi) #"pi" is a predefined constant.
sin(pi) # Note the rounding error. This number is not really different from zero.
sin(30 * pi/180) # Trigonometric functions use radians as their argument - this conversion calculates sin(30 degrees)
exp(1) # "e" is not predefined, but easy to calculate.
log(exp(1)) # functions can be arguments to functions - nested functions are evaluated from the inside out.
log(10000) / log(10) # log() calculates natural logarithms; convert to any base by dividing by the log of the base. Here: log to base 10.
exp(complex(r=0, i=pi)) # Euler's identity
utils::example("wilcox.test")    # example() is a function from the util:: package
                                 # and runs the code in the Examples sexction of
                                 # R-help pages

There are several ways to populate the argument list for a function and R makes a reasonable guess what you want to do. Arguments can either be used in their predefined order, or assigned via an argument name. Let's look at the complex() function to illustrate this. Consider the specification of a complex number in Euler's identity above. The function complex() can work with a number of arguments that are explained in the documentation (see: ?complex). Its signature includes length.out, real, imaginary, and some more.

complex(length.out = 0, real = numeric(), imaginary = numeric(), modulus = 1, argument = 0)

The length.out argument creates a vector with one or more complex numbers. If nothing else is specified, this will be a vector of complex zero(s). If there are two, or three arguments, they will be placed in the respective slots. However, since the arguments are named, we can also define which slot of the argument list they should populate.


Consider the following to illustrate this:

complex(1)    # parameter is in the first slot -> length.out
complex(4)
complex(1, 2) # imaginary part missing
complex(1, 2, 3) # one complex number with real and imaginary parts defined
complex(4, 2, 3) # four complex numbers
complex(real = 0, imaginary = pi) # defining values via named parameters
complex(imaginary = pi, real = 0) # same thing - if names are used, order is not important
complex(re = 0, im = pi) # names can be abbreviated ...
complex(r = 0, i = pi)   # ... to the shortest string that is unique among the named parameters,
                         # but this is _poor_ practice, strongly advises against.
complex(i = pi, 1, 0) # Think: what have I done here? Why does this work?
exp(complex(i = pi, 1, 0)) # (The complex number above is the same as in Euler's identity.)

Task:
A frequently used function is seq().

  • Read the help page about seq()
  • Use seq() to generate a sequence of integers from -5 to 3. Pass arguments in default order, don't use argument names.
  • Use seq() to generate a sequence of numbers from -2 to 2 in intervals of 1/3. This time, use argument names.
  • Use seq() to generate a sequence of 30 numbers between 1 and 100. Pass the arguments in the following order: length.out, to, from.


 

On missing parameters

If a parameter is missing several things can happen. Let's illustrate wih a little function that returns the golden-ratio pair to a number, either the smaller, or the larger one.

goldenRatio <- function(x, smaller) {
  phi <- (1 + sqrt(5)) / 2
  if (smaller == TRUE) {
    return(x / phi)
  } else {
    return(x * phi)
  }
}
  • If there's no way to recover, executing the function will throw an error:
goldenRatio(1)
# Error in goldenRatio(1) : argument "smaller" is missing, with no default
  • If the function has a default parameter defined, it is used :
goldenRatio <- function(x, smaller = TRUE) {
  phi <- (1 + sqrt(5)) / 2
  if (smaller == TRUE) {
    return(x / phi)
  } else {
    return(x * phi)
  }
}

goldenRatio(1)
# [1] 0.618034
  • Alternatively, the function body can check whether a parameter is missing with the missing() function, and then react accordingly:
goldenRatio <- function(x, smaller) {
  if (missing(smaller)) {
    smaller <- TRUE
  }
  phi <- (1 + sqrt(5)) / 2
  if (smaller == TRUE) {
    return(x / phi)
  } else {
    return(x * phi)
  }
}

goldenRatio(1)
# [1] 0.618034

goldenRatio(1, smaller = FALSE)
# [1] 1.618034

Why is this useful, if you could just define a default? Because the parameter can then be the result of a (complex) computation, based on other parameters, done in the function body. Whereas if you pass the argument into the function, you need to know the desired value ahead of time.


 

Reading functions

R is open-source; this means that you can find, and study the source code of all functions - IF you know where to find it. For many cases this is very easy. I cover the most frequent cases below; for a more detailed discussion, see here (StackOverflow).


 

Basic R

If the function is a normal R function, like the ones we have defined above, you can read the function code when you type its name without parantheses:

goldenRatio

# function(x, smaller) {
#  if (missing(smaller)) {
#    smaller <- TRUE
#  }
#  phi <- (1 + sqrt(5)) / 2
#  if (smaller == TRUE) {
#    return(x / phi)
#  } else {
#    return(x * phi)
#  }
#}

But that strictly only works for functions which have been written in basic R code.


S3 methods

You might also get a line saying UseMethod(<function name>). Then you are looking at a "method" from R's S3 object oriented system - such a function is also called a "generic", because it dispatches to more specific code, depending on the type of the parameter it is being given. Use methods() to see which specific methods are defined, and then use getAnywhere(<function.class>) to get the code.

seq

# function (...)
# UseMethod("seq")
# <bytecode: 0x103f3f9c8>
# <environment: namespace:base>

methods(seq)

# [1] seq.Date    seq.default seq.POSIXt
# see '?methods' for accessing help and source code

getAnywhere(seq.default)

# Lots of code ...


 


Primitives

You might also get a line saying .Call(C_<function name> <arguments>). Then you are looking at a primitive - a function that has been compiled in the C programming language, for efficiency.

runif

# function (n, min = 0, max = 1)
# .Call(C_runif, n, min, max)
# <bytecode: 0x103a5b098>
# <environment: namespace:stats>

To read the C source code, just do a Google search for the function name in the repository where the R sources are kept:

This search finds runif.c (have a look).


 

Writing your own functions

R is a "functional programming language" and working with R will involve writing your own functions. This is easy and gives you access to flexible, powerful and reusable solutions. You have to understand the "anatomy" of an R function however.

  • Functions are assigned to function names. They are treated like any other R object and you can have vectors of functions, and functions that return functions etc.
  • Data gets into the function via the function's parameters.
  • Data is returned from a function via the return() statement[3]. One and only one object is returned. However the object can be a list, and thus contain values of arbitrary complexity. This is called the "value" of the function. Well-written functions have no side-effects like changing global variables.


# the function definition pattern:

<myName> <- function(<myArguments>) {
  # <documentation!>
	result <- <do something with the parameters>
	return(result)
}

In this pattern, the function is assigned to the name - any valid name in R. Once it is assigned, it the function can be invoked with myName(). The parameter list (the values we write into the parentheses following the function name) can be empty, or hold a list of variable names. If variable names are present, you need to enter the corresponding parameters when you execute the function. These assigned variables are available inside the function, and can be used for computations. This is called "passing variables into the function".


Task:

Quick Exercise

This exercise is similar to the while loop exercise. The only difference is to put the code into a function. Write a function countDown() so that you can start the countdown call from any number. For example calling countDown(5) should give:

[1]  "5"          "4"          "3"          "2"          "1"          "0"          "Lift Off!"

Solution ... No peeking!

countDown <- function(n) {
  start <- n
  countdown <- start
  txt <- as.character(start)

  while (countdown > 0) {
    countdown <- countdown - 1
    txt <- c(txt, countdown)
  }
  txt <- c(txt, "Lift Off!")
  return(txt)
}

# Try it ...
countDown(7)



 

The scope of functions is local: this means all variables within a function are lost upon return, and global variables are not overwritten by a definition within a function. However variables that are defined outside the function are also available inside.

We can use loops and control structures inside functions. For example the following creates a vector containing n Fibonacci numbers.

fibSeq <- function(n) {
   if (n < 1) { return( 0 ) }
   else if (n == 1) { return( 1 ) }
   else if (n == 2) { return( c(1, 1) ) }
   else {
      v <- numeric(n)
      v[1] <- 1
      v[2] <- 1
      for ( i in 3:n ) {
         v[n] <- v[n-2] + v[n-1]
      }
      return( v )
   }
}


 

Here is another example to play with: a function that calculates how old you are. In days. This is neat - you can celebrate your 10,000 birthday - or so.

Task:
Copy, explore and run ...

Define the function ...

<source lang = "rsplus">

  1. A lifedays calculator function

myLifeDays <- function(birthday) {

 if (missing(birthday)) {
   print ("Enter your birthday as a string in \"YYYY-MM-DD\" format.")
   return()
 }
 bd <- strptime(birthday, "%Y-%m-%d") # convert string to time
 now <- format(Sys.time(), "%Y-%m-%d") # convert "now" to time
 diff <- round(as.numeric(difftime(now, bd, unit="days")))
 print(sprintf("This date was %d days ago.", diff))

}

Use the function (example)

<source lang = "rsplus">

  myLifeDays("1932-09-25")  # Glenn Gould's birthday

Here is a good opportunity to practice programming: modify this function to accept a second argument. When a second argument is present (e.g. 10000) the function should print the calendar date on which the input date will be the required number of days ago. Then you could use it to know when to celebrate your 10,000th life-day, or your 888th anniversary or whatever.


 


Further reading, links and resources

Notes

  1. The terms parameter and argument have similar but distinct meanings. A parameter is an item that appears in the function definition, an argument is the actual value that is passed into the function.
  2. However a function may have side-effects, such as writing something to console, plotting graphics, saving data to a file, or changing the value of variables outside the function scope. But changing values outside the scope is poor practice, and should always be avoided.
  3. Actually the return() statement is optional, if missing, the result of the last expression is returned. You will find this frequently in other people's code, somthing to be aware of. However, you'll surely understand that it is really poor practice to omit return()</return>, it makes the code harder to read and can give rise to misunderstandings. Never use implicit behaviour where you can be explicit instead.


 


About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-09-17

Version:

1.1

Version history:

  • 1.1 Maintenance
  • 1.0.1 Maintenance
  • 1.0 Completed to first live version
  • 0.1 Material collected from previous tutorial

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.