Expected Preparations:
|
|||||||
|
|||||||
Keywords: Anatomy of a function: arguments; parameters and values; the concept of functional programming. | |||||||
|
|||||||
Objectives:
This unit will …
|
Outcomes:
After working through this unit you …
|
||||||
|
|||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. |
|||||||
|
|||||||
Evaluation: NA: This unit is not evaluated for course marks. |
In this unit we discuss the “anatomy”“ of R functions: arguments, parameters and values, and how R’s treatment of functions supports”functional programming”.
R is considered an (impure) functional programming language(W) and thus the focus of R programs is on functions. The key advantage is that this encourages programming without side-effects and this makes it easier to write error free code and maintain it. Function parameters1 are instantiated for use inside a function as the function’s arguments, and a single result is returned2. The return values can either be assigned to a variable, or used directly as the argument of another function. This means functions can be nested, and intermediate assignment is not required.
Functions are either built-in (i.e. available in
the basic R installation), loaded via specific
packages, or they can be defined by you (see below). In general a
function is invoked through its name, followed by zero or more arguments
in parentheses, separated by commas. Whenever I refer to a function, I
write the parentheses to identify it as such and not a constant or other
keyword eg. log()
. Here are some examples for you to try
and play with:
cos(pi) #"pi" is a predefined constant.
sin(pi) # Note the rounding error. This number is not really different from zero.
sin(30 * pi/180) # Trigonometric functions use radians as their argument - this conversion calculates sin(30 degrees)
exp(1) # "e" is not predefined, but easy to calculate.
log(exp(1)) # functions can be arguments to functions - nested functions are evaluated from the inside out.
log(10000) / log(10) # log() calculates natural logarithms; convert to any base by dividing by the log of the base. Here: log to base 10.
exp(complex(r=0, i=pi)) # Euler's identity
utils::example("wilcox.test") # example() is a function from the util:: package
# and runs the code in the Examples sexction of
# R-help pages
There are several ways to populate the argument list for a function
and R makes a reasonable guess what you want to do.
Arguments can either be used in their predefined order, or assigned via
an argument name. Let’s look at the complex()
function to illustrate this. Consider the specification of a complex
number in Euler’s identity above. The function complex()
can work with a number of arguments that are explained in the
documentation (see: ?complex
). Its signature includes
length.out
, real
, imaginary
, and
some more.
complex(length.out = 0, real = numeric(), imaginary = numeric(), modulus = 1, argument = 0)
The length.out
argument creates a vector with one or
more complex numbers. If nothing else is specified, this will be a
vector of complex zero(s). If there are two, or three arguments, they
will be placed in the respective slots. However, since the arguments are
named, we can also define which slot of the argument
list they should populate.
Consider the following to illustrate this:
complex(1) # parameter is in the first slot -> length.out
complex(4)
complex(1, 2) # imaginary part missing
complex(1, 2, 3) # one complex number with real and imaginary parts defined
complex(4, 2, 3) # four complex numbers
complex(real = 0, imaginary = pi) # defining values via named parameters
complex(imaginary = pi, real = 0) # same thing - if names are used, order is not important
complex(re = 0, im = pi) # names can be abbreviated ...
complex(r = 0, i = pi) # ... to the shortest string that is unique among the named parameters,
# but this is _poor_ practice, strongly advises against.
complex(i = pi, 1, 0) # Think: what have I done here? Why does this work?
exp(complex(i = pi, 1, 0)) # (The complex number above is the same as in Euler's identity.)
Task…
A frequently used function is seq()
.
seq()
seq()
to generate a sequence of integers from -5 to
3. Pass arguments in default order, don’t use argument names.seq()
to generate a sequence of numbers from -2 to
2 in intervals of 1/3. This time, use argument names.seq()
to generate a sequence of 30 numbers between
1 and 100. Pass the arguments in the following order:
length.out
, to
, from
.
If a parameter is missing several things can happen. Let’s illustrate wih a little function that returns the golden-ratio pair to a number, either the smaller, or the larger one.
goldenRatio <- function(x, smaller) {
phi <- (1 + sqrt(5)) / 2
if (smaller == TRUE) {
return(x / phi)
} else {
return(x * phi)
}
}
goldenRatio(1)
# Error in goldenRatio(1) : argument "smaller" is missing, with no default
goldenRatio <- function(x, smaller = TRUE) {
phi <- (1 + sqrt(5)) / 2
if (smaller == TRUE) {
return(x / phi)
} else {
return(x * phi)
}
}
goldenRatio(1)
# [1] 0.618034
missing()
function, and then react
accordingly:goldenRatio <- function(x, smaller) {
if (missing(smaller)) {
smaller <- TRUE
}
phi <- (1 + sqrt(5)) / 2
if (smaller == TRUE) {
return(x / phi)
} else {
return(x * phi)
}
}
goldenRatio(1)
# [1] 0.618034
goldenRatio(1, smaller = FALSE)
# [1] 1.618034
Why is this useful, if you could just define a default? Because the parameter can then be the result of a (complex) computation, based on other parameters, done in the function body. Whereas if you pass the argument into the function, you need to know the desired value ahead of time.
R is open-source; this means that you can find, and study the source code of all functions - IF you know where to find it. For many cases this is very easy. I cover the most frequent cases below; for a more detailed discussion, see here (StackOverflow).
If the function is a normal R function, like the ones we have defined above, you can read the function code when you type its name without parentheses:
goldenRatio
# function(x, smaller) {
# if (missing(smaller)) {
# smaller <- TRUE
# }
# phi <- (1 + sqrt(5)) / 2
# if (smaller == TRUE) {
# return(x / phi)
# } else {
# return(x * phi)
# }
# }
But that strictly only works for functions which have been written in basic R code.
You might also get a line saying UseMethod(<function
name>)
. Then you are looking at a “method” from R’s S3 object
oriented system - such a function is also called a “generic”, because it
dispatches to more specific code, depending on the type of the parameter
it is being given. Use methods()
to see which specific
methods are defined, and then use
getAnywhere(<function.class>)
to get the code.
seq
# function (...)
# UseMethod("seq")
# <bytecode: 0x103f3f9c8>
# <environment: namespace:base>
methods(seq)
# [1] seq.Date seq.default seq.POSIXt
# see '?methods' for accessing help and source code
getAnywhere(seq.default)
# Lots of code ...
You might also get a line saying .Call(C_<function name>
<arguments>)
. Then you are looking at a primitive - a
function that has been compiled in the C programming language, for
efficiency.
runif
# function (n, min = 0, max = 1)
# .Call(C_runif, n, min, max)
# <bytecode: 0x103a5b098>
# <environment: namespace:stats>
To read the C source code, just do a Google search for the function name in the repository where the R sources are kept:
site:https://svn.r-project.org/R/trunk/src
runif
runif.c
(have a look).
R is a “functional programming language” and working with R will involve writing your own functions. This is easy and gives you access to flexible, powerful and reusable solutions. You have to understand the “anatomy” of an R function however.
return()
statement3. One and only one object is returned.
However the object can be a list, and thus contain values of arbitrary
complexity. This is called the “value” of the function. Well-written
functions have no side-effects like changing global variables.# the function definition pattern:
<myName> <- function(<myArguments>) {
# <documentation!>
result <- <do something with the parameters>
return(result)
}
In this pattern, the function is assigned to the name - any
valid name in R. Once it is assigned, it the function
can be invoked with myName()
. The parameter list (the
values we write into the parentheses following the function name) can be
empty, or hold a list of variable names. If variable names are present,
you need to enter the corresponding parameters when you execute the
function. These assigned variables are available inside the function,
and can be used for computations. This is called “passing variables into
the function”.
Task…
This exercise is similar to the while loop exercise. The only
difference is to put the code into a function. Write a function
countDown()
so that you can start the countdown call from
any number. For example calling countDown(5)
should
give:
[1] "5" "4" "3" "2" "1" "0" "Lift Off!"
The scope of functions is local: this means all variables within a function are lost upon return, and global variables are not overwritten by a definition within a function. However variables that are defined outside the function are also available inside.
We can use loops and control structures inside functions. For example the following creates a vector containing n Fibonacci numbers.
fibSeq <- function(n) {
if (n < 1) { return( 0 ) }
else if (n == 1) { return( 1 ) }
else if (n == 2) { return( c(1, 1) ) }
else {
v <- numeric(n)
v[1] <- 1
v[2] <- 1
for ( i in 3:n ) {
v[n] <- v[n-2] + v[n-1]
}
return( v )
}
}
Here is another example to play with: a function that calculates how old you are. In days. This is neat - you can celebrate your 10,000 birthday - or so.
Task…
Copy, explore and run …
Define the function …
# A lifedays calculator function
myLifeDays <- function(birthday) {
if (missing(birthday)) {
print ("Enter your birthday as a string in \"YYYY-MM-DD\" format.")
return()
}
bd <- strptime(birthday, "%Y-%m-%d") # convert string to time
now <- format(Sys.time(), "%Y-%m-%d") # convert "now" to time
diff <- round(as.numeric(difftime(now, bd, unit="days")))
print(sprintf("This date was %d days ago.", diff))
}
Use the function (example):
myLifeDays("1932-09-25") # Glenn Gould's birthday
Here is a good opportunity to practice programming: modify this function to accept a second argument. When a second argument is present (e.g. 10000) the function should print the calendar date on which the input date will be the required number of days ago. Then you could use it to know when to celebrate your 10,000th life-day, or your 888th anniversary or whatever.
“How can I view the source code for a function?” (On Stack Overflow)
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]
The terms parameter and argument have similar but distinct meanings. A parameter is an item that appears in the function definition, an argument is the actual value that is passed into the function.↩︎
However a function may have side-effects, such as writing something to console, plotting graphics, saving data to a file, or changing the value of variables outside the function scope. But changing values outside the scope is poor practice, and should always be avoided.↩︎
Actually the return() statement is optional, if missing,
the result of the last expression is returned. You will find this
frequently in other people’s code, somthing to be aware of. However,
you’ll surely understand that it is really poor practice to omit
return()
, it makes the code harder to read and can give
rise to misunderstandings. Never use implicit behaviour where you can be
explicit instead.↩︎