Difference between revisions of "RPR-OBJECTS-Vectors"

From "A B C"
Jump to navigation Jump to search
m
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
R scalars and vectors
 
R scalars and vectors
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Types of R objects: scalars, vectors and matrices)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Types of R objects: scalars, vectors and matrices
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
 
 
{{Vspace}}
 
  
  
{{LIVE}}
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 
+
<div style="font-size:118%;">
{{Vspace}}
+
<b>Abstract:</b><br />
 
 
 
 
</div>
 
<div id="ABC-unit-framework">
 
== Abstract ==
 
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "abstract" -->
 
 
Introduction to vector objects in R: what are they, how are they created, how can they be subset?
 
Introduction to vector objects in R: what are they, how are they created, how can they be subset?
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================ -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[RPR-Syntax_basics]]
 
 
 
{{Vspace}}
 
 
 
 
 
=== Objectives ===
 
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "objectives" -->
 
 
This unit will ...
 
This unit will ...
 
* ... introduce scalars, vectors and matrices;
 
* ... introduce scalars, vectors and matrices;
 
* ... demonstrate vectorized operations;
 
* ... demonstrate vectorized operations;
 
* ... teach various ways of subsetting;
 
* ... teach various ways of subsetting;
 
+
</td>
{{Vspace}}
+
<td style="padding:10px;">
 
+
<b>Outcomes:</b><br />
 
 
=== Outcomes ===
 
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "outcomes" -->
 
 
After working through this unit you ...
 
After working through this unit you ...
 
* ... can create vectors by assignment from sequences or using the <code>c()</code>;
 
* ... can create vectors by assignment from sequences or using the <code>c()</code>;
Line 62: Line 35:
 
* ... can subset elements, ranges, and slices from vectors and matrices;
 
* ... can subset elements, ranges, and slices from vectors and matrices;
 
* ... can combine objects with <code>c()</code>, <code>rbind()</code>, or <code>cbind()</code>.
 
* ... can combine objects with <code>c()</code>, <code>rbind()</code>, or <code>cbind()</code>.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[RPR-Syntax_basics|RPR-Syntax_basics (Basics of R syntax)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Deliverables ===
 
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Evaluation ===
+
__TOC__
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
<b>Evaluation: NA</b><br />
 
:This unit is not evaluated for course marks.
 
  
 
{{Vspace}}
 
{{Vspace}}
  
  
</div>
+
=== Evaluation ===
<div id="BIO">
+
<b>Evaluation: NA</b><br />
 +
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
== Contents ==
 
== Contents ==
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "contents" -->
 
  
 
'''R''' objects can be composed of different kinds of data according to the type and number of "atomic" values they contain:
 
'''R''' objects can be composed of different kinds of data according to the type and number of "atomic" values they contain:
Line 96: Line 77:
 
* Vectors are ordered sequences of scalars, they must all have the same "data type" (e.g. numeric, logical, character ...);
 
* Vectors are ordered sequences of scalars, they must all have the same "data type" (e.g. numeric, logical, character ...);
 
* Matrices are vectors for which one or more "dimension(s)" have been defined;
 
* Matrices are vectors for which one or more "dimension(s)" have been defined;
* [[RPR-Objects-Data_frames|"Data frames""]] are spreadsheet-like objects, their columns are like vectors and all columns must have the same length, but within one data frame, columns can have different data types;
+
* [[RPR-Objects-Data_frames|"Data frames""]] are spreadsheet-like objects, their columns are like vectors and all columns must have the same length, but within one data frame, columns can have different data types. They are the most commonly used type of object to hold data;
*  [[RPR-Objects-Lists|Lists]] are the most general collection of data items, the can contain items of any type and kind, including matrices, functions, data frames, and lists.
+
*  [[RPR-Objects-Lists|Lists]] are the most general collection of data items; lists can contain items of any type and kind, including vectors, matrices, functions, data frames, and other lists.
  
 
{{Vspace}}
 
{{Vspace}}
Line 104: Line 85:
  
  
''Scalars'' are single numbers, the "atomic" parts of more complex datatypes. Of course we can work with single numbers in '''R''', but under the hood they are actually vectors of length 1. (More on vectors in the next section). To create a scalar object, simply assign some value to its name.
+
''Scalars'' are single numbers, the "atomic" parts of more complex datatypes. Under the hood, scalars are actually vectors of length 1. (More on vectors in the next section). To create a scalar object, simply assign some value to its name.
  
<source lang="rsplus">
+
<pre>
 
x <- pi      # define a scalar by assignment
 
x <- pi      # define a scalar by assignment
 
x            # its value is ...
 
x            # its value is ...
Line 112: Line 93:
 
x[1]          # it is actually a vector, and its first element is ...
 
x[1]          # it is actually a vector, and its first element is ...
 
x[2]          # a second element does not exist NA: Not Available
 
x[2]          # a second element does not exist NA: Not Available
</source>
+
</pre>
  
  
 
Here are some remarks on the types of scalars '''R''' uses, and on ''coercion'' between types, i.e. casting one datatype into another. The following scalar types are supported:
 
Here are some remarks on the types of scalars '''R''' uses, and on ''coercion'' between types, i.e. casting one datatype into another. The following scalar types are supported:
  
* Boolean constants: <code>TRUE</code> and <code>FALSE</code>. This type has the "mode" ''logical";
+
* Boolean constants: <code>TRUE</code> and <code>FALSE</code>. This type has the "mode" ''logical'';
 
* Integers, floats (floating point numbers) and complex numbers.  These types have the mode ''numeric'';
 
* Integers, floats (floating point numbers) and complex numbers.  These types have the mode ''numeric'';
 
* Strings. These have the mode ''character''.
 
* Strings. These have the mode ''character''.
Line 123: Line 104:
 
Other modes exist, such as <code>list</code>, <code>function</code> and <code>expression</code>, all of which can be combined into complex objects.
 
Other modes exist, such as <code>list</code>, <code>function</code> and <code>expression</code>, all of which can be combined into complex objects.
  
The function <code>mode()</code> returns the mode of an object and <code>typeof()</code> returns its type. Also {{c|class()}} tells you what class it belongs to.
+
The function <code>mode()</code> returns the mode of an object and <code>typeof()</code> returns its type. Also <code>class()</code> tells you what class it belongs to.
  
<source lang="rsplus">
+
<pre>
  
 
typeof(TRUE)
 
typeof(TRUE)
Line 131: Line 112:
 
mode(print)
 
mode(print)
  
</source>
+
</pre>
  
 
I have combined these information functions into a single function, <code>objectInfo()</code> which gets loaded and defined when you execute the <code>init()</code> function of the <code>BasicSetup</code> project, so you can explore objects in more detail. We can use {{c|objectInfo()}} to explore how R objects are made up, by handing various expressions as arguments to the function. Many of these you may not yet recognize ... bear with it though:
 
I have combined these information functions into a single function, <code>objectInfo()</code> which gets loaded and defined when you execute the <code>init()</code> function of the <code>BasicSetup</code> project, so you can explore objects in more detail. We can use {{c|objectInfo()}} to explore how R objects are made up, by handing various expressions as arguments to the function. Many of these you may not yet recognize ... bear with it though:
Line 142: Line 123:
  
  
<source lang="rsplus">
+
<pre>
  
  
Line 181: Line 162:
 
objectInfo( NaN )                # "Not a Number" is numeric
 
objectInfo( NaN )                # "Not a Number" is numeric
 
objectInfo( NA )                # "Not Available" - i.e. missing value is
 
objectInfo( NA )                # "Not Available" - i.e. missing value is
                                 # logical
+
                                 # logical by default
  
 
# NULL
 
# NULL
Line 191: Line 172:
 
objectInfo( as.factor("M") )    # factor
 
objectInfo( as.factor("M") )    # factor
 
objectInfo( Sys.time() )        # time
 
objectInfo( Sys.time() )        # time
objectInfo( letters )            # inbuilt
+
objectInfo( letters )            # inbuilt character vector
 +
objectInfo( LETTERS )            # same
 
objectInfo( 1:4 )                # numeric vector
 
objectInfo( 1:4 )                # numeric vector
 
objectInfo( matrix(1:4, nrow=2)) # numeric matrix
 
objectInfo( matrix(1:4, nrow=2)) # numeric matrix
Line 197: Line 179:
 
                       roman = c("I", "II", "III"),
 
                       roman = c("I", "II", "III"),
 
                       stringsAsFactors = FALSE))
 
                       stringsAsFactors = FALSE))
objectInfo( list(arabic = 1:7, roman = c("I", "II", "III")))  # list
+
objectInfo( list(arabic = 1:7,
 +
                roman = c("I", "II", "III"),
 +
                chinese = c("一", "二", "三", "四")))  # list
  
 
# Expressions:
 
# Expressions:
objectInfo( 3 > 5 ) # Note: any combination of variables via the logical
+
objectInfo( 3 > 5 ) # Note: any combination of two variables via the logical
 
                     # operators ! == != > < >= <= | || & and && is a
 
                     # operators ! == != > < >= <= | || & and && is a
                     # logical expression, with values TRUE or FALSE.
+
                     # logical expression, and evaluates to TRUE or FALSE.
 
objectInfo( 3 < 5 )
 
objectInfo( 3 < 5 )
objectInfo( 1:6 > 4 )
+
objectInfo( 1:6 > 4 ) # these are "vectorized" operators
  
 
objectInfo( a ~ b )              # a formula
 
objectInfo( a ~ b )              # a formula
 
objectInfo( objectInfo )        # the function itself
 
objectInfo( objectInfo )        # the function itself
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
Line 215: Line 199:
  
  
<source lang="rsplus">
+
<pre>
 
a <- 7
 
a <- 7
 
b <- 6:7
 
b <- 6:7
Line 230: Line 214:
 
identical(b[2], c) # TRUE
 
identical(b[2], c) # TRUE
  
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
Line 236: Line 220:
 
===Vectors===
 
===Vectors===
  
Since we (almost) never do statistics on scalars, '''R''' obviously needs ways to handle collections of data items. In its simplest form such a collection is a '''vector''': an ordered list of items of the same type. Vectors are created from scratch with the <code>c()</code> function which '''c'''oncatenates individual items into a list, or with various sequencing functions. Vectors have properties, such as length; individual items in vectors can be combined in useful ways. All elements of a vector must be of the same type. If they are not, they are coerced silently to the most general type (which is often {{c|character}}). (The actual hierarchy for coercion is raw < logical < integer < double < complex < character < list ).
+
Since we (almost) never do statistics on scalars, '''R''' obviously needs ways to handle collections of data items. In its simplest form such a collection is a '''vector''': an ordered list of items '''of the same type'''. Vectors are created from scratch with the <code>c()</code> function which '''c'''oncatenates individual items into a vector, or with various sequencing functions. Vectors have properties, such as length; individual items in vectors can be combined in useful ways. It's worth repeating: all elements of a vector must be of the same type. If they are not, they are silently(!) coerced to the most general type (which is often {{c|character}}). (The actual hierarchy for coercion is raw < logical < integer < double < complex < character < list ).
  
<source lang="rsplus">
+
<pre>
  
 
# The c() function concatenates elements into a vector
 
# The c() function concatenates elements into a vector
Line 257: Line 241:
 
# ... returns the result of the assignment. I will use this idiom often.
 
# ... returns the result of the assignment. I will use this idiom often.
  
( f <- c(1, 1, 3, 5, 8, 13, 21, 34, 55, 89) )
+
( myVec <- c(1, 1, 3, 5, 8, 13, 21, 34, 55, 89) )
  
  
 
# Coercion:
 
# Coercion:
# all elements of vectors must be of the same mode
+
# all elements of vectors must be of the same type
c(1, 2.0, "3", TRUE)  # trying to get a vector with mixed modes ...
+
c(1, 2.0, "3", TRUE)  # trying to get a vector with mixed types ...
 
[1] "1"    "2"    "3"    "TRUE"
 
[1] "1"    "2"    "3"    "TRUE"
  
 
# ... shows that all elements are silently being coerced
 
# ... shows that all elements are silently being coerced
# to character mode. The emphasis is on _silently_. This might
+
# to character. The emphasis is on _silently_. This might
 
# be unexpected, for example if you are reading numeric data
 
# be unexpected, for example if you are reading numeric data
 
# from a text-file into a vector but someone has entered a " " for a missing
 
# from a text-file into a vector but someone has entered a " " for a missing
 
# value ... then everything is characterified. Nasty.
 
# value ... then everything is characterified. Nasty.
</source>
+
</pre>
  
  
Line 276: Line 260:
  
 
===Subsetting by index===
 
===Subsetting by index===
<source lang="rsplus">
+
<pre>
 
# Extracting by index ...
 
# Extracting by index ...
f[1]        # "1" is first element, not 0.
+
myVec[1]        # In R, the first element of a vector has index 1! Not 0.
head(f, 1)  # same thing
+
head(myVec, 1)  # same thing
  
f[length(f)] # length() is the index of the last element.
+
myVec[length(myVec)] # length() is the index of the last element.
tail(f, 1)   # same thing
+
tail(myVec, 1)       # same thing
  
  
 
# With a vector of indices ...
 
# With a vector of indices ...
 
1:4 # This is the range operator
 
1:4 # This is the range operator
f[1:4] # using the range operator (it generates a sequence and returns it in a vector)
+
myVec[1:4] # using the range operator (it generates a sequence and returns it in a vector)
f[4:1] # same thing, backwards
+
myVec[4:1] # same thing, backwards
 
seq(from=2, to=6, by=2) # The seq() function is a flexible, generic way to generate sequences
 
seq(from=2, to=6, by=2) # The seq() function is a flexible, generic way to generate sequences
 
seq(2, 6, 2) # Same thing: arguments in default order
 
seq(2, 6, 2) # Same thing: arguments in default order
f[seq(2, 6, 2)]
+
myVec[seq(2, 6, 2)]
  
 
# since a scalar is a vector of length 1, does this work?
 
# since a scalar is a vector of length 1, does this work?
Line 301: Line 285:
 
                   # valid indices of the target vector.
 
                   # valid indices of the target vector.
 
                   # The index vector can be of any length.
 
                   # The index vector can be of any length.
f[a] # In this case, four elements are retrieved from f[]
+
myVec[a] # In this case, four elements are retrieved from myVec[]
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
  
 
====Excluding items through negative indexes====
 
====Excluding items through negative indexes====
<source lang="rsplus">
+
<pre>
 
# Negative indices omit elements ...
 
# Negative indices omit elements ...
 
# ...using an index vector with negative indices
 
# ...using an index vector with negative indices
Line 315: Line 299:
 
( a <- -(1:4) ) # Note that this is NOT the same as -1:4
 
( a <- -(1:4) ) # Note that this is NOT the same as -1:4
  
f[a] # Here, the first four elements are omitted from f[]
+
myVec[a] # Here, the first four elements are omitted from myVec[]
  
f[-((length(f)-3):length(f))] # Here, the last four elements are omitted
+
myVec[-((length(myVec)-3):length(myVec))] # Here, the last four elements are omitted
  
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
  
 
===Subsetting by boolean vectors===
 
===Subsetting by boolean vectors===
<source lang="rsplus">
+
<pre>
f > 4 # A logical expression operating on the target vector
+
myVec > 4         # A logical expression operating on the target vector
      # returns a vector of logical elements. It has the
+
                  # returns a vector of logical elements. It has the
      # same length as the target vector.
+
                  # same length as the target vector.
f[f > 4]; # We can use this logical vector to extract only
+
myVec[myVec > 4]; # We can use this logical vector to extract only
          # elements for which the logical expression evaluates as TRUE.
+
                  # elements for which the logical expression evaluates as TRUE.
          # This is also called "filtering".
+
                  # This is also called "filtering".
 
# Note: the logical vector is aligned with the elements of the original
 
# Note: the logical vector is aligned with the elements of the original
 
# vector. You can't retrieve elements more than once, as you could
 
# vector. You can't retrieve elements more than once, as you could
Line 339: Line 323:
  
  
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
  
 
===Subsetting by name===
 
<source lang="rsplus">
 
 
# If the vector has named elements, vectors of names can be used exactly like
 
# index vectors:
 
 
summary(f)["Median"]
 
summary(f)[c("Max", "Min")]  # Oooops - I mistyped. But you can fix the expression, right?
 
 
 
</source>
 
 
{{Vspace}}
 
  
 
==="[" is an operator===
 
==="[" is an operator===
<source lang="rsplus">
+
<pre>
  
 
# Some more thoughts about "["
 
# Some more thoughts about "["
Line 371: Line 341:
 
# For example, the summary() function returns some basic statistics of a vector:
 
# For example, the summary() function returns some basic statistics of a vector:
  
summary(f)
+
summary(myVec)
 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 
   1.00    5.00  21.00  75.69  89.00  377.00
 
   1.00    5.00  21.00  75.69  89.00  377.00
  
 
# This is a vector of six numbers:
 
# This is a vector of six numbers:
length(summary(f))
+
length(summary(myVec))
  
 
# We can extract e.g. the median like so:
 
# We can extract e.g. the median like so:
summary(f)[3]
+
summary(myVec)[3]
  
 
# ... or the boundaries of the interquartile range:
 
# ... or the boundaries of the interquartile range:
summary(f)[c(2, 5)]
+
summary(myVec)[c(2, 5)]
  
 
# Note that the elements that summary() returns are "named".
 
# Note that the elements that summary() returns are "named".
Line 389: Line 359:
  
 
# The names() function can retrieve (or set) names:
 
# The names() function can retrieve (or set) names:
names(summary(f))
+
names(summary(myVec))
  
 
# ... which brings us to yet another way to extract elements from Vectors:
 
# ... which brings us to yet another way to extract elements from Vectors:
 
# subsetting by name:
 
# subsetting by name:
 +
summary(myVec)["Median"]
 +
 +
</pre>
 +
 +
{{Vspace}}
 +
 +
===Subsetting by name===
 +
<pre>
 +
 +
# If the vector has named elements, vectors of names can be used exactly like
 +
# index vectors:
 +
 +
summary(myVec)
 +
summary(myVec)["Median"]
 +
summary(myVec)[c("Max", "Min")]  # Oooops - I mistyped. But you can fix the expression, right?
  
</source>
+
 
 +
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
  
 
===Extending vectors===
 
===Extending vectors===
<source lang="rsplus">
+
<pre>
  
 
# Vectors are not immutable. They can grow and shrink as required.
 
# Vectors are not immutable. They can grow and shrink as required.
Line 413: Line 399:
 
# Example: extending the Fibonacci series for three steps.
 
# Example: extending the Fibonacci series for three steps.
 
# Think: How does this work? What numbers are we adding here and why does the result end up in the vector?
 
# Think: How does this work? What numbers are we adding here and why does the result end up in the vector?
( f <- c(f, f[length(f)-1] + f[length(f)]) )
+
( myVec <- c(myVec, myVec[length(myVec)-1] + myVec[length(myVec)]) )
( f <- c(f, f[length(f)-1] + f[length(f)]) )
+
( myVec <- c(myVec, myVec[length(myVec)-1] + myVec[length(myVec)]) )
( f <- c(f, f[length(f)-1] + f[length(f)]) )
+
( myVec <- c(myVec, myVec[length(myVec)-1] + myVec[length(myVec)]) )
  
  
</source>
+
</pre>
  
 
{{Vspace}}
 
{{Vspace}}
  
 
===Vectorized operations===
 
===Vectorized operations===
<source lang="rsplus">
+
<pre>
  
 
Many operations on vectors are by default performed for every element of the vector, and '''R''' computes them '''very''' efficiently. These are called "vectorized" operations and definitely should be used whenever possible, rather than loops or other explicit iterations.
 
Many operations on vectors are by default performed for every element of the vector, and '''R''' computes them '''very''' efficiently. These are called "vectorized" operations and definitely should be used whenever possible, rather than loops or other explicit iterations.
  
<source lang="rsplus">
+
<pre>
f
+
myVec
f + 1
+
myVec + 1
f * 2
+
myVec * 2
 +
log(myVec)
  
 
# computing with two vectors of same length
 
# computing with two vectors of same length
f # the Fibonacci numbers you have defined above
+
myVec # the Fibonacci numbers you have defined above
( a <- f[-1] )  # like f, but omitting the first element
+
( a <- myVec[-1] )  # like myVec, but omitting the first element
( b <- f[1:(length(f)-1)] ) # like f, but shortened by the least element
+
( b <- myVec[1:(length(myVec)-1)] ) # like f, but shortened by the last element
 
c <- a / b # the "golden ratio", phi (~1.61803 or (1+sqrt(5))/2 ),
 
c <- a / b # the "golden ratio", phi (~1.61803 or (1+sqrt(5))/2 ),
 
           # an irrational number, is approximated by the ratio of
 
           # an irrational number, is approximated by the ratio of
Line 441: Line 428:
 
c
 
c
 
abs(c - ((1+sqrt(5))/2)) # Calculating the error of the approximation, element by element
 
abs(c - ((1+sqrt(5))/2)) # Calculating the error of the approximation, element by element
</source>
+
</pre>
  
  
Line 454: Line 441:
  
 
:Consider:
 
:Consider:
<source lang="rsplus">
+
<pre>
 
x <- 8; sample(6:x)
 
x <- 8; sample(6:x)
 
x <- 7; sample(6:x)
 
x <- 7; sample(6:x)
Line 463: Line 450:
 
x <- 6:7; seq(x)
 
x <- 6:7; seq(x)
 
x <- 6:6; seq(x)    # Oi vay!
 
x <- 6:6; seq(x)    # Oi vay!
</source>
+
</pre>
  
 
:Wherever this misbehaviour is a possibility - i.e. when the number of elements to sample from is variable and could be just one, for example in some simulation code - you can write a replacement function like so...
 
:Wherever this misbehaviour is a possibility - i.e. when the number of elements to sample from is variable and could be just one, for example in some simulation code - you can write a replacement function like so...
  
<source lang="rsplus">
+
<pre>
 
safeSample <- function(x, size, ...) {
 
safeSample <- function(x, size, ...) {
 
# Replace the sample() function to ensure sampling from a single
 
# Replace the sample() function to ensure sampling from a single
Line 481: Line 468:
  
  
</source>
+
</pre>
  
 
:Don't be discouraged though: such warts are rare in '''R'''.
 
:Don't be discouraged though: such warts are rare in '''R'''.
Line 496: Line 483:
  
 
The most basic way to define matrix rows and columns is to use the <code>dim()</code> function and specify the size of each dimension. Consider:
 
The most basic way to define matrix rows and columns is to use the <code>dim()</code> function and specify the size of each dimension. Consider:
<source lang="rsplus">
+
<pre>
 
( a <- 1:12 )
 
( a <- 1:12 )
 
dim(a) <- c(2,6)
 
dim(a) <- c(2,6)
Line 504: Line 491:
 
dim(a) <- c(2,2,3)
 
dim(a) <- c(2,2,3)
 
a
 
a
</source>
+
</pre>
  
 
<code>dim()</code> also tells you the number of rows resp. columns a matrix has. For example:
 
<code>dim()</code> also tells you the number of rows resp. columns a matrix has. For example:
<source lang="rsplus">
+
<pre>
 
dim(a)    # returns the dimensions of a in a vector
 
dim(a)    # returns the dimensions of a in a vector
 
dim(a)[3]  # only the size of the third dimension of a
 
dim(a)[3]  # only the size of the third dimension of a
</source>
+
</pre>
  
 
If you have a two-dimensional matrix, the function <code>nrow()</code> and <code>ncol()</code> will also give you the number of rows and columns, respectively. Obviously, <code>dim(a)[1]</code> is the same as <code>nrow(a)</code>.
 
If you have a two-dimensional matrix, the function <code>nrow()</code> and <code>ncol()</code> will also give you the number of rows and columns, respectively. Obviously, <code>dim(a)[1]</code> is the same as <code>nrow(a)</code>.
Line 516: Line 503:
 
As an alternative to <code>dim()</code>, matrices can be defined using the <code>matrix()</code> or <code>array()</code> functions (see there), or "glued" together from vectors by rows or columns, using the  <code>rbind()</code> or <code>cbind()</code> functions respectively:
 
As an alternative to <code>dim()</code>, matrices can be defined using the <code>matrix()</code> or <code>array()</code> functions (see there), or "glued" together from vectors by rows or columns, using the  <code>rbind()</code> or <code>cbind()</code> functions respectively:
  
<source lang="rsplus">
+
<pre>
 
( a  <- 1:4 )
 
( a  <- 1:4 )
 
( b  <- 5:8 )
 
( b  <- 5:8 )
Line 523: Line 510:
 
( m  <- cbind(m2, c = 9:12) )  # naming a column "c" while cbind()'ing it
 
( m  <- cbind(m2, c = 9:12) )  # naming a column "c" while cbind()'ing it
  
</source>
+
</pre>
  
 
"Subsetting" (retrieving) individual elements or slices from matrices is simply done by specifying the appropriate indices, where a missing index indicates that the entire row or column is to be retrieved.
 
"Subsetting" (retrieving) individual elements or slices from matrices is simply done by specifying the appropriate indices, where a missing index indicates that the entire row or column is to be retrieved.
Line 529: Line 516:
 
Explore how you extract rows or columns from a matrix by specifying them. Within the square brackets the order is '''<code>[&lt;rows&gt;, &lt;columns&lt;]</code>'''
 
Explore how you extract rows or columns from a matrix by specifying them. Within the square brackets the order is '''<code>[&lt;rows&gt;, &lt;columns&lt;]</code>'''
  
<source lang="rsplus">
+
<pre>
 
m[1,] # first row
 
m[1,] # first row
 
m[, 2] # second column
 
m[, 2] # second column
 
m[3, 2] # element at row == 3, column == 2
 
m[3, 2] # element at row == 3, column == 2
 
m[3:4, 1:2] # submatrix: rows 3 to 4 and columns 1 to 2
 
m[3:4, 1:2] # submatrix: rows 3 to 4 and columns 1 to 2
</source>
+
</pre>
  
 
Note that '''R''' has numerous functions to compute with matrices, such as transposition, multiplication, inversion, calculating eigenvalues and eigenvectors and much more.
 
Note that '''R''' has numerous functions to compute with matrices, such as transposition, multiplication, inversion, calculating eigenvalues and eigenvectors and much more.
Line 540: Line 527:
 
{{Vspace}}
 
{{Vspace}}
  
 
{{Vspace}}
 
 
 
== Further reading, links and resources ==
 
<!-- {{#pmid: 19957275}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
 
{{Vspace}}
 
 
 
== Notes ==
 
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
<references />
 
 
{{Vspace}}
 
 
 
</div>
 
<div id="ABC-unit-framework">
 
 
== Self-evaluation ==
 
== Self-evaluation ==
<!-- included from "../components/RPR-Objects-Vectors.components.wtxt", section: "self-evaluation" -->
 
 
<!--
 
<!--
 
=== Question 1===
 
=== Question 1===
Line 582: Line 546:
 
=== Question 1===
 
=== Question 1===
  
Within the square brackets of a matrix the order is '''<code>[&lt;rows&gt;, &lt;columns&lt;]</code>''', but what about slices of a 3D matrix? Is it '''<code>[&lt;slices&gt;, &lt;rows&gt;, &lt;columns&lt;]</code>''' or '''<code>[&lt;rows&gt;, &lt;columns&lt;, &lt;slices&gt;]</code>'''?
+
Within the square brackets of a matrix the order is '''<code>[&lt;rows&gt;, &lt;columns&gt;]</code>''', but what about slices of a 3D matrix? Is it '''<code>[&lt;slices&gt;, &lt;rows&gt;, &lt;columns&gt;]</code>''' or '''<code>[&lt;rows&gt;, &lt;columns&gt;, &lt;slices&gt;]</code>'''?
  
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
Line 588: Line 552:
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
Just try:
 
Just try:
<source lang="rsplus">
+
<pre>
 
x <- 1:27
 
x <- 1:27
 
dim(x) <- c(3, 3, 3)
 
dim(x) <- c(3, 3, 3)
Line 595: Line 559:
 
x[1, 1, 2]  # 10
 
x[1, 1, 2]  # 10
  
</source>
+
</pre>
  
 
... and this means <code>[&lt;rows&gt;, &lt;columns&lt;, &lt;slices&gt;]</code> is correct.
 
... and this means <code>[&lt;rows&gt;, &lt;columns&lt;, &lt;slices&gt;]</code> is correct.
Line 606: Line 570:
 
{{Vspace}}
 
{{Vspace}}
  
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 631: Line 579:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-09-10
+
:2020-09-18
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:1.0
+
:1.0.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.0.1 Maintenance
 
*1.0 Completed to first live version
 
*1.0 Completed to first live version
 
*0.1 Material collected from previous tutorial
 
*0.1 Material collected from previous tutorial
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 01:06, 6 September 2021

R scalars and vectors

(Types of R objects: scalars, vectors and matrices)


 


Abstract:

Introduction to vector objects in R: what are they, how are they created, how can they be subset?


Objectives:
This unit will ...

  • ... introduce scalars, vectors and matrices;
  • ... demonstrate vectorized operations;
  • ... teach various ways of subsetting;

Outcomes:
After working through this unit you ...

  • ... can create vectors by assignment from sequences or using the c();
  • ... are familar with subsetting by index, name, and boolean vectors;
  • ... can subset elements, ranges, and slices from vectors and matrices;
  • ... can combine objects with c(), rbind(), or cbind().

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    R objects can be composed of different kinds of data according to the type and number of "atomic" values they contain:

    • Scalar items are single values;
    • Vectors are ordered sequences of scalars, they must all have the same "data type" (e.g. numeric, logical, character ...);
    • Matrices are vectors for which one or more "dimension(s)" have been defined;
    • "Data frames"" are spreadsheet-like objects, their columns are like vectors and all columns must have the same length, but within one data frame, columns can have different data types. They are the most commonly used type of object to hold data;
    • Lists are the most general collection of data items; lists can contain items of any type and kind, including vectors, matrices, functions, data frames, and other lists.


     

    Scalar data

    Scalars are single numbers, the "atomic" parts of more complex datatypes. Under the hood, scalars are actually vectors of length 1. (More on vectors in the next section). To create a scalar object, simply assign some value to its name.

    x <- pi       # define a scalar by assignment
    x             # its value is ...
    length(x)     # its length is ...
    x[1]          # it is actually a vector, and its first element is ...
    x[2]          # a second element does not exist NA: Not Available
    


    Here are some remarks on the types of scalars R uses, and on coercion between types, i.e. casting one datatype into another. The following scalar types are supported:

    • Boolean constants: TRUE and FALSE. This type has the "mode" logical;
    • Integers, floats (floating point numbers) and complex numbers. These types have the mode numeric;
    • Strings. These have the mode character.

    Other modes exist, such as list, function and expression, all of which can be combined into complex objects.

    The function mode() returns the mode of an object and typeof() returns its type. Also class() tells you what class it belongs to.

    
    typeof(TRUE)
    class(3L)
    mode(print)
    
    

    I have combined these information functions into a single function, objectInfo() which gets loaded and defined when you execute the init() function of the BasicSetup project, so you can explore objects in more detail. We can use objectInfo() to explore how R objects are made up, by handing various expressions as arguments to the function. Many of these you may not yet recognize ... bear with it though:

    Task:

    • Load the R-Exercise_BasicSetup project in RStudio if you don't already have it open.
    • Type init() as instructed after the project has loaded.
    • Continue below.


    
    
    #Let's have a brief look at the function itself: typing a function name without its parentheses returns the source code for the function:
    objectInfo
    
    # Various objects:
    
    #Scalars:
    objectInfo( 3.0 )    # Double precision floating point number
    objectInfo( 3.0e0 )  # Same value, exponential notation
    
    objectInfo( 3 )   # Note: integers are double precision floats by default.
    objectInfo( 3L )  # If we really want an integer, we must use R's
                      # special integer notation ...
    objectInfo( as.integer(3) )  # or explicitly "coerce" to type integer...
    
    # Coercions: For each of these, first think what result you would expect:
    objectInfo( as.character(3) )  # Forcing the number to be interpreted as a character.
    objectInfo( as.numeric("3") )   # character as numeric
    objectInfo( as.numeric("3.141592653") )  # string as numeric. Where do the
                                             # non-zero digits at the end come from?
    objectInfo( as.numeric(pi) )    # not a string, but a predefined constant
    objectInfo( as.numeric("pi") )  # another string as numeric ... Ooops -
                                    # why the warning?
    objectInfo( as.complex(1) )
    
    objectInfo( as.logical(0) )
    objectInfo( as.logical(1) )
    objectInfo( as.logical(-1) )
    objectInfo( as.logical(pi) )      # any non-zero number is TRUE ...
    objectInfo( as.logical("pie") )   # ... but not non-numeric types.
                                      # NA means "Not Available".
    objectInfo( as.character(pi) )    # Interesting: the conversion eats digits.
    
    objectInfo( Inf )                # Larger than the largest representable number
    objectInfo( -Inf )               # ... or smaller
    objectInfo( NaN )                # "Not a Number" is numeric
    objectInfo( NA )                 # "Not Available" - i.e. missing value is
                                     # logical by default
    
    # NULL
    objectInfo( NULL )  # NULL is nothing. Not 0, not NaN,
                        # not FALSE - nothing. NULL is the value that is returned
                        # by expressions or functions when the result is
                        # undefined and nothing more can be said about it.
    
    objectInfo( as.factor("M") )     # factor
    objectInfo( Sys.time() )         # time
    objectInfo( letters )            # inbuilt character vector
    objectInfo( LETTERS )            # same
    objectInfo( 1:4 )                # numeric vector
    objectInfo( matrix(1:4, nrow=2)) # numeric matrix
    objectInfo( data.frame(arabic = 1:3,                           # data frame
                           roman = c("I", "II", "III"),
                           stringsAsFactors = FALSE))
    objectInfo( list(arabic = 1:7,
                     roman = c("I", "II", "III"),
                     chinese = c("一", "二", "三", "四")))   # list
    
    # Expressions:
    objectInfo( 3 > 5 ) # Note: any combination of two variables via the logical
                        # operators ! == != > < >= <= | || & and && is a
                        # logical expression, and evaluates to TRUE or FALSE.
    objectInfo( 3 < 5 )
    objectInfo( 1:6 > 4 ) # these are "vectorized" operators
    
    objectInfo( a ~ b )              # a formula
    objectInfo( objectInfo )         # the function itself
    


     

    Sometimes (but rarely) you may run into a distinction that R makes regarding integers and floating point numbers. By default, if you specify e.g. the number 2 in your code, it is stored as a floating point number. But if the numbers are generated e.g. from a range operator as in 1:2 they are integers! This can give rise to confusion as in the following example:


    a <- 7
    b <- 6:7
    str(a)             # num 7
    str(b)             # int [1:2] 6 7
    a == b[2]          # TRUE
    identical(b[2], a) # FALSE ! Not identical! Why?
                       # (see the str() results above.)
    
    # If you need to be sure that a number is an
    # integer, write it with an "L" after the number:
    c <- 7L
    str(c)             # int 7
    identical(b[2], c) # TRUE
    
    


     

    Vectors

    Since we (almost) never do statistics on scalars, R obviously needs ways to handle collections of data items. In its simplest form such a collection is a vector: an ordered list of items of the same type. Vectors are created from scratch with the c() function which concatenates individual items into a vector, or with various sequencing functions. Vectors have properties, such as length; individual items in vectors can be combined in useful ways. It's worth repeating: all elements of a vector must be of the same type. If they are not, they are silently(!) coerced to the most general type (which is often character). (The actual hierarchy for coercion is raw < logical < integer < double < complex < character < list ).

    
    # The c() function concatenates elements into a vector
    c(2, 4, 6)
    
    
    #Create a vector and list its contents and length:
    f <- c(1, 1, 3, 5, 8, 13, 21)
    f
    length(f)
    
    # Often, for teaching code, I want to demonstrate the contents of an object after
    # assigning it. I can simply wrap the assignment into parentheses to achieve that.
    # Parentheses return the value of whatever they enclose. So ...
    a <- 17
    # ... assigns 17 to the variable "a". But this happens silently. However ...
    ( a <- 17 )
    # ... returns the result of the assignment. I will use this idiom often.
    
    ( myVec <- c(1, 1, 3, 5, 8, 13, 21, 34, 55, 89) )
    
    
    # Coercion:
    # all elements of vectors must be of the same type
    c(1, 2.0, "3", TRUE)  # trying to get a vector with mixed types ...
    [1] "1"    "2"    "3"    "TRUE"
    
    # ... shows that all elements are silently being coerced
    # to character. The emphasis is on _silently_. This might
    # be unexpected, for example if you are reading numeric data
    # from a text-file into a vector but someone has entered a " " for a missing
    # value ... then everything is characterified. Nasty.
    


    There are various ways to subset (retrieve) specific values from a vector; this is important.

    Subsetting by index

    # Extracting by index ...
    myVec[1]         # In R, the first element of a vector has index 1! Not 0.
    head(myVec, 1)   # same thing
    
    myVec[length(myVec)] # length() is the index of the last element.
    tail(myVec, 1)       # same thing
    
    
    # With a vector of indices ...
    1:4 # This is the range operator
    myVec[1:4] # using the range operator (it generates a sequence and returns it in a vector)
    myVec[4:1] # same thing, backwards
    seq(from=2, to=6, by=2) # The seq() function is a flexible, generic way to generate sequences
    seq(2, 6, 2) # Same thing: arguments in default order
    myVec[seq(2, 6, 2)]
    
    # since a scalar is a vector of length 1, does this work?
    5[1]
    
    
    # ...using an index vector with positive indices
    a <- c(1, 3, 4, 1) # the elements of index vectors must be
                       # valid indices of the target vector.
                       # The index vector can be of any length.
    myVec[a] # In this case, four elements are retrieved from myVec[]
    


     

    Excluding items through negative indexes

    # Negative indices omit elements ...
    # ...using an index vector with negative indices
    
    # If elements of index vectors are negative integers,
    # the corresponding elements are excluded.
    ( a <- -(1:4) ) # Note that this is NOT the same as -1:4
    
    myVec[a] # Here, the first four elements are omitted from myVec[]
    
    myVec[-((length(myVec)-3):length(myVec))] # Here, the last four elements are omitted
    
    


     

    Subsetting by boolean vectors

    myVec > 4         # A logical expression operating on the target vector
                      # returns a vector of logical elements. It has the
                      # same length as the target vector.
    myVec[myVec > 4]; # We can use this logical vector to extract only
                      # elements for which the logical expression evaluates as TRUE.
                      # This is also called "filtering".
    # Note: the logical vector is aligned with the elements of the original
    # vector. You can't retrieve elements more than once, as you could
    # with index vectors. If the logical vector is shorter than its target
    # it is "recycled" to the full length.
    
    (1:20)[c(TRUE, FALSE)]  # odd numbers, but how and why?
    
    
    


     


    "[" is an operator

    
    # Some more thoughts about "["
    # "[" is not just a special character, it is an operator. It
    # operates on whatever it is attached to on the left.
    
    ?"["   # help is available ...
    
    # We have attached "[" to vectors above,
    # but we can also attach it directly to functions or other expressions.
    # For example, the summary() function returns some basic statistics of a vector:
    
    summary(myVec)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
       1.00    5.00   21.00   75.69   89.00  377.00
    
    # This is a vector of six numbers:
    length(summary(myVec))
    
    # We can extract e.g. the median like so:
    summary(myVec)[3]
    
    # ... or the boundaries of the interquartile range:
    summary(myVec)[c(2, 5)]
    
    # Note that the elements that summary() returns are "named".
    # "Names" are attributes.
    objectInfo(summary(f))
    
    # The names() function can retrieve (or set) names:
    names(summary(myVec))
    
    # ... which brings us to yet another way to extract elements from Vectors:
    # subsetting by name:
    summary(myVec)["Median"]
    
    


     

    Subsetting by name

    
    # If the vector has named elements, vectors of names can be used exactly like
    # index vectors:
    
    summary(myVec)
    summary(myVec)["Median"]
    summary(myVec)[c("Max", "Min")]  # Oooops - I mistyped. But you can fix the expression, right?
    
    
    


     

    Extending vectors

    
    # Vectors are not immutable. They can grow and shrink as required.
    ( x <- 1:3 )
    length(x)
    x[4] <- 4; x
    length(x)
    x[7] <- 7; x
    length(x)
    ( x <- x[-(5:6)] )
    length(x)
    
    # Example: extending the Fibonacci series for three steps.
    # Think: How does this work? What numbers are we adding here and why does the result end up in the vector?
    ( myVec <- c(myVec, myVec[length(myVec)-1] + myVec[length(myVec)]) )
    ( myVec <- c(myVec, myVec[length(myVec)-1] + myVec[length(myVec)]) )
    ( myVec <- c(myVec, myVec[length(myVec)-1] + myVec[length(myVec)]) )
    
    
    


     

    Vectorized operations

    
    Many operations on vectors are by default performed for every element of the vector, and '''R''' computes them '''very''' efficiently. These are called "vectorized" operations and definitely should be used whenever possible, rather than loops or other explicit iterations.
    
    <pre>
    myVec
    myVec + 1
    myVec * 2
    log(myVec)
    
    # computing with two vectors of same length
    myVec  # the Fibonacci numbers you have defined above
    ( a <- myVec[-1] )  # like myVec, but omitting the first element
    ( b <- myVec[1:(length(myVec)-1)] ) # like f, but shortened by the last element
    c <- a / b # the "golden ratio", phi (~1.61803 or (1+sqrt(5))/2 ),
               # an irrational number, is approximated by the ratio of
               # two consecutive Fibonacci numbers.
    c
    abs(c - ((1+sqrt(5))/2)) # Calculating the error of the approximation, element by element
    


    What could possibly go wrong?...


    When a number is not a single number ...
    One of the "warts" of R is that some functions substitute a range when they receive a vector of length one. Most everyone agrees this is pretty bad. This behaviour was introduced when someone sometime long ago thought it would be nifty to save two keystrokes. This has caused countless errors, hours of frustration and probably hundreds of undiscovered bugs instead. Today we wouldn't write code like that anymore (I hope), but the community believes that since it's been around for so long, it would probably break more things if it's changed. Two functions to watch out for are sample() and seq(); other functions include diag() and runif().
    Consider:
    x <- 8; sample(6:x)
    x <- 7; sample(6:x)
    x <- 6; sample(6:x)  # Oi!
    
    # also consider
    x <- 6:8; seq(x)
    x <- 6:7; seq(x)
    x <- 6:6; seq(x)    # Oi vay!
    
    Wherever this misbehaviour is a possibility - i.e. when the number of elements to sample from is variable and could be just one, for example in some simulation code - you can write a replacement function like so...
    safeSample <- function(x, size, ...) {
    	# Replace the sample() function to ensure sampling from a single
    	# value returns that value.
      # Respect additional arguments if present.
        if (length(x) == 1 && is.numeric(x) && x > 0) {
        	if (missing(size)) size <- 1
            return(rep(x, size))
        } else {
            return(sample(x, size, ...))
        }
    }
    
    
    
    Don't be discouraged though: such warts are rare in R.


     

    Matrices and higher-dimensional objects

    If we need to operate with several vectors, or multi-dimensional data, we make use of matrices or more generally k-dimensional arrays R. Matrix operations are very similar to vector operations, in fact a matrix actually is a vector for which the number of rows and columns have been defined. Thus matrices inherit the basic limitation of vectors: all elements have to be of the same type.

    The most basic way to define matrix rows and columns is to use the dim() function and specify the size of each dimension. Consider:

    ( a <- 1:12 )
    dim(a) <- c(2,6)
    a
    dim(a) <- c(4,3)
    a
    dim(a) <- c(2,2,3)
    a
    

    dim() also tells you the number of rows resp. columns a matrix has. For example:

    dim(a)    # returns the dimensions of a in a vector
    dim(a)[3]  # only the size of the third dimension of a
    

    If you have a two-dimensional matrix, the function nrow() and ncol() will also give you the number of rows and columns, respectively. Obviously, dim(a)[1] is the same as nrow(a).

    As an alternative to dim(), matrices can be defined using the matrix() or array() functions (see there), or "glued" together from vectors by rows or columns, using the rbind() or cbind() functions respectively:

    ( a  <- 1:4 )
    ( b  <- 5:8 )
    ( m1 <- rbind(a, b) )
    ( m2 <- cbind(a, b) )
    ( m  <- cbind(m2, c = 9:12) )  # naming a column "c" while cbind()'ing it
    
    

    "Subsetting" (retrieving) individual elements or slices from matrices is simply done by specifying the appropriate indices, where a missing index indicates that the entire row or column is to be retrieved.

    Explore how you extract rows or columns from a matrix by specifying them. Within the square brackets the order is [<rows>, <columns<]

    m[1,] # first row
    m[, 2] # second column
    m[3, 2] # element at row == 3, column == 2
    m[3:4, 1:2] # submatrix: rows 3 to 4 and columns 1 to 2
    

    Note that R has numerous functions to compute with matrices, such as transposition, multiplication, inversion, calculating eigenvalues and eigenvectors and much more.


     

    Self-evaluation

    Question 1

    Within the square brackets of a matrix the order is [<rows>, <columns>], but what about slices of a 3D matrix? Is it [<slices>, <rows>, <columns>] or [<rows>, <columns>, <slices>]?

    Answer ...

    Just try:

    x <- 1:27
    dim(x) <- c(3, 3, 3)
    x
    x[2, 1, 1]   # 2
    x[1, 1, 2]   # 10
    
    

    ... and this means [<rows>, <columns<, <slices>] is correct.


     


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-18

    Version:

    1.0.1

    Version history:

    • 1.0.1 Maintenance
    • 1.0 Completed to first live version
    • 0.1 Material collected from previous tutorial

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.