Difference between revisions of "RPR-Subsetting"

From "A B C"
Jump to navigation Jump to search
m
m
Line 19: Line 19:
  
  
{{DEV}}
+
{{LIVE}}
  
 
{{Vspace}}
 
{{Vspace}}
Line 47: Line 47:
 
=== Objectives ===
 
=== Objectives ===
 
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "objectives" -->
 
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "objectives" -->
 +
This unit will ...
 +
* ... introduce ;
 +
* ... discuss ;
 +
* ... teach ;
 +
 +
{{Vspace}}
 +
 +
 +
=== Outcomes ===
 +
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "outcomes" -->
 +
After working through this unit you ...
 +
* ... have done;
 +
* ... know how ;
 +
* ... can ;
 +
 +
{{Vspace}}
 +
 +
 +
=== Deliverables ===
 +
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "deliverables" -->
 +
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 +
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 +
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 +
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 +
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 +
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 +
 +
{{Vspace}}
 +
 +
 +
=== Evaluation ===
 +
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "evaluation" -->
 +
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 +
<b>Evaluation: NA</b><br />
 +
:This unit is not evaluated for course marks.
 +
 +
{{Vspace}}
 +
 +
 +
</div>
 +
<div id="BIO">
 +
== Contents ==
 +
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "contents" -->
 
===Subsetting===
 
===Subsetting===
  
Line 130: Line 173:
 
{{Vspace}}
 
{{Vspace}}
  
 
{{Vspace}}
 
 
 
=== Outcomes ===
 
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "outcomes" -->
 
...
 
 
{{Vspace}}
 
 
 
=== Deliverables ===
 
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
 
{{Vspace}}
 
 
 
=== Evaluation ===
 
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
<b>Evaluation: NA</b><br />
 
:This unit is not evaluated for course marks.
 
 
{{Vspace}}
 
 
 
</div>
 
<div id="BIO">
 
== Contents ==
 
<!-- included from "../components/RPR-Subsetting.components.wtxt", section: "contents" -->
 
...
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 235: Line 241:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2017-09-10
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.0
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
*0.1 First stub
+
*1.0 Completed to first live version
 +
*0.1 Material collected from previous tutorial
 
</div>
 
</div>
 
[[Category:ABC-units]]
 
[[Category:ABC-units]]

Revision as of 15:57, 10 September 2017

Subsetting and filtering R objects


 

Keywords:  Subsetting with the [], [[]], and $ operators, filtering


 



 


 


Abstract

...


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... introduce ;
  • ... discuss ;
  • ... teach ;


 


Outcomes

After working through this unit you ...

  • ... have done;
  • ... know how ;
  • ... can ;


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.


 


Contents

Subsetting

We have encountered "subsetting" before, but we really need to discuss this in more detail. It is one of the most important and powerful topics of R since it is indispensable to select, transform, and otherwise modify data to prepare it for analysis. You have seen that we use square brackets to indicate individual elements in vectors and matrices. These square brackets are actually "operators", and you can find more information about them in the help pages:

> ?"["     # Note that you need quotation marks around the operator for this.

Note especially:

  • [ ] "extracts" one or more elements defined within the brackets;
  • [[ ]] "extracts" a single element defined within the brackets;
  • ? "extracts" a single named element that.

"Elements" are not necessarily scalars, but can apply to a row, column, or more complex data structure. But a "single element" can't be a range, or collection.


Here are some examples of subsetting data from the plasmidData data frame we constructed above. For the most part, this is review of what we already did above:

plasmidData[1, ]
plasmidData[2, ]

# we can extract more than one row by specifying
# the rows we want in a vector ...
plasmidData[c(1, 2), ]

# ... this works in any order ...
plasmidData[c(3, 1), ]

# ... and for any number of rows ...
plasmidData[c(1, 2, 1, 2, 1, 2), ]


# Same for columns
plasmidData[ , 2 ]

# We can select rows and columns by name if a name has been defined...
plasmidData[, "Name"]
plasmidData$Name      # different syntax, same thing. This is the syntax I use most frequently.


# Watch this!
plasmidData$Name[plasmidData$Ori != "ColE1"]
# What happened here?
# plasmidData$Ori != "ColE1" is a logical expression, it gives a vector of TRUE/FALSE values
plasmidData$Ori != "ColE1"

# We insert this vector into the square brackets. R then returns all rows for
# which the vector is TRUE.

# In this way we can "filter" for values
plasmidData$Size > 3000
plasmidData$Name[plasmidData$Size > 3000]

# This principle is what we use when we want to "sort" an object
# by some value. The function order() is used to return values
# that are sorted. Remember this: not sort() but order().
order(plasmidData$Size)
plasmidData[order(plasmidData$Size), ]

# grep() matches substrings in strings and returns a vector of indices
grep("Tet", plasmidData$Marker)
plasmidData[grep("Tet", plasmidData$Marker), ]
plasmidData[grep("Tet", plasmidData$Marker), "Ori"]

Elements that can be extracted from an object also can be replaced. Simply assign the new value to the element.

( x <- sample(1:10) )
x[4] <- 99
x
( x <- x[order(x)] )

Try your own subsetting ideas. Play with this. I find that even seasoned investigators have problems with subsetting their data and if you become comfortable with the many ways of subsetting, you will be ahaed of the game right away.


 


 


Further reading, links and resources

 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-09-10

Version:

1.0

Version history:

  • 1.0 Completed to first live version
  • 0.1 Material collected from previous tutorial

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.