Difference between revisions of "RPR-OBJECTS-Data frames"

Latest revision as of 01:06, 6 September 2021

R "data frames""

(R data frames)

Abstract:

Introduction to data frames: how to create, and modify them and how to retrieve data.

Objectives:
This unit will ...

... introduce R data frames;
... cover a number of basic operations.

Outcomes:
After working through this unit you ...

... know how to create and manipulate data frames;
... can access and change individual elements;
... can extract rows, columns, and append new data rows;

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:

RPR-Objects-Vectors (R scalars and vectors)

rownames(plasmidData) <- plasmidData[ , 1]  # assigns the contents of column 1 as rownames
nrow(plasmidData)
ncol(plasmidData)
objectInfo(plasmidData)


x <- plasmidData[2, ]  # assign one row to a variable
objectInfo(x)  # This is also a data frame! One row. It has to be, because
               # it contains elements of type chr and of type int!

plasmidData["pBR322", ]  # retrieve one row: different syntax, same thing

( s <- plasmidData["pBR322", "Size"] )  # one element
plasmidData["pBR322", "Size"] <- "???"  # change one element
plasmidData["pBR322", ]                 # Note that this is noew a string, not a number
objectInfo(plasmidData)                 # In fact, the assignment has changed the
                                        # type of the the whole column. Remember:
                                        # in a data.frame, all elements of one column
                                        # have the same type.

plasmidData <- plasmidData[-2, ]  # remove one row
objectInfo(plasmidData)

plasmidData <- rbind(plasmidData, x)  # add it back at the end
objectInfo(plasmidData)

# add a new row from scratch:
plasmidData <- rbind(plasmidData, data.frame(Name = "pMAL-p5x",
                                                     Size = 5752,
                                                     Marker = "Amp",
                                                     Ori = "pMB1",
                                                     Sites = "SacI, AvaI, HindIII"))
objectInfo(plasmidData)

( x <- plasmidData[ , 2] )    # retrieve one column by index
  plasmidData[ , "Size"]      # retrieve one column by name
objectInfo(plasmidData)       # now a vector!

# That may be surprising behaviour. When you retrieve a single column from a
# dataframe it is (silently) turned into a vector (unless you explicitly
# tell R not to do that - e.g. plasmidData[ , "Size", drop = FALSE]). To make the
# nature of this data as a vector more expolicit, I usually use a different
# and equivalent syntax: the "$" operator

plasmidData$Size
objectInfo(plasmidData$Size)

# Note: the $ operator always returns a vector. And, the column name is _NOT_
# placed in quotation marks. This is the syntax we usually will use throughout
# the course.

Task:
The rowname of the new row of plasmidData is now "1". It should be "pMAL-p5x". Fix this.

Notes

↑ The two most important formats for generic text-based datafiles are "tab"-separated values (.tsv) and "comma"-separated values (.csv).

About ...

Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-09-18

Version:

1.1

Version history:

1.1 Remove stringsAsFactors, no longer an issue
1.0.1 Maintenance
1.0 Completed to first live version
0.1 Material collected from previous tutorial

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

[1] The two most important formats for generic text-based datafiles are "tab"-separated values (.tsv) and "comma"-separated values (.csv).

[1]

Difference between revisions of "RPR-OBJECTS-Data frames"

Latest revision as of 01:06, 6 September 2021

Contents

Evaluation

Contents

Data frames

Basic operations

Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools

Revision as of 09:28, 25 September 2020 (view source) Boris (talk \| contribs) m ← Older edit	Latest revision as of 01:06, 6 September 2021 (view source) Boris (talk \| contribs) m (Boris moved page RPR-Objects-Data frames to RPR-OBJECTS-Data frames)
(No difference)