Difference between revisions of "RPR-Setup"
m |
m |
||
Line 228: | Line 228: | ||
</source> | </source> | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | ====The "Workspace"==== | ||
+ | |||
+ | |||
+ | During an '''R''' session, you might define a large number of variables, data structures, load packages and scripts etc. All of this information is stored in the so-called "Workspace". When you quit '''R''' you have the option to save the Workspace; it will then be restored in your next session. However, restoring the Workspace from a previous state is potentially a bad idea: if you load data or variables in a startup script, they may be overwritten with a corrupted version that you happened to save in the workspace when you last quit. This is very hard to troubleshoot. Essentially, when you save and reload your Workspace habitually, you have overlapping and potentially conflicting behaviour of startup script and Workplace restore. | ||
+ | |||
+ | What I prefer and recommend instead is the following: | ||
+ | * Never save the Workspace. | ||
+ | * Always work from scripts. | ||
+ | * Write your scripts so that you can easily recreate all objects you need to continue your analysis. | ||
+ | * If some objects are expensive to compute, you can always <code>save()</code> and later <code>load()</code> them explicitly. In fact, restoring the Workspace does the same thing, but you have less control regarding whether the version of your objects are correct, and what temporary variables may be loaded as well. | ||
+ | * In this way, you work with '''explicit''' instructions, not '''implicit''' behaviour. | ||
+ | * Explicit beats implicit. | ||
+ | |||
+ | |||
+ | {{console|List the current workspace contents: initially it is empty. (R reports an object of type "character" with a length of 0.) | ||
+ | |> ls() | ||
+ | character(0) | ||
+ | > | ||
+ | }} | ||
+ | |||
+ | {{console|Initialize three variables | ||
+ | |> a <- 3 | ||
+ | |> b <- 4 | ||
+ | |> c <- sqrt(a^2 +b^2) | ||
+ | > ls() | ||
+ | [1] "a" "b" "c" | ||
+ | > | ||
+ | }} | ||
+ | |||
+ | {{console|Save one item in an .RData file. | ||
+ | |> save(a, file = "tmp.RData") | ||
+ | }} | ||
+ | |||
+ | {{console|Remove one item from the Workspace. (Note: the argument for <code>rm()</code> is not the string "''a''", but the variable name ''a''. No quotation marks!) | ||
+ | |> rm(a) | ||
+ | > ls() | ||
+ | [1] "b" "c" | ||
+ | > | ||
+ | }} | ||
+ | |||
+ | {{console|Load what you previously saved. | ||
+ | |> load("tmp.RData") | ||
+ | > ls() | ||
+ | [1] "a" "b" "c" | ||
+ | }} | ||
+ | |||
+ | |||
+ | Note: you can <code>save()</code> more than one item in an .RData file. When you then <code>load()</code> the file, all of the objects it contains are loaded. You don't '''assign''' the objects - they are being '''restored'''. | ||
+ | |||
+ | |||
+ | <small>We can use the output of {{c|ls()}} as input to {{c|rm()}} to remove all items from the workspace. (cf. {{c|?rm}} for details)</small> | ||
+ | <source lang="rsplus"> | ||
+ | rm(list = ls()) | ||
+ | > ls() | ||
+ | character(0) | ||
+ | > | ||
+ | </source> | ||
+ | |||
+ | |||
+ | The contents of the workspace is displayed in RStudio's Environment Pane (top-right). You can see a little "brush" icon at the top that you can click to remove all items from the workspace. | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | |||
{{Vspace}} | {{Vspace}} |
Revision as of 05:32, 17 August 2017
Setup R to work with it
Keywords: R projects; working with git version control via RStudio; the history mechanism and why not to use it; .Rprofile to customize startup behaviour; the working directory
Contents
This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.
Abstract
...
This unit ...
Prerequisites
You need to complete the following units before beginning this one:
Objectives
...
Outcomes
...
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your course journal.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Evaluation
Evaluation: NA
- This unit is not evaluated for course marks.
Contents
"Projects"
We will make extensive use of "projects" in class. Read more about projects in RStudio here.
Git Version control
We will also make extensive use of version control. In fact, we will now load a project via Git version control from its free, public repository on GitHub.
Task:
- Read more about Version Control in RStudio here.
- Follow the instructions to install
git
on your computer.
Then do the following:
- open RStudio
- Select File → NewProject...
- Click on Version Control
- Click on Git
- Enter
https://github.com/hyginn/R_Exercise-BasicSetup
as the Repository URL. - Type a
<tab>
character, the Project directory name field should then autofill to readR_Exercise-BasicSetup
- Click on Browse... to find your project directory. (The one that you have created above). Click Open.
- Click Create Project; the project files should be downloaded and the console should prompt you to type
init()
to begin. - Type
init()
into the console pane.
An R script should load.
- Explore the script and follow its instructions.
What could possibly go wrong?...
- I get an error message
- "Git not found".
- The simplest reason is that you may have had RStudio open while installing git. Just restart RStudio.
- The executable for Git (the Git "program" - "git.exe" on Windows, "git" elsewhere) needs to be on your system's path, or correctly specified in RStudio's options. The correct "path" to Git will depend on your operating system, and how git was installed. To find where git is installed –
- On Mac and Unix systems, open a Terminal window[1] and type
which git
. This will either print the path (Yay), or tell you that git is not found. The latter could have two reasons: either git has no been installed in the first place, or it has been installed in a non-standard location by whatever installation manager you have used. Ask Google to help you figure out how to solve your specific case. - On Windows you can find the location of the executable by searching "git.exe" in your "programs and files". Once it's been found, right click on it and select "Open file location" from the options. It might be in C:\Program Files\Git\cmd\git.exe but the exact location depends on your operating system.
- On Mac and Unix systems, open a Terminal window[1] and type
- Once you know the path to your git executable, open File → Preferences, click on the Git/SVN option, click on the Browse button, and find the correct folder. On Macs you may need to click <shift> <command> G to open the "Go to ..." dialogue, then type the top-folder of the path (e.g.
/usr
) and click your way down to folder where the program lives. Find the installation directory and select git.exe. Then click "ok".
- Then try again to create the project and let us know what happened in case it still did not work.
- I get an error message like "directory exists and is not empty".
- A directory with the name of the project already exists in the location in which you are asking RStudio to create the project. Either delete the existing directory, or install the project into a different parent directory.
- The git icon has disappeared.
- I have seen this happen when somehow the path to git has changed.
- (A) Make sure the correct path to git is set in your File → Preferences → Git/SVN.
- (B) Open Tools → Project options... → Git/SVN. Next to Version control system git must be selected, not (None). If it is (None), change this to git. If that's not an option, the path is not correct. Go back to (A).
- (C) I think you may need to restart RStudio then and reload your project via the Files → Recent projects... menu for the git icon and the version control options to reappear.
Working directory
To locate a file in a computer, one has to specify the filename and the directory in which the file is stored; this is also called the path of the file. The "working directory" for R is either the directory in which the R-program has been installed, or some other directory, as initialized by a startup script. You can execute the command getwd()
to list what the "Working Directory" is currently set to:
> getwd()
[1] "/Users/steipe/R"
In RStudio, the contents of the working directory is listed in the Files Pane.
It is convenient to put all your R-input and output files into a project specific directory and then define this to be the "Working Directory". The R working directory is the directory that R uses when you don't specify a path. Think of it as the default directory. Use the setwd()
command for this. setwd()
requires an argument that you type between the parentheses: a string with the directory path, or a variable containing such a string. Strings in R are delimited with "
or '
characters. If the directory does not exist, an Error will be reported. Make sure you have created the directory. On Mac and Unix systems, the usual shorthand notation for relative paths can be used: ~
for the home directory, .
for the current directory, ..
for the parent of the current directory.
If you use a windows system, you need know that backslashes – "\" – have a special meaning for R, they work as escape characters. For example the string "\n" means newline, and "\t" means tab. Thus R gets confused when you put backslashes into string literals, such as Windows path names. R has a simple solution: you simply use forward slashes instead of backslashes when you specify paths, and R will translate them correctly when it talks to your operating system. Instead of C:\documents\projectfiles
you write C:/documents/projectfiles
. Also note that on Windows the ~
tilde is a shorthand for the directory in which R is installed, not the user's home directory.
My home directory...
> setwd("~") # Note: ~ is the "tilde" - the squiggly line - not the straight hyphen
> getwd()
[1] "/Users/steipe"
Relative path: home directory, up one level, then down into chen's home directory)
> setwd("~/../chen")
> getwd()
[1] "/Users/chen"
Absolute path: specify the entire string)
> setwd("/Users/steipe/abc/R_samples")
> getwd()
[1] "Users/steipe/abc/R_samples"
In RStudio you can use the Session → Set Working Directory menu. This includes the useful option to set the current project directory as the working directory.
Task:
Since you have gone through the script of the BasicSetup project, your working directory should be set to this project directory (I have configured the project to do this automatically.)
- Figure out the path to its parent directory - i.e. the course- or workshop directory you created at the beginning.
- Use
setwd("<your/path/and/directory/name>")
to set the working directory to the course directory. - Confirm that this has worked by typing
getwd()
andlist.files()
.
The Working Directory functions can also be accessed through the Menu, under Misc.
.Rprofile - startup commands
Often, when working on a project, you would like to start off in your working directory right away when you start up R, instead of typing the setwd()
command. This is easily done in a special R-script that is executed automatically on startup[2]. The name of the script is .Rprofile
and R expects to find it in the user's home directory. You can edit these files with a simple text editor like Textedit (Mac), Notepad (windows) or Gedit (Linux) - or, of course, by opening it in R itself[3].
Besides setting the working directory, other items that might go into such a file could be
- libraries that you often use
- constants that are not automatically defined
- functions that you would like to preload.
For more details, use R's help function:
> ?Startup
The "Workspace"
During an R session, you might define a large number of variables, data structures, load packages and scripts etc. All of this information is stored in the so-called "Workspace". When you quit R you have the option to save the Workspace; it will then be restored in your next session. However, restoring the Workspace from a previous state is potentially a bad idea: if you load data or variables in a startup script, they may be overwritten with a corrupted version that you happened to save in the workspace when you last quit. This is very hard to troubleshoot. Essentially, when you save and reload your Workspace habitually, you have overlapping and potentially conflicting behaviour of startup script and Workplace restore.
What I prefer and recommend instead is the following:
- Never save the Workspace.
- Always work from scripts.
- Write your scripts so that you can easily recreate all objects you need to continue your analysis.
- If some objects are expensive to compute, you can always
save()
and laterload()
them explicitly. In fact, restoring the Workspace does the same thing, but you have less control regarding whether the version of your objects are correct, and what temporary variables may be loaded as well. - In this way, you work with explicit instructions, not implicit behaviour.
- Explicit beats implicit.
List the current workspace contents: initially it is empty. (R reports an object of type "character" with a length of 0.)
> ls()
character(0)
>
Initialize three variables
> a <- 3
Save one item in an .RData file.
{{{2}}}
Remove one item from the Workspace. (Note: the argument for rm()
is not the string "a", but the variable name a. No quotation marks!)
> rm(a)
> ls()
[1] "b" "c"
>
Load what you previously saved.
> load("tmp.RData")
> ls()
[1] "a" "b" "c"
Note: you can save()
more than one item in an .RData file. When you then load()
the file, all of the objects it contains are loaded. You don't assign the objects - they are being restored.
We can use the output of ls()
as input to rm()
to remove all items from the workspace. (cf. ?rm
for details)
rm(list = ls())
> ls()
character(0)
>
The contents of the workspace is displayed in RStudio's Environment Pane (top-right). You can see a little "brush" icon at the top that you can click to remove all items from the workspace.
- ↑ The Terminal app is in the Utilities sub-folder of your Applications folder.
- ↑ Actually, the first script that runs is Rprofile.site which is found on Linux and Windows machines in the
C:\Program Files\R\R-{version}\etc
directory. But not on Macs. - ↑ Operating systems commonly hide files whose name starts with a period "." from normal directory listings. All files however are displayed in RStudio's File pane. Nevertheless, it is useful to know how to view such files by default. On Macs, you can configure the Finder to show you such "hidden files" by default. To do this:
(i) Open a terminal window; (ii) Type:
$defaults write com.apple.Finder AppleShowAllFiles YES
(iii) Restart the Finder by accessing Force quit (under the Apple menu), selecting the Finder and clicking Relaunch. (iV) If you ever want to revert this, just do the same thing but set the default toNO
instead.