Difference between revisions of "RPR-Setup"
m |
m |
||
Line 1: | Line 1: | ||
<div id="ABC"> | <div id="ABC"> | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;"> |
Setup R to work with it | Setup R to work with it | ||
− | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:# | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; "> |
(R projects; working with git version control via RStudio; the history mechanism and why not to use it; .Rprofile to customize startup behaviour; the working directory.) | (R projects; working with git version control via RStudio; the history mechanism and why not to use it; .Rprofile to customize startup behaviour; the working directory.) | ||
</div> | </div> | ||
Line 10: | Line 10: | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;"> |
<div style="font-size:118%;"> | <div style="font-size:118%;"> | ||
<b>Abstract:</b><br /> | <b>Abstract:</b><br /> | ||
Line 65: | Line 65: | ||
− | |||
{{Smallvspace}} | {{Smallvspace}} | ||
Line 82: | Line 81: | ||
Your Course Folder [[FND-Biocomputing_setup#Course_Folder|should already exist]]. | Your Course Folder [[FND-Biocomputing_setup#Course_Folder|should already exist]]. | ||
− | Take note! When | + | Take note! When you write a Windows paths in an '''R''' command, you have to use the "wrong" forward slash to separte directories and files. '''R''' will translate these "Unix-style"" paths into Windows-style paths automatically when it negotiates with the operating system. But the backslash is interpreted as an "escape" character that gives the character the follows it a special meaning.<ref>For example <code>C:Documents\new</code> would be interpreted as <code>C:Documents<linebreak>ew</code> because <code>\n</code> is the linebreak character. Even though that's actually the path name on Windows, in an R command you have to write <code>C:Documents/new</code></ref> |
;Folder name and path examples | ;Folder name and path examples | ||
Line 217: | Line 216: | ||
− | In RStudio you can use the '''Session → Set Working Directory''' menu. This includes the useful option to set the current project directory as the working directory<ref> | + | In RStudio you can use the '''Session → Set Working Directory''' menu. This includes the useful option to set the current project directory as the working directory<ref>Projects that I create for teaching are configured to use this option by default, thus once the project is loaded, the Working Directory should already be correctly set.</ref>. |
Line 235: | Line 234: | ||
− | Often, when working on a project, you would like to start off in your working directory right away when you start up '''R''', instead of typing the <code>setwd()</code> command. This is easily done in a special '''R'''-script that is executed automatically on startup<ref>Actually, the first script that runs is '''Rprofile.site''' which is found on Linux and Windows machines in the <code>C:\Program Files\R\R-{version}\etc</code> directory. But not on Macs.</ref>. The name of the script is <code>.Rprofile</code> and '''R''' expects to find it in the user's home directory. You can edit these files with a simple text editor like Textedit (Mac), Notepad (windows) or Gedit (Linux) - or, of course, by opening it in ' | + | Often, when working on a project, you would like to start off in your working directory right away when you start up '''R''', instead of typing the <code>setwd()</code> command. This is easily done in a special '''R'''-script that is executed automatically on startup<ref>Actually, the first script that runs is '''Rprofile.site''' which is found on Linux and Windows machines in the <code>C:\Program Files\R\R-{version}\etc</code> directory. But not on Macs.</ref>. The name of the script is <code>.Rprofile</code> and '''R''' expects to find it in the user's home directory. You can edit these files with a simple text editor like Textedit (Mac), Notepad (windows) or Gedit (Linux) - or, of course, by opening it in RStudio - don't forget that a code editor is also a text editor<ref>Operating systems commonly hide files whose name starts with a period "." from normal directory listings. '''All''' files however are displayed in RStudio's File pane. Nevertheless, it is useful to know how to view such files by default. On Macs, you can configure the Finder to show you such "hidden files" by default. To do this: |
(i) Open a terminal window; (ii) Type: <code>$defaults write com.apple.Finder AppleShowAllFiles YES</code> (iii) Restart the Finder by accessing '''Force quit''' (under the Apple menu), selecting the Finder and clicking '''Relaunch'''. (iV) If you ever want to revert this, just do the same thing but set the default to <code>NO</code> instead.</ref>. | (i) Open a terminal window; (ii) Type: <code>$defaults write com.apple.Finder AppleShowAllFiles YES</code> (iii) Restart the Finder by accessing '''Force quit''' (under the Apple menu), selecting the Finder and clicking '''Relaunch'''. (iV) If you ever want to revert this, just do the same thing but set the default to <code>NO</code> instead.</ref>. | ||
Line 267: | Line 266: | ||
− | During an '''R''' session, you might define a large number of R-objects: variables, data structures, functions etc. and load packages and scripts. All of this information is stored in the so-called "Workspace". When you quit '''R''' you have the option to save the Workspace; it will then be restored in your next session. | + | During an '''R''' session, you might define a large number of R-objects: variables, data structures, functions etc., and you might load packages and scripts. All of this information is stored in the so-called "Workspace". When you quit '''R''' you have the option to save the Workspace; it will then be restored in your next session. Now, you might think: how convenient - I can just stop R, and when I restart it, it will go into the same state as it was. But no. Restoring the Workspace from a previous state is actually a bad idea: if you load data or variables in a startup script, they may be overwritten with a corrupted version that you happened to save in the workspace when you last quit. This is very hard to troubleshoot. Essentially, when you save and reload your Workspace habitually, you have overlapping and potentially conflicting behaviour of startup script and Workspace restore. |
What I recommend instead is the following: | What I recommend instead is the following: | ||
Line 317: | Line 316: | ||
− | Note: you can <code>save()</code> more than one item in an .RData file. When you then <code>load()</code> the file, all of the objects it contains are loaded. You don't '''assign''' | + | Note: you can <code>save()</code> more than one item in an .RData file. When you then <code>load()</code> the file, all of the objects it contains are loaded. You don't '''assign''' these objects - they are being '''restored'''. |
Line 329: | Line 328: | ||
− | The contents of the workspace is displayed in RStudio's Environment Pane (top-right). You can see a little " | + | The contents of the workspace is displayed in RStudio's Environment Pane (top-right). You can see a little "broom" icon at the top that you can click to remove all items from the workspace. |
Line 385: | Line 384: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | : | + | :2018-05-02 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :1.1 | + | :1.1.1 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
+ | *1.1.1 Maintenance | ||
*1.1 Fixed display bug with "=" in template code; moved to GeSHi formatting. | *1.1 Fixed display bug with "=" in template code; moved to GeSHi formatting. | ||
*1.0 Completed to first live version | *1.0 Completed to first live version |
Revision as of 13:06, 7 May 2018
Setup R to work with it
(R projects; working with git version control via RStudio; the history mechanism and why not to use it; .Rprofile to customize startup behaviour; the working directory.)
Abstract:
This unit discusses the setup of a working session with RStudio.
Objectives:
|
Outcomes:
|
Deliverables:
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Prerequisites:
This unit builds on material covered in the following prerequisite units:
Contents
Contents
Your Course Folder
Your Course Folder should already exist.
Take note! When you write a Windows paths in an R command, you have to use the "wrong" forward slash to separte directories and files. R will translate these "Unix-style"" paths into Windows-style paths automatically when it negotiates with the operating system. But the backslash is interpreted as an "escape" character that gives the character the follows it a special meaning.[1]
- Folder name and path examples
- /Users/Pierette/Documents/BCB420 ◁ Looking good on a Mac.
- C:\Users\Pulcinella\Documents\CBW ◁ Looking good on a Windows computer.
- "C:/Users/Pulcinella/Documents/CBW" ◁ Looking good inside R on a Windows computer (note the quotation marks!).
- C:\Users\Pantalone\Documents\BCH1441 (2017) ◁ Wrong. No special characters please.
- /Users/Brighella/Documents/UofT Stuffz/Courses/more/Comp Sys biol. course ◁ Wrong. Please read instructions more carefully.
- C:\Users\Tartaglia\Documents\KUWTK\<Coursecode> ◁ I can't even ...
"Projects"
We will make extensive use of "projects" in class. Read more about projects in RStudio here.
Git Version control
We will also make extensive use of version control. In fact, we will now load a project via Git version control from its free, public repository on GitHub.
Task:
- Read more about Version Control in RStudio here.
- Follow the instructions to install
git
on your computer.
Then do the following:
- open RStudio
- Select File → NewProject...
- Click on Version Control
- Click on Git
- Enter
https://github.com/hyginn/R_Exercise-BasicSetup
as the Repository URL. - Type a
<tab>
character, the Project directory name field should then autofill to readR_Exercise-BasicSetup
- Click on Browse... to find your Course Folder. (The one that you have already created). Click Open.
- Click Create Project; the project files should be downloaded and the console should prompt you to type
init()
to begin. - Type
init()
into the console pane.
An R script should load.
- Explore the script and follow its instructions.
What could possibly go wrong?...
- I get an error message
- "Git not found".
- The simplest reason is that you may have had RStudio open while installing git. Just restart RStudio.
- The executable for Git (the Git "program" - "git.exe" on Windows, "git" elsewhere) needs to be on your system's path, or correctly specified in RStudio's options. The correct "path" to Git will depend on your operating system, and how git was installed. To find where git is installed –
- On Mac and Unix systems, open a Terminal window[2] and type
which git
. This will either print the path (Yay), or tell you that git is not found. The latter could have two reasons: either git has no been installed in the first place, or it has been installed in a non-standard location by whatever installation manager you have used. Ask Google to help you figure out how to solve your specific case. - On Windows you can find the location of the executable by searching "git.exe" in your "programs and files". Once it's been found, right click on it and select "Open file location" from the options. It might be in C:\Program Files\Git\cmd\git.exe but the exact location depends on your operating system.
- On Mac and Unix systems, open a Terminal window[2] and type
- Once you know the path to your git executable, open File → Preferences, click on the Git/SVN option, click on the Browse button, and find the correct folder. On Macs you may need to click <shift> <command> G to open the "Go to ..." dialogue, then type the top-folder of the path (e.g.
/usr
) and click your way down to folder where the program lives. Find the installation directory and select git.exe. Then click "ok".
- Then try again to create the project and let us know what happened in case it still did not work.
- I get an error message like "directory exists and is not empty".
- A directory with the name of the project already exists in the location in which you are asking RStudio to create the project (the Course Folder). Either delete the existing directory, or install the project into a different parent directory.
- The git icon has disappeared.
- I have seen this happen when somehow the path to git has changed.
- (A) Make sure the correct path to git is set in your File → Preferences → Git/SVN.
- (B) Open Tools → Project options... → Git/SVN. Next to Version control system git must be selected, not (None). If it is (None), change this to git. If that's not an option, the path is not correct. Go back to (A).
- (C) I think you may need to restart RStudio then and reload your project via the Files → Recent projects... menu for the git icon and the version control options to reappear.
Working directory
To locate a file in a computer, one has to specify the filename and the directory in which the file is stored; this is also called the path of the file. However R uses a default "working directory"", which is assumed if no path is specified. This "working directory" for R is either the directory in which the R-program has been installed, or some other directory, that has been defined in a startup script, or specifically defined with the command setwd("<working Directory>")
at any time. You can execute the command getwd()
to list what the Working Directory is currently set to:
> getwd()
[1] "/Users/steipe/R"
In RStudio, the contents of the working directory is listed in the Files Pane (lower-right).
It is convenient to put all your R-input and output files into a project specific directory and then define this to be the "Working Directory". Use the setwd()
command for this. setwd()
requires an argument that you type between the parentheses: a string with the directory path, or a variable containing such a string. Strings in R are delimited with "
or '
characters. If the directory does not exist, an Error will be reported. Make sure you have created the directory. On Mac and Unix systems, the usual shorthand notation for relative paths can be used: ~
for the home directory, .
for the current directory, ..
for the parent of the current directory.
If you use a Windows system, you need know that backslashes – "\" – have a special meaning for R, they work as escape characters. For example the string "\n" means newline, and "\t" means tab. Thus R gets confused when you put backslashes into string literals, such as Windows path names. R has a simple solution: you simply use forward slashes instead of backslashes when you specify paths, and R will translate them correctly when it talks to your operating system. Instead of C:\documents\projectfiles
you write C:/documents/projectfiles
. Also note that on Windows the ~
tilde is a shorthand for the directory in which R is installed, not the user's home directory.
My home directory...
|> setwd("~") # Note: ~ is the "tilde" - the squiggly line - not the straight hyphen
> getwd()
[1] "/Users/steipe"
Relative path: home directory, up one level, then down into chen's home directory)
|> setwd("~/../chen")
> getwd()
[1] "/Users/chen"
Absolute path: specify the entire string)
|> setwd("/Users/steipe/abc/R_samples")
> getwd()
[1] "Users/steipe/abc/R_samples"
In RStudio you can use the Session → Set Working Directory menu. This includes the useful option to set the current project directory as the working directory[3].
Task:
Since you have gone through the script of the BasicSetup project, your working directory should be set to this project directory (I have configured the project to do this automatically.)
- Figure out the path to its parent directory - i.e. the course- or workshop directory you created at the beginning.
- Use
setwd("<your/path/and/directory/name>")
to set the Working Directory to the Course Folder. - Confirm that this has worked by typing
getwd()
andlist.files()
.
The Working Directory functions can also be accessed through the Menu, under Misc.
.Rprofile - startup commands
Often, when working on a project, you would like to start off in your working directory right away when you start up R, instead of typing the setwd()
command. This is easily done in a special R-script that is executed automatically on startup[4]. The name of the script is .Rprofile
and R expects to find it in the user's home directory. You can edit these files with a simple text editor like Textedit (Mac), Notepad (windows) or Gedit (Linux) - or, of course, by opening it in RStudio - don't forget that a code editor is also a text editor[5].
Besides setting the working directory, other items that might go into such a file could be
- libraries that you often use
- constants that are not automatically defined
- functions that you would like to preload.
For more details, use R's help function:
> ?Startup
Task:
Just for information:
- locate the .Rprofile file in the RStudio file pane;
- click on it to open it in the text-editing window.
This way you could change it and save the changes. However, don't do that now but
- Close the file again.
The "Workspace"
During an R session, you might define a large number of R-objects: variables, data structures, functions etc., and you might load packages and scripts. All of this information is stored in the so-called "Workspace". When you quit R you have the option to save the Workspace; it will then be restored in your next session. Now, you might think: how convenient - I can just stop R, and when I restart it, it will go into the same state as it was. But no. Restoring the Workspace from a previous state is actually a bad idea: if you load data or variables in a startup script, they may be overwritten with a corrupted version that you happened to save in the workspace when you last quit. This is very hard to troubleshoot. Essentially, when you save and reload your Workspace habitually, you have overlapping and potentially conflicting behaviour of startup script and Workspace restore.
What I recommend instead is the following:
- Never save the Workspace.
- Always work from scripts.
- Write your scripts so that you can easily recreate all objects you need to continue your analysis.
- If some objects are expensive to compute, you can always
save()
and laterload()
them explicitly. In fact, restoring the Workspace does the same thing, but you have less control regarding whether the version of your objects are correct, and what temporary variables may be loaded as well. - In this way, you work with explicit instructions, not implicit behaviour.
- Explicit beats implicit.
List the current workspace contents: initially it only contains the init()
function that was loaded from the .Rprofile script on startup.
|> ls()
[1] "init"
>
Initialize three variables
|> a <- 3
|> b <- 4
|> c <- sqrt(a^2 +b^2)
> ls()
[1] "a" "b" "c" "init"
>
Save one item in an .RData file.
|> save(a, file = "tmp.RData")
Remove one item from the Workspace. (Note: the argument for rm()
is not the string "a", but the variable name a. No quotation marks!)
|> rm(a)
> ls()
[1] "b" "c" "init"
>
Load what you previously saved.
|> load("tmp.RData")
> ls()
[1] "a" "b" "c" "init"
Note: you can save()
more than one item in an .RData file. When you then load()
the file, all of the objects it contains are loaded. You don't assign these objects - they are being restored.
We can use the output of ls()
as input to rm()
to remove all items from the workspace. (cf. ?rm
for details)
rm(list = ls())
> ls()
character(0)
>
The contents of the workspace is displayed in RStudio's Environment Pane (top-right). You can see a little "broom" icon at the top that you can click to remove all items from the workspace.
Self-evaluation
Notes
- ↑ For example
C:Documents\new
would be interpreted asC:Documents<linebreak>ew
because\n
is the linebreak character. Even though that's actually the path name on Windows, in an R command you have to writeC:Documents/new
- ↑ The Terminal app is in the Utilities sub-folder of your Applications folder.
- ↑ Projects that I create for teaching are configured to use this option by default, thus once the project is loaded, the Working Directory should already be correctly set.
- ↑ Actually, the first script that runs is Rprofile.site which is found on Linux and Windows machines in the
C:\Program Files\R\R-{version}\etc
directory. But not on Macs. - ↑ Operating systems commonly hide files whose name starts with a period "." from normal directory listings. All files however are displayed in RStudio's File pane. Nevertheless, it is useful to know how to view such files by default. On Macs, you can configure the Finder to show you such "hidden files" by default. To do this:
(i) Open a terminal window; (ii) Type:
$defaults write com.apple.Finder AppleShowAllFiles YES
(iii) Restart the Finder by accessing Force quit (under the Apple menu), selecting the Finder and clicking Relaunch. (iV) If you ever want to revert this, just do the same thing but set the default toNO
instead.
Further reading, links and resources
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2018-05-02
Version:
- 1.1.1
Version history:
- 1.1.1 Maintenance
- 1.1 Fixed display bug with "=" in template code; moved to GeSHi formatting.
- 1.0 Completed to first live version
- 0.1 Material collected from previous tutorial
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.