Difference between revisions of "R tutorial"

From "A B C"
Jump to navigation Jump to search
 
(272 intermediate revisions by the same user not shown)
Line 5: Line 5:
  
  
{{dev}}
+
This is a hub for a first introduction to '''R''', for students of one of my workshops or courses. I have subdivided the material into (somewhat) independent learning units that you can work through at your own pace, but in sequence.
  
 +
The units have ''Deliverables'' and ''Prerequisites'' - please ignore these sections, they are for use in a more formal course setting.
  
This is a tutorial introduction to '''R''' for users with no previous background in the platform or the language.  
+
You need to work through these units '''before''' you come to the workshop. There are two reasons:
  
 +
* (i) installation of software is very specific to your computer and we can't walk you through this in a room full of people. It would take so much time that we won't get anything else done.
 +
* (ii) When you are working with '''R''' - like with any computer language or natural language, the key is repetition, repetition, repetition. The more you prime yourself with this material, the more you will profit when we actually meet in class. I hope to see everyone radiant and elated, and not lost before we even begin. Let's do this!
  
__TOC__
 
  
  
 
 
==The environment==
 
In this section we discuss how to download and install the software, how to configure an '''R''' session and what work with the '''R''' environment includes.
 
  
===Installation===
 
  
# Navigate to http://probability.ca/cran/ <ref>This is the CRAN mirror site at the University of Toronto, any other mirror site will do. You may access a choice of mirror sites from the [http://r-project.org '''R'''-project homepage].</ref> and follow the link to your computer's operating system.
 
# Download a precompiled binary (or "build") of the R "framework" to your computer and follow the instructions for installing it. You don't need tools, or GUI versions for now, but do make sure that the program is the correct one for your '''version''' of your operating system.
 
# Launch '''R'''.
 
  
The program should open a window&ndash;the "R console"&ndash;and greet you with its ''input prompt'', awaiting your input:
+
==The Units==
>
 
  
 +
{{Smallvspace}}
  
The samples here sometimes copy input/output from the console, and sometimes shows the actual commands only. The <code>&lt;</code> character at the beginning of the line is always just '''R''''s ''input prompt''; It is shown here only to illustrate the interactive use of the program and you do not need to type it. If a line starts with <code>[1]</code> or similar, this is '''R''''s ''output'' on the console. Often, I type a <code>#</code>-character into a command line: this marks the following text as a comment which is not executed by '''R'''. In principle, commands can be copied by you and pasted into the console, or into a script - obviously, you don't need to copy the comments. In addition, I use [http://www.mediawiki.org/wiki/Extension:SyntaxHighlight_GeSHi syntax highlighting] on '''R'''-script, to color language keywords, numbers, strings, etc. different from other text. This improves readability but keep in mind that the colours you see on your computer will be different. One more thing about the console: use your keyboard's ''up-arrow'' keys to retrieve previous commands, then enter the line with ''left-arrow'' to edit it; hit ''enter'' to execute the modified line.
+
; Start with this:
 +
* [[FND-Biocomputing_setup| Set up your computer for biocomputing work]]
  
===User interface===
+
{{Smallvspace}}
  
R comes with a GUI<ref>Graphical User Interface</ref> to lay out common tasks. For example, there are a number of menu items, many of which are similar to other programs you will have worked with ("File", "Edit", "Format", "Window", "Help"  ...). All of these tasks can also be accessed through the command line. In general, GUIs are useful when you are not  sure what you want to do or how to go about it; the command line is much more powerful when you have more experience and know your way around in principle. '''R''' gives you both options.
+
; Install R and make sure everything works:
 +
* [[RPR-Installation| Installing R and RStudio]]
 +
* [[RPR-Setup| Setup]]
 +
* [[RPR-Console| The "Console"]]
 +
* [[RPR-Help| Getting Help]]
  
Let's begin with a glossary of some terms that '''R''' uses and how they relate to your work:
+
{{Smallvspace}}
  
;Help
+
; Explore how to get '''R''' to work with data:
:Help is available for all commands and for the R command line syntax. As well, help is available to find the names of commands when you are not sure of them.
+
* [[RPR-Syntax_basics| R Syntax]]
 +
* [[RPR-Objects-Vectors| Vectors]]
 +
* [[RPR-Objects-Data_frames| Data frames]]
 +
* [[RPR-Objects-Lists| Lists]]
  
<source lang="rc">
+
{{Smallvspace}}
> help(rnorm) # "help" is a function, arguments to a function are passed in parentheses "()"
 
> ?rnorm      # shorthand for the same thing
 
> ?binom      # what was the name of that again ... ?
 
No documentation for 'binom' in specified packages and libraries:
 
you could try '??binom'
 
> ??binom
 
> ?Binomial  # ... found it in the list of keywords
 
>
 
</source>
 
 
 
;Working directory
 
To locate a file i a computer, one has to specify the ''filename'' and the directory in which the file is stored; this is sometimes called the ''path'' of the file. The "working directory" for '''R''' is either the direcory i which the '''R'''-program has been installed, or some other directory, as initialized by a startup script. You can execute the command <code>getwd()</code> to list what the "Working Directory" is currently set to:
 
<source lang="rc">
 
> getwd()
 
[1] "/Users/steipe/R"
 
</source>
 
  
It is convenient to put all your '''R'''-input and output files into a project specific directory and then define this to be the "Working Directory". Use the <code>setwd()</code> command for this. <code>setwd()</code> requires a parameter in its parentheses: a string with the directory path. Strings in R are delimited with <code>"</code> or <code>'</code> characters. If the directory does not exist, an Error will be reported. Make sure you have created the directory. On Mac and Unix systems, the usual shorthand notation for relative paths can be used: <code>~</code> for the home directory, <code>.</code> for the current directory, <code>..</code> for the parent of the current directory.
+
; The one unit that will save your ***, over and over again:
 +
* [[RPR-Subsetting| Subsetting and Filtering]]
  
<source lang="rc">
+
{{Smallvspace}}
> setwd("~")  # my home directory
 
> getwd()
 
[1] "/Users/steipe"
 
> setwd("~/../chen")  # relative path: home directory, up one level, then down into chen's home directory
 
> getwd()
 
[1] "/Users/chen"
 
> setwd("/Users/steipe/abc/R_samples")  # absolute path: specify the entire string
 
> getwd()
 
[1] "Users/steipe/abc/R_samples"
 
</source>
 
  
* Create a directory for your sample files and use <code>setwd()</code> to set the working directory.
+
; First steps towards programming:
 +
* [[RPR-Subsetting| Subsetting and Filtering]]
 +
* [[RPR-Control_structures| Control structures]]
 +
* [[RPR-Functions| Functions]]
  
The ''Working Directory'' functions can also be accessed thorugh the Menu, under '''Misc'''.
+
{{Smallvspace}}
  
 +
; Maybe optional? Meh, just work through this anyway, as time permits. It'll be on the exam.
 +
* [[RPR-Subsetting| Subsetting and Filtering]]
 +
* [[RPR-Plotting| First Plots]]
 +
* [[RPR-Coding_style| Coding Style]]
  
;Workspace
 
During an '''R''' session, you might define a large number of variables, datastructures, load packages and scripts etc. All of this information is stored in the so-called "Workspace". When you quit '''R''' you have the option to save the Workspace; it will then be reloaded in your next session.
 
<source lang="rc">
 
> ls()  # list the current workspace contents: initally it is empty
 
character(0)
 
> a <- 1; b <-2; eps <- 0.0001  #Initialize three variables (multiple commands on one line can be separated with a semicolon";")
 
> ls()  # list the current workspace contents
 
[1] "a"  "b"  "eps"
 
> rm(a)  # remove one item. Note: the parameter is not a string, but a variable name
 
> ls()
 
[1] "b"  "eps"
 
> rm(list = ls())  # we can use the output of ls() as input to rm() to remove everything ... cf. ?rm for details
 
> ls()  # once again empty
 
character(0)
 
</source>
 
  
===Packages===
+
{{Vspace}}
  
Standard packages included, data available
 
 
===Files===
 
 
... Loading and running scripts
 
 
 
&nbsp;
 
 
==Simple commands==
 
Including functions
 
 
 
 
&nbsp;
 
==Scalar datatypes==
 
Definition, change, operations with, functions to work on...
 
 
&nbsp;
 
==Vectors==
 
 
 
 
&nbsp;
 
==Matrices, tables, frames==
 
 
Subsetting,mselecting and filtering
 
 
 
 
&nbsp;
 
==Data manipulations==
 
Transformation
 
Search
 
 
 
&nbsp;
 
==Writing functions==
 
 
 
&nbsp;
 
==Installing new functions==
 
 
 
&nbsp;
 
==Numeric output==
 
 
 
&nbsp;
 
==Graphic output==
 
 
 
 
&nbsp;
 
 
==Notes==
 
==Notes==
 
<references />
 
<references />
  
 +
 +
{{Vspace}}
  
  
&nbsp;
+
----
==Further reading and resources==
 
<!-- {{#pmid:21627854}} -->
 
<!-- {{WWW|WWW_UniProt}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
  
 +
{{Vspace}}
  
&nbsp;
 
 
[[Category:Applied_Bioinformatics]]
 
[[Category:Applied_Bioinformatics]]
 +
[[Category:R]]
 
</div>
 
</div>

Latest revision as of 15:52, 8 May 2018

R tutorial


This is a hub for a first introduction to R, for students of one of my workshops or courses. I have subdivided the material into (somewhat) independent learning units that you can work through at your own pace, but in sequence.

The units have Deliverables and Prerequisites - please ignore these sections, they are for use in a more formal course setting.

You need to work through these units before you come to the workshop. There are two reasons:

  • (i) installation of software is very specific to your computer and we can't walk you through this in a room full of people. It would take so much time that we won't get anything else done.
  • (ii) When you are working with R - like with any computer language or natural language, the key is repetition, repetition, repetition. The more you prime yourself with this material, the more you will profit when we actually meet in class. I hope to see everyone radiant and elated, and not lost before we even begin. Let's do this!




The Units

 
Start with this


 
Install R and make sure everything works


 
Explore how to get R to work with data


 
The one unit that will save your ***, over and over again


 
First steps towards programming


 
Maybe optional? Meh, just work through this anyway, as time permits. It'll be on the exam.


 

Notes