Difference between revisions of "RPR-Installation"

From "A B C"
Jump to navigation Jump to search
m
m
 
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Installing R and RStudio
 
Installing R and RStudio
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Notation; installing R and RStudio; packages; first experiments.)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Notation; installing R and RStudio; project directory; packages; first experiments.
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
__TOC__
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 
+
<div style="font-size:118%;">
{{Vspace}}
+
<b>Abstract:</b><br />
 
 
 
 
{{LIVE}}
 
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="ABC-unit-framework">
 
== Abstract ==
 
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/RPR-Installation.components.wtxt", section: "abstract" -->
+
This unit works through the installation of '''R''' and RStudio and introduces R's packages of additional functions.
This unit works through the installation of R and RStudio, discusses files and directories on your computer, introduces R's packages of additional functions.
 
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================ -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "../components/RPR-Installation.components.wtxt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[ABC-Insights]]
 
*[[FND-Biocomputing_setup]]
 
 
 
{{Vspace}}
 
 
 
 
 
=== Objectives ===
 
<!-- included from "../components/RPR-Installation.components.wtxt", section: "objectives" -->
 
 
This unit will ...
 
This unit will ...
* ... inform you about file- and folder names and paths;
+
<ul>
* ... guide you through first steps for installing R and R Studio on your own computer; and
+
<li>guide you through first steps for installing R and R Studio on your own computer; and</li>
* ... introduce the concept of "packages" to extend R's functionality;
+
<li>introduce the concept of "packages" to extend R's functionality;</li>
 +
</ul>
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
<ul>
 +
<li>have a working installation of R and RStudio and know how to start RStudio;</li>
 +
<li>can find and install packages.</li>
 +
</ul>
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<ul>
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
</ul>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
<ul>
 +
<li>[[ABC-Insights]]</li>
 +
<li>[[FND-Biocomputing_setup]]</li>
 +
</ul>
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Outcomes ===
 
<!-- included from "../components/RPR-Installation.components.wtxt", section: "outcomes" -->
 
After working thorugh this unit you ...
 
* ... have created a project directory for this course or workshop on your computer;
 
* ... have a working installation of R and RStudio and know how to start RStudio;
 
* ... can search for and install packages.
 
  
 +
{{Smallvspace}}
  
{{Vspace}}
 
  
 
+
__TOC__
=== Deliverables ===
 
<!-- included from "../components/RPR-Installation.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 80: Line 75:
  
 
=== Evaluation ===
 
=== Evaluation ===
<!-- included from "../components/RPR-Installation.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
 
<b>Evaluation: NA</b><br />
 
<b>Evaluation: NA</b><br />
:This unit is not evaluated for course marks.
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="BIO">
 
 
== Contents ==
 
== Contents ==
<!-- included from "../components/RPR-Installation.components.wtxt", section: "contents" -->
 
 
==R==
 
==R==
  
Line 104: Line 90:
  
 
<div class="alert">
 
<div class="alert">
Note: you can't learn a programming language in a single day.
+
Be realistic. One can't learn a programming language in a single day.
  
Work through this material unit by unit, but when you are done, you need constant repetition to bring it into active memory. And make sure you understand every step. Taking shortcuts and/or cramming everything in a single, desperate effort is a waste of your time.
+
Work through this material unit by unit; constant repetition will bring the principles into active memory. Make sure you understand every step. Taking shortcuts and/or cramming everything into a single, desperate effort will not get you far.
 
</div>
 
</div>
  
Line 123: Line 109:
 
'''Bold emphasis''' and <u>underlining</u> are to mark words as particularly important.
 
'''Bold emphasis''' and <u>underlining</u> are to mark words as particularly important.
  
<span class="right">Examples of the <u>right</u> way to do something are highlighted green.</span>
+
<span class="right">Sometimes I highlight examples of the <u>right</u> way to do something in green.</span>
  
<span class="wrong">Examples of the <u>wrong</u> way to do something are highlighted red.</span>
+
<span class="wrong">... and examples of the <u>wrong</u> way to do something may be highlighted red.</span>
  
  
Line 145: Line 131:
 
</div>
 
</div>
  
'''"Metasyntactic variables"''': When I use notation  like <code>&lt;Year&gt;</code> in instructions, you type the year, the whole year and nothing but the year (e.g the four digits '''2017'''). You '''never''' type the angle brackets! I use the angle brackets only to indicate that you should note type '''Year''' literally, but <u>substitute</u> the correct value. You might encounter this notation as <code>&lt;path&gt;</code>, <code>&lt;filename&gt;</code>, <code>&lt;firstname lastname&gt;</code> and similar. To repeat: if I specify
+
'''"Metasyntactic variables"''': When I use notation  like <code>&lt;Year&gt;</code> in instructions, you type the year, the whole year and nothing but the year (e.g the four digits '''2020'''). You '''never''' type the angle brackets! I use the angle brackets only to indicate that you should not type '''Year''' literally, but <u>substitute</u> the correct value. You might encounter this notation as <code>&lt;path&gt;</code>, <code>&lt;filename&gt;</code>, <code>&lt;firstname lastname&gt;</code> and similar. To repeat: if I specify
<your name>
+
<code><your name></code>
... and your name is Elcid Barrett, '''You''' type
+
... and your name is <code>Elcid Barrett</code>, '''You''' type
  Elcid Barrett
+
  <span class="right">&nbsp;<tt>Elcid Barret</tt>&nbsp;</span>
 
... and not &nbsp;<span class="wrong">&nbsp;<tt>your name</tt>&nbsp;</span>&nbsp; or &nbsp;<span class="wrong">&nbsp;<tt>&lt;Elcid Barret&gt;</tt>&nbsp;</span>&nbsp; or similar. <small>(Oh the troubles I've seen ...)</small>
 
... and not &nbsp;<span class="wrong">&nbsp;<tt>your name</tt>&nbsp;</span>&nbsp; or &nbsp;<span class="wrong">&nbsp;<tt>&lt;Elcid Barret&gt;</tt>&nbsp;</span>&nbsp; or similar. <small>(Oh the troubles I've seen ...)</small>
  
  
 
The sample code on this page sometimes copies text from the console, and sometimes shows the actual commands only. The <code>&gt;</code> character at the beginning of the line is always just '''R''''s ''input prompt'', it tells you that you can type something now - you never actually type <code>&gt;</code> at the beginning of a line. If you read:
 
The sample code on this page sometimes copies text from the console, and sometimes shows the actual commands only. The <code>&gt;</code> character at the beginning of the line is always just '''R''''s ''input prompt'', it tells you that you can type something now - you never actually type <code>&gt;</code> at the beginning of a line. If you read:
> getwd()
+
 
 +
<code> > getwd()</code>
 +
 
 
... you need to type:
 
... you need to type:
getwd()
 
  
 +
<code>getwd()</code>
  
If a line starts with <code>[1]</code> or similar, this is '''R''''s '''output''' on the console.<ref><code>[1]</code> means: the following is the first (often only) element of a vector.</ref> The <code>#</code> character marks the following text as a comment which is not executed by '''R'''. These are lines that you '''do not type'''. They are program output, or comments, not commands. If the code says
+
 
 +
If a line starts with <code>[1]</code> or similar, this is '''R''''s '''output''' on the console.<ref><code>[1]</code> means: the following is the first element of a vector - and this is often the only one.</ref> The <code>#</code> character marks the following text as a comment which is not executed by '''R'''. These are lines that you '''do not type'''. They are program output, or comments, not commands.
  
 
;Characters
 
;Characters
 
:Different characters mean different things for computers, and it is important to call them by their right name.
 
:Different characters mean different things for computers, and it is important to call them by their right name.
 +
 +
* <code> / </code>&nbsp;◁&nbsp;this is a '''forward-slash'''. It leans forward in the reading direction.
 +
* <code> \ </code>&nbsp;◁&nbsp;this is a '''backslash'''. It leans backward in the reading direction.
 
* <code> ( ) </code>&nbsp;◁&nbsp;these are '''parentheses'''.
 
* <code> ( ) </code>&nbsp;◁&nbsp;these are '''parentheses'''.
 
* <code> [ ] </code>&nbsp;◁&nbsp;these are (square) '''brackets'''.
 
* <code> [ ] </code>&nbsp;◁&nbsp;these are (square) '''brackets'''.
Line 168: Line 160:
 
* <code> &nbsp;" </code>&nbsp;◁&nbsp;this, and '''only''' this is a quotation mark or double quote. All of these are not: <span class="wrong"><tt> “”„«» </tt></span>. They will break your code. Especially the first two are often automatically inserted by MSWord and hard to distinguish.<ref>Never, ever edit code in MS Word. Use '''R''' or '''RStudio'''. Actually, don't use notepad or TextEdit either.</ref>
 
* <code> &nbsp;" </code>&nbsp;◁&nbsp;this, and '''only''' this is a quotation mark or double quote. All of these are not: <span class="wrong"><tt> “”„«» </tt></span>. They will break your code. Especially the first two are often automatically inserted by MSWord and hard to distinguish.<ref>Never, ever edit code in MS Word. Use '''R''' or '''RStudio'''. Actually, don't use notepad or TextEdit either.</ref>
 
* <code> &nbsp;' </code>&nbsp;◁&nbsp;this, and '''only''' this is a single quote. All of these are not: <span class="wrong"><tt> ‘’‚‹› </tt></span>. They will break your code. Especially the first two are often automatically inserted by MSWord and hard to distinguish.
 
* <code> &nbsp;' </code>&nbsp;◁&nbsp;this, and '''only''' this is a single quote. All of these are not: <span class="wrong"><tt> ‘’‚‹› </tt></span>. They will break your code. Especially the first two are often automatically inserted by MSWord and hard to distinguish.
 +
 +
{{Smallvspace}}
  
 
MSWord is not useful as a code editor.
 
MSWord is not useful as a code editor.
  
{{Vspace}}
+
{{Smallvspace}}
  
 
==The environment==
 
==The environment==
Line 177: Line 171:
  
 
In this section we discuss how to download and install the software, how to configure an '''R''' session and how to work in the '''R''' environment.
 
In this section we discuss how to download and install the software, how to configure an '''R''' session and how to work in the '''R''' environment.
 
{{Vspace}}
 
 
===Files, directories and paths===
 
 
 
{{task|1=
 
Create a folder (directory) on your computer in which to keep materials for this course (or workshop, as the case may be). Put it into the right place, and give it the right name:
 
 
:'''The right place''' is directly in the <code>Documents</code> folder of your account.
 
 
:'''The right name''' is simply the <code>&lt;Coursecode&gt;</code> e.g. for a CBW workshop in 2016, you call the folder <code>CBW</code>, for a BCH441 course, the name should be <code>BCH441</code>, or you could use my generic "ABC" (A Bioinformatics Course).
 
 
Do not use spaces, hyphens, or any other special characters in your filename - we have encountered various problems with such filenames in the past.<ref>'''After''' the course, you can rename / move the directory to whatever, wherever you want, but during the course, I need your files in a predictable location to be able to troubleshoot problems.</ref>.
 
 
{{Vspace}}
 
 
I will call this the '''course directory'''. (I use the words "folder" and "directory" synonymously and completely interchangeably.)
 
 
}}
 
 
In my experience, it is better to organize file hierarchies <b>wide, not deep</b>. This means I aim to put more things in one folder rather than create elaborate directory structures. I need to look for stuff a lot, and looking more-or-less in the same folder keeps my files more visible. Files that are tucked away in sub-directories are harder to find.
 
 
A '''filename''' is a label that identifies a file. Often filenames have two parts: the actual name, and an extension. To specify a file on the computer's command line, or in '''R''', you need to specify its full name <u>including the extension</u>. Now, the problem is that you can switch off that extensions are displayed, in Windows; I'm afraid this is actually done by default. Then all hell breaks lose when you are trying to do "real" work. Files can't be found, or worse, can be inadvertently overwritten. '''Never allow your operating system to hide file extensions from you.''' You must be able to see the full name<ref>RStudio is actually very helpful in this regard, since it shows you the </ref>.
 
 
A '''path''' is the complete specification of where a file is located in the directory tree of your computer. Paths are simply directories strung together into a long string, separated by a forward slash "/" (on Mac or Unix) or a backslash "\" on Windows. Take note! When writing Windows paths in '''R''',  you have to use the "wrong" forward slash to specify the path. '''R''' will translate Unix-style paths into Windows-style paths automatically - but the backslash would be interpreted as an "escape" character that gives the following character a special meaning.
 
 
;Folder name and path examples
 
*<span class="right"> <tt>/Users/Pierette/Documents/BCB420</tt></span>&nbsp;&nbsp;◁&nbsp;Looking good on a Mac.
 
*<span class="right"> <tt>C:\Users\Pulcinella\Documents\CBW</tt></span>&nbsp;&nbsp;◁&nbsp;Looking good on a Windows computer.
 
*<span class="right"> <tt>"C:/Users/Pulcinella/Documents/CBW"</tt></span>&nbsp;&nbsp;◁&nbsp;Looking good inside '''R''' on a Windows computer (note the quotation marks!).
 
 
 
*<span class="wrong"> <tt>C:\Users\Pantalone\Documents\BCH1441 (2017)</tt></span>&nbsp;&nbsp;◁&nbsp;Wrong. No special characters please.
 
*<span class="wrong"> <tt>/Users/Brighella/Documents/UofT Stuffz/Courses/more/Comp Sys biol. course</tt></span>&nbsp;&nbsp;◁&nbsp;Wrong. Please read instructions more carefully.
 
*<span class="wrong"> <tt>C:\Users\Tartaglia\Documents\KUWTK\&lt;Coursecode&gt;</tt></span>&nbsp;&nbsp;◁&nbsp;I can't even ...
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 218: Line 176:
 
===Install '''R'''===
 
===Install '''R'''===
  
 +
{{Smallvspace}}
  
 
{{task|
 
{{task|
# Navigate to [https://cran.r-project.org/ '''CRAN''' (the Comprehensive R Archive Network)]<ref>You can also use one of the mirror sites, if CRAN is down - for example the [http://probability.ca/cran/ mirror site at the University of Toronto]. A choice of mirror sites is listed on the [https://r-project.org '''R'''-project homepage].</ref> and follow the link to '''Download R''' for your computer's operating system.
+
1. Navigate to [https://cran.r-project.org/ '''CRAN''' (the Comprehensive R Archive Network)]<ref>You can also use one of the mirror sites, if CRAN is down - for example the [http://probability.ca/cran/ mirror site at the University of Toronto]. A choice of mirror sites is listed on the [https://r-project.org '''R'''-project homepage].</ref> and follow the link to '''Download R''' for your computer's operating system.<br>
# Download a precompiled binary (or "build") of the R "framework" to your computer and follow the instructions for installing it. Make sure that the program is the correct one for your '''version''' of your operating system.
+
2. Download a precompiled binary (or "build") of the R "framework" to your computer and follow the instructions for installing it. Make sure that the program is the correct one for your '''version''' of your operating system.<br>
# Launch '''R'''.
+
3. Launch '''R'''.
 
}}
 
}}
  
The program should open a window&ndash;this window is called the "R console"&ndash;and greet you with its ''input prompt'', awaiting your input:
+
The program should open a window &ndash; this window is called the "R console" &ndash; and greet you with its ''input prompt'', awaiting your input:
 
  >
 
  >
  
Line 260: Line 219:
 
* A consistent interface across all supported platforms. (Base '''R''' GUIs are not all the same for e.g. Mac OS X and Windows.)
 
* A consistent interface across all supported platforms. (Base '''R''' GUIs are not all the same for e.g. Mac OS X and Windows.)
 
* Code autocompletion in the script editor. (Depending on your point of view this can be a help or an annoyance. I used to hate it. After using it for a while I find it useful.)
 
* Code autocompletion in the script editor. (Depending on your point of view this can be a help or an annoyance. I used to hate it. After using it for a while I find it useful.)
* "Function signaturtes" (a list of named parameters) displayed when you hover over a function name.
+
* "Function signatures" (a list of named parameters) displayed when you hover over a function name.
 
* The ability to set breakpoints for debugging in the script editor.
 
* The ability to set breakpoints for debugging in the script editor.
* Support for [http://yihui.name/knitr/ knitr], [http://www.statistik.lmu.de/~leisch/Sweave/ Sweave], [http://rmarkdown.rstudio.com/ rmarkdown]... <small>(This supports "literate programming" and is actually a big advance in software development)</small>
+
* Support for [http://yihui.name/knitr/ knitr], and [http://rmarkdown.rstudio.com/ rmarkdown]; also support for [http://rmarkdown.rstudio.com/r_notebooks.html R notebooks] ... <small>(This supports {{WP|Literate_programming|"'''literate programming'''"}} and is actually a big advance in software development)</small>
 
* Support for [http://rmarkdown.rstudio.com/r_notebooks.html '''R''' notebooks].
 
* Support for [http://rmarkdown.rstudio.com/r_notebooks.html '''R''' notebooks].
  
Line 269: Line 228:
 
* There are sometimes (rarely) situations where '''R''' functions do not behave in exactly the same way in '''RStudio'''.
 
* There are sometimes (rarely) situations where '''R''' functions do not behave in exactly the same way in '''RStudio'''.
 
* The supported '''R''' version is not always immediately the most recent release.
 
* The supported '''R''' version is not always immediately the most recent release.
 +
 +
{{Smallvspace}}
  
 
{{task|
 
{{task|
Line 284: Line 245:
  
 
===Packages===
 
===Packages===
 +
 +
{{Smallvspace}}
  
 
'''R''' has many powerful functions built in, but one of it's greatest features is that it is easily extensible. Extensions have been written by legions of scientists for many years, most commonly in the '''R''' programming language itself, and made available through [http://cran.r-project.org/ '''CRAN'''&ndash;The Comprehensive R Archive Network] or through the [http://www.bioconductor.org '''Bioconductor project'''].
 
'''R''' has many powerful functions built in, but one of it's greatest features is that it is easily extensible. Extensions have been written by legions of scientists for many years, most commonly in the '''R''' programming language itself, and made available through [http://cran.r-project.org/ '''CRAN'''&ndash;The Comprehensive R Archive Network] or through the [http://www.bioconductor.org '''Bioconductor project'''].
  
A package is a collection of code, documentation and (often) sample data. To use packages, you need to install them (once), and add them to your current session (for every new session). You can get an overview of installed and loaded packages by opening the '''Package Manager''' window from the '''Packages & Data''' Menu item. It gives a list of available packages you currently have ''installed'', and identifies those that have been ''loaded'' at startup, or interactively.
+
A package is a collection of code, documentation and (often) sample data. To use packages, you need to '''install''' the package (once). You can then use all of the package's functions by prefixing them with the package name and a double colon (eg. <code>package::function()</code>); that's the preferred way. Or you can load all of the package's functions with a <code>library(package)</code> command, and then use the functions without a prefix. That's less typing, but it's also less explicit and you may end up constantly wondering where exactly a particular function came from. In the teaching code for this course, I use the <code>package::function()</code> idiom wherever possible, since it is more explicit.
 +
 
 +
To repeat:</br>
 +
<ul>
 +
<li><code>install.packages("<package-name")</code> downloads the package files from '''CRAN''' and places them in the appropriate location on your computer.</li>
 +
<li><code>packagename::function()</code> is usually the preferred idiom to use functionality that is providedby a package.</li>
 +
<li>For some packages, or when you use a particular package '''a lot''', you can "load" the package with <code> library(packagename)</code> (note: '''no''' quotation marks in this case.) Then you can use the functions simply by typing <code>function()</code>.</li>
 +
</ul>
 +
 
 +
You can get an overview of installed and loaded packages by opening the '''Package Manager''' window from the '''Packages & Data''' Menu item. It gives a list of available packages you currently have ''installed'', and identifies those that have been ''loaded'' at startup, or interactively. But note, a package does not have to be loaded to be used.
  
 
{{Vspace}}
 
{{Vspace}}
Line 294: Line 266:
 
* Navigate to http://cran.r-project.org/web/packages/ and read the page.
 
* Navigate to http://cran.r-project.org/web/packages/ and read the page.
 
* Navigate to http://cran.r-project.org/web/views/ (the '''curated''' CRAN task-views).
 
* Navigate to http://cran.r-project.org/web/views/ (the '''curated''' CRAN task-views).
* Follow the link to [http://cran.r-project.org/web/views/Genetics.html '''Genetics'''] and read the synopsis of available packages. The library {{c|sequinr}} sounds useful, but check first whether it is already installed.
+
* Follow the link to [http://cran.r-project.org/web/views/Genetics.html '''Genetics'''] and read the synopsis of available packages. The library {{c|sequinr}} sounds useful, but check first whether it has been installed.
{{console
+
 
|{{c|library()}} opens a window of installed packages in the library; {{c|search()}} shows which one are currently loaded.
+
<small>
|> library()
+
{{c|library()}} opens a window that lists the packages that are installed on your computer; {{c|search()}} shows which one are currently loaded.
 +
</small>
 +
<pre>
 +
> library()
 
> search()
 
> search()
 
  [1] ".GlobalEnv"        "tools:RGUI"        "package:stats"    "package:graphics"
 
  [1] ".GlobalEnv"        "tools:RGUI"        "package:stats"    "package:graphics"
 
  [5] "package:grDevices" "package:utils"    "package:datasets"  "package:methods"
 
  [5] "package:grDevices" "package:utils"    "package:datasets"  "package:methods"
 
  [9] "Autoloads"        "package:base"
 
  [9] "Autoloads"        "package:base"
}}
+
</pre>
 
 
  
 
* In the '''Packages''' tab of the lower-right pane in RStudio, confirm that  {{c|seqinr}} is not yet installed.
 
* In the '''Packages''' tab of the lower-right pane in RStudio, confirm that  {{c|seqinr}} is not yet installed.
 
* Follow the link to [http://cran.r-project.org/web/packages/seqinr/index.html seqinr] to see what standard information is available with a package. Then follow the link to the [http://cran.r-project.org/web/packages/seqinr/seqinr.pdf '''Reference manual'''] to access the documentation pdf. This is also sometimes referred to as a "vignette" and contains usage hints and sample code.
 
* Follow the link to [http://cran.r-project.org/web/packages/seqinr/index.html seqinr] to see what standard information is available with a package. Then follow the link to the [http://cran.r-project.org/web/packages/seqinr/seqinr.pdf '''Reference manual'''] to access the documentation pdf. This is also sometimes referred to as a "vignette" and contains usage hints and sample code.
{{console
 
| Read the help for  {{c|vignette}}. Note that there is a command to extract '''R''' sample code from a vignette, to experiment with it.
 
|> ?vignette
 
}}
 
  
{{console
+
<small>
| Install {{c|seqinr}} from the closest CRAN mirror and load it for this session. Explore some functions.
+
Read the help for  {{c|vignette}}. Note that there is a command to extract '''R''' sample code from a vignette, to experiment with it.
|> ??install
+
</small>
 +
<pre>
 +
> ?vignette
 +
</pre>
 +
 
 +
<small>
 +
Install {{c|seqinr}} from the closest CRAN mirror and load it for this session. Explore some functions.
 +
</small>
 +
<pre>
 +
> ??install
 
> ?install.packages
 
> ?install.packages
 
> install.packages("seqinr")  # Note: the parameter is a quoted string!
 
> install.packages("seqinr")  # Note: the parameter is a quoted string!
Line 333: Line 312:
 
/var/folders/mx/ld0hdst54jjf11hpcjh8snfr0000gn/T//Rtmpsy5GMx/downloaded_packages
 
/var/folders/mx/ld0hdst54jjf11hpcjh8snfr0000gn/T//Rtmpsy5GMx/downloaded_packages
  
> library(seqinr)    # This parameter is an R object (an installed package). No quotes here...
+
> library(help="seqinr")
> library(help{{=}}"seqinr")
 
 
> ls("package:seqinr")
 
> ls("package:seqinr")
 
   [1] "a"                      "aaa"                    "AAstat"
 
   [1] "a"                      "aaa"                    "AAstat"
Line 341: Line 319:
 
[205] "where.is.this.acc"      "words"                  "words.pos"
 
[205] "where.is.this.acc"      "words"                  "words.pos"
 
[208] "write.fasta"            "zscore"
 
[208] "write.fasta"            "zscore"
> ?a
+
> ?seqinr::a
> a("Tyr")
+
> seqinr::a("Tyr")
 
[1] "Y"
 
[1] "Y"
> choosebank()
+
> seqinr::words(3, c("A", "G", "C", "U"))
  [1] "genbank"       "embl"         "emblwgs"       "swissprot"     "ensembl"
+
  [1] "AAA" "AAG" "AAC" "AAU" "AGA" "AGG" "AGC" "AGU" "ACA" "ACG" "ACC" "ACU" "AUA" "AUG"
    [...]
+
[15] "AUC" "AUU" "GAA" "GAG" "GAC" "GAU" "GGA" "GGG" "GGC" "GGU" "GCA" "GCG" "GCC" "GCU"
[31] "refseqViruses"
+
[29] "GUA" "GUG" "GUC" "GUU" "CAA" "CAG" "CAC" "CAU" "CGA" "CGG" "CGC" "CGU" "CCA" "CCG"
 
+
[43] "CCC" "CCU" "CUA" "CUG" "CUC" "CUU" "UAA" "UAG" "UAC" "UAU" "UGA" "UGG" "UGC" "UGU"
}}
+
[57] "UCA" "UCG" "UCC" "UCU" "UUA" "UUG" "UUC" "UUU"
 +
</pre>
  
 
}}
 
}}
Line 366: Line 345:
 
::<span style="color:#EE0000;"><code>package ‘XYZ’ is not available (for R version 3.2.2)</code></span>
 
::<span style="color:#EE0000;"><code>package ‘XYZ’ is not available (for R version 3.2.2)</code></span>
 
:This can mean several things:
 
:This can mean several things:
*The package is not available on CRAN. Try Bioconductor instead or Google for the name to find it.
+
* The package is not available on CRAN. Try Bioconductor instead or Google for the name to find it.
*The package requires a newer version of '''R''' than the one you have. Upgrade, or see if a legacy version exists.
+
* The package requires a newer version of '''R''' than the one you have. Upgrade, or see if a legacy version exists.
*A comprehensive set of reasons and their resolution is [http://stackoverflow.com/questions/25721884/how-should-i-deal-with-package-xxx-is-not-available-warning '''here''' on stackoverflow].
+
* A comprehensive set of reasons and their resolution is [http://stackoverflow.com/questions/25721884/how-should-i-deal-with-package-xxx-is-not-available-warning '''here''' on stackoverflow].
  
  
Line 378: Line 357:
 
:<span style="color:#EE0000;"><code>incorrect values of 'indent' and 'width'</code></span>
 
:<span style="color:#EE0000;"><code>incorrect values of 'indent' and 'width'</code></span>
  
Anecdotally this was due to a previous installation problem with a mixup of 32-bit and 64-bit '''R''' versions, although another student told us that the problem simply went away when trying the command again. Whatever: Make sure you have the right ''''R''' version installed for your operating system. Uninstall and reinstall when in doubt. Conflicting libraries '''can''' be the source of strange misbehaviour.
+
Anecdotally this was due to a previous installation problem with a mixup of 32-bit and 64-bit '''R''' versions, although another student told us that the problem simply went away when trying the command again. Whatever: Make sure you have the right '''R''' version installed for your operating system. Uninstall and reinstall when in doubt. Conflicting libraries '''can''' be the source of strange misbehaviour.
  
  
Line 388: Line 367:
 
{{task|
 
{{task|
  
* The fact that these methods work, shows that the package has been downloaded, installed, the library has been loaded and its functions and data are now available in the current environment. Just like many other packages, {{c|seqinr}} comes with a number of data files. Try:
+
* The fact that these methods work, shows that the package has been downloaded, installed, its functions are now available with the package name prefix and any datasets it contains can be loaded. Just like many other packages, {{c|seqinr}} comes with a number of datafiles. Try:
  
<source lang="rsplus">
+
<pre>
 
?data
 
?data
data(package="seqinr")   # list the available data
+
data(package="seqinr")           # list the available data
data(aaindex)           # load ''aaindex''
+
data(aaindex, package="seqinr")   # load ''aaindex''
?aaindex                 # what is this?
+
?aaindex                         # what is this?
aaindex$FASG890101       # two of the indices ...
+
aaindex$FASG890101               # two of the indices ...
 
aaindex$PONJ960101
 
aaindex$PONJ960101
  
# Lets use the data: plot amino acid codes by hydrophobicity and volume
+
# Lets use the data: plot amino acid single-letter codes by hydrophobicity
 +
# and volume. The values come from the dataset. Copy and paste the commands,
 +
# we'll discuss them in detail later.
  
 
plot(aaindex$FASG890101$I,
 
plot(aaindex$FASG890101$I,
Line 407: Line 388:
 
     labels=a(names(aaindex$FASG890101$I)))
 
     labels=a(names(aaindex$FASG890101$I)))
  
</source>
+
</pre>
 +
 
 +
 
 +
* Now, just for fun, let's use functions from the seqinr package functions to download a sequence and calculate some statistics (however, not to digress too far, without further explanation at this point). Copy the code below and paste it into the '''R'''-console.
  
 +
<pre>
 +
seqinr::choosebank("swissprot")
 +
mySeq <- seqinr::query("mySeq", "N=MBP1_YEAST")
 +
mbp1 <- seqinr::getSequence(mySeq)
 +
seqinr::closebank()
 +
x <- seqinr::AAstat(mbp1[[1]])
 +
barplot(sort(x$Compo), cex.names = 0.6)
 +
</pre>
  
* Now, just for fun, let's use these functions to download a sequence and calculate some statistics (however, not to digress too far, without further explanation at this point). Copy the code below and paste it into the '''R'''-console
+
We could have "loaded" the package with {{c|library()}}, and then used the functions without prefix. Less typing, but also less explicit.
  
<source lang="rsplus">
+
<pre>
 +
library(seqinr)
 
choosebank("swissprot")
 
choosebank("swissprot")
 
mySeq <- query("mySeq", "N=MBP1_YEAST")
 
mySeq <- query("mySeq", "N=MBP1_YEAST")
Line 418: Line 411:
 
closebank()
 
closebank()
 
x <- AAstat(mbp1[[1]])
 
x <- AAstat(mbp1[[1]])
barplot(sort(x$Compo))
+
barplot(sort(x$Compo), cex.names = 0.6)
</source>
+
</pre>
 +
 
 +
In general we will be using the idiom '''with''' the package prefix throughout the course.
 +
 
 
}}
 
}}
  
The function {{c|require()}} is similar to {{c|library()}}, but it does not produce an error when it fails because the package has not been installed. It simply returns {{c|TRUE}} if successful or {{c|FALSE}} if not. If the library has already been loaded, it does nothing. Therefore I usually use the following code paradigm in my '''R''' scripts to avoid downloading the package every time I need to run a script<ref>The parameter <code>quietly = TRUE</code> turns off feedback from the function when a script is <code>source()</code>'d. It does not work in interactive scripts. Thus, when you execute this piece of code interactively, you will see a <code>Warning:</code> ... which you can ignore.</ref>:
 
  
<source lang="rsplus">
+
The function {{c|requireNamespace()}} is useful because it does not produce an error when a package has not been installed. It simply returns {{c|TRUE}} if successful or {{c|FALSE}} if not. Therefore one can use the following code idiom in '''R''' scripts to avoid downloading the package every time the script is called.
if (!require(seqinr, quietly = TRUE)) {
 
    install.packages("seqinr")
 
    library(seqinr)
 
}
 
</source>
 
  
Note that {{c|install.packages()}} takes a (quoted) string as its argument, but {{c|library()}} takes a variable name (without quotes). New users usually get this wrong :-)
+
<pre>
 
+
if (! requireNamespace("seqinr", quietly=TRUE)) {
One of the challenges of working with '''R''' is the overabundance of options. To find the right package that contains a particular function you might be looking for could be tricky, but there is a package to help you do that. Try this:
+
  install.packages("seqinr")
<source lang="rsplus">
 
if (!require(sos, quietly=TRUE)) {
 
    install.packages("sos")
 
    library(sos)
 
 
}
 
}
 +
# You can get package information with the following commands:
 +
library(help = seqinr)      # basic information
 +
browseVignettes("seqinr")    # available vignettes
 +
data(package = "seqinr")    # available datasets
  
findFn("moving average")
+
</pre>
 
 
  
</source>
+
'''Note''' that {{c|install.packages()}} takes a (quoted) string as its argument, but {{c|library()}} takes a variable name (without quotes). New users usually get this wrong :-)
  
 +
'''Note''' that the '''Bioconductor''' project has its own installation system, the {{c|Biocmanager::install()}} function. It is explained [https://bioconductor.org/install/ '''here'''].
  
A good way to find packages in CRAN is also a keyword search on the '''Metacran''' site. Try this link:
+
'''Note''', just to mention it at this point: to install packages that are not on CRAN or Bioconductor, you need the [https://www.rstudio.com/products/rpackages/devtools/ '''devtools'''] package.
:http://www.r-pkg.org/search.html?q=regex
 
 
 
 
 
 
 
Note that the '''Bioconductor''' project has its own installation system, the {{c|bioclite()}} function. It is explained [http://www.bioconductor.org/install/ '''here'''].
 
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
====Finding packages====
  
{{Vspace}}
+
One of the challenges of working with '''R''' is the overabundance of options. CRAN has over 10,000 packages and Bioconductor has over 1,300 more. How can you find ones that are useful to your work? There's actually a package to help you do that, the [https://cran.r-project.org/web/packages/sos/ '''sos''' package] on CRAN. Try this:
  
 +
<pre>
 +
if (! requireNamespace(sos, quietly=TRUE)) {
 +
    install.packages("sos")
 +
}
 +
library(help = sos)      # basic information
 +
browseVignettes("sos")    # available vignettes
  
== Further reading, links and resources ==
+
sos::findFn("moving average")
 
 
<div class="reference-box">{{WP|R (programming language)|Wikipedia article}} on the R statistics environment and programming language</div>
 
<div class="reference-box">[http://www.r-project.org/ The '''R project''' homepage]</div>
 
<div class="reference-box">[http://cran.r-project.org/ '''CRAN'''&ndash;The Comprehensive R Archive Network]</div>
 
<div class="reference-box">[http://www.bioconductor.org/ The '''Bioconductor project''' homepage]</div>
 
<div class="reference-box">[http://www.r-bloggers.com/ '''R''' bloggers]</div>
 
<div class="reference-box">[http://data-mining-tutorials.blogspot.ca/2011/08/data-mining-with-r-rattle-package.html Tutorial on the '''R Rattle''' GUI package]</div>
 
<div class="reference-box">[http://www.rstudio.com/ide/ The '''R Studio''' IDE]</div>
 
 
 
  
{{Vspace}}
+
</pre>
  
  
== Notes ==
+
;Or ...
<!-- included from "../components/RPR-Installation.components.wtxt", section: "notes" -->
+
* Read a CRAN [https://cran.r-project.org/web/views/ '''Task View'''] for your area of interest ...
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
+
* ... or the [https://www.bioconductor.org/packages/devel/BiocViews.html '''Bioconductor Views'''];
<references />
+
* Search on "Metacran" ([ "regex" example here"]) ...
 +
* ... or "MRAN" ([https://mran.microsoft.com/packages/?search=regex "regex" example here"], not that the results are not identical);
 +
* ... and, as always, [http://lmgtfy.com/?q=r+regex+package Google].
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
 
== Self-evaluation ==
 
== Self-evaluation ==
<!-- included from "../components/RPR-Installation.components.wtxt", section: "self-evaluation" -->
 
 
<!--
 
<!--
 
=== Question 1===
 
=== Question 1===
Line 504: Line 485:
  
 
What is the purpose of this code?
 
What is the purpose of this code?
<source lang="rsplus">
+
<pre>
if (!require(seqinr, quietly = TRUE)) {
+
if (! requireNamespace("seqinr", quietly = TRUE)) {
 
     install.packages("seqinr")
 
     install.packages("seqinr")
    library(seqinr)
 
 
}
 
}
</source>
+
</pre>
  
  
 
Why not just use ...
 
Why not just use ...
<source lang="rsplus">
+
<pre>
 
   install.packages("seqinr")
 
   install.packages("seqinr")
  library(seqinr)
+
</pre>
</source>
+
... in your script instead?
... instead?
 
  
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
Answer ...
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
This code paradigm is useful in scripts, to ensure a package is installed before we try to load it as a library. If we would simply use <code>install.packages("seqinr")</code>, the package would be downloaded from CRAN everytime the script is run. That would make our script slow, and require available internet access for the script to run.
+
This code idiom is useful in scripts, to ensure a package is installed before we try to use its functions. If we would simply use <code>install.packages("seqinr")</code>, the package would be downloaded from CRAN every time the script is run. That would make our script slow, and require available internet access for the script to run.
  
In the code above, the package is downloaded <b>only</b> when <code>require()</code> returns <code>FALSE</code>, (and therefor <code>!require()</code> is <code>TRUE</code>), which presumably means the package has not yet ever been downloaded.
+
In the code above, the package is downloaded <b>only</b> when <code>requireNamespace()</code> returns <code>FALSE</code>, which presumably means the package has not yet been downloaded.
  
 
</div>
 
</div>
Line 530: Line 509:
  
 
   {{Vspace}}
 
   {{Vspace}}
 +
== Further reading, links and resources ==
 +
 +
<div class="two-col"><!-- BEGIN two col block -->
 +
 +
<div class="resource">
 +
<div class="name">[https://en.wikipedia.org/wiki/R_(programming_language) '''R''' on Wikipedia]</div>
 +
<div class="content">Wikipedia article on the R statistics environment and programming language.</div>
 +
</div>
 +
 +
 +
<div class="resource">
 +
<div class="name">[http://www.r-project.org/ The '''R project''']</div>
 +
<div class="content">Homepage of R for development, resources and, most importantly, download of code and documentation.</div>
 +
</div>
 +
 +
 +
<div class="resource">
 +
<div class="name">[http://www.rstudio.com/ide/ The '''R Studio''' IDE]</div>
 +
<div class="content">The IDE (Integrated Development Environment) that is the ''de facto'' standard for R programming and the development of code, projects, packages, and documentation.</div>
 +
</div>
 +
 +
 +
<div class="resource">
 +
<div class="name">[http://cran.r-project.org/ '''CRAN''']</div>
 +
<div class="content">The Comprehensive R Archive Network</div>
 +
</div>
 +
 +
 +
<div class="resource">
 +
<div class="name">[http://www.bioconductor.org/ The '''Bioconductor project''' homepage]</div>
 +
<div class="content">BioConductor is a bit like CRAN for bioinformatics and computational biology. The most important computational advances in our field are available from here. There is a special focus on high-throughput analysis, and a specific mental model of how data, code and workflows all come together.</div>
 +
</div>
 +
 +
 +
<div class="resource">
 +
<div class="name">[http://www.r-bloggers.com/ '''R''' bloggers]</div>
 +
<div class="content">A digest of new blog-posts on R - from the introductory to the highly advanced. Sent out once every day or two. Really worthwhile subscription.</div>
 +
</div>
  
{{Vspace}}
 
  
 +
<div class="resource">
 +
<div class="name">[http://blog.revolutionanalytics.com/2017/01/cran-10000.html Package finding strategies]</div>
 +
<div class="content">(Revolutions Analytics Blog)</div>
 +
</div>
  
  
{{Vspace}}
+
<div class="resource">
 +
<div class="name">[https://www.datacamp.com/community/tutorials/r-packages-guide Intro to R packages]</div>
 +
<div class="content">(at DataCamp)</div>
 +
</div>
  
  
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
+
<div class="resource">
 +
<div class="name">[https://stackoverflow.blog/2017/10/10/impressive-growth-r/ "The Impressive Growth of R"]</div>
 +
<div class="content">(Stackoverflow Data Analytics Team Blog)</div>
 +
</div>
  
----
 
  
{{Vspace}}
+
<div class="resource">
 +
<div class="name">[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005871 '''Ten simple rules''' for biologists learning to program]</div>
 +
<div class="content">Carey and Papin advise novice biologist programmers how to begin. Much of this paper resonates well with our Introduction to R learning units. Good context for a beginning, to get a sense of where we are going with this.</div>
 +
</div>
  
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
+
</div><!-- END two col block -->
  
----
+
== Notes ==
 +
<references />
  
 
{{Vspace}}
 
{{Vspace}}
 +
  
 
<div class="about">
 
<div class="about">
Line 558: Line 588:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-17
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
*0.1 First stub
+
*1.1.1 Introduce two-column layout
 +
*1.1 Change from require() to requireNamespace() and use &lt;package&gt;::&lt;function&gt;() idiom.
 +
*1.02 Maintenance
 +
*1.0.1 Removed mention of Sweave - obsolete, and broken link. Added mention of "literate programming".
 +
*1.0 Completed to first live version
 +
*0.1 Material collected from previous tutorial
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 09:28, 25 September 2020

Installing R and RStudio

(Notation; installing R and RStudio; packages; first experiments.)


 


Abstract:

This unit works through the installation of R and RStudio and introduces R's packages of additional functions.


Objectives:
This unit will ...

  • guide you through first steps for installing R and R Studio on your own computer; and
  • introduce the concept of "packages" to extend R's functionality;

Outcomes:
After working through this unit you ...

  • have a working installation of R and RStudio and know how to start RStudio;
  • can find and install packages.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:


 



 



 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.

Contents

R

 

Introduction

The R statistics environment and programming language is an exceptionally well engineered, free (as in free speech) and free (as in free beer) platform for data manipulation and analysis. The number of functions that are included by default is large, there is a very large number of additional, community-generated analysis modules that can be simply imported from dedicated sites (e.g. the Bioconductor project for molecular biology data), or via the CRAN network, and whatever function is not available can be easily programmed. The ability to filter and manipulate data to prepare it for analysis is an absolute requirement in research-centric fields such as ours, where the strategies for analysis are constantly shifting and prepackaged solutions become obsolete almost faster than they can be developed. Besides numerical analysis, R has very powerful and flexible functions for plotting graphical output.


 

Be realistic. One can't learn a programming language in a single day.

Work through this material unit by unit; constant repetition will bring the principles into active memory. Make sure you understand every step. Taking shortcuts and/or cramming everything into a single, desperate effort will not get you far.


 

Before you begin: Notation and Formatting

In this tutorial, I use specific notation and formatting to mean different things.

If you see footnotes[1], click on the number to read more.

This is normal text for explanations. It is written in a proportionally spaced font.

Code formatting is for code examples, file- and function names, directory paths etc. Code is written in a monospaced font[2].

Bold emphasis and underlining are to mark words as particularly important.

Sometimes I highlight examples of the right way to do something in green.

... and examples of the wrong way to do something may be highlighted red.


Task:
Tasks and exercises are described in boxes with a blue background. You have to do them, they are not optional. If you have problems, you must contact your instructor, or discuss the issue on the mailing list. Don't simply continue. All material builds on previous material, and evaluation is cumulative.

What could possibly go wrong? ... Click to expand.


These sections have information about issues I encounter more frequently. They are required reading when you need to troubleshoot problems but also give background information that may be useful to avoid problems in the first place.

Click to collapse.

"Metasyntactic variables": When I use notation like <Year> in instructions, you type the year, the whole year and nothing but the year (e.g the four digits 2020). You never type the angle brackets! I use the angle brackets only to indicate that you should not type Year literally, but substitute the correct value. You might encounter this notation as <path>, <filename>, <firstname lastname> and similar. To repeat: if I specify <your name> ... and your name is Elcid Barrett, You type

 Elcid Barret 

... and not   your name   or   <Elcid Barret>   or similar. (Oh the troubles I've seen ...)


The sample code on this page sometimes copies text from the console, and sometimes shows the actual commands only. The > character at the beginning of the line is always just R's input prompt, it tells you that you can type something now - you never actually type > at the beginning of a line. If you read:

> getwd()

... you need to type:

getwd()


If a line starts with [1] or similar, this is R's output on the console.[3] The # character marks the following text as a comment which is not executed by R. These are lines that you do not type. They are program output, or comments, not commands.

Characters
Different characters mean different things for computers, and it is important to call them by their right name.
  • /  ◁ this is a forward-slash. It leans forward in the reading direction.
  • \  ◁ this is a backslash. It leans backward in the reading direction.
  • ( )  ◁ these are parentheses.
  • [ ]  ◁ these are (square) brackets.
  • < >  ◁ these are angle brackets.
  • { }  ◁ these are (curly) braces.
  •  "  ◁ this, and only this is a quotation mark or double quote. All of these are not: “”„«» . They will break your code. Especially the first two are often automatically inserted by MSWord and hard to distinguish.[4]
  •  '  ◁ this, and only this is a single quote. All of these are not: ‘’‚‹› . They will break your code. Especially the first two are often automatically inserted by MSWord and hard to distinguish.


 

MSWord is not useful as a code editor.


 

The environment

In this section we discuss how to download and install the software, how to configure an R session and how to work in the R environment.


 

Install R

 

Task:

1. Navigate to CRAN (the Comprehensive R Archive Network)[5] and follow the link to Download R for your computer's operating system.
2. Download a precompiled binary (or "build") of the R "framework" to your computer and follow the instructions for installing it. Make sure that the program is the correct one for your version of your operating system.
3. Launch R.

The program should open a window – this window is called the "R console" – and greet you with its input prompt, awaiting your input:

>

Task:

Once you see that R is running correctly, you may quit the program for now.


What could possibly go wrong?...


I can't install R.
Make sure that the version you downloaded is the right one for your operating system. Also make sure that you have the necessary permissions on your computer to install new software.


 

Install RStudio

RStudio is a free IDE (Integrated Development Environment) for R. RStudio is a wrapper[6] for R and as far as basic R is concerned, all the underlying functions are the same, only the user interface is different (and there are a few additional functions that are very useful e.g. for managing projects).

Here is a small list of differences between R and RStudio.

pros (some pretty significant ones actually)
  • Integrated version control.
  • Support for "projects" that package scripts and other assets.
  • Syntax-aware code colouring.
  • A consistent interface across all supported platforms. (Base R GUIs are not all the same for e.g. Mac OS X and Windows.)
  • Code autocompletion in the script editor. (Depending on your point of view this can be a help or an annoyance. I used to hate it. After using it for a while I find it useful.)
  • "Function signatures" (a list of named parameters) displayed when you hover over a function name.
  • The ability to set breakpoints for debugging in the script editor.
  • Support for knitr, and rmarkdown; also support for R notebooks ... (This supports "literate programming" and is actually a big advance in software development)
  • Support for R notebooks.
cons (all minor actually)
  • The tiled interface uses more desktop space than the windows of the R GUI.
  • There are sometimes (rarely) situations where R functions do not behave in exactly the same way in RStudio.
  • The supported R version is not always immediately the most recent release.


 

Task:

  • Navigate to the RStudio Website.
  • Find the right version of the RStudio Desktop installer for your computer, download it and install the software.
  • Open RStudio.
  • Focus on the bottom left pane of the window, this is the "console" pane.
  • Type getwd().

This prints out the path of the current working directory. Make a (mental) note where this is. We usually always need to change this "default directory" to a project directory.



 

Packages

 

R has many powerful functions built in, but one of it's greatest features is that it is easily extensible. Extensions have been written by legions of scientists for many years, most commonly in the R programming language itself, and made available through CRAN–The Comprehensive R Archive Network or through the Bioconductor project.

A package is a collection of code, documentation and (often) sample data. To use packages, you need to install the package (once). You can then use all of the package's functions by prefixing them with the package name and a double colon (eg. package::function()); that's the preferred way. Or you can load all of the package's functions with a library(package) command, and then use the functions without a prefix. That's less typing, but it's also less explicit and you may end up constantly wondering where exactly a particular function came from. In the teaching code for this course, I use the package::function() idiom wherever possible, since it is more explicit.

To repeat:

  • install.packages("<package-name") downloads the package files from CRAN and places them in the appropriate location on your computer.
  • packagename::function() is usually the preferred idiom to use functionality that is providedby a package.
  • For some packages, or when you use a particular package a lot, you can "load" the package with library(packagename) (note: no quotation marks in this case.) Then you can use the functions simply by typing function().

You can get an overview of installed and loaded packages by opening the Package Manager window from the Packages & Data Menu item. It gives a list of available packages you currently have installed, and identifies those that have been loaded at startup, or interactively. But note, a package does not have to be loaded to be used.


 

Task:

library() opens a window that lists the packages that are installed on your computer; search() shows which one are currently loaded.

> library()
> search()
 [1] ".GlobalEnv"        "tools:RGUI"        "package:stats"     "package:graphics"
 [5] "package:grDevices" "package:utils"     "package:datasets"  "package:methods"
 [9] "Autoloads"         "package:base"
 
  • In the Packages tab of the lower-right pane in RStudio, confirm that seqinr is not yet installed.
  • Follow the link to seqinr to see what standard information is available with a package. Then follow the link to the Reference manual to access the documentation pdf. This is also sometimes referred to as a "vignette" and contains usage hints and sample code.

Read the help for vignette. Note that there is a command to extract R sample code from a vignette, to experiment with it.

> ?vignette

Install seqinr from the closest CRAN mirror and load it for this session. Explore some functions.

> ??install
> ?install.packages
> install.packages("seqinr")   # Note: the parameter is a quoted string!
also installing the dependency ‘ade4’

trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.2/ade4_1.7-2.tgz'
Content type 'application/x-gzip' length 3365088 bytes (3.2 MB)
==================================================
downloaded 3.2 MB

trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.2/seqinr_3.1-3.tgz'
Content type 'application/x-gzip' length 2462893 bytes (2.3 MB)
==================================================
downloaded 2.3 MB


The downloaded binary packages are in
	/var/folders/mx/ld0hdst54jjf11hpcjh8snfr0000gn/T//Rtmpsy5GMx/downloaded_packages

> library(help="seqinr")
> ls("package:seqinr")
  [1] "a"                       "aaa"                     "AAstat"
  [4] "acnucclose"              "acnucopen"               "al2bp"
     [...]
[205] "where.is.this.acc"       "words"                   "words.pos"
[208] "write.fasta"             "zscore"
> ?seqinr::a
> seqinr::a("Tyr")
[1] "Y"
> seqinr::words(3, c("A", "G", "C", "U"))
 [1] "AAA" "AAG" "AAC" "AAU" "AGA" "AGG" "AGC" "AGU" "ACA" "ACG" "ACC" "ACU" "AUA" "AUG"
[15] "AUC" "AUU" "GAA" "GAG" "GAC" "GAU" "GGA" "GGG" "GGC" "GGU" "GCA" "GCG" "GCC" "GCU"
[29] "GUA" "GUG" "GUC" "GUU" "CAA" "CAG" "CAC" "CAU" "CGA" "CGG" "CGC" "CGU" "CCA" "CCG"
[43] "CCC" "CCU" "CUA" "CUG" "CUC" "CUU" "UAA" "UAG" "UAC" "UAU" "UGA" "UGG" "UGC" "UGU"
[57] "UCA" "UCG" "UCC" "UCU" "UUA" "UUG" "UUC" "UUU"



What could possibly go wrong?...


The installation fails.
You might see an error message such as this:
Warning message:
package ‘XYZ’ is not available (for R version 3.2.2)
This can mean several things:
  • The package is not available on CRAN. Try Bioconductor instead or Google for the name to find it.
  • The package requires a newer version of R than the one you have. Upgrade, or see if a legacy version exists.
  • A comprehensive set of reasons and their resolution is here on stackoverflow.




We have seen the following on Windows systems when typing library(help="seqinr")
Error in formatDL(nm, txt, indent = max(nchar(nm, "w")) + 3) :
incorrect values of 'indent' and 'width'

Anecdotally this was due to a previous installation problem with a mixup of 32-bit and 64-bit R versions, although another student told us that the problem simply went away when trying the command again. Whatever: Make sure you have the right R version installed for your operating system. Uninstall and reinstall when in doubt. Conflicting libraries can be the source of strange misbehaviour.



 

Task:


  • The fact that these methods work, shows that the package has been downloaded, installed, its functions are now available with the package name prefix and any datasets it contains can be loaded. Just like many other packages, seqinr comes with a number of datafiles. Try:
?data
data(package="seqinr")            # list the available data
data(aaindex, package="seqinr")   # load ''aaindex''
?aaindex                          # what is this?
aaindex$FASG890101                # two of the indices ...
aaindex$PONJ960101

# Lets use the data: plot amino acid single-letter codes by hydrophobicity
# and volume. The values come from the dataset. Copy and paste the commands,
# we'll discuss them in detail later.

plot(aaindex$FASG890101$I,
     aaindex$PONJ960101$I,
     xlab="hydrophobicity", ylab="volume", type="n")
text(aaindex$FASG890101$I,
     aaindex$PONJ960101$I,
     labels=a(names(aaindex$FASG890101$I)))


  • Now, just for fun, let's use functions from the seqinr package functions to download a sequence and calculate some statistics (however, not to digress too far, without further explanation at this point). Copy the code below and paste it into the R-console.
seqinr::choosebank("swissprot")
mySeq <- seqinr::query("mySeq", "N=MBP1_YEAST")
mbp1 <- seqinr::getSequence(mySeq)
seqinr::closebank()
x <- seqinr::AAstat(mbp1[[1]])
barplot(sort(x$Compo), cex.names = 0.6)

We could have "loaded" the package with library(), and then used the functions without prefix. Less typing, but also less explicit.

library(seqinr)
choosebank("swissprot")
mySeq <- query("mySeq", "N=MBP1_YEAST")
mbp1 <- getSequence(mySeq)
closebank()
x <- AAstat(mbp1[[1]])
barplot(sort(x$Compo), cex.names = 0.6)

In general we will be using the idiom with the package prefix throughout the course.



The function requireNamespace() is useful because it does not produce an error when a package has not been installed. It simply returns TRUE if successful or FALSE if not. Therefore one can use the following code idiom in R scripts to avoid downloading the package every time the script is called.

if (! requireNamespace("seqinr", quietly=TRUE)) {
  install.packages("seqinr")
}
# You can get package information with the following commands:
library(help = seqinr)       # basic information
browseVignettes("seqinr")    # available vignettes
data(package = "seqinr")     # available datasets

Note that install.packages() takes a (quoted) string as its argument, but library() takes a variable name (without quotes). New users usually get this wrong :-)

Note that the Bioconductor project has its own installation system, the Biocmanager::install() function. It is explained here.

Note, just to mention it at this point: to install packages that are not on CRAN or Bioconductor, you need the devtools package.


 

Finding packages

One of the challenges of working with R is the overabundance of options. CRAN has over 10,000 packages and Bioconductor has over 1,300 more. How can you find ones that are useful to your work? There's actually a package to help you do that, the sos package on CRAN. Try this:

if (! requireNamespace(sos, quietly=TRUE)) {
    install.packages("sos")
}
library(help = sos)       # basic information
browseVignettes("sos")    # available vignettes

sos::findFn("moving average")


Or ...


 

Self-evaluation

Question 1

What is the purpose of this code?

if (! requireNamespace("seqinr", quietly = TRUE)) {
    install.packages("seqinr")
}


Why not just use ...

  install.packages("seqinr")

... in your script instead?

Answer ...

This code idiom is useful in scripts, to ensure a package is installed before we try to use its functions. If we would simply use install.packages("seqinr"), the package would be downloaded from CRAN every time the script is run. That would make our script slow, and require available internet access for the script to run.

In the code above, the package is downloaded only when requireNamespace() returns FALSE, which presumably means the package has not yet been downloaded.


 

Further reading, links and resources

Wikipedia article on the R statistics environment and programming language.


Homepage of R for development, resources and, most importantly, download of code and documentation.


The IDE (Integrated Development Environment) that is the de facto standard for R programming and the development of code, projects, packages, and documentation.


The Comprehensive R Archive Network


BioConductor is a bit like CRAN for bioinformatics and computational biology. The most important computational advances in our field are available from here. There is a special focus on high-throughput analysis, and a specific mental model of how data, code and workflows all come together.


A digest of new blog-posts on R - from the introductory to the highly advanced. Sent out once every day or two. Really worthwhile subscription.


(Revolutions Analytics Blog)


(at DataCamp)


(Stackoverflow Data Analytics Team Blog)


Carey and Papin advise novice biologist programmers how to begin. Much of this paper resonates well with our Introduction to R learning units. Good context for a beginning, to get a sense of where we are going with this.

Notes

  1. ... and when you click on the arrow to the left, this will take you back to where you came from.
  2. Proportional fonts are for elegant document layout. Monospaced fonts are needed to properly align characters in columns. For code and sequences, we alway use monospaced font. Code editors always use monospaced fonts, but since I need to eMail a lot of code and sequences, I have also set my eMail client to use monospaced font by default (Courier, or Monaco). I highly encourage you to do the same.
  3. [1] means: the following is the first element of a vector - and this is often the only one.
  4. Never, ever edit code in MS Word. Use R or RStudio. Actually, don't use notepad or TextEdit either.
  5. You can also use one of the mirror sites, if CRAN is down - for example the mirror site at the University of Toronto. A choice of mirror sites is listed on the R-project homepage.
  6. A "wrapper" program uses another program's functionality in its own context. RStudio is a wrapper for R since it does not duplicate R's functions, it runs the actual R in the background.


 


About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-09-17

Version:

1.1.1

Version history:

  • 1.1.1 Introduce two-column layout
  • 1.1 Change from require() to requireNamespace() and use <package>::<function>() idiom.
  • 1.02 Maintenance
  • 1.0.1 Removed mention of Sweave - obsolete, and broken link. Added mention of "literate programming".
  • 1.0 Completed to first live version
  • 0.1 Material collected from previous tutorial

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.