Difference between revisions of "RPR-Literate programming"

From "A B C"
Jump to navigation Jump to search
m
m
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Literate Programming with R
 
Literate Programming with R
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
((Draft) Literate programming principles; R Markdown; R Notebooks)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
(Draft) Literate programming principles; R Markdown; R Notebooks
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
 
 
{{Vspace}}
 
 
 
 
 
{{LIVE}}
 
 
 
{{Vspace}}
 
  
  
</div>
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
<div id="ABC-unit-framework">
+
<div style="font-size:118%;">
== Abstract ==
+
<b>Abstract:</b><br />
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "abstract" -->
 
 
Documentation of results using R markdown and R notebooks.
 
Documentation of results using R markdown and R notebooks.
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================ -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]]
 
 
 
{{Vspace}}
 
 
 
 
 
=== Objectives ===
 
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "objectives" -->
 
 
This unit will ...
 
This unit will ...
 
* ... introduce the philosophy behind "Literate Programming";
 
* ... introduce the philosophy behind "Literate Programming";
* ... teach the practice with an example that uses knitr in the RStuio environment;
+
* ... teach the practice with an example that uses knitr in the RStudio environment;
 
* ... point you to R notebooks;
 
* ... point you to R notebooks;
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... can produce your own "Literate Programs" with knitr or in an R notebook.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Outcomes ===
 
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "outcomes" -->
 
After working through this unit you ...
 
* ... can produce your own "Literate Programs" with knitr or in an R notebook.
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Deliverables ===
+
__TOC__
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "deliverables" -->
 
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
  
  
</div>
+
=== Evaluation ===
<div id="BIO">
+
<b>Evaluation: NA</b><br />
 +
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
== Contents ==
 
== Contents ==
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "contents" -->
 
 
{{WP|Literate programming|'''Literate programming'''}} is an idea that software is best described in a natural language, focussing on the logic of the program, i.e. the '''why''' of code, not the '''what'''. The goal is to ensure that model, code, and documentation become a single unit, and that all this information is stored in one and only one location. The product should be ''consistent'' between its described goals and its implementation, ''seamless'' in capturing the process from start (data input) to end (visualization, interpretation), and ''reversible'' (between analysis, design and implementation).
 
{{WP|Literate programming|'''Literate programming'''}} is an idea that software is best described in a natural language, focussing on the logic of the program, i.e. the '''why''' of code, not the '''what'''. The goal is to ensure that model, code, and documentation become a single unit, and that all this information is stored in one and only one location. The product should be ''consistent'' between its described goals and its implementation, ''seamless'' in capturing the process from start (data input) to end (visualization, interpretation), and ''reversible'' (between analysis, design and implementation).
  
Line 95: Line 86:
  
 
{{task|1=
 
{{task|1=
* Open an '''R Studio''' session.
+
* Open your normal '''R Studio''' session
* Select '''Session &rarr; Set Working Directory &rarr; Choose Directory...''' and choose some project directory.
 
** <small>Note that there is a bug in '''R Studio''' that will prevent the knitr interface from working correctly '''if''' your home directory contains an <code>.Rprofile</code> file that issues a <code>setwd()</code> command to a directory other than your project directory. If you run into an error when ''weaving'' your file, remove any  <code>setwd()</code> command you might find in such a profile.</small>
 
 
* Select '''File &rarr; New File &rarr; RMarkdown...'''. When you do this the first time, '''R Studio''' will ask you whether you want to install/update a number of required packages. Click '''Yes'''.
 
* Select '''File &rarr; New File &rarr; RMarkdown...'''. When you do this the first time, '''R Studio''' will ask you whether you want to install/update a number of required packages. Click '''Yes'''.
 
* Enter "Random Phobia" as the '''Title''' and your name as the '''Author''', select to create a '''Document''', and check '''HTML''' as the default output option.
 
* Enter "Random Phobia" as the '''Title''' and your name as the '''Author''', select to create a '''Document''', and check '''HTML''' as the default output option.
Line 104: Line 93:
 
* Choose '''Help &rarr; Cheatssheets &rarr; R Markdown Cheat Sheet''' and '''R Markdown Reference Guide''' to download two PDFs via your browser. Browse the contents to get an idea where you can clarify concepts as you go through this example.
 
* Choose '''Help &rarr; Cheatssheets &rarr; R Markdown Cheat Sheet''' and '''R Markdown Reference Guide''' to download two PDFs via your browser. Browse the contents to get an idea where you can clarify concepts as you go through this example.
  
Let's introduce our plan: copy/paste the following text into the document to replace the two sections with the headers <code>## R Markdown</code> and <code>## Including Plots</code>.
+
Let's introduce our plan:
 
 
<div class{{=}}"text-box">
 
<nowiki>##Phobias!</nowiki><br />
 
We all have some, but we could always use more. How to know them all? With this code we access the Oxford English Dictionary's Website - the most authoritative source on the English language, and scrape a list of phobias. A function is supplied to retrieve a random phobia, which we can subsequently ponder on - either to delight in the fact that we don't have that fear, or to add to our daily quota of anxieties &lt;small>(like our well-founded [fear of bad programming practice](<nowiki>http://xkcd.com/292/</nowiki>))&lt;/small>.
 
 
 
To load the list, we will "screenscrape" a list of Phobias from the [OED Phobia list](<nowiki>https://en.oxforddictionaries.com/explore/phobias-list</nowiki>). First, we load the XML library (or install it from CRAN, if we don't have it).
 
  
</div>
+
* We'll create a markdown document containing code and explanations
 +
* We'll knit it into an HTML document and examine it
 +
* Then we'll have a look at its structure, to learn how it works.
  
Note the following Markdown elements in the code:
+
; Creating the Document
 +
* First, delete everything except the header block from your new Markdown document.
 +
* There is a document in the data folder called <tt>data/RandomPhobiaPage.txt</tt>. Open that, copy its entire contents, and paste it under the header block.
 +
* Save the document as <tt>myScripts/RandomPhobias.Rmd</tt>
  
 +
; Knitting the Document to HTML
 +
* Right above the edit window, next to the search (looking glass) icon, there is an icon of a ball of wool with a knitting needle ... click on '''Knit'''.
 +
* The .HTML document will be created and opened.
  
::- a tag <code>&lt;small>Text...&lt;/small></code> to set text to a smaller font size <small>(we could have used <code>&lt;span style&#61;"font-size:85%;">Text...&lt;/span></code> instead because Markdown respects HTML elelements)</small>.
 
  
::- a Web link <code>[Text...](URL)</code>added to text
+
; Inspecting the Markdown source
  
* The filename in the script pane tab (<span style="colour:#AA0000;">Untitled1</span>) is red, because the file contains unsaved changes. Save the file in your project directory under the name<code>RandomPhobia</code>, note that the extension <code>.Rmd</code> is automatically added.
 
  
Time to add our first bit of '''R code'''
+
Note the following Markdown elements in the code:
 
 
*Copy and paste the following:
 
 
 
<source lang="R">
 
```{r loadLibrary}
 
if (! require(rvest, quietly=TRUE)) {
 
  install.packages("rvest")
 
  library(rvest)
 
}
 
```
 
</source>
 
 
 
 
 
This is what is know as a "code chunk". It is delimited by three backticks <code>```</code> and has directives and options for the chunk in the first line. It is labelled as '''R''' code, and note that after the <code>{r </code> we have added an (optional) label for the chunk. That is useful, because we can rapidly navigate between chunks (click on the navigation menu at the ''bottom'' of the script pane), and we can refer to the labels to execute chunks that are coded later in the document at an earlier stage. This is an important idea of literate programming: the flow of the document should not be determined by the requirements of the code, but by the logic of the narrative.  '''TLDR;''' label your chunks. It's useful.
 
 
 
Other options can be added after a comma, for example we can suppress printing of a chunk into the document altogether, if we think it is not relevant for the document, by adding the option <code>echo{{=}}FALSE</code><ref>For a complete list of chunk options, see [http://yihui.name/knitr/options/ the documentation by knitr's author, Xie Yihui].</ref>.
 
 
 
* To execute a particular chunk, simply place the cursor into the chunk and select '''Chunks &rarr; Run Current Chunk''' from the menu at the top of the script pane. Try this and check the console pane, the library should load without error.
 
  
* Let's add more text and code: copy and paste this into the document to add more comment and a second chunk.
+
*After the header-block, three "backticks" delimit a functional block (curly braces). In this case the block injects a local bit of .css to create striped tables later on. This is entirely optional and can be deleted in your own code if you have no need for it.
  
<div class{{=}}"text-box">
+
* Then there is another code block. This one is crucial. the first letter in the curly braces is <code>r</code> and this bit of code will be run as R code. <code>include=FALSE</code> ensures the code block is not actually shown in the output, but it sets a knitr:: option. This is what is know as a "code chunk". It is delimited by three backticks <code>```</code> and has directives and options for the chunk in the first line. It is labelled as '''R''' code, and note that after the <code>{r </code> we have added an (optional) label for the chunk. That is useful, because we can rapidly navigate between chunks (click on the navigation menu at the ''bottom'' of the script pane), and we can refer to the labels to execute chunks that are coded later in the document at an earlier stage. This is an important idea of literate programming: the flow of the document should not be determined by the requirements of the code, but by the logic of the narrative.  '''TLDR;''' label your chunks. It's useful.
The rvest package was designed for screenscraping and has functions to make our life very easy: it accesses an URL, looks for all HTML formatted tables, parses them with an XPATH expression and returns them as lists from which we can get data frames. There may be several tables in the source page, each one is returned as a list element. Since we know (hope?) the OED page contains only one table, we use only the first list element.<br />
 
<source lang="R">
 
```{r getPageData, cache=TRUE}
 
phobias <- read_html("https://en.oxforddictionaries.com/explore/phobias-list")
 
phobias <- html_nodes(phobias, xpath = '//*[@id="content"]/div[1]/div[2]/div/div/div/div/div[4]/table')
 
phobias <- html_table(phobias)[[1]]
 
```
 
</source>
 
  
</div>
+
:Other options can be added after a comma, for example we can suppress printing of a chunk into the document altogether, if we think it is not relevant for the document, by adding the option <code>echo{{=}}FALSE</code><ref>For a complete list of chunk options, see [http://yihui.name/knitr/options/ the documentation by knitr's author, Xie Yihui].</ref>.
  
Some things to note here:
+
* Then the text begins. It is writting in markdown syntax which is a simple way to annotate text. Note the conventions to create headers and links etc.
  
* Enclosing a piece of text in "backticks" <code>`Text...`</code> formats that text as "code" - typically in a fixed-width font.
+
* To execute a particular chunk, simply place the cursor into the chunk and select '''Chunks &rarr; Run Current Chunk''' from the menu at the top of the script pane.
* For this chunk we have set the option <code>cache</code> as <code>TRUE</code>. This is a very useful and well thought out mechanism that avoids recomputing code that takes a long time or should otherwise be limited. The results of a cached chunk of code are stored locally and retrieved when the file is ''weaved''. Only if anything within the chunk is changed (or <code>cache</code> is set to <code>FALSE</code>), is the chunk evaluated again. This prevents us from excessively pounding on the OED as we develop our script, which is a question of good manners in the context of this example, but can save a lot of time as our projects become large and the calculations become complex.
 
* <code>rvest</code> needs an XPATH expression to parse the document. Writing XPATH expressions can be a bit gnarly - the RBloggers article linked from the Further Reading section demonstrates a nifty way to get the expression from within a Chrome browser window. The interface has slightly changed since the article was written, but it's easy enough to figure out.
 
  
In order to make sure everything has worked, we'll print a sample from the table to our documentation file. RMarkdown provides a shorthand notation for tables - just like Wiki markup. I never use these. HTML tables are easy enough to format and remember and they provide '''many''' more options. In the example below, we customize the row background-color for alternating rows. That is something we could not do with simple markdown.
+
* More text describes how we screen-scrape tables from a Wikipedia page, and the code chunks run that code.
  
*Paste the following:
+
:Some things to note here:
  
<source lang="rsplus">
+
** Enclosing a piece of text in "backticks" <code>`Text...`</code> formats that text as "code" - typically in a fixed-width font.
**Table**: seven random phobias
+
** For this chunk we have set the option <code>cache</code> as <code>TRUE</code>. This is a very useful and well conceived mechanism that avoids recomputing code that takes a long time or should otherwise be limited. The results of a cached chunk of code are stored locally and retrieved when the file is ''weaved''. Only if anything within the chunk is changed (or <code>cache</code> is set to <code>FALSE</code>), is the chunk evaluated again. This prevents us from excessively pounding on Websites as we develop our script, which can save a lot of time as our projects become large and the calculations become complex.
```{r renderPhobiaTable, echo=FALSE, results='asis'}
 
cat("<table border=\"1\", width=\"50%\">\n")
 
cat("<tr style=\"background-color:#CCFFF0;\"><th>Phobia</th><th>Fear of...</th></tr>\n")
 
for (i in 1:7) {
 
  r <- randRow(phobias)
 
  if (i %% 2) {
 
    cat("<tr style=\"background-color:#F9F9F9;\">")
 
  }
 
  else {
 
    cat("<tr style=\"background-color:#EEFFF9;\">")
 
  }
 
  cat(paste("<td>", r[2], "</td><td>", r[1], "</td></tr>\n", sep=""))
 
}
 
cat("</table>\n")
 
```
 
</source>
 
  
This is now a mix of markup code and '''R''' code. Two important options in the chunk header:
+
* As the code continues, we have more of this mix of markup code and '''R''' code. Two important options in the chunk header:
 
* <code>echo{{=}}FALSE</code> prevents the contents of the chunk to be printed. We don't want this code in our output, we only want the result.
 
* <code>echo{{=}}FALSE</code> prevents the contents of the chunk to be printed. We don't want this code in our output, we only want the result.
 
* <code>results{{=}}'asis'</code> prevents the results from being marked up. The raw HTML is sent to the output document.
 
* <code>results{{=}}'asis'</code> prevents the results from being marked up. The raw HTML is sent to the output document.
  
But note the following: this piece of code calls a function <code>randRow(phobiaFrame)</code> that we have not defined yet. In an '''R''' script this would not work. But in a knitr document we can '''reference''' a chunk of code anywhere in (and outside) of the document and thus define our function before the <code>renderPhobiaTable</code> chunk is executed. This is important for literate programming, where we don't want to be constrained by the requirements of the code.
+
But note the following: the code chunk that creates the table calls a function <code>randRow(M)</code> that we have not defined yet. In an '''R''' script this would not work. But in a knitr document we can '''reference''' a chunk of code anywhere in (and outside) of the document and thus define our function before the <code>renderPhobiaTable</code> chunk is executed. This is important for literate programming, where we don't want to be constrained by the requirements of the code.
 
 
Therefore, paste the following '''before''' the previous chunk:
 
 
 
<source lang="rsplus">
 
```{r , ref.label="randRow", echo=FALSE}
 
```
 
</source>
 
 
 
This executes the code chunk with the label <code>randRow</code> (and - you guessed it - the function will be defined in that chunk) without giving any output.
 
 
 
To finish off, paste the following:
 
 
 
<div class{{=}}"text-box">
 
&lt;p>&amp;nbsp;<br />
 
&lt;p><br />
 
To pick a single random phobia from the list, we take a (pseudo) random sample of size 1 from the number of rows in the `phobiaFrame` object. Our function thus returns a random row from a matrix or dataframe, and it uses an optional argument: `seed`. This can either be Boolean `FALSE` (the default), or an integer that is used in R's `set.seed()` function.
 
 
 
<source lang="rsplus">
 
```randRow <- function(M, seed = FALSE) {
 
  # Return a random row from a dataframe M.
 
  if (seed) {
 
    set.seed(as.integer(seed))
 
  }
 
  return(M[sample(1:nrow(M), 1), ])
 
}
 
```
 
</source>
 
 
 
With this useful tool we can ponder on our favourite phobia of the day. For today, let it be **`r randRow(phobias, seed{{=}}1123581321)[2]`**, the fear of `r randRow(phobias, seed{{=}}1123581321)[1]`.
 
 
 
Reptiles! Awful.
 
</div>
 
 
 
This piece now contains the function definition for <code>randRow</code>, which it prints to the document after our comments. It also contains '''inline''' '''R''' code that is executed as the document is built.
 
 
 
* That should be all. You should be able to save the document and select (from the menu bar of the script pane) '''Knit &rarr; Knit to HTML''' to execute the code, build, and load a Webpage with the document we just wrote. If your code has errors in the chunks, they will be reported in the console.
 
  
<small>If all the pasting of bits and chunks was confusing, the final <code>.Rmd</code> file is [http://steipe.biochemistry.utoronto.ca/abc/assets/RandomPhobia.Rmd here].</small>
+
You have a working markdown page, and that should go a long way to help you write your own.
  
 
}}
 
}}
Line 237: Line 146:
 
==R Notebooks==
 
==R Notebooks==
  
R Notebooks take the concpet into the RStudio editor itself, rather than constructing a Webpage. On one hand, you become dependent on the RStudio editor, on the other hand, you directly edit and comment as you are developing. This is '''true''' "Literate Programming".
+
R Notebooks take the concept into the RStudio editor itself, rather than constructing a Webpage. On one hand, you become dependent on the RStudio editor, on the other hand, you directly edit and comment as you are developing. This is '''true''' "Literate Programming".
  
 
{{task|1=
 
{{task|1=
Line 245: Line 154:
  
 
{{Vspace}}
 
{{Vspace}}
 
 
{{Vspace}}
 
 
  
 
== Further reading, links and resources ==
 
== Further reading, links and resources ==
Line 255: Line 160:
 
{{Smallvspace}}
 
{{Smallvspace}}
 
<div class="reference-box">[https://blog.ouseful.info/2017/11/15/programming-meh-lets-teach-how-to-write-computational-essays-instead/ Thoughts on notebooks and literate programming] (Tony Hirst, via R Bloggers)</div>
 
<div class="reference-box">[https://blog.ouseful.info/2017/11/15/programming-meh-lets-teach-how-to-write-computational-essays-instead/ Thoughts on notebooks and literate programming] (Tony Hirst, via R Bloggers)</div>
 
 
 
{{Vspace}}
 
  
  
 
== Notes ==
 
== Notes ==
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "notes" -->
 
<!-- included from "./data/ABC-unit_components.txt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "./components/RPR-Literate_programming.components.txt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 317: Line 176:
 
:2017-09-17
 
:2017-09-17
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-10-24
+
:2020-09-25
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:1.0
+
:2.0
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*2.0 Update after complete rewrite of sample .Rmd - don't assemble the page piecewise
 +
*1.2 Change from require() to requireNamespace() and use &lt;package&gt;::&lt;function&gt;() idiom.
 +
*1.1 bugfix, comment on header tags in a table, add an eval question
 
*1.0 First live version
 
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
 
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_footer" -->
 
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_footer" -->
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 14:11, 26 September 2020

Literate Programming with R

((Draft) Literate programming principles; R Markdown; R Notebooks)


 


Abstract:

Documentation of results using R markdown and R notebooks.


Objectives:
This unit will ...

  • ... introduce the philosophy behind "Literate Programming";
  • ... teach the practice with an example that uses knitr in the RStudio environment;
  • ... point you to R notebooks;

Outcomes:
After working through this unit you ...

  • ... can produce your own "Literate Programs" with knitr or in an R notebook.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    Literate programming is an idea that software is best described in a natural language, focussing on the logic of the program, i.e. the why of code, not the what. The goal is to ensure that model, code, and documentation become a single unit, and that all this information is stored in one and only one location. The product should be consistent between its described goals and its implementation, seamless in capturing the process from start (data input) to end (visualization, interpretation), and reversible (between analysis, design and implementation).

    In literate programming, narrative and computer code are kept in the same file. This source document is typically written in Markdown or LaTeX syntax and includes the programming code as well as text annotations, tables, formulas etc. The supporting software can weave human-readable documentation from this, or tangle executable code. Literate programming with both Markdown and LaTex is supported by R Studio and this makes the R Studio interface a useful development environment for this paradigm. While it is easy to edit source files with a different editor and process files in base R after loading the Sweave() and Stangle() functions or the knitr package. In our context here we will use R Studio because it conveniently integrates the functionality we need.

    knitr is an R package for literate programming. It is integrated with R Studio.


     

    RMarkdown

    Markdown is an extremely simple and informal way of structuring documents that is useful if for some reason you feel html is too complicated. That's really all it does: format documents in a simple way so they can be displayed as Web pages. For Markdown documentation, see here.. The concept is quite similar to Wiki markup syntax, the syntax is (regrettably) different, and for a number of features there there are (regrettably) several different ways to achieve the same results.

    RMarkdown is an R package that is integrated with R Studio and allows integrating R code with Markdown documents. knitr can work with Markdown files, and this gives additional output options, such as PDF and MSWord documents.


    Let's give it a try: we'll write and document an R function that will find us a random phobia to ponder on.

    Task:

    • Open your normal R Studio session
    • Select File → New File → RMarkdown.... When you do this the first time, R Studio will ask you whether you want to install/update a number of required packages. Click Yes.
    • Enter "Random Phobia" as the Title and your name as the Author, select to create a Document, and check HTML as the default output option.

    R Studio will load some default text and markup into the script pane which we can edit.

    • Choose Help → Cheatssheets → R Markdown Cheat Sheet and R Markdown Reference Guide to download two PDFs via your browser. Browse the contents to get an idea where you can clarify concepts as you go through this example.

    Let's introduce our plan:

    • We'll create a markdown document containing code and explanations
    • We'll knit it into an HTML document and examine it
    • Then we'll have a look at its structure, to learn how it works.
    Creating the Document
    • First, delete everything except the header block from your new Markdown document.
    • There is a document in the data folder called data/RandomPhobiaPage.txt. Open that, copy its entire contents, and paste it under the header block.
    • Save the document as myScripts/RandomPhobias.Rmd
    Knitting the Document to HTML
    • Right above the edit window, next to the search (looking glass) icon, there is an icon of a ball of wool with a knitting needle ... click on Knit.
    • The .HTML document will be created and opened.


    Inspecting the Markdown source


    Note the following Markdown elements in the code:

    • After the header-block, three "backticks" delimit a functional block (curly braces). In this case the block injects a local bit of .css to create striped tables later on. This is entirely optional and can be deleted in your own code if you have no need for it.
    • Then there is another code block. This one is crucial. the first letter in the curly braces is r and this bit of code will be run as R code. include=FALSE ensures the code block is not actually shown in the output, but it sets a knitr:: option. This is what is know as a "code chunk". It is delimited by three backticks ``` and has directives and options for the chunk in the first line. It is labelled as R code, and note that after the {r we have added an (optional) label for the chunk. That is useful, because we can rapidly navigate between chunks (click on the navigation menu at the bottom of the script pane), and we can refer to the labels to execute chunks that are coded later in the document at an earlier stage. This is an important idea of literate programming: the flow of the document should not be determined by the requirements of the code, but by the logic of the narrative. TLDR; label your chunks. It's useful.
    Other options can be added after a comma, for example we can suppress printing of a chunk into the document altogether, if we think it is not relevant for the document, by adding the option echo=FALSE[1].
    • Then the text begins. It is writting in markdown syntax which is a simple way to annotate text. Note the conventions to create headers and links etc.
    • To execute a particular chunk, simply place the cursor into the chunk and select Chunks → Run Current Chunk from the menu at the top of the script pane.
    • More text describes how we screen-scrape tables from a Wikipedia page, and the code chunks run that code.
    Some things to note here:
      • Enclosing a piece of text in "backticks" `Text...` formats that text as "code" - typically in a fixed-width font.
      • For this chunk we have set the option cache as TRUE. This is a very useful and well conceived mechanism that avoids recomputing code that takes a long time or should otherwise be limited. The results of a cached chunk of code are stored locally and retrieved when the file is weaved. Only if anything within the chunk is changed (or cache is set to FALSE), is the chunk evaluated again. This prevents us from excessively pounding on Websites as we develop our script, which can save a lot of time as our projects become large and the calculations become complex.
    • As the code continues, we have more of this mix of markup code and R code. Two important options in the chunk header:
    • echo=FALSE prevents the contents of the chunk to be printed. We don't want this code in our output, we only want the result.
    • results='asis' prevents the results from being marked up. The raw HTML is sent to the output document.

    But note the following: the code chunk that creates the table calls a function randRow(M) that we have not defined yet. In an R script this would not work. But in a knitr document we can reference a chunk of code anywhere in (and outside) of the document and thus define our function before the renderPhobiaTable chunk is executed. This is important for literate programming, where we don't want to be constrained by the requirements of the code.

    You have a working markdown page, and that should go a long way to help you write your own.


     

    R Notebooks

    R Notebooks take the concept into the RStudio editor itself, rather than constructing a Webpage. On one hand, you become dependent on the RStudio editor, on the other hand, you directly edit and comment as you are developing. This is true "Literate Programming".

    Task:
    Read about the concept here and follow along with the exercise.


     

    Further reading, links and resources

     


    Notes

    1. For a complete list of chunk options, see the documentation by knitr's author, Xie Yihui.


     


    About ...
     
    Author:

    Boris Steipe (boris.steipe@utoronto.ca)

    Created:

    2017-09-17

    Modified:

    2020-09-25

    Version:

    2.0

    Version history:

    • 2.0 Update after complete rewrite of sample .Rmd - don't assemble the page piecewise
    • 1.2 Change from require() to requireNamespace() and use <package>::<function>() idiom.
    • 1.1 bugfix, comment on header tags in a table, add an eval question
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.