Difference between revisions of "R knitr"

From "A B C"
Jump to navigation Jump to search
Line 41: Line 41:
 
We all have some that we might not even be aware of. How to know them all? With this code we access the Oxford English Dictionary's Website - the most authoritative source on the English language, and scrape a list of phobias. A function is supplied to retrieve a random phobia, which we can subsequently ponder on - either to delight in the fact that we don't have that fear, or to add to our daily quota of anxieties &lt;small>(like our well-founded [fear of bad programming practice](<nowiki>http://xkcd.com/292/</nowiki>))&lt;/small>.
 
We all have some that we might not even be aware of. How to know them all? With this code we access the Oxford English Dictionary's Website - the most authoritative source on the English language, and scrape a list of phobias. A function is supplied to retrieve a random phobia, which we can subsequently ponder on - either to delight in the fact that we don't have that fear, or to add to our daily quota of anxieties &lt;small>(like our well-founded [fear of bad programming practice](<nowiki>http://xkcd.com/292/</nowiki>))&lt;/small>.
  
To load the list, we will "screenscrape" a list of Phobias from the [OED Phobia list](http://www.oxforddictionaries.com/words/phobias-list). First, we load the XML library (or install it from CRAN, if we don't have it).
+
To load the list, we will "screenscrape" a list of Phobias from the [OED Phobia list](<nowiki>http://www.oxforddictionaries.com/words/phobias-list</nowiki>). First, we load the XML library (or install it from CRAN, if we don't have it).
  
 
</div>
 
</div>

Revision as of 03:42, 17 January 2015

knitr


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


This page contains examples for the use of knitr to create documents from RMarkdown or LaTex sources.



knitr is an R package for literate programming. It is integrated with R Studio and the exercises on this page assume you have R Studio installed.


RMarkdown

Markdown is an extremely simple and informal way of structuring documents that is useful if for some reason you feel html is too complicated. That's really all it does: format documents in a simple way so they can be displayed as Web pages. For Markdown documentation, see here.. The concept is quite similar to Wiki markup syntax, the syntax is (regrettably) different, and for a number of features there there are (regrettably) several different ways to achieve the same results.

RMarkdown is an R package that is integrated with R Studio and allows integrating R code with Markdown documents. knitr can work with Markdown files, and this gives additional output options, such as PDF and MSWord documents.


Let's give it a try: we'll develop an R function that will give us a random phobia to ponder on.

Task:

  • Open an R Studio session.
  • Select Session → Set Working Directory → Choose Directory... and choose some project directory.
    • Note that there is a bug in R Studio that will prevent the knitr interface from working correctly if your home directory contains an .Rprofile file that issues a setwd() command to a file other than your project directory. If you run into an error when weaving your file, remove any setwd() command you might find there.
  • Select File → New File → RMarkdown.... When you do this the first time, R Studio will ask you whether you want to install/update a number of required packages. Click Yes.
  • Enter "Random Phobia" as the Title and your name as the Author, select to create a Document, and check Word as the default output option.

R Studio will load some default text and markup into the script pane which we can edit.

Let's introduce our plan: copy/paste the following text into the document.

    1. Phobias!

We all have some that we might not even be aware of. How to know them all? With this code we access the Oxford English Dictionary's Website - the most authoritative source on the English language, and scrape a list of phobias. A function is supplied to retrieve a random phobia, which we can subsequently ponder on - either to delight in the fact that we don't have that fear, or to add to our daily quota of anxieties <small>(like our well-founded [fear of bad programming practice](http://xkcd.com/292/))</small>.

To load the list, we will "screenscrape" a list of Phobias from the [OED Phobia list](http://www.oxforddictionaries.com/words/phobias-list). First, we load the XML library (or install it from CRAN, if we don't have it).

Note the following Markdown elements in the code:


- a tag <small>Text...</small> to set text to a smaller font size (we could have used <span style="font-size:85%;">Text...</span> instead because Markdown respects HTML elelements).
- a Web link [Text...](URL)added to text
  • Click on the green question mark of the menu of your script pane. There is a link to an overview of RMarkdown use and to a quick reference. Load the quick reference (it will appear in the Help pane) and scan it.
  • The filename in the script pane tab is red, because it contains unsaved changes. Save the file in your project directory, note that the extension .Rmd is automatically added.

Time to add our first bit of R code

  • Copy and paste the following:
```{r loadLibrary}
if (!require(XML, quiet=TRUE)) { 
  install.packages("XML")
  library(XML)
}
```


This is what is know as a "code chunk". Note that after the {r signal we have added an (optional) label for the chunk. That is useful, because we can rapidly navigate between chunks (click on the navigation menu at the bottom of the script pane), and we can refer to the labels to execute chunks that are coded later in the document at an earlier stage. This is an important idea of literal programming: the flow of the document should not be determined by the requirements of the code, but by the logic of the narrative. We can even suppress printing of a chunk into the document altogether, if we think it is not relevant for the document, by adding the option echo=FALSE. TLDR; label your chunks. It's useful.

  • To execute a particular chunk, simply place the cursor into the chunk and select Chunks → Run Current Chunk from the menu at the top of the script pane. Check the console pane, the library should load without error.
  • Lets add more text and code:


The XML package provides a function -- `readHTMLTable()` -- that makes our life very easy: it accesses an URL, looks for all HTML formatted tables, parses them and returns them as lists.

```{r getPageData, cache=TRUE}
page <- readHTMLTable("http://www.oxforddictionaries.com/words/phobias-list")
```

Two things to note:

  • Enclosing a piece of text in "backticks" `Text...` formats that text as "code" - typically in a fixed-width font.
  • For this chunk we have set the option cache as TRUE. This is a very useful and well thought out mechanism that avoids recomputing code that takes a long time or should otherwise be limited. The results of a cached chunk of code are stored locally and retrieved when the file is weaved. Only if anything within the chunk is changed (or cache is set to FALSE), is the chunk evaluated again. This prevents us from pounding on the OED again and again as we develop our script, which is a question of good manners in the context of this example, but can save a lot of time as our projects become large and the calculations become complex.





 

Contents

 

Notes


 

Further reading and resources

[1] A list of chunk options by the author of knitr. Required reading.