Difference between revisions of "CSB Assignment Week 4"

From "A B C"
Jump to navigation Jump to search
m
m
 
(4 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
<div class="b1">
 
<div class="b1">
 
Assignments for Week 4<br/>
 
Assignments for Week 4<br/>
<span style="font-size: 70%">Collaboration tools, initializing our project.</span>
+
<span style="font-size: 70%">Setting up your local environment; Data in R</span>
 
</div>
 
</div>
  
Line 156: Line 156:
 
::* Add your name to the collaborator list and save your changed copy.
 
::* Add your name to the collaborator list and save your changed copy.
 
::* Commit your change. Make sure you always add a commit message to your commits.
 
::* Commit your change. Make sure you always add a commit message to your commits.
::* '''sync''' again, to "pus" your commit to github.
+
::* '''sync''' again, to "push" your commit to github.
 
::* Go to the [https://github.com/hyginn/Ontoscope Ontoscope repository] to confirm your commit has arrived.
 
::* Go to the [https://github.com/hyginn/Ontoscope Ontoscope repository] to confirm your commit has arrived.
  
Line 183: Line 183:
 
{{task|1=
 
{{task|1=
  
A significant portion of your efforts in any project will be spent on '''preparing data''' for analysis. This includes reading data from various sources, preprocessing it, and extracting subsets of interest. R has powerful functions that support these tasks. I would like you to practice subsetting of data objects: load the following tutorial with its associated file as an RStudio project from github.
+
Reading and writing data is another of the truly essential '''R''' skills. This brief tutorial reviews the basics: text-files, csv tables, and .Rdata objects. Load the following tutorial with its associated file as an RStudio project from github.
  
  
Line 189: Line 189:
 
* Select '''File &rarr; New Project ...'''
 
* Select '''File &rarr; New Project ...'''
 
* Choose '''Version control &rarr; Git '''
 
* Choose '''Version control &rarr; Git '''
* Enter the repository URL for the tutorial: https://github.com/hyginn/R_Exercise-Subsetting
+
* Enter the repository URL for the tutorial: https://github.com/hyginn/R_Exercise-Data
 
* Click on '''Create Project'''.
 
* Click on '''Create Project'''.
  
If the R script source-code does not appear in the left-hand pane, click on the file name R_Exercise-Subsetting.R in the lower-right hand pane.
+
(In case the R script source-code does not appear in the left-hand pane, click on the file name R_Exercise-Data.R in the lower-right hand pane.)
  
 
}}
 
}}
 
  
  

Latest revision as of 18:44, 25 February 2016

Assignments for Week 4
Setting up your local environment; Data in R

< Assignment 3 Assignment 5 >

Note! This assignment is currently active. All significant changes will be announced on the mailing list.

 
 

Assigned material - concepts, exercises and reading - will be reflected in next week's evaluation and feedback session. Please remember to contribute to self-evaluation questions by Tuesday at noon.


 


 


Warm up

Sometimes it is easy to identify gender from names, especially if the name is taken from a religious tradition. Abraham comes to mind, or Eve. So you read a novel where one sunny day, in a schoolyard, Abraham is looking at Pat, and Pat is looking at Eve.


Can you (reasonably) know if a boy is looking at a girl? [I don't know... I mean I don't know the answer ...]

Seriously?
You're probably uncertain about Pat: Patrick? Patricia?
Hm. Do you need a hint... [Ok. A hint please...]

Does it matter?[Sorry. Of course it matters whether Pat is a girl or a boy - how could it not?]

If you think it matters, you're both right and wrong. You're right in the sense that it makes a difference. But you're wrong that it matters for the answer: if Pat is a girl, the boy Abraham looks at a girl. But if Pat is a boy, then he is looking at the girl Eve. So the answer is: yes, you can reasonably know that a boy is looking at a girl. Often, you just need to enumerate all possibilities...

Now: why "reasonably"? Well, I can't exclude that some parents named their daughter Abraham, or that they were confused about Eve ... but hey, it's just a puzzle.




 


Collaborative Environment

 

The setup of folders and files for collaborative work needs some care. We want to be able to share data and code. But we don't want to mirror all our local experiments to everyone else on the team. We want version control for all files. But not for our very large, read-only datasets. And we want our environment to be predictable - for ourselves and others. Here is a common scheme for everyone to adopt[1].

ProjectFolderAndFileLayout.png

Project folder and file layout. R has a "home directory", commonly abbreviated with the tilde character "~". A file called .Rprofile is executed whenever R starts up and is therefore useful to define global settings. Course-related material should go into a folder called BCB420 somewhere on your computer. This folder contains three folders: Ontoscope is the shared code repository that is mirrored on github and through which you publish the assets you develop. dev is your local development folder where you develop and experiment. Your intermediate stages of development will be in that folder. data contains large data-files that we don't want to put under version control.


 

Task:
Go through the following steps to set up this structure.


Create Folders
Create the following folders on your computer:
  • BCB420 - this will contain all course-related files. And within that folder:
  • dev - this will contain your local code edits and experiments
  • data - this will contain large, stable assets that don't need to be changed.


Create your .Rprofile
Many common operating systems will not allow you to edit files whose name is prefixed with a period/dot. Here's what you do instead.
Open RStudio.
# First, use RStudio's Session -> Set working Directory -> Choose Directory...
# option to find your new <code>dev</code> directory and set it as the 
# "Working Directory". The type:
getwd()

# ... to see the correct path to "dev".


# Here's how you find  your home directory:
path.expand("~")

# You can use this to edit/create .Rprofile in the right place.
file.edit(paste(path.expand("~"), ".Rprofile", sep=""))

# Once the file is opened, 
# add the following three lines to your .Rprofile
DEVDIR <- "/path/to/your/BCB420/dev/"
setwd(DEVDIR)

# Yes, the third line is empty. Make it a habit to end the lat line of your
# script with a carriage return. This matters sometimes. Also, please, please
# for the love of everything that is holy: don't actually type "/path/to/your.."
# I hope it's obvious that that is just a placeholder for the real path on
# _your_ computer which you have just previously printed to the console
# from where you can copy it.

# Save, and close .Rprofile. 
# Exit, and restart RStudio.
# Confirm that your .Rprofile does what it should. Type:
DEVDIR
getwd()

# ... to demonstrate that the object exists, has the right contents, and that
# the setwd() command has correctly changed the working directory. If that doesn't
# work, contact me or the list to troubleshoot and fix.


Setup to use github
Create a github account and install the github desktop client on your computer.


Clone the Repository
Go through the following steps to create your local copy of the shared assets.
  • Navigate to the Ontoscope repository
  • Find the button that looks like a computer monitor with an arrow. If you hover over it it should explain that it is used to save the repository to your local computer. Click on that button. An "External Protocol Request" warning should appear. Click to Launch Application.
  • Your github desktop application will open. Find and select your BCB420 folder to clone the repository into and keep the name. Click on Clone.
  • Check that the folder has been created in the right spot and contains the same files you see in the github repository on the Web.


Email me your github user name so I can add you as collaborator to the repository.


Initialize your development folder
Copy codeTemplate.R from the Ontoscope folder to your dev folder. Rename it to myCode.R to avoid confusion. You can edit and adapt this template for your own code. But you should also place your dev folder under version control, so that you can work effectively...
In your github desktop client click the (+) button (Add a repository). Choose Ontoscope-dev as the Name (don't just call it "dev", you may be working on several projects in the future...). Then Choose... your dev folder and Create Repository.... This will now appear under the Other category in the side-bar: you have full version control over the folder, but it is not mirrored to github.


Checking in
Let's make sure this works. After I have added you as collaborator, add your name to the Readme document ...
  • Open the github desktop client.
  • Select the Ontoscope repository.
  • Click the sync button to download the most recent version of all assets.
  • Open your local copy of Readme.md in notepad, or RStudio.
  • Add your name to the collaborator list and save your changed copy.
  • Commit your change. Make sure you always add a commit message to your commits.
  • sync again, to "push" your commit to github.
  • Go to the Ontoscope repository to confirm your commit has arrived.
If you have added your name before Tuesday's class session, you have earned yourself 3 marks for the quiz.


 

A little bit of (light) reading

 

Here are some useful observations on scientific data in the lab...

Goodman et al. (2014) Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10:e1003542. (pmid: 24763340)

PubMed ] [ DOI ]

Do you agree? Are there useful tools that we should know about? After all, the article is over a year old and in this game that's a lot. Anything here that we should adopt? Software design needs clearly defined requirements. There are functional requirements and non-functional requirements such as the ones the article discusses. Are there others we need to act on and add to our task list?


 


Advancing your R skills

Task:
Reading and writing data is another of the truly essential R skills. This brief tutorial reviews the basics: text-files, csv tables, and .Rdata objects. Load the following tutorial with its associated file as an RStudio project from github.


(In case the R script source-code does not appear in the left-hand pane, click on the file name R_Exercise-Data.R in the lower-right hand pane.)


 


 
That is all.


 

Footnotes and references

 
  1. You might be used to do things differently, but for this project, do it this way. But if you think this can be improved, let's talk.


 


 
Ask, if things don't work for you!
If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.
... are required reading.


 



< Assignment 3 Assignment 5 >