Difference between revisions of "CSB Assignment Week 4"

From "A B C"
Jump to navigation Jump to search
m
m
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
<div class="b1">
 
<div class="b1">
 
Assignments for Week 4<br/>
 
Assignments for Week 4<br/>
<span style="font-size: 70%">Collaboration tools, initializing our project.</span>
+
<span style="font-size: 70%">Setting up your local environment; Data in R</span>
 
</div>
 
</div>
  
Line 10: Line 10:
 
</tr></table>
 
</tr></table>
  
{{Inactive}}
+
{{Active}}
  
 
Assigned material - concepts, exercises and reading - will be reflected in next week's evaluation and feedback session. Please remember to contribute to [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCB420_self-evaluation_questions '''self-evaluation questions'''] by Tuesday at noon.
 
Assigned material - concepts, exercises and reading - will be reflected in next week's evaluation and feedback session. Please remember to contribute to [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCB420_self-evaluation_questions '''self-evaluation questions'''] by Tuesday at noon.
Line 24: Line 24:
 
<section begin=warm-up />
 
<section begin=warm-up />
 
<div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-01" style="border:solid 1px #000000;padding:10px;background-color:#F9F9FF">
 
<div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-01" style="border:solid 1px #000000;padding:10px;background-color:#F9F9FF">
Sometimes it is easy to identify gender with names, especially if the name is taken from a religious tradition. Abraham comes to mind, or Eve. So one day, in a schoolyard, Abraham is looking at Pat, and Pat is looking at Eve.
+
Sometimes it is easy to identify gender from names, especially if the name is taken from a religious tradition. Abraham comes to mind, or Eve. So you read a novel where one sunny day, in a schoolyard, Abraham is looking at Pat, and Pat is looking at Eve.
  
  
'''Can you (reasonably) know if a boy is looking at a girl?''' <span class="mw-customtoggle-01" style="background:#F2F2FF;padding:5px;font-size:80%;margin-left:50px">[''I&nbsp;don't&nbsp;know...&nbsp;I&nbsp;mean&nbsp;I&nbsp;don't&nbsp;know&nbsp;the&nbsp;answer&nbsp;...'']</span>
+
'''Can you <small>(reasonably)</small> know if a boy is looking at a girl?''' <span class="mw-customtoggle-01" style="background:#F2F2FF;padding:5px;font-size:80%;margin-left:50px">[''I&nbsp;don't&nbsp;know...&nbsp;I&nbsp;mean&nbsp;I&nbsp;don't&nbsp;know&nbsp;the&nbsp;answer&nbsp;...'']</span>
  
 
<div class="mw-collapsible-content" style="padding:10px;">
 
<div class="mw-collapsible-content" style="padding:10px;">
Line 43: Line 43:
 
<div class="mw-collapsible-content" style="padding:5px;border:solid 1px #000000;background-color:#E2E2FF">
 
<div class="mw-collapsible-content" style="padding:5px;border:solid 1px #000000;background-color:#E2E2FF">
  
If you think it matters, you're both right and wrong. You're right in the sense that it makes a difference. But you're wrong that it matters for the answer: if Pat is a girl, the boy Abraham looks at a girl. But if Pat is is a boy, then he is looking at the girl Eve. So the answer is: yes, you can reasonably know that a boy is looking at a girl. Often, you just need to enumerate all possibilities...  
+
If you think it matters, you're both right and wrong. You're right in the sense that it makes a difference. But you're wrong that it matters for the answer: if Pat is a girl, the boy Abraham looks at a girl. But if Pat is a boy, then he is looking at the girl Eve. So the answer is: yes, you can reasonably know that a boy is looking at a girl. Often, you just need to enumerate all possibilities...  
  
 
<small>Now: why "reasonably"? Well, I can't exclude that some parents named their daughter Abraham, or that they were confused about Eve ... but hey, it's just a puzzle.</small>
 
<small>Now: why "reasonably"? Well, I can't exclude that some parents named their daughter Abraham, or that they were confused about Eve ... but hey, it's just a puzzle.</small>
Line 62: Line 62:
 
{{Vspace}}
 
{{Vspace}}
  
...TBD
 
  
<!--
+
==Collaborative Environment==
 +
 
 +
{{Vspace}}
 +
 
 +
The setup of folders and files for collaborative work needs some care. We want to be able to share data and code. But we don't want to mirror all our local experiments to everyone else on the team. We want version control for all files. But not for our very large, read-only datasets. And we want our environment to be predictable - for ourselves and others. Here is a common scheme for everyone to adopt<ref>You might be used to do things differently, but for this project, do it this way. But if you think this can be improved, let's talk.</ref>.
 +
 
 +
{{FullImage|ProjectFolderAndFileLayout.png|Project folder and file layout. '''R''' has a "home directory", commonly abbreviated with the tilde character "<code>~</code>". A file called <code>.Rprofile</code> is executed whenever '''R''' starts up and is therefore useful to define global settings. Course-related material should go into a folder called <code>BCB420</code> somewhere on your computer. This folder contains three folders: <code>Ontoscope</code> is the shared code repository that is mirrored on github and through which you publish the assets you develop. <code>dev</code> is your ''local'' development folder where you develop and experiment. Your intermediate stages of development will be in that folder. <code>data</code> contains large data-files that we don't want to put under version control.
 +
}}
 +
 
 +
{{Vspace}}
 +
 
 +
{{task|1=
 +
Go through the following steps to set up this structure.
 +
 
 +
 
 +
;Create Folders
 +
: Create the following folders on your computer:
 +
:* <code>BCB420</code> - this will contain all course-related files. And within that folder:
 +
::* <code>dev</code> - this will contain your local code edits and experiments
 +
::* <code>data</code> - this will contain large, stable assets that don't need to be changed.
  
==Exercises==
 
  
{{#lst:Enrichment|exercises}}
+
;Create your <code>.Rprofile</code>
 +
:Many common operating systems will not allow you to edit files whose name is prefixed with a period/dot. Here's what you do instead.
 +
:Open RStudio.
 +
<source lang="R">
 +
# First, use RStudio's Session -> Set working Directory -> Choose Directory...
 +
# option to find your new <code>dev</code> directory and set it as the
 +
# "Working Directory". The type:
 +
getwd()
  
January 30 2014 saw the publication of what may be the most important scientific advance published in our lifetimes. In two ''nature'' papers, Haruko Obokata described the creation of so-called '''STAP''' (stimulus-triggered acquisition of pluripotency) cells. These cells can be generated by simple stress-protocols applied to leukocytes, but also to brain-, skin-, muscle-, fat-, bone marrow, lung- and liver-derived cells. Successful protocols include bathing the cells in moderately acidic medium for half an hour, mechanically perturbing the cells, or inducing plasma cell membrane pores with streptolysin. Post stress, cells shrink, stop proliferating, downregulate their differentiation gene markers and begin expressing markers of pluripotent stem cells that include our friends OCT4, Nanog, and Sox-2, and others. Cell clusters that form after this apparent transformation can be propagated. '''Strikingly, not only are these clusters able to grow into entire embryos after blastocyst injection, and further into normal, adult mice, but these mice are fertile and demonstrate germline transmission of their genetic markers into their offspring.'''
+
# ... to see the correct path to "dev".
  
{{Task|1=
 
  
'''1.''' Read ...
+
# Here's how you find  your home directory:
:{{#pmid: 24476887}}
+
path.expand("~")
  
:Obvious questions arise such as:
+
# You can use this to edit/create .Rprofile in the right place.
* can these mouse-findings be transferred to humans;
+
file.edit(paste(path.expand("~"), ".Rprofile", sep=""))
* what are the upstream signals that trigger the pluripotency response; and,  
 
* can this signalling network be controlled?
 
  
: Obokata ''et al.'' describe the transformation as the release of cells from an epigenetic differentiation state, and this may point to the class of genes that could possibly be involved. How would you find them?
+
# Once the file is opened,
 +
# add the following three lines to your .Rprofile
 +
DEVDIR <- "/path/to/your/BCB420/dev/"
 +
setwd(DEVDIR)
  
'''2.''' Think about what datasets you might need to pursue these questions and what comparisons you might want to undertake and what you would need to do to run these comparisons.
+
# Yes, the third line is empty. Make it a habit to end the lat line of your
 +
# script with a carriage return. This matters sometimes. Also, please, please
 +
# for the love of everything that is holy: don't actually type "/path/to/your.."
 +
# I hope it's obvious that that is just a placeholder for the real path on
 +
# _your_ computer which you have just previously printed to the console
 +
# from where you can copy it.
  
'''3.''' Navigate to [http://www.ncbi.nlm.nih.gov/gds '''GEO'''] and search whether suitable datasets are available. Familiarize yourself with the query fields such as <code>[ti]</code> and <code>[organism]</code>, and the boolean operators <code>AND</code>, <code>OR</code>, and <code>NOT</code>, (Note: capitals!).
+
# Save, and close .Rprofile.
 +
# Exit, and restart RStudio.
 +
# Confirm that your .Rprofile does what it should. Type:
 +
DEVDIR
 +
getwd()
 +
 
 +
# ... to demonstrate that the object exists, has the right contents, and that
 +
# the setwd() command has correctly changed the working directory. If that doesn't
 +
# work, contact me or the list to troubleshoot and fix.
 +
 
 +
</source>
 +
 
 +
 
 +
;Setup to use github
 +
:Create a github account and install the github desktop client on your computer.
 +
 
 +
 
 +
;Clone the Repository
 +
:Go through the following steps to create your local copy of the shared assets.
 +
:* Navigate to the [https://github.com/hyginn/Ontoscope Ontoscope repository]
 +
:* Find the button that looks like a computer monitor with an arrow. If you hover over it it should explain that it is used to save the repository to your local computer. Click on that button. An "External Protocol Request" warning should appear. Click to '''Launch Application'''.
 +
:* Your github desktop application will open. Find and select your <code>BCB420</code> folder to clone the repository into and keep the name. Click on '''Clone'''.
 +
:* Check that the folder has been created in the right spot and contains the same files you see in the github repository on the Web.
 +
 
 +
 
 +
;Email me your github user name so I can add you as collaborator to the repository.
 +
 
 +
 
 +
;Initialize your development folder
 +
:Copy <code>codeTemplate.R</code> from the <code>Ontoscope</code> folder to your <code>dev</code> folder. Rename it to <code>myCode.R</code> to avoid confusion. You can edit and adapt this template for your own code. But you should also place your <code>dev</code> folder under version control, so that you can work effectively...
 +
:In your github desktop client click the '''(+)''' button (Add a repository). Choose <code>Ontoscope-dev</code> as the Name (don't just call it "dev", you may be working on several projects in the future...). Then '''Choose...''' your <code>dev</code> folder and '''Create Repository...'''. This will now appear under the '''Other''' category in the side-bar: you have full version control over the folder, but it is not mirrored to github.
 +
 
 +
 
 +
;Checking in
 +
:Let's make sure this works. '''After''' I have added you as collaborator, add your name to the ''Readme'' document ...
 +
::* Open the github desktop client.
 +
::* Select the Ontoscope repository.
 +
::* Click the '''sync''' button to download the most recent version of all assets.
 +
::* Open your '''local''' copy of <code>Readme.md</code> in notepad, or RStudio.
 +
::* Add your name to the collaborator list and save your changed copy.
 +
::* Commit your change. Make sure you always add a commit message to your commits.
 +
::* '''sync''' again, to "push" your commit to github.
 +
::* Go to the [https://github.com/hyginn/Ontoscope Ontoscope repository] to confirm your commit has arrived.
 +
 
 +
;If you have added your name before Tuesday's class session, you have earned yourself 3 marks for the quiz.
  
'''4.''' Note down all datasets of interest that you find and '''put the links into your student Wiki page'''.
 
  
 
}}
 
}}
  
==(Pre-)reading==
+
{{Vspace}}
 +
 
 +
==A little bit of (light) reading==
  
Now is as good a time as any to scan two recent updates on the use of and holdings of '''GEO'''. Read briefly, we will not go over the contents in much detail but this overview may help you define how to go about task 2.
+
{{Vspace}}
  
{{#lst:Transcriptome|reading}}
+
Here are some useful observations on scientific data in the lab...
  
{{#lst:-omics|reading}}
+
{{#pmid: 24763340}}  
  
{{#lst:Proteome|reading}}
+
Do you agree? Are there useful tools that we should know about? After all, the article is over a year old and in this game that's a lot. Anything here that we should adopt? Software design needs clearly defined requirements. There are '''functional requirements''' and '''non-functional requirements''' such as the ones the article discusses. Are there others we need to act on and add to our [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/BCB420_2016_Tasks task list]?
  
-->
+
{{Vspace}}
  
 +
 +
==Advancing your '''R''' skills==
 +
 +
{{task|1=
 +
 +
Reading and writing data is another of the truly essential '''R''' skills. This brief tutorial reviews the basics: text-files, csv tables, and .Rdata objects. Load the following tutorial with its associated file as an RStudio project from github.
 +
 +
 +
* Open RStudio
 +
* Select '''File &rarr; New Project ...'''
 +
* Choose '''Version control &rarr; Git '''
 +
* Enter the repository URL for the tutorial: https://github.com/hyginn/R_Exercise-Data
 +
* Click on '''Create Project'''.
 +
 +
(In case the R script source-code does not appear in the left-hand pane, click on the file name R_Exercise-Data.R in the lower-right hand pane.)
 +
 +
}}
  
  

Latest revision as of 18:44, 25 February 2016

Assignments for Week 4
Setting up your local environment; Data in R

< Assignment 3 Assignment 5 >

Note! This assignment is currently active. All significant changes will be announced on the mailing list.

 
 

Assigned material - concepts, exercises and reading - will be reflected in next week's evaluation and feedback session. Please remember to contribute to self-evaluation questions by Tuesday at noon.


 


 


Warm up

Sometimes it is easy to identify gender from names, especially if the name is taken from a religious tradition. Abraham comes to mind, or Eve. So you read a novel where one sunny day, in a schoolyard, Abraham is looking at Pat, and Pat is looking at Eve.


Can you (reasonably) know if a boy is looking at a girl? [I don't know... I mean I don't know the answer ...]

Seriously?
You're probably uncertain about Pat: Patrick? Patricia?
Hm. Do you need a hint... [Ok. A hint please...]

Does it matter?[Sorry. Of course it matters whether Pat is a girl or a boy - how could it not?]

If you think it matters, you're both right and wrong. You're right in the sense that it makes a difference. But you're wrong that it matters for the answer: if Pat is a girl, the boy Abraham looks at a girl. But if Pat is a boy, then he is looking at the girl Eve. So the answer is: yes, you can reasonably know that a boy is looking at a girl. Often, you just need to enumerate all possibilities...

Now: why "reasonably"? Well, I can't exclude that some parents named their daughter Abraham, or that they were confused about Eve ... but hey, it's just a puzzle.




 


Collaborative Environment

 

The setup of folders and files for collaborative work needs some care. We want to be able to share data and code. But we don't want to mirror all our local experiments to everyone else on the team. We want version control for all files. But not for our very large, read-only datasets. And we want our environment to be predictable - for ourselves and others. Here is a common scheme for everyone to adopt[1].

ProjectFolderAndFileLayout.png

Project folder and file layout. R has a "home directory", commonly abbreviated with the tilde character "~". A file called .Rprofile is executed whenever R starts up and is therefore useful to define global settings. Course-related material should go into a folder called BCB420 somewhere on your computer. This folder contains three folders: Ontoscope is the shared code repository that is mirrored on github and through which you publish the assets you develop. dev is your local development folder where you develop and experiment. Your intermediate stages of development will be in that folder. data contains large data-files that we don't want to put under version control.


 

Task:
Go through the following steps to set up this structure.


Create Folders
Create the following folders on your computer:
  • BCB420 - this will contain all course-related files. And within that folder:
  • dev - this will contain your local code edits and experiments
  • data - this will contain large, stable assets that don't need to be changed.


Create your .Rprofile
Many common operating systems will not allow you to edit files whose name is prefixed with a period/dot. Here's what you do instead.
Open RStudio.
# First, use RStudio's Session -> Set working Directory -> Choose Directory...
# option to find your new <code>dev</code> directory and set it as the 
# "Working Directory". The type:
getwd()

# ... to see the correct path to "dev".


# Here's how you find  your home directory:
path.expand("~")

# You can use this to edit/create .Rprofile in the right place.
file.edit(paste(path.expand("~"), ".Rprofile", sep=""))

# Once the file is opened, 
# add the following three lines to your .Rprofile
DEVDIR <- "/path/to/your/BCB420/dev/"
setwd(DEVDIR)

# Yes, the third line is empty. Make it a habit to end the lat line of your
# script with a carriage return. This matters sometimes. Also, please, please
# for the love of everything that is holy: don't actually type "/path/to/your.."
# I hope it's obvious that that is just a placeholder for the real path on
# _your_ computer which you have just previously printed to the console
# from where you can copy it.

# Save, and close .Rprofile. 
# Exit, and restart RStudio.
# Confirm that your .Rprofile does what it should. Type:
DEVDIR
getwd()

# ... to demonstrate that the object exists, has the right contents, and that
# the setwd() command has correctly changed the working directory. If that doesn't
# work, contact me or the list to troubleshoot and fix.


Setup to use github
Create a github account and install the github desktop client on your computer.


Clone the Repository
Go through the following steps to create your local copy of the shared assets.
  • Navigate to the Ontoscope repository
  • Find the button that looks like a computer monitor with an arrow. If you hover over it it should explain that it is used to save the repository to your local computer. Click on that button. An "External Protocol Request" warning should appear. Click to Launch Application.
  • Your github desktop application will open. Find and select your BCB420 folder to clone the repository into and keep the name. Click on Clone.
  • Check that the folder has been created in the right spot and contains the same files you see in the github repository on the Web.


Email me your github user name so I can add you as collaborator to the repository.


Initialize your development folder
Copy codeTemplate.R from the Ontoscope folder to your dev folder. Rename it to myCode.R to avoid confusion. You can edit and adapt this template for your own code. But you should also place your dev folder under version control, so that you can work effectively...
In your github desktop client click the (+) button (Add a repository). Choose Ontoscope-dev as the Name (don't just call it "dev", you may be working on several projects in the future...). Then Choose... your dev folder and Create Repository.... This will now appear under the Other category in the side-bar: you have full version control over the folder, but it is not mirrored to github.


Checking in
Let's make sure this works. After I have added you as collaborator, add your name to the Readme document ...
  • Open the github desktop client.
  • Select the Ontoscope repository.
  • Click the sync button to download the most recent version of all assets.
  • Open your local copy of Readme.md in notepad, or RStudio.
  • Add your name to the collaborator list and save your changed copy.
  • Commit your change. Make sure you always add a commit message to your commits.
  • sync again, to "push" your commit to github.
  • Go to the Ontoscope repository to confirm your commit has arrived.
If you have added your name before Tuesday's class session, you have earned yourself 3 marks for the quiz.


 

A little bit of (light) reading

 

Here are some useful observations on scientific data in the lab...

Goodman et al. (2014) Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10:e1003542. (pmid: 24763340)

PubMed ] [ DOI ]

Do you agree? Are there useful tools that we should know about? After all, the article is over a year old and in this game that's a lot. Anything here that we should adopt? Software design needs clearly defined requirements. There are functional requirements and non-functional requirements such as the ones the article discusses. Are there others we need to act on and add to our task list?


 


Advancing your R skills

Task:
Reading and writing data is another of the truly essential R skills. This brief tutorial reviews the basics: text-files, csv tables, and .Rdata objects. Load the following tutorial with its associated file as an RStudio project from github.


(In case the R script source-code does not appear in the left-hand pane, click on the file name R_Exercise-Data.R in the lower-right hand pane.)


 


 
That is all.


 

Footnotes and references

 
  1. You might be used to do things differently, but for this project, do it this way. But if you think this can be improved, let's talk.


 


 
Ask, if things don't work for you!
If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.
... are required reading.


 



< Assignment 3 Assignment 5 >