CSB Assignment Week 3

From "A B C"
Revision as of 21:01, 1 February 2016 by Boris (talk | contribs) (→‎Warm up)
Jump to navigation Jump to search

Assignments for Week 3
Collaboration tools, initializing our project.

< Assignment 2 Assignment 4 >

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

Assigned material - concepts, exercises and reading - will be reflected in next week's evaluation and feedback session. Please remember to contribute to self-evaluation questions by Tuesday at noon.


 


 

Warm up

In many London Underground tube stations there are two escalators going up but only one going down.


Why? [I don't know...]


Exercises

In this exercise we will attempt to extract a set of relevant genes for the pluripotency network from deposited expression data.

Task:
A recent paper has highlighted the lineage-specific roles of SOX2, OCT4 and NANOG in human cells.

Wang et al. (2012) Distinct lineage specification roles for NANOG, OCT4, and SOX2 in human embryonic stem cells. Cell Stem Cell 10:440-54. (pmid: 22482508)

PubMed ] [ DOI ]

First, we will access the relevant data series on GEO, the NCBI's database for expression data.
  1. Navigate to the pubMed page of the article via the link provided in the reference box above.
  2. Follow the link to associated GEO records in the right hand side of the PubMed page (under Related Information). The top hit is a Superseries, composed of a number of Subseries of experiments.
  3. Open its link in a new tab.
  4. Examine the samples that are included in this study by expanding the list of samples. You will notice that the sample titles tell you a bit about the experiment, the actual Subseries page describes more about the experiment, but here, and in general, for a reasonable understanding of the experimental variables, you will need to read the actual paper.
  5. Not for this first-look exercise however – just note: shXXX samples are knock-downs (KD) using a lentiviral short-hairpin RNA, OE is overexpression, H1 and H9 are human embryonal stem-cell lines.

We can pursue the question: if any or all of the pluripotency maintaining transcription factors are knocked down – presumably a surrogate for a differentiation signal – what are the downstream targets and what do they have in common; conversely, what complementary effects are observed when these factors are overexpressed? The first step therefore is to identify differentially expressed genes. Conveniently, GEO offers the GEO2R utility to help perform differential expression analysis.

Now proceed to apply this to the stem-cell transcription factor study
  1. On the Superset page, click on the Analyze with GEO2R link.
  2. Click on the Treatment column header to sort the series by experimental variable.
  3. Define meaningful groups: you could name them SOX2 KD, SOX2 OE, the same for NANOG and OCT4, and CTRL. (Note that these are just names, you could also have called the groups Capitoline, Palatine, Esquiline, Aventine, Caelian, Viminal, and Quirinal – if you remember what the names stand for.)
  4. Then associate the group names with relevant experiments, as shown in the video. For the control samples, you can combine the H1 "controls" and the H1 "untreated" samples from the BMP4 treatment series.
  5. Confirm that the value distributions are unbiased - overall, in such experiments, the bulk of the expression values should not change and thus means and quantiles of the expression levels should be about the same. You should note that the OE samples are systematically different from the others, and that one of the NANOG samples has very low values. Remove that series from your list and rerun the distribution to confirm that the data is no longer in the list.
  6. In the GEO2R tab, click on the Top 250 button to execute the analysis of significantly differentially expressed genes.
  7. By clicking on a few of the gene names in the Gene.symbol column, you can view the expression profiles that tell you why the genes were found to be differentially expressed. Can you identify a gene that increases in expression in response to all three factors?
  • Finally, review the R script for your analysis. Check if there are any aspects of the code that you don't understand. That will give you an idea of the level to which you ought to bring your R skills. But not right now – and: no worries, R code analysis will not be required on Wednesday's quiz.


Pre-reading

No pre-reading: Open Project visions are due in class!



 
That is all.


 

Footnotes and references

 



 


 
Ask, if things don't work for you!
If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.
... are required reading.


 



< Assignment 2 Assignment 4 >