CSB Assignment Week 3

From "A B C"
Revision as of 01:08, 28 January 2014 by Boris (talk | contribs)
Jump to navigation Jump to search

Assignments for Week 3


Note! This assignment is currently active. All significant changes will be announced on the mailing list.

 
 


Exercises for this week relate to this week's lecture.
Pre-reading for this week will prepare next week's lecture.
Exercises and pre-reading will be topics on next week's quiz.



Exercises

In this exercise we will attempt to extract a set of relevant genes for the pluripotency network from deposited expression data.

Task:
A recent paper has highlighted the lineage-specific roles of SOX2, OCT4 and NANOG in human cells.

Wang et al. (2012) Distinct lineage specification roles for NANOG, OCT4, and SOX2 in human embryonic stem cells. Cell Stem Cell 10:440-54. (pmid: 22482508)

PubMed ] [ DOI ] Nanog, Oct4, and Sox2 are the core regulators of mouse (m)ESC pluripotency. Although their basic importance in human (h)ESCs has been demonstrated, the mechanistic functions are not well defined. Here, we identify general and cell-line-specific requirements for NANOG, OCT4, and SOX2 in hESCs. We show that OCT4 regulates, and interacts with, the BMP4 pathway to specify four developmental fates. High levels of OCT4 enable self-renewal in the absence of BMP4 but specify mesendoderm in the presence of BMP4. Low levels of OCT4 induce embryonic ectoderm differentiation in the absence of BMP4 but specify extraembryonic lineages in the presence of BMP4. NANOG represses embryonic ectoderm differentiation but has little effect on other lineages, whereas SOX2 and SOX3 are redundant and repress mesendoderm differentiation. Thus, instead of being panrepressors of differentiation, each factor controls specific cell fates. Our study revises the view of how self-renewal is orchestrated in hESCs.

First, we will access the relevant data series on GEO, the NCBI's database for expression data.
  1. Navigate to the pubMed page of the article via the link provided in the reference box above.
  2. Follow the link to associated GEO records in the right hand side of the PubMed page (under Related Information). The top hit is a Superseries, composed of a number of Subseries of experiments.
  3. Open its link in a new tab.
  4. Examine the samples that are included in this study by expanding the list of samples. You will notice that the sample titles tell you a bit about the experiment, the actual Subseries page describes more about the experiment, but here, and in general, for a reasonable understanding of the experimental variables, you will need to read the actual paper.
  5. Not for this first-look exercise however – just note: shXXX samples are knock-downs (KD) using a lentiviral short-hairpin RNA, OE is overexpression, H1 and H9 are human embryonal stem-cell lines.

We can pursue the question: if any or all of the pluripotency maintaining transcription factors are knocked down – presumably a surrogate for a differentiation signal – what are the downstream targets and what do they have in common; conversely, what complementary effects are observed when these factors are overexpressed? The first step therefore is to identify differentially expressed genes. Conveniently, GEO offers the GEO2R utility to help perform differential expression analysis.

Now proceed to apply this to the stem-cell transcription factor study
  1. On the Superset page, click on the Analyze with GEO2R link.
  2. Click on the Treatment column header to sort the series by experimental variable.
  3. Define meaningful groups: you could name them SOX2 KD, SOX2 OE, the same for NANOG and OCT4, and CTRL. (Note that these are just names, you could also have called the groups Capitoline, Palatine, Esquiline, Aventine, Caelian, Viminal, and Quirinal – if you remember what the names stand for.)
  4. Then associate the group names with relevant experiments, as shown in the video. For the control samples, you can combine the H1 "controls" and the H1 "untreated" samples from the BMP4 treatment series.
  5. Confirm that the value distributions are unbiased - overall, in such experiments, the bulk of the expression values should not change and thus means and quantiles of the expression levels should be about the same. You should note that the OE samples are systematically different from the others, and that one of the NANOG samples has very low values. Remove that series from your list and rerun the distribution to confirm that the data is no longer in the list.
  6. In the GEO2R tab, click on the Top 250 button to execute the analysis of significantly differentially expressed genes.
  7. By clicking on a few of the gene names in the Gene.symbol column, you can view the expression profiles that tell you why the genes were found to be differentially expressed. Can you identify a gene that increases in expression in response to all three factors?
  • Finally, review the R script for your analysis. Check if there are any aspects of the code that you don't understand. That will give you an idea of the level to which you ought to bring your R skills. But not right now – and: no worries, R code analysis will not be required on Wednesday's quiz.


Pre-reading

No pre-reading: project concepts are due in class!