BIO Assignment Week 10

Assignment for Week 10
Expression Analysis

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.

Introduction

The transcriptome is the set of a cell's mRNA molecules. Microarray technology - the quantitative, sequence-specific hybridization of nucleotides - was the first domain of massively parallel, high-throughput biology. Quantifying gene expression levels in a tissue-, development-, or response-specific has yielded detailed insight into cellular function at the molecular level. Yet, while the questions remain, high-throughput sequencing methods are rapidly supplanting microarrays to provide the data. Moreover, we realize that the transcriptome is not just a passive buffer of expressed information: an entire, complex, intrinsic level of regulation through hybridization of small nuclear RNAs has been discovered.

Barrett et al. (2011) NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res 39:D1005-10. (pmid: 21097893)

[ PubMed ] [ DOI ] Abstract

Barrett et al. (2013) NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res 41:D991-5. (pmid: 23193258)

[ PubMed ] [ DOI ] Abstract

The transcriptome originates from the genome, mostly, that is, and it results in the proteome, again: mostly. RNA that is transcribed from the genome is not yet fit for translation but must be processed: splicing is ubiquitous^[1] and in addition RNA editing has been encountered in many species. Some authors therefore refer to the exome—the set of transcribed exons— to indicate the actual coding sequence.

The dark matter of the transcriptome may just be noise^[2].

Microarray standards and databases
Working with expression data
Interpretation

http://coxpresdb.jp/cgi-bin/coex_list.cgi?gene=851503&sp=Sce

http://www.geneticsofgeneexpression.org/network/index.php?gene=CDK4

Exercises

In this exercise we will attempt to extract a set of relevant genes for the pluripotency network from deposited expression data.

Task:
A recent paper has highlighted the lineage-specific roles of SOX2, OCT4 and NANOG in human cells.

Wang et al. (2012) Distinct lineage specification roles for NANOG, OCT4, and SOX2 in human embryonic stem cells. Cell Stem Cell 10:440-54. (pmid: 22482508)

[ PubMed ] [ DOI ] Abstract

First, we will access the relevant data series on GEO, the NCBI's database for expression data.

Navigate to the pubMed page of the article via the link provided in the reference box above.
Follow the link to associated GEO records in the right hand side of the PubMed page (under Related Information). The top hit is a Superseries, composed of a number of Subseries of experiments.
Open its link in a new tab.
Examine the samples that are included in this study by expanding the list of samples. You will notice that the sample titles tell you a bit about the experiment, the actual Subseries page describes more about the experiment, but here, and in general, for a reasonable understanding of the experimental variables, you will need to read the actual paper.
Not for this first-look exercise however – just note: shXXX samples are knock-downs (KD) using a lentiviral short-hairpin RNA, OE is overexpression, H1 and H9 are human embryonal stem-cell lines.

We can pursue the question: if any or all of the pluripotency maintaining transcription factors are knocked down – presumably a surrogate for a differentiation signal – what are the downstream targets and what do they have in common; conversely, what complementary effects are observed when these factors are overexpressed? The first step therefore is to identify differentially expressed genes. Conveniently, GEO offers the GEO2R utility to help perform differential expression analysis.

View the GEO2R video tutorial on youtube.

Now proceed to apply this to the stem-cell transcription factor study

On the Superset page, click on the Analyze with GEO2R link.
Click on the Treatment column header to sort the series by experimental variable.
Define meaningful groups: you could name them SOX2 KD, SOX2 OE, the same for NANOG and OCT4, and CTRL. (Note that these are just names, you could also have called the groups Capitoline, Palatine, Esquiline, Aventine, Caelian, Viminal, and Quirinal – if you remember what the names stand for.)
Then associate the group names with relevant experiments, as shown in the video. For the control samples, you can combine the H1 "controls" and the H1 "untreated" samples from the BMP4 treatment series.
Confirm that the value distributions are unbiased - overall, in such experiments, the bulk of the expression values should not change and thus means and quantiles of the expression levels should be about the same. You should note that the OE samples are systematically different from the others, and that one of the NANOG samples has very low values. Remove that series from your list and rerun the distribution to confirm that the data is no longer in the list.
In the GEO2R tab, click on the Top 250 button to execute the analysis of significantly differentially expressed genes.
By clicking on a few of the gene names in the Gene.symbol column, you can view the expression profiles that tell you why the genes were found to be differentially expressed. Can you identify a gene that increases in expression in response to all three factors?

Finally, review the R script for your analysis. Check if there are any aspects of the code that you don't understand. That will give you an idea of the level to which you ought to bring your R skills. But not right now – and: no worries, R code analysis will not be required on Wednesday's quiz.

References

↑ Strictly speaking, splicing is an eukaryotic achievement, many instances of splicing have been recognized in prokaryotes as well.
↑
Jarvis & Robertson (2011) The noncoding universe. BMC Biol 9:52. (pmid: 21798102)

[ PubMed ] [ DOI ]

Links and resources

Further reading

Ray et al. (2013) A compendium of RNA-binding motifs for decoding gene regulation. Nature 499:172-7. (pmid: 23846655)

[ PubMed ] [ DOI ] Abstract

Vera et al. (2013) MicroRNA-regulated networks: the perfect storm for classical molecular biology, the ideal scenario for systems biology. Adv Exp Med Biol 774:55-76. (pmid: 23377968)

[ PubMed ] [ DOI ] Abstract

Barbosa-Morais et al. (2012) The evolutionary landscape of alternative splicing in vertebrate species. Science 338:1587-93. (pmid: 23258890)

[ PubMed ] [ DOI ] Abstract

Han et al. (2011) SnapShot: High-throughput sequencing applications. Cell 146:1044, 1044.e1-2. (pmid: 21925324)

[ PubMed ] [ DOI ]

Malone & Oliver (2011) Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 9:34. (pmid: 21627854)

[ PubMed ] [ DOI ] Abstract

Zheng & Tao (2011) Stochastic analysis of gene expression. Methods Mol Biol 734:123-51. (pmid: 21468988)

[ PubMed ] [ DOI ] Abstract

Parkinson et al. (2011) ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:D1002-4. (pmid: 21071405)

[ PubMed ] [ DOI ] Abstract

Xie & Ahn (2010) Statistical methods for integrating multiple types of high-throughput data. Methods Mol Biol 620:511-29. (pmid: 20652519)

[ PubMed ] [ DOI ] Abstract

Reimers (2010) Making informed choices about microarray data analysis. PLoS Comput Biol 6:e1000786. (pmid: 20523743)

[ PubMed ] [ DOI ]

Hubble et al. (2009) Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res 37:D898-901. (pmid: 18953035)

[ PubMed ] [ DOI ] Abstract

Chuang et al. (2007) Network-based classification of breast cancer metastasis. Mol Syst Biol 3:140. (pmid: 17940530)

[ PubMed ] [ DOI ] Abstract

Carninci (2007) Constructing the landscape of the mammalian transcriptome. J Exp Biol 210:1497-506. (pmid: 17449815)

[ PubMed ] [ DOI ] Abstract

Barrett & Edgar (2006) Mining microarray data at NCBI's Gene Expression Omnibus (GEO)*. Methods Mol Biol 338:175-90. (pmid: 16888359)

[ PubMed ] [ DOI ] Abstract

Footnotes and references

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.

Do consider how to ask your questions so that a meaningful answer is possible:
- How to create a Minimal, Complete, and Verifiable example on stackoverflow and ...
- How to make a great R reproducible example are required reading.

< Assignment 9

Assignment 11 >

[1] Strictly speaking, splicing is an eukaryotic achievement, many instances of splicing have been recognized in prokaryotes as well.

[Jarvis2011-2] 
Jarvis & Robertson (2011) The noncoding universe. BMC Biol 9:52. (pmid: 21798102)

[ PubMed ] [ DOI ]

[1]

[2]

BIO Assignment Week 10

Contents

Introduction

Exercises

References

Further reading and resources

Links and resources

Footnotes and references

Ask, if things don't work for you!

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools