Difference between revisions of "BIN-EXPR-DE"

Latest revision as of 07:08, 25 September 2020

Discovering Differentially Expressed Genes

(Discovering differentially expressed genes)

Abstract:

Discovering differentially expressed genes in a yeast cell cycle dataset.

Objectives:
This unit will ...

... introduce the GEO tools to evaluate differentially expressed genes.

Outcomes:
After working through this unit you ...

... can access GEO, search for relevant datasets and find significantly differentially expressed genes in the data.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:

The Central Dogma: Regulation of transcription and translation; protein biosynthesis and degradation; quality control.

This unit builds on material covered in the following prerequisite units:

BIN-EXPR-GEO (The NCBI GEO Gene Expression database)

GEO2R

Let's look at differential expression of Mbp1 and its target genes using the analysis facilities of the GEO database at the NCBI.

Task:

First, we will search for relevant data sets on GEO, the NCBI's database for expression data.

Navigate to the entry page for GEO data sets].
Enter the following query in the usual Entrez query format: "cell cycle"[ti] AND "saccharomyces cerevisiae"[organism].
There are quite a few hits and it would take a whiole to sort through them. A study that has analyzed cell-cycle data in an interesting way is Pramila et al.'s Cell-Cycle study, a 13-samples analysis of wild-type yeast (W303a cells) across two cell-cycles after release from alpha-factor arrest.
On the linked GEO DataSet Browser page, follow the link to the Accession Viewer page: the "Reference series".
Read about the experiment and samples, then follow the link to analyze with GEO2R

View the GEO2R video tutorial on youtube.

Now proceed to apply what you have learned in the video-tutorial to the yeast cell-cycle study: Value distribution for the yeast cell-cycle experiment GSE3635. Experiments are grouped approximately into equivalent time-points on a cell cycle.

Define groups: the associated publication shows us that one cell-cycle takes pretty exactly 60 minutes. Create timepoints T0, T1, T2, ... T5. Then associate the 0 and 60 min. sample with "T0"; 10 and 70 minutes get grouped as "T1"; 20 and 80 minutes are T2, etc. up to T5. The final sample does not get assigned.
Confirm that the Value distributions are unbiased by accessing the value distribution tab - overall, in such experiments, the bulk of the expression values should not change and thus means and quantiles of the expression levels should be about the same.
Your distribution should look like the image on the right: properly grouped into six categories, and unbiased regarding absolute expression levels and trends.
Look for differentially expressed genes: open the GEO2R tab and click on Top 250.

Analyze the results.

Examine the top hits. Click on a few of the gene names in the Gene.symbol column to view the expression profiles that tell you why the genes were found to be differentially expressed. What do you think? Is this what you would have expected for genes' responses to the cell-cycle? What seems to be the algorithm's notion of what "differentially expressed" means?
Look for expected genes. Here are a few genes that are known to be differentially expressed in the cell-cycle as target genes of the MBF complex: DSE1, DSE2, ERF3, HTA2, HTB2, and GAS3. But what about the MBD complex proteins themselves: Mbp1 and Swi6?

The notion of "differential expression" and "cell-cycle dependent expression" do not overlap completely. Significant differential expression is mathematically determined for genes that have low variance within groups and large differences between groups. The algorithm has no concept of any expectation you might have about the shape of the expression profile. All it finds are genes for which differential expression between some groups is statistically supported. The algorithm returns the top 250 of those. Consistency within groups is very important, while we intuitively might be giving more weight to genes that conform to our expectations of a cyclical pattern.

Let's see if we can group our time points differently to enhance the contrast between expression levels for cyclically expressed genes. Let's define only two groups: one set before and between the two cycles, one set at the peaks - and we'll omit some of the intermediate values.

Remove all of your groups and define two groups only. Call them "A" and "B".
Assign samples for T = 0 min, 10, 60 and 70 min. to the "A" group. Assign sets 30, 40, 90, and 100 to the "B" group.
Recalculate the Top 250 differentially expressed genes (you might have to refresh the page to get the "Top 250" button back.) Which of the "known" MBF targets are now contained in the set? What about Mbp1 and Swi6?
Finally: Let's compare the expression profiles for Mbp1, Swi6 and Swi4. It is not obvious that transcription factors are themselves under transcriptional control, as opposed to being expressed at a basal level and activated by phosporylation or ligand binding. In a new page, navigate to the Geo profiles page and enter (Mbp1 OR Swi6 OR Swi4 OR Nrm1 OR Cln1 OR Clb6 OR Act1 OR Alg9) AND GSE3635 (Nrm1, Cln1, and Clb6 are Mbp1 target genes. Act1 and Alg9 are beta-Actin and mannosyltransferase, these are often used as "housekeeping genes, i.e. genes with condition-independent expression levels, especially for qPCR studies - although Alg9 is also an Mbp1 target. We include them here as negative controls. CGSE3635 is the ID of the GEO data set we have just studied). You could have got similar results in the Profile graph tab of the GEO2R page. What do you find? What does this tell you? Would this information allow you to define groups that are even better suited for finding cyclically expressed genes?
Click on the profile graph for Mbp1. Describe the evidence you find on that page that allows us to conclude whether or not Mbp1 is a cell-cycle gene. You'll probably want to think for a moment what this question really means, how a cell-cycle gene could be defined, and what can be considered "evidence".

Finally, note the R script for the GEO2R analysis in the R script tab. This code will run on your machine and make the expression analysis available. Once the datasets are loaded and prepared, you could - for example - perform a "real" time series analysis, calculate correlation coefficients with an idealized sine wave, or search for genes that are co-regulated with your genes of interest. We will explore this in another unit.

About ...

Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-09-24

Version:

1.1

Version history:

1.1 2020 Maintainance
1.0 First live version
0.1 First stub

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

Difference between revisions of "BIN-EXPR-DE"

Latest revision as of 07:08, 25 September 2020

Contents

Evaluation

Contents

GEO2R

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools

@@ Line 1: / Line 1: @@
-<div id="BIO">
+<div id="ABC">
-  <div class="b1">
+<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 Discovering Differentially Expressed Genes
-  </div>
+<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
+(Discovering differentially expressed genes)
-  {{Vspace}}
+</div>
-<div class="keywords">
-<b>Keywords:</b>&nbsp;
-Discovering differentially expressed genes
 </div>
-{{Vspace}}
+{{Smallvspace}}
-__TOC__
-{{Vspace}}
-{{DEV}}
-{{Vspace}}
+<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
+<div style="font-size:118%;">
+<b>Abstract:</b><br />
+<section begin=abstract />
+Discovering differentially expressed genes in a yeast cell cycle dataset.
+<section end=abstract />
 </div>
-<div id="ABC-unit-framework">
+<!-- ============================  -->
-== Abstract ==
+<hr>
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "abstract" -->
+<table>
-...
+<tr>
+<td style="padding:10px;">
-{{Vspace}}
+<b>Objectives:</b><br />
+This unit will ...
+* ... introduce the GEO tools to evaluate differentially expressed genes.
-== This unit ... ==
+</td>
-=== Prerequisites ===
+<td style="padding:10px;">
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "prerequisites" -->
+<b>Outcomes:</b><br />
-<!-- included from "ABC-unit_components.wtxt", section: "notes-external_prerequisites" -->
+After working through this unit you ...
-You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
+* ... can access GEO, search for relevant datasets and find significantly differentially expressed genes in the data.
-<!-- included from "FND-prerequisites.wtxt", section: "central_dogma" -->
+</td>
+</tr>
+</table>
+<!-- ============================  -->
+<hr>
+<b>Deliverables:</b><br />
+<section begin=deliverables />
+<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
+<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
+<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
+<section end=deliverables />
+<!-- ============================  -->
+<hr>
+<section begin=prerequisites />
+<b>Prerequisites:</b><br />
+You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:<br />
 *<b>The Central Dogma</b>: Regulation of transcription and translation; protein biosynthesis and degradation; quality control.
-<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
+This unit builds on material covered in the following prerequisite units:<br />
-You need to complete the following units before beginning this one:
+*[[BIN-EXPR-GEO|BIN-EXPR-GEO (The NCBI GEO Gene Expression database)]]
-*[[BIN-EXPR-GEO]]
+<section end=prerequisites />
-*[[FND-STA-Multiple_testing]]
+<!-- ============================  -->
+</div>
-{{Vspace}}
+{{Smallvspace}}
-=== Objectives ===
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "objectives" -->
-...
-{{Vspace}}
+{{Smallvspace}}
-=== Outcomes ===
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "outcomes" -->
-...
-{{Vspace}}
+__TOC__
-=== Deliverables ===
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "deliverables" -->
-<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
-*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
-<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
-*<b>Journal</b>: Document your progress in your [[FND-Journal|course journal]].
-<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
-*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]].
 {{Vspace}}
@@ Line 75: / Line 66: @@
 === Evaluation ===
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "evaluation" -->
-<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 <b>Evaluation: NA</b><br />
-:This unit is not evaluated for course marks.
+<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
-{{Vspace}}
-</div>
-<div id="BIO">
 == Contents ==
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "contents" -->
+{{Smallvspace}}
+{{Task|1=
+*Read the introductory notes on {{ABC-PDF|BIN-EXPR-DE|discovering differentially expressed genes in high-throughput data}}.
-==Introduction==
+}}
-The transcriptome is the set of a cell's mRNA molecules. The transcriptome originates from the genome, mostly, that is, and it results in the proteome, again: mostly. RNA that is {{WP|Transcription (genetics)|transcribed}} from the genome is not yet fit for translation but must be processed: {{WP|RNA splicing|splicing}} is ubiquitous<ref>Strictly speaking, splicing is an {{WP|Eukaryote|eukaryotic}} achievement, however there are examples of splicing in {{WP|Prokaryote|prokaryotes}} as well.</ref> and in addition {{WP|RNA editing}} has been encountered in many species. Some authors therefore refer to the ''exome''&mdash;the set of transcribed {{WP|exons}}&mdash; to indicate the actual coding sequence.
-'''Microarray technology''' &mdash; the quantitative, sequence-specific hybridization of labelled nucleotides in chip-format &mdash; was the first domain of "high-throughput biology". Today, it has largely been replaced by {{WP|RNA-Seq|'''RNA-seq'''}}: quantification of transcribed mRNA by high-throughput sequencing and mapping reads to genes. Quantifying gene expression levels in a  tissue-, development-, or response-specific way has yielded detailed insight into cellular function at the molecular level, with recent results of single-cell sequencing experiments adding a new level of precision. But not all transcripts are mapped to genes: we increasingly realize that the transcriptome is not merely a passive buffer of expressed information on its way to be translated into proteins, but contains multiple levels of complex, regulation through hybridization of small nuclear RNAs<ref>{{#pmid: 25565024}} {{#pmid: 21798102}}</ref>.
-In this assignment, we will look at differential expression of Mbp1 and its target genes.
 {{Vspace}}
@@ Line 102: / Line 78: @@
 ==GEO2R==
+Let's look at differential expression of Mbp1 and its target genes using the analysis facilities of the GEO database at the NCBI.
-In this exercise we will use the analysis facilities of the GEO database at the NCBI.
 {{task|1=
 ;First, we will search for relevant data sets on GEO, the NCBI's database for expression data.
-#Navigate to the entry page for [http://www.ncbi.nlm.nih.gov/gds/ ''' GEO data sets]].
+# Navigate to the entry page for [http://www.ncbi.nlm.nih.gov/gds/ ''' GEO data sets]].
-#Enter the following query in the usual Entrez query format: <code>"cell cycle"[ti] AND "saccharomyces cerevisiae"[organism]</code>.
+# Enter the following query in the usual Entrez query format: <code>"cell cycle"[ti] AND "saccharomyces cerevisiae"[organism]</code>.
-#You should get two datasets among the top hits that analyze wild-type yeast (W303a cells) across two cell-cycles after release from alpha-factor arrest. Choose the [http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2347 experiment with lower resolution] (13 samples).
+# There are quite a few hits and it would take a whiole to sort through them. A study that has analyzed cell-cycle data in an interesting way is Pramila ''et al.'s'' [http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2347 Cell-Cycle study], a 13-samples analysis of wild-type yeast (W303a cells) across two cell-cycles after release from alpha-factor arrest.
-#On the linked GEO DataSet Browser page, follow the link to the [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3635 Accession Viewer page: the "Reference series"].
+# On the linked GEO DataSet Browser page, follow the link to the [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3635 Accession Viewer page: the "Reference series"].
-#Read about the experiment and samples, then follow the link to [http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE3635 '''analyze with GEO2R''']
+# Read about the experiment and samples, then follow the link to [http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE3635 '''analyze with GEO2R''']
 * View the [http://www.youtube.com/watch?v=EUPmGWS8ik0 '''GEO2R''' video tutorial] on youtube.
-;Now proceed to apply this to the yeast cell-cycle study:[[File:GSE3635_ValueDistribution.png|frame|right|Value distribution for the yeast cell-cycle experiment GSE3635. Experiments are grouped approximately into equivalent time-points on a cell cycle.]]
+;Now proceed to apply what you have learned in the video-tutorial to the yeast cell-cycle study: [[File:GSE3635_ValueDistribution.png|frame|right|Value distribution for the yeast cell-cycle experiment GSE3635. Experiments are grouped approximately into equivalent time-points on a cell cycle.]]
 # '''Define groups''': the associated publication shows us that one cell-cycle takes pretty exactly 60 minutes. Create timepoints T0, T1, T2, ... T5. Then associate the 0 and 60 min. sample with "T0"; 10 and 70 minutes get grouped as "T1"; 20 and 80 minutes are T2, etc. up to T5. The final sample does not get assigned.
@@ Line 128: / Line 103: @@
 # Look for expected genes. Here are a few genes that are known to be differentially expressed in the cell-cycle as target genes of the MBF complex: <code>DSE1</code>, <code>DSE2</code>, <code>ERF3</code>, <code>HTA2</code>, <code>HTB2</code>, and <code>GAS3</code>. But what about the MBD complex proteins themselves: Mbp1 and Swi6?
-The notion of "differential expression" and "cell-cycle dependent expression" do not overlap completely. Significant differential expression is mathematically determined for genes that have low variance within groups and large differences between groups. This algorithm has no notion of any expectation you might have about the shape of the expression profile. All it finds are genes for which differential expression between some groups is statistically supported. The algorithm returns the top 250 of those. Consistency within groups is very important, while we intuitively might be giving more weight to conformance to our expectations of a cyclical pattern.
+The notion of "differential expression" and "cell-cycle dependent expression" do not overlap completely. Significant differential expression is mathematically determined for genes that have '''low variance within groups and large differences between groups'''. The algorithm has no concept of any expectation you might have about the shape of the expression profile. All it finds are genes for which differential expression between some groups is statistically supported. The algorithm returns the top 250 of those. Consistency within groups is very important, while we intuitively might be giving more weight to genes that conform to our expectations of a cyclical pattern.
 Let's see if we can group our time points differently to enhance the contrast between expression levels for cyclically expressed genes. Let's define only two groups: one set before and between the two cycles, one set at the peaks - and we'll omit some of the intermediate values.
@@ Line 136: / Line 111: @@
 # Recalculate the '''Top 250''' differentially expressed genes (you might have to refresh the page to get the "Top 250" button back.) Which of the "known" MBF targets are now contained in the set? What about Mbp1 and Swi6?
 # Finally: Let's compare the expression profiles for Mbp1, Swi6 and Swi4. It is not obvious that transcription factors are themselves under '''transcriptional''' control, as opposed to being expressed at a basal level and ''activated'' by phosporylation or ligand binding. In a new page, navigate to the [http://www.ncbi.nlm.nih.gov/geoprofiles '''Geo profiles'''] page and enter <code>(Mbp1 OR Swi6 OR Swi4 OR Nrm1 OR Cln1 OR Clb6 OR Act1 OR Alg9) AND GSE3635</code> (Nrm1, Cln1, and Clb6 are Mbp1 target genes. Act1 and Alg9 are beta-Actin and mannosyltransferase, these are often used as "housekeeping genes, i.e. genes with condition-independent expression levels, especially for qPCR studies - although Alg9 is also an Mbp1 target. We include them here as negative controls. CGSE3635 is the ID of the GEO data set we have just studied). You could have got similar results in the '''Profile graph''' tab of the GEO2R page. What do you find? What does this tell you? Would this information allow you to define groups that are even better suited for finding cyclically expressed genes?
-# Click on the profile graph for Mbp1 and print out the page. Write your name and student number on the page. With a red pen, '''in one sentence''' describe the evidence you find '''on that page''' that allows us to conclude '''whether or not''' Mbp1 is a cell-cycle gene. You'll probably want to think for a moment what this question really means, how a cell-cycle gene could be defined, and what can be considered "evidence", before you write. I will mark your response for a maximum of four marks.
+# Click on the profile graph for Mbp1. Describe the evidence you find '''on that page''' that allows us to conclude '''whether or not''' Mbp1 is a cell-cycle gene. You'll probably want to think for a moment what this question really means, how a cell-cycle gene could be defined, and what can be considered "evidence".
- <!--
+* Finally, note the '''R''' script for the GEO2R analysis in the '''R script''' tab. This code will run on your machine and make the expression analysis available. Once the datasets are loaded and prepared, you could - for example - perform a "real" time series analysis, calculate correlation coefficients with an idealized sine wave, or search for genes that are '''co-regulated''' with your genes of interest. We will explore this in another unit.
-* Finally, review the '''R''' script for the GEO2R analysis in the '''R script''' tab. This code will run on your machine and make the expression analysis available. Once the datasets are loaded and prepared, you could - for example - perform a "real" time series analysis, calculate correlation coefficients with an idealized sine wave, or search for genes that are '''co-regulated''' with your genes of interest.
--->
 }}
@@ Line 146: / Line 119: @@
 {{Vspace}}
-== Further reading, links and resources ==
-<!-- {{#pmid: 19957275}} -->
-<!-- {{WWW|WWW_GMOD}} -->
-<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
-{{Vspace}}
-== Notes ==
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "notes" -->
-<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
-<references />
-{{Vspace}}
-</div>
-<div id="ABC-unit-framework">
-== Self-evaluation ==
-<!-- included from "../components/BIN-EXPR-DE.components.wtxt", section: "self-evaluation" -->
-<!--
-=== Question 1===
-Question ...
-<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
-Answer ...
-<div class="mw-collapsible-content">
-Answer ...
-</div>
-  </div>
-  {{Vspace}}
--->
-{{Vspace}}
-{{Vspace}}
-<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
-----
-{{Vspace}}
-<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
-----
-{{Vspace}}
 <div class="about">
@@ Line 211: / Line 128: @@
 :2017-08-05
 <b>Modified:</b><br />
-:2017-08-05
+:2020-09-24
 <b>Version:</b><br />
-:0.1
+:1.1
 <b>Version history:</b><br />
+*1.1 2020 Maintainance
+*1.0 First live version
 *0.1 First stub
 </div>
-[[Category:ABC-units]]
-<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 {{CC-BY}}
+[[Category:ABC-units]]
+{{UNIT}}
+{{LIVE}}
 </div>
 <!-- [END] -->