Difference between revisions of "CSB Systems extraction"
m |
m |
||
Line 34: | Line 34: | ||
| | ||
− | |||
==Exercises== | ==Exercises== | ||
<section begin=exercises /> | <section begin=exercises /> | ||
+ | ;Try out MINE: | ||
+ | # Create a working folder on your computer (e.g. name it MINE). | ||
+ | # Navigate to http://www.exploredata.net/ and follow the link to '''Downloads'''. | ||
+ | # Follow the link to the '''Gene Expression Data Set''' in the side-bar and download '''Spellman.csv''' to your folder. | ||
+ | # Edit '''Spellman.csv''' by duplicating the first row and renaming "time" in the first row to "Name". (Don't use MSWord!!!) | ||
+ | # Follow the link to '''MINE application''' in the side-bar. | ||
+ | # Download '''MINE.jar''' and '''MINE.r''' to your folder. | ||
+ | # Follow the link to '''Parameters''' in the side-bar and study your options. | ||
+ | # Click on the link to Usage-instructions and follow the instructions: '''How to run MINE in R'''. | ||
+ | ## Start '''R''' and set your working folder as the working directory (command: <code>setwd(...)</code>). | ||
+ | ## Use '''File''' → '''Open Document...''' to open '''MINE.r''' | ||
+ | ## run: <code>install.packages("rJava")</code> ... to download the rJava package from CRAN if it hasn't been installed before | ||
+ | ## Use '''File''' → '''Source File...''' to execute the commands in '''MINE.r''' ... this executes <code>library("rJava")</code> and <code>.jinit(classpath="MINE.jar")</code> and defines the functions <code>MINE</code> and <code>rMINE</code>. | ||
+ | ## run: <code>MINE("Spellman.csv","two.pairs",1,5)</code> to verify that the installation is oK and you can access the data. | ||
+ | ## The MCM3 gene that was discussed in Reshef ''et al.'' (2011) has the systematic name YEL032W: | ||
+ | ::What is its index in the table? | ||
+ | <source lang="R"> | ||
+ | genes <- read.csv("Spellman.csv") | ||
+ | genes[grep("YEL032W",genes$Name),] | ||
+ | </source> | ||
+ | |||
+ | ::Plot the gene's expression profile: | ||
+ | <source lang="R"> | ||
+ | genes <- read.csv("Spellman.csv") | ||
+ | time <- data.matrix(genes[1,2:24]) | ||
+ | plot (time, data.matrix(genes[1017,2:24])) | ||
+ | </source> | ||
+ | |||
+ | ::Find genes with a high MIC with YEL032W. | ||
+ | <source lang="R"> | ||
+ | MINE("Spellman.csv","master.variable",1017) | ||
+ | </source> | ||
+ | |||
+ | ::Looking at the output text-file, you see that YDR191W has a high MIC. Plot its expression profile as an overlay plot, then plot the expression values of one gene against the other. Is this a postive or a negative correlation. Then explore more genes. Can you find a gene that is negatively correlated with YEL032W? | ||
+ | |||
+ | ;Have fun! | ||
+ | |||
<section end=exercises /> | <section end=exercises /> | ||
− | + | <!-- | |
| | ||
==References== | ==References== |
Revision as of 20:11, 16 March 2012
Mutual information
A powerful concept within the mathematical theory of information, the Mutual Information of two variables measures how much the knowledge about one variable reduces uncertainty about the other. For example, if two genes always either occur as a pair, or are both absent from a genome, it is sufficient to know whether one is present or not, to also know about the other. In biology, genes with high mutual information invariably are either components of physical complexes or collaborate functionally. Thus measuring mutual information in large datasets can be used to infer such relationships.
Introductory reading
Here is a useful introduction to the use of information theory, in particular mutual information for the analysis of signal transduction networks.
Waltermann & Klipp (2011) Information theory based approaches to cellular signaling. Biochim Biophys Acta 1810:924-32. (pmid: 21798319) |
Mutual information is at the core of a novel approach to quantify non-linear correlations in data. Read the perspective on this recent work here:
Speed (2011) Mathematics. A correlation for the 21st century. Science 334:1502-3. (pmid: 22174235) |
The actual paper is here; have a look, but its contents wil not be material for the quiz.
Reshef et al. (2011) Detecting novel associations in large data sets. Science 334:1518-24. (pmid: 22174245) |
Contents
...
Exercises
- Try out MINE
- Create a working folder on your computer (e.g. name it MINE).
- Navigate to http://www.exploredata.net/ and follow the link to Downloads.
- Follow the link to the Gene Expression Data Set in the side-bar and download Spellman.csv to your folder.
- Edit Spellman.csv by duplicating the first row and renaming "time" in the first row to "Name". (Don't use MSWord!!!)
- Follow the link to MINE application in the side-bar.
- Download MINE.jar and MINE.r to your folder.
- Follow the link to Parameters in the side-bar and study your options.
- Click on the link to Usage-instructions and follow the instructions: How to run MINE in R.
- Start R and set your working folder as the working directory (command:
setwd(...)
). - Use File → Open Document... to open MINE.r
- run:
install.packages("rJava")
... to download the rJava package from CRAN if it hasn't been installed before - Use File → Source File... to execute the commands in MINE.r ... this executes
library("rJava")
and.jinit(classpath="MINE.jar")
and defines the functionsMINE
andrMINE
. - run:
MINE("Spellman.csv","two.pairs",1,5)
to verify that the installation is oK and you can access the data. - The MCM3 gene that was discussed in Reshef et al. (2011) has the systematic name YEL032W:
- Start R and set your working folder as the working directory (command:
- What is its index in the table?
genes <- read.csv("Spellman.csv")
genes[grep("YEL032W",genes$Name),]
- Plot the gene's expression profile:
genes <- read.csv("Spellman.csv")
time <- data.matrix(genes[1,2:24])
plot (time, data.matrix(genes[1017,2:24]))
- Find genes with a high MIC with YEL032W.
MINE("Spellman.csv","master.variable",1017)
- Looking at the output text-file, you see that YDR191W has a high MIC. Plot its expression profile as an overlay plot, then plot the expression values of one gene against the other. Is this a postive or a negative correlation. Then explore more genes. Can you find a gene that is negatively correlated with YEL032W?
- Have fun!
Further reading and resources
Wu et al. (2003) Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19:1524-30. (pmid: 12912833) |
Wu et al. (2005) Deciphering protein network organization using phylogenetic profile groups. Genome Inform 16:142-9. (pmid: 16362916) |
Rao et al. (2008) Using directed information to build biologically relevant influence networks. J Bioinform Comput Biol 6:493-519. (pmid: 18574860) |
Luo & Woolf (2010) Reconstructing transcriptional regulatory networks using three-way mutual information and Bayesian networks. Methods Mol Biol 674:401-18. (pmid: 20827604) |
Speed (2011) Mathematics. A correlation for the 21st century. Science 334:1502-3. (pmid: 22174235) |
Reshef et al. (2011) Detecting novel associations in large data sets. Science 334:1518-24. (pmid: 22174245) |