CSB Assignment Week 6

From "A B C"
Jump to navigation Jump to search

Assignments for Week 6
Gene Regulatory Networks revisited

< Assignment 5 Assignment 7 >

Note! This assignment is currently active. All significant changes will be announced on the mailing list.

 
 



Context

One of the interesting parts of the Mogrify workflow is the use of a network weighting method, based on STRING and GRN networks - the network-based sphere of influence. The idea behind this is that effects of genes propagate a certain distance through networks. Such network-based analytics are systems biology methods par excellence. In our workflow, transcription factors are ranked, based upon how many differentially expressed genes they are associated with.

From your iGraph tutorial, you will recall that networks can be constructed from adjacency matrices, or from edge lists. Whatever the source is: if we want to build a network, we need to define the nodes, and we need to define when to posit edges between the nodes. This seems quite straightforward for STRING - we can download the whole database as an edge-list. TFs are nodes, the neighbourhood of one node is quickly determined from the edges provided by STRING, and we can easily evaluate the DESeq results for each neighbour. But how is the MARA network constructed?


MARA

You will recall that we had a long discussion last Tuesday about MARA. Rackham et al. state: "MARA provides protein-DNA interactions for transcription factors with known binding sites in the promoter regions of a gene." (online Methods, Step 3.)[1] Nodes are presumably genes. But what exactly are the edges? The Rackham paper does not say. The initial iteration of the Ontoscope workflow assumed that a MARA edge list was available for download. But matters are more complicated.

The MARA algorithm was described in detail in 2009[2], in the paper's Supplementary Information. Fundamentally, known TFA binding-sites are sought in the promoter regions of differentially expressed genes. The construction of the Weight Matrices to identify the binding-sites is an involved procedure to begin with. But the core of the procedure is to identify "motif activities" - i.e. the contribution of a single TF/motif interaction to the observed expression change in a sample. The procedure is complex and not described to be reproduced. The end result is a z-value which could be interpreted as a probability that the expression change is actually due to a particular TF.

The core network was constructed by first selecting all predicted regulatory interactions (z-value at least 1.5) between core motifs and promoters that are associated with a gene which is a TF that in turn is associated with a core motif. This set of predicted regulatory interactions was then filtered by choosing only interactions that have independent experimental support of at least one of the following types. 1) The regulatory interaction has been reported in the literature 2) There is a ChIP-chip experiment in which binding of one of the TFs associated with the motif to the promoter of the target gene has been reported. 3) In our siRNA experiments the target promoter is observed to be perturbed in expression (B-statistic larger than zero) after knockdown of a TF associated with the motif.

The bottom line for this is: it seems implausible to reproduce this procedure within the limited scope of our project. We could make use of the ISMARA server[3]. However this requires upload of whole expression profiles to the SIB servers and extensive postprocessing of results. By all means - we should pursue this, not the least to be able to compare individual results, but it should not be on the critical path of our project.

Moreover - and I consider this a big downside to the procedure - the MARA network is separately constructed for each DESeq result set, it can't be precomputed.

Finally, you should note that there is a certain tautology in using expression data to predict a network, and then using that network to explain the expression data. These cannot be considered informationally orthogonal.

Alternatives

But is it necessary to use MARA? We note that the careful quantitative analysis of motif activities is not actually used, other than to define network edges. These edges are not considered "weighted" edges in the GRN (Gene Regulatory Network) graph. Why not work with static graphs based on ENCODE data or similar instead, and rely on the differential expression of the neighbourhood to provide the correct ranking? Is MARA really better?

Here is where you come in. We will analyze and evaluate the procedures that are currently available to build TF target lists or GRNs.

Here is a short, recent overview of the methods:

Liu (2015) Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 16:3-22. (pmid: 25937810)

PubMed ] [ DOI ]


And here are recent papers in the field.

Diez et al. (2014) Systematic identification of transcriptional regulatory modules from protein-protein interaction networks. Nucleic Acids Res 42:e6. (pmid: 24137002)

PubMed ] [ DOI ]

(Note: Bioconductor package available.)

Jang et al. (2013) hARACNe: improving the accuracy of regulatory model reverse engineering via higher-order data processing inequality tests. Interface Focus 3:20130011. (pmid: 24511376)

PubMed ] [ DOI ]

(Note: source code available.)

Blatti et al. (2015) Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 43:3998-4012. (pmid: 25791631)

PubMed ] [ DOI ]

Medina-Rivera et al. (2015) RSAT 2015: Regulatory Sequence Analysis Tools. Nucleic Acids Res 43:W50-6. (pmid: 25904632)

PubMed ] [ DOI ]

Nicolle et al. (2015) CoRegNet: reconstruction and integrated analysis of co-regulatory networks. Bioinformatics 31:3066-8. (pmid: 25979476)

PubMed ] [ DOI ]

Jiang et al. (2015) Inference of transcriptional regulation in cancers. Proc Natl Acad Sci U.S.A 112:7731-6. (pmid: 26056275)

PubMed ] [ DOI ]

Han et al. (2015) TRRUST: a reference database of human transcriptional regulatory interactions. Sci Rep 5:11432. (pmid: 26066708)

PubMed ] [ DOI ]

Pemberton-Ross et al. (2015) ARMADA: Using motif activity dynamics to infer gene regulatory networks from gene expression data. Methods 85:62-74. (pmid: 26164700)

PubMed ] [ DOI ]

(Note: MARA authors)

Gitter & Bar-Joseph (2016) The SDREM Method for Reconstructing Signaling and Regulatory Response Networks: Applications for Studying Disease Progression. Methods Mol Biol 1303:493-506. (pmid: 26235087)

PubMed ] [ DOI ]

Narang et al. (2015) Automated Identification of Core Regulatory Genes in Human Gene Regulatory Networks. PLoS Comput Biol 11:e1004504. (pmid: 26393364)

PubMed ] [ DOI ]

Liu et al. (2015) RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database (Oxford) 2015:. (pmid: 26424082)

PubMed ] [ DOI ]

Kulakovskiy et al. (2016) HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res 44:D116-25. (pmid: 26586801)

PubMed ] [ DOI ]

Affeldt et al. (2016) 3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics. BMC Bioinformatics 17 Suppl 2:12. (pmid: 26823190)

PubMed ] [ DOI ]

Ruyssinck et al. (2016) Netter: re-ranking gene network inference predictions using structural network properties. BMC Bioinformatics 17:76. (pmid: 26862054)

PubMed ] [ DOI ]

Omranian et al. (2016) Gene regulatory network inference using fused LASSO on multiple data sets. Sci Rep 6:20533. (pmid: 26864687)

PubMed ] [ DOI ]

(Note: R code available.)

Zerbino et al. (2016) Ensembl regulation resources. Database (Oxford) 2016:. (pmid: 26888907)

PubMed ] [ DOI ]




Analyzing GRN construction

Task:

  • Choose one of the papers cited here that provides an exact computational procedure how to build a TF target list or a GRN from public data[4].
  • Email me on Monday which paper you have chosen.
  • Analyze the approach with a SPN diagram and enough annotation that you could design the algorithm.
  • Bring your diagram and annotation to class on Tuesday. Refer to the marking rubrics for Assigned Material for how to make this an excellent piece work. Also: "late rules" like last time: same day but not in class: marks * 0.5, next day: marks * 0.2, the day after: marks * 0.1 - then 0. The diagrams will be marked by me for a maximum of six marks. No quiz.

In class, I would like to compare and contrast approaches. Can yours replace MARA for our purposes? Let's discuss...



 


 
That is all.


 

Footnotes and references

 
  1. Rackham et al. (2016) A predictive computational framework for direct reprogramming between human cell types. Nat Genet 48:331-5. (pmid: 26780608)

    PubMed ] [ DOI ]

  2. Suzuki et al. (2009) The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 41:553-62. (pmid: 19377474)

    PubMed ] [ DOI ]

  3. Balwierz et al. (2014) ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs. Genome Res 24:869-84. (pmid: 24515121)

    PubMed ] [ DOI ]

  4. You may choose a different paper if you e-mail me the reference AND I approve.


 


 
Ask, if things don't work for you!
If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.
... are required reading.


 



< Assignment 5 Assignment 7 >