Gene regulatory networks

From "A B C"
Revision as of 19:30, 29 January 2012 by Boris (talk | contribs)
Jump to navigation Jump to search

Gene Regulatory Networks


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


The discovery and definition of gene regulatory networks is one of the big topics of systems biology, not only because of their biological importance, but also because the basic data can be acquired from the first high-throughput assays in biology: microarray expression profiles.



 

Introductory reading

Baitaluk (2009) System biology of gene regulation. Methods Mol Biol 569:55-87. (pmid: 19623486)

PubMed ] [ DOI ] A famous joke story that exhibits the traditionally awkward alliance between theory and experiment and showing the differences between experimental biologists and theoretical modelers is when a University sends a biologist, a mathematician, a physicist, and a computer scientist to a walking trip in an attempt to stimulate interdisciplinary research. During a break, they watch a cow in a field nearby and the leader of the group asks, "I wonder how one could decide on the size of a cow?" Since a cow is a biological object, the biologist responded first: "I have seen many cows in this area and know it is a big cow." The mathematician argued, "The true volume is determined by integrating the mathematical function that describes the outer surface of the cow's body." The physicist suggested: "Let's assume the cow is a sphere...." Finally the computer scientist became nervous and said that he didn't bring his computer because there is no Internet connection up there on the hill. In this humorous but explanatory story suggestions proposed by theorists can be taken to reflect the view of many experimental biologists that computer scientists and theorists are too far removed from biological reality and therefore their theories and approaches are not of much immediate usefulness. Conversely, the statement of the biologist mirrors the view of many traditional theoretical and computational scientists that biological experiments are for the most part simply descriptive, lack rigor, and that much of the resulting biological data are of questionable functional relevance. One of the goals of current biology as a multidisciplinary science is to bring people from different scientific areas together on the same "hill" and teach them to speak the same "language." In fact, of course, when presenting their data, most experimentalist biologists do provide an interpretation and explanation for the results, and many theorists/computer scientists aim to answer (or at least to fully describe) questions of biological relevance. Thus systems biology could be treated as such a socioscientific phenomenon and a new approach to both experiments and theory that is defined by the strategy of pursuing integration of complex data about the interactions in biological systems from diverse experimental sources using interdisciplinary tools and personnel.


 

Contents

  • Principles
  • Annotation of transcription factor binding sites
  • Network discovery from expression profiles

   

Further reading and resources

Principles
Harbison et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431:99-104. (pmid: 15343339)

PubMed ] [ DOI ] DNA-binding transcriptional regulators interpret the genome's regulatory code by binding to specific sequences to induce or repress gene expression. Comparative genomics has recently been used to identify potential cis-regulatory sequences within the yeast genome on the basis of phylogenetic conservation, but this information alone does not reveal if or when transcriptional regulators occupy these binding sites. We have constructed an initial map of yeast's transcriptional regulatory code by identifying the sequence elements that are bound by regulators under various conditions and that are conserved among Saccharomyces species. The organization of regulatory elements in promoters and the environment-dependent use of these elements by regulators are discussed. We find that environment-specific use of regulatory elements predicts mechanistic models for the function of a large population of yeast's transcriptional regulators.

Vaquerizas et al. (2012) How do you find transcription factors? Computational approaches to compile and annotate repertoires of regulators for any genome. Methods Mol Biol 786:3-19. (pmid: 21938617)

PubMed ] [ DOI ] Transcription factors (TFs) play an important role in regulating gene expression. The availability of complete genome sequences and associated functional genomic data offer excellent opportunities to understand the transcriptional regulatory system of an entire organism. To do so, however, it is essential to compile a reliable dataset of regulatory components. Here, we review computational methods and publicly accessible resources that help identify TF-coding genes in prokaryotic and eukaryotic genomes. Since the regulatory functions of most TFs remain unknown, we also discuss approaches for combining diverse genomic datasets that will help elucidate their chromosomal organisation, expression, and evolutionary conservation. These analysis methods provide a solid foundation for further investigations of the transcriptional regulatory system.

El-Samad & Weissman (2011) Genetics: Noise rules. Nature 480:188-9. (pmid: 22158239)

PubMed ] [ DOI ]

Knabe et al. (2010) Genetic algorithms and their application to in silico evolution of genetic regulatory networks. Methods Mol Biol 673:297-321. (pmid: 20835807)

PubMed ] [ DOI ] A genetic algorithm (GA) is a procedure that mimics processes occurring in Darwinian evolution to solve computational problems. A GA introduces variation through "mutation" and "recombination" in a "population" of possible solutions to a problem, encoded as strings of characters in "genomes," and allows this population to evolve, using selection procedures that favor the gradual enrichment of the gene pool with the genomes of the "fitter" individuals. GAs are particularly suitable for optimization problems in which an effective system design or set of parameter values is sought.In nature, genetic regulatory networks (GRNs) form the basic control layer in the regulation of gene expression levels. GRNs are composed of regulatory interactions between genes and their gene products, and are, inter alia, at the basis of the development of single fertilized cells into fully grown organisms. This paper describes how GAs may be applied to find functional regulatory schemes and parameter values for models that capture the fundamental GRN characteristics. The central ideas behind evolutionary computation and GRN modeling, and the considerations in GA design and use are discussed, and illustrated with an extended example. In this example, a GRN-like controller is sought for a developmental system based on Lewis Wolpert's French flag model for positional specification, in which cells in a growing embryo secrete and detect morphogens to attain a specific spatial pattern of cellular differentiation.

Pilpel (2011) Noise in biological systems: pros, cons, and mechanisms of control. Methods Mol Biol 759:407-25. (pmid: 21863500)

PubMed ] [ DOI ] Genetic regulatory circuits are often regarded as precise machines that accurately determine the level of expression of each protein. Most experimental technologies used to measure gene expression levels are incapable of testing and challenging this notion, as they often measure levels averaged over entire populations of cells. Yet, when expression levels are measured at the single cell level of even genetically identical cells, substantial cell-to-cell variation (or "noise") may be observed. Sometimes different genes in a given genome may display different levels of noise; even the same gene, expressed under different environmental conditions, may display greater cell-to-cell variability in specific conditions and more tight control in other situations. While at first glance noise may seem to be an undesired property of biological networks, it might be beneficial in some cases. For instance, noise will increase functional heterogeneity in a population of microorganisms facing variable, often unpredictable, environmental changes, increasing the probability that some cells may survive the stress. In that respect, we can speculate that the population is implementing a risk distribution strategy, long before genetic heterogeneity could be acquired. Organisms may have evolved to regulate not only the averaged gene expression levels but also the extent of allowed deviations from such an average, setting it at the desired level for every gene under each specific condition. Here we review the evolving understanding of noise, its molecular underpinnings, and its effect on phenotype and fitness--when it can be detrimental, beneficial, or neutral and which regulatory tools eukaryotic cells may use to optimally control it.

TFBS and Network discovery
Chan et al. (2009) Discovering multiple realistic TFBS motifs based on a generalized model. BMC Bioinformatics 10:321. (pmid: 19811641)

PubMed ] [ DOI ] BACKGROUND: Identification of transcription factor binding sites (TFBSs) is a central problem in Bioinformatics on gene regulation. de novo motif discovery serves as a promising way to predict and better understand TFBSs for biological verifications. Real TFBSs of a motif may vary in their widths and their conservation degrees within a certain range. Deciding a single motif width by existing models may be biased and misleading. Additionally, multiple, possibly overlapping, candidate motifs are desired and necessary for biological verification in practice. However, current techniques either prohibit overlapping TFBSs or lack explicit control of different motifs. RESULTS: We propose a new generalized model to tackle the motif widths by considering and evaluating a width range of interest simultaneously, which should better address the width uncertainty. Moreover, a meta-convergence framework for genetic algorithms (GAs), is proposed to provide multiple overlapping optimal motifs simultaneously in an effective and flexible way. Users can easily specify the difference amongst expected motif kinds via similarity test. Incorporating Genetic Algorithm with Local Filtering (GALF) for searching, the new GALF-G (G for generalized) algorithm is proposed based on the generalized model and meta-convergence framework. CONCLUSION: GALF-G was tested extensively on over 970 synthetic, real and benchmark datasets, and is usually better than the state-of-the-art methods. The range model shows an increase in sensitivity compared with the single-width ones, while providing competitive precisions on the E. coli benchmark. Effectiveness can be maintained even using a very small population, exhibiting very competitive efficiency. In discovering multiple overlapping motifs in a real liver-specific dataset, GALF-G outperforms MEME by up to 73% in overall F-scores. GALF-G also helps to discover an additional motif which has probably not been annotated in the dataset. http://www.cse.cuhk.edu.hk/%7Etmchan/GALFG/

Segal et al. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34:166-76. (pmid: 12740579)

PubMed ] [ DOI ] Much of a cell's activity is organized as a network of interacting modules: sets of genes coregulated to respond to different conditions. We present a probabilistic method for identifying regulatory modules from gene expression data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'. We applied the method to a Saccharomyces cerevisiae expression data set, showing its ability to identify functionally coherent modules and their correct regulators. We present microarray experiments supporting three novel predictions, suggesting regulatory roles for previously uncharacterized proteins.

Lee & Tzou (2009) Computational methods for discovering gene networks from expression data. Brief Bioinformatics 10:408-23. (pmid: 19505889)

PubMed ] [ DOI ] Designing and conducting experiments are routine practices for modern biologists. The real challenge, especially in the post-genome era, usually comes not from acquiring data, but from subsequent activities such as data processing, analysis, knowledge generation and gaining insight into the research question of interest. The approach of inferring gene regulatory networks (GRNs) has been flourishing for many years, and new methods from mathematics, information science, engineering and social sciences have been applied. We review different kinds of computational methods biologists use to infer networks of varying levels of accuracy and complexity. The primary concern of biologists is how to translate the inferred network into hypotheses that can be tested with real-life experiments. Taking the biologists' viewpoint, we scrutinized several methods for predicting GRNs in mammalian cells, and more importantly show how the power of different knowledge databases of different types can be used to identify modules and subnetworks, thereby reducing complexity and facilitating the generation of testable hypotheses.

Myers et al. (2009) Discovering biological networks from diverse functional genomic data. Methods Mol Biol 563:157-75. (pmid: 19597785)

PubMed ] [ DOI ] Recent advances in biotechnology have produced a wealth of genomic data, which capture a variety of complementary cellular features. While these data promise to yield key insights into molecular biology, much of the available information remains underutilized because of the lack of scalable approaches for integrating signals across large, diverse data sets. A proper framework for capturing these numerous snapshots of complementary phenomena under a variety of conditions can provide the holistic view necessary for developing precise systems-level hypotheses. Here we describe bioPIXIE, a system for combining information from diverse genomic data sets to predict biological networks. bioPIXIE utilizes a Bayesian framework for probabilistic integration of several high-throughput genomic data types including gene expression, protein-protein interactions, genetic interactions, protein localization, and sequence data to predict biological networks. The main purpose of the system is to support user-driven exploration through the inferred functional network, which is enabled by a public, web-based interface. We describe the features and supporting methods of this integration and discovery framework and present case examples where bioPIXIE has been used to generate specific, testable hypotheses for Saccharomyces cerevisiae, many of which have been confirmed experimentally.

Schultheiss (2010) Kernel-based identification of regulatory modules. Methods Mol Biol 674:213-23. (pmid: 20827594)

PubMed ] [ DOI ] The challenge of identifying cis-regulatory modules (CRMs) is an important milestone for the ultimate goal of understanding transcriptional regulation in eukaryotic cells. It has been approached, among others, by motif-finding algorithms that identify overrepresented motifs in regulatory sequences. These methods succeed in finding single, well-conserved motifs, but fail to identify combinations of degenerate binding sites, like the ones often found in CRMs. We have developed a method that combines the abilities of existing motif finding with the discriminative power of a machine learning technique to model the regulation of genes (Schultheiss et al. (2009) Bioinformatics 25, 2126-2133). Our software is called KIRMES: , which stands for kernel-based identification of regulatory modules in eukaryotic sequences. Starting from a set of genes thought to be co-regulated, KIRMES: can identify the key CRMs responsible for this behavior and can be used to determine for any other gene not included on that list if it is also regulated by the same mechanism. Such gene sets can be derived from microarrays, chromatin immunoprecipitation experiments combined with next-generation sequencing or promoter/whole genome microarrays. The use of an established machine learning method makes the approach fast to use and robust with respect to noise. By providing easily understood visualizations for the results returned, they become interpretable and serve as a starting point for further analysis. Even for complex regulatory relationships, KIRMES: can be a helpful tool in directing the design of biological experiments.

Applications
Csikász-Nagy (2009) Computational systems biology of the cell cycle. Brief Bioinformatics 10:424-34. (pmid: 19270018)

PubMed ] [ DOI ] One of the early success stories of computational systems biology was the work done on cell-cycle regulation. The earliest mathematical descriptions of cell-cycle control evolved into very complex, detailed computational models that describe the regulation of cell division in many different cell types. On the way these models predicted several dynamical properties and unknown components of the system that were later experimentally verified/identified. Still, research on this field is far from over. We need to understand how the core cell-cycle machinery is controlled by internal and external signals, also in yeast cells and in the more complex regulatory networks of higher eukaryotes. Furthermore, there are many computational challenges what we face as new types of data appear thanks to continuing advances in experimental techniques. We have to deal with cell-to-cell variations, revealed by single cell measurements, as well as the tremendous amount of data flowing from high throughput machines. We need new computational concepts and tools to handle these data and develop more detailed, more precise models of cell-cycle regulation in various organisms. Here we review past and present of computational modeling of cell-cycle regulation, and discuss possible future directions of the field.

Alberghina et al. (2009) Systems biology of the cell cycle of Saccharomyces cerevisiae: From network mining to system-level properties. Biotechnol Adv 27:960-978. (pmid: 19465107)

PubMed ] [ DOI ] Following a brief description of the operational procedures of systems biology (SB), the cell cycle of budding yeast is discussed as a successful example of a top-down SB analysis. After the reconstruction of the steps that have led to the identification of a sizer plus timer network in the G1 to S transition, it is shown that basic functions of the cell cycle (the setting of the critical cell size and the accuracy of DNA replication) are system-level properties, detected only by integrating molecular analysis with modelling and simulation of their underlying networks. A detailed network structure of a second relevant regulatory step of the cell cycle, the exit from mitosis, derived from extensive data mining, is constructed and discussed. To reach a quantitative understanding of how nutrients control, through signalling, metabolism and transcription, cell growth and cycle is a very relevant aim of SB. Since we know that about 900 gene products are required for cell cycle execution and control in budding yeast, it is quite clear that a purely systematic approach would require too much time. Therefore lines for a modular SB approach, which prioritises molecular and computational investigations for faster cell cycle understanding, are proposed. The relevance of the insight coming from the cell cycle SB studies in developing a new framework for tackling very complex biological processes, such as cancer and aging, is discussed.

Efroni et al. (2007) Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS ONE 2:e425. (pmid: 17487280)

PubMed ] [ DOI ] Cancer is recognized to be a family of gene-based diseases whose causes are to be found in disruptions of basic biologic processes. An increasingly deep catalogue of canonical networks details the specific molecular interaction of genes and their products. However, mapping of disease phenotypes to alterations of these networks of interactions is accomplished indirectly and non-systematically. Here we objectively identify pathways associated with malignancy, staging, and outcome in cancer through application of an analytic approach that systematically evaluates differences in the activity and consistency of interactions within canonical biologic processes. Using large collections of publicly accessible genome-wide gene expression, we identify small, common sets of pathways - Trka Receptor, Apoptosis response to DNA Damage, Ceramide, Telomerase, CD40L and Calcineurin - whose differences robustly distinguish diverse tumor types from corresponding normal samples, predict tumor grade, and distinguish phenotypes such as estrogen receptor status and p53 mutation state. Pathways identified through this analysis perform as well or better than phenotypes used in the original studies in predicting cancer outcome. This approach provides a means to use genome-wide characterizations to map key biological processes to important clinical features in disease.