Pathways and Networks

From "A B C"
Jump to navigation Jump to search

Pathways and Networks


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


One of the enduring contributions of biochemistry in the 20th century was the concept that the chemistry of life is not laid out in single-step reactions, but in organized, multi-step transformations across numerous intermediates: the concept of the biochemical pathway. The concept was productively applied to other multi-step biological phenomena: signalling pathways and developmental pathways are just two. However, the sobering reality is that interactions in biology are not laid out so neatly in practice. Rather we encounter multiple cross links between pathway components, which give rise to interconnected networks of components. The topic of biological networks is very large. On this page, we focus primarily on the principles of biological pathways and networks, on clustering, to discover significant associations, on the topology of networks, in particular whether biological networks are "scale-free", and the discovery of network motifs.



Introductory reading

Zhu & Snyder (2002) "Omic" approaches for unraveling signaling networks. Curr Opin Cell Biol 14:173-9. (pmid: 11891116)

PubMed ] [ DOI ] Signaling pathways are crucial for cell differentiation and response to cellular environments. Recently, a large number of approaches for the global analysis of genes and proteins have been described. These have provided important new insights into the components of different pathways and the molecular and cellular responses of these pathways. This review covers genomic and proteomic (collectively referred to as "omic") approaches for the global analysis of cell signaling, including gene expression profiling and analysis, protein-protein interaction methods, protein microarrays, mass spectroscopy and gene-disruption and engineering approaches.

Bouveret & Brun (2012) Bacterial interactomes: from interactions to networks. Methods Mol Biol 804:15-33. (pmid: 22144146)

PubMed ] [ DOI ] In order to ensure their function(s) in the cell, proteins are organized in machineries, underlaid by a complex network of interactions. Identifying protein interactions is thus crucial to our understanding of cell functioning. Technical advances in molecular biology and genomic technology now allow for the systematic study of the interactions occurring in a given organism. This review first presents the techniques readily available to microbiologists for studying protein-protein interactions in bacteria, as well as their usability for high-throughput studies. Two types of techniques need to be considered: (1) the isolation of protein complexes from the organism of interest by affinity purification, and subsequent identification of the complex partners by mass spectrometry and (2) two-hybrid techniques, in general based on the production of two recombinant proteins whose interaction has to be tested in a reporter cell. Next, we summarize the bacterial interactomes already published. Finally, the strengths and pitfalls of the techniques are discussed, together with the potential prospect of interactome studies in bacteria.

Emmert-Streib & Glazko (2011) Pathway analysis of expression data: deciphering functional building blocks of complex diseases. PLoS Comput Biol 7:e1002053. (pmid: 21637797)

PubMed ] [ DOI ]


 

Contents

 

Principles

Systems Biology Graphical Notation - a community project for a standardized notation of entities and relationships for systems biology maps.
Charloteaux et al. (2011) Protein-protein interactions and networks: forward and reverse edgetics. Methods Mol Biol 759:197-213. (pmid: 21863489)

PubMed ] [ DOI ] Phenotypic variations of an organism may arise from alterations of cellular networks, ranging from the complete loss of a gene product to the specific perturbation of a single molecular interaction. In interactome networks that are modeled as nodes (macromolecules) connected by edges (interactions), these alterations can be thought of as node removal and edge-specific or "edgetic" perturbations, respectively. Here we present two complementary strategies, forward and reverse edgetics, to investigate the phenotypic outcomes of edgetic perturbations of binary protein-protein interaction networks. Both approaches are based on the yeast two-hybrid system (Y2H). The first allows the determination of the interaction profile of proteins encoded by alleles with known phenotypes to identify edgetic alleles. The second is used to directly isolate edgetic alleles for subsequent in vivo characterization.

Sardiu & Washburn (2011) Building protein-protein interaction networks with proteomics and informatics tools. J Biol Chem 286:23645-51. (pmid: 21566121)

PubMed ] [ DOI ] The systematic characterization of the whole interactomes of different model organisms has revealed that the eukaryotic proteome is highly interconnected. Therefore, biological research is progressively shifting away from classical approaches that focus only on a few proteins toward whole protein interaction networks to describe the relationship of proteins in biological processes. In this minireview, we survey the most common methods for the systematic identification of protein interactions and exemplify different strategies for the generation of protein interaction networks. In particular, we will focus on the recent development of protein interaction networks derived from quantitative proteomics data sets.

Sneppen et al. (2010) Simplified models of biological networks. Annu Rev Biophys 39:43-59. (pmid: 20192769)

PubMed ] [ DOI ] The function of living cells is controlled by complex regulatory networks that are built of a wide diversity of interacting molecular components. The sheer size and intricacy of molecular networks of even the simplest organisms are obstacles toward understanding network functionality. This review discusses the achievements and promise of a bottom-up approach that uses well-characterized subnetworks as model systems for understanding larger networks. It highlights the interplay between the structure, logic, and function of various types of small regulatory circuits. The bottom-up approach advocates understanding regulatory networks as a collection of entangled motifs. We therefore emphasize the potential of negative and positive feedback, as well as their combinations, to generate robust homeostasis, epigenetics, and oscillations.

Alberghina et al. (2009) Molecular networks and system-level properties. J Biotechnol 144:224-33. (pmid: 19616593)

PubMed ] [ DOI ] Molecular systems biology aims to describe the functions of complex biological processes through recursive integration of molecular analysis, modeling, simulation and theory. It focuses on networks that originate from interconnection of genes, proteins and metabolites whose dynamic interactions generate, as an emergent property of the system, the corresponding function. Although evolutionary optimized, intracellular biochemical parameters, such as the expression level of gene products or the affinity between two or more proteins, must have a permissible range that gives robustness against perturbations to the system. Using the yeast G(1)-to-S transition network as an example we show that sophisticated relations exist among network structure, emergent property and robustness. Different emergent properties are generated from the same network by changing the strength of its interactions, not only by altering expression level, but also through mono and multi-site phosphorylation/dephosphorylation. Besides, multi-site protein phosphorylation modules, widespread in cell cycle, may ensure robust and coherent timing of cell cycle transitions as it happens for the onset of DNA replication. In conclusion, the modulation of biological function/emergent property by modifying interaction strength provides an efficient, highly tunable device to regulate biological processes. Furthermore, the principles outlined herein may provide new insight to network analysis in drug discovery.

Chang (2009) Prioritizing genes for pathway impact using network analysis. Methods Mol Biol 563:141-56. (pmid: 19597784)

PubMed ] [ DOI ] Prioritization, or ranking, of gene lists is becoming increasingly important for analyzing data generated from high-throughput assays like expression profiling and RNAi-based screening. This is especially the case when specific genes in a list need to be further validated using low-throughput experiments. In addition to gene set overlap enrichment methods, a complementary approach is to examine molecular interaction networks. These can provide putative functional insights based on gene connectivity, especially when many genes contain little or no annotation. For bench and computational biologists alike, using networks requires an informed selection of interaction data for network construction and strategies for managing network complexity. Moreover, graph theory and social network analysis methods can be used to isolate critical subnetworks and quantify network properties. Here, I discuss the basic components of networks, implications of their structure for functional interpretation, and common metrics used for prioritization. Although this is still an ongoing area of research, networks are providing new ways for gauging pathway impact in large-scale data sets.

Lipshtat et al. (2009) Specification of spatial relationships in directed graphs of cell signaling networks. Ann N Y Acad Sci 1158:44-56. (pmid: 19348631)

PubMed ] [ DOI ] Graph theory provides a useful and powerful tool for the analysis of cellular signaling networks. Intracellular components such as cytoplasmic signaling proteins, transcription factors, and genes are connected by links, representing various types of chemical interactions that result in functional consequences. However, these graphs lack important information regarding the spatial distribution of cellular components. The ability of two cellular components to interact depends not only on their mutual chemical affinity but also on colocalization to the same subcellular region. Localization of components is often used as a regulatory mechanism to achieve specific effects in response to different receptor signals. Here we describe an approach for incorporating spatial distribution into graphs and for the development of mixed graphs where links are specified by mutual chemical affinity as well as colocalization. We suggest that such mixed graphs will provide more accurate descriptions of functional cellular networks and their regulatory capabilities and aid in the development of large-scale predictive models of cellular behavior.

Liu et al. (2007) Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet 3:e96. (pmid: 17571924)

PubMed ] [ DOI ] Type 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug treatment, and gene knockouts, and yet all display the clinical hallmarks of hyperglycemia and insulin resistance in peripheral tissue. The recent advances in gene-expression microarray technologies present an unprecedented opportunity to study type 2 diabetes mellitus at a genome-wide scale and across different models. To date, a key challenge has been to identify the biological processes or signaling pathways that play significant roles in the disorder. Here, using a network-based analysis methodology, we identified two sets of genes, associated with insulin signaling and a network of nuclear receptors, which are recurrent in a statistically significant number of diabetes and insulin resistance models and transcriptionally altered across diverse tissue types. We additionally identified a network of protein-protein interactions between members from the two gene sets that may facilitate signaling between them. Taken together, the results illustrate the benefits of integrating high-throughput microarray studies, together with protein-protein interaction networks, in elucidating the underlying biological processes associated with a complex disorder.


 

Topology

Albert (2005) Scale-free networks in cell biology. J Cell Sci 118:4947-57. (pmid: 16254242)

PubMed ] [ DOI ] A cell's behavior is a consequence of the complex interactions between its numerous constituents, such as DNA, RNA, proteins and small molecules. Cells use signaling pathways and regulatory mechanisms to coordinate multiple processes, allowing them to respond to and adapt to an ever-changing environment. The large number of components, the degree of interconnectivity and the complex control of cellular networks are becoming evident in the integrated genomic and proteomic analyses that are emerging. It is increasingly recognized that the understanding of properties that arise from whole-cell function require integrated, theoretical descriptions of the relationships between different cellular components. Recent theoretical advances allow us to describe cellular network structure with graph concepts and have revealed organizational features shared with numerous non-biological networks. We now have the opportunity to describe quantitatively a network of hundreds or thousands of interacting components. Moreover, the observed topologies of cellular networks give us clues about their evolution and how their organization influences their function and dynamic responses.

Lima-Mendez & van Helden (2009) The powerful law of the power law and other myths in network biology. Mol Biosyst 5:1482-93. (pmid: 20023717)

PubMed ] [ DOI ] For almost 10 years, topological analysis of different large-scale biological networks (metabolic reactions, protein interactions, transcriptional regulation) has been highlighting some recurrent properties: power law distribution of degree, scale-freeness, small world, which have been proposed to confer functional advantages such as robustness to environmental changes and tolerance to random mutations. Stochastic generative models inspired different scenarios to explain the growth of interaction networks during evolution. The power law and the associated properties appeared so ubiquitous in complex networks that they were qualified as "universal laws". However, these properties are no longer observed when the data are subjected to statistical tests: in most cases, the data do not fit the expected theoretical models, and the cases of good fitting merely result from sampling artefacts or improper data representation. The field of network biology seems to be founded on a series of myths, i.e. widely believed but false ideas. The weaknesses of these foundations should however not be considered as a failure for the entire domain. Network analysis provides a powerful frame for understanding the function and evolution of biological processes, provided it is brought to an appropriate level of description, by focussing on smaller functional modules and establishing the link between their topological properties and their dynamical behaviour.

Ghoshal et al. (2013) Uncovering the role of elementary processes in network evolution. Sci Rep 3:2920. (pmid: 24108146)

PubMed ] [ DOI ] The growth and evolution of networks has elicited considerable interest from the scientific community and a number of mechanistic models have been proposed to explain their observed degree distributions. Various microscopic processes have been incorporated in these models, among them, node and edge addition, vertex fitness and the deletion of nodes and edges. The existing models, however, focus on specific combinations of these processes and parameterize them in a way that makes it difficult to elucidate the role of the individual elementary mechanisms. We therefore formulated and solved a model that incorporates the minimal processes governing network evolution. Some contribute to growth such as the formation of connections between existing pair of vertices, while others capture deletion; the removal of a node with its corresponding edges, or the removal of an edge between a pair of vertices. We distinguish between these elementary mechanisms, identifying their specific role on network evolution.

Hao & Li (2011) The dichotomy in degree correlation of biological networks. PLoS ONE 6:e28322. (pmid: 22164269)

PubMed ] [ DOI ] Most complex networks from different areas such as biology, sociology or technology, show a correlation on node degree where the possibility of a link between two nodes depends on their connectivity. It is widely believed that complex networks are either disassortative (links between hubs are systematically suppressed) or assortative (links between hubs are enhanced). In this paper, we analyze a variety of biological networks and find that they generally show a dichotomous degree correlation. We find that many properties of biological networks can be explained by this dichotomy in degree correlation, including the neighborhood connectivity, the sickle-shaped clustering coefficient distribution and the modularity structure. This dichotomy distinguishes biological networks from real disassortative networks or assortative networks such as the Internet and social networks. We suggest that the modular structure of networks accounts for the dichotomy in degree correlation and vice versa, shedding light on the source of modularity in biological networks. We further show that a robust and well connected network necessitates the dichotomy of degree correlation, suggestive of an evolutionary motivation for its existence. Finally, we suggest that a dichotomous degree correlation favors a centrally connected modular network, by which the integrity of network and specificity of modules might be reconciled.

Barabási (2009) Scale-free networks: a decade and beyond. Science 325:412-3. (pmid: 19628854)

PubMed ] [ DOI ] For decades, we tacitly assumed that the components of such complex systems as the cell, the society, or the Internet are randomly wired together. In the past decade, an avalanche of research has shown that many real networks, independent of their age, function, and scope, converge to similar architectures, a universality that allowed researchers from different disciplines to embrace network theory as a common paradigm. The decade-old discovery of scale-free networks was one of those events that had helped catalyze the emergence of network science, a new research field with its distinct set of challenges and accomplishments.

Sales-Pardo et al. (2007) Extracting the hierarchical organization of complex systems. Proc Natl Acad Sci U.S.A 104:15224-9. (pmid: 17881571)

PubMed ] [ DOI ] Extracting understanding from the growing "sea" of biological and socioeconomic data is one of the most pressing scientific challenges facing us. Here, we introduce and validate an unsupervised method for extracting the hierarchical organization of complex biological, social, and technological networks. We define an ensemble of hierarchically nested random graphs, which we use to validate the method. We then apply our method to real-world networks, including the air-transportation network, an electronic circuit, an e-mail exchange network, and metabolic networks. Our analysis of model and real networks demonstrates that our method extracts an accurate multiscale representation of a complex system.

Wang & Zhang (2007) In search of the biological significance of modular structures in protein networks. PLoS Comput Biol 3:e107. (pmid: 17542644)

PubMed ] [ DOI ] Many complex networks such as computer and social networks exhibit modular structures, where links between nodes are much denser within modules than between modules. It is widely believed that cellular networks are also modular, reflecting the relative independence and coherence of different functional units in a cell. While many authors have claimed that observations from the yeast protein-protein interaction (PPI) network support the above hypothesis, the observed structural modularity may be an artifact because the current PPI data include interactions inferred from protein complexes through approaches that create modules (e.g., assigning pairwise interactions among all proteins in a complex). Here we analyze the yeast PPI network including protein complexes (PIC network) and excluding complexes (PEC network). We find that both PIC and PEC networks show a significantly greater structural modularity than that of randomly rewired networks. Nonetheless, there is little evidence that the structural modules correspond to functional units, particularly in the PEC network. More disturbingly, there is no evolutionary conservation among yeast, fly, and nematode modules at either the whole-module or protein-pair level. Neither is there a correlation between the evolutionary or phylogenetic conservation of a protein and the extent of its participation in various modules. Using computer simulation, we demonstrate that a higher-than-expected modularity can arise during network growth through a simple model of gene duplication, without natural selection for modularity. Taken together, our results suggest the intriguing possibility that the structural modules in the PPI network originated as an evolutionary byproduct without biological significance.

Batada et al. (2006) Evolutionary and physiological importance of hub proteins. PLoS Comput Biol 2:e88. (pmid: 16839197)

PubMed ] [ DOI ] It has been claimed that proteins with more interaction partners (hubs) are both physiologically more important (i.e., less dispensable) and, owing to an assumed high density of binding sites, slow evolving. Not all analyses, however, support these results, probably because of biased and less-than reliable global protein interaction data. Here we provide the first examination of these issues using a comprehensive literature-curated dataset of well-substantiated protein interactions in Saccharomyces cerevisiae. Whereas use of less reliable yeast two-hybrid data alone can reject the possibility that local connectivity correlates with measures of dispensability, in higher quality datasets a relatively robust correlation is observed. In contrast, local connectivity does not correlate with the rate of protein evolution even in reliable datasets. This perhaps surprising lack of correlation with evolutionary rate appears in part to arise from the fact that hub proteins do not have a higher density of residues associated with binding. However, hub proteins do have at least one other set of unusual features, namely rapid turnover and regulation, as manifest in high mRNA decay rates and a large number of phosphorylation sites. This, we suggest, is an adaptation to minimize unwanted activation of pathways that might be mediated by adventitious binding to hubs, were they to actively persist longer than required at any given time point. We conclude that hub proteins are more important for cellular growth rate and under tight regulation but are not slow evolving.

Milo et al. (2004) Superfamilies of evolved and designed networks. Science 303:1538-42. (pmid: 15001784)

PubMed ] [ DOI ] Complex biological, technological, and sociological networks can be of very different sizes and connectivities, making it difficult to compare their structures. Here we present an approach to systematically study similarity in the local structure of networks, based on the significance profile (SP) of small subgraphs in the network compared to randomized networks. We find several superfamilies of previously unrelated networks with very similar SPs. One superfamily, including transcription networks of microorganisms, represents "rate-limited" information-processing networks strongly constrained by the response time of their components. A distinct superfamily includes protein signaling, developmental genetic networks, and neuronal wiring. Additional superfamilies include power grids, protein-structure networks and geometric networks, World Wide Web links and social networks, and word-adjacency networks from different languages.

Ravasz & Barabási (2003) Hierarchical organization in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys 67:026112. (pmid: 12636753)

PubMed ] [ DOI ] Many real networks in nature and society share two generic properties: they are scale-free and they display a high degree of clustering. We show that these two features are the consequence of a hierarchical organization, implying that small groups of nodes organize in a hierarchical manner into increasingly large groups, while maintaining a scale-free topology. In hierarchical networks, the degree of clustering characterizing the different groups follows a strict scaling law, which can be used to identify the presence of a hierarchical organization in real networks. We find that several real networks, such as the Worldwideweb, actor network, the Internet at the domain level, and the semantic web obey this scaling law, indicating that hierarchy is a fundamental characteristic of many complex systems.

Ravasz et al. (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551-5. (pmid: 12202830)

PubMed ] [ DOI ] Spatially or chemically isolated functional modules composed of several cellular components and carrying discrete functions are considered fundamental building blocks of cellular organization, but their presence in highly integrated biochemical networks lacks quantitative support. Here, we show that the metabolic networks of 43 distinct organisms are organized into many small, highly connected topologic modules that combine in a hierarchical manner into larger, less cohesive units, with their number and degree of clustering following a power law. Within Escherichia coli, the uncovered hierarchical modularity closely overlaps with known metabolic functions. The identified network architecture may be generic to system-level cellular organization.

 

Clustering

Overview
Brohée & van Helden (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7:488. (pmid: 17087821)

PubMed ] [ DOI ] BACKGROUND: Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism). In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies). High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions. The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Super Paramagnetic Clustering (SPC), and Molecular Complex Detection (MCODE). RESULTS: A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions. Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes. We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values. We also evaluated their robustness to alterations of the test graph. We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. CONCLUSION: This analysis shows that MCL is remarkably robust to graph alterations. In the tests of robustness, RNSC is more sensitive to edge deletion but less sensitive to the use of suboptimal parameter values. The other two algorithms are clearly weaker under most conditions. The analysis of high-throughput data supports the superiority of MCL for the extraction of complexes from interaction networks.

Clustering and Community Structure
  • Community structure
  • Girvan-Newman algorithm (based on edge betweenness) - also called: "modularity"; Rosvall, Axelson & Bergstrom (2009) [1] : "to analyze how networks are formed and to simplify networks for which links do not represent flows but rather pairwise relationships, modularity or other topological methods may be preferred. But if instead one is interested in the dynamics on the network, in how local interactions induce a system-wide flow, in the interdependence across the network, and in how network structure relates to system behavior, then flow-based approaches such as the map equation are preferable."
The map equation
Rosvall & Bergstrom (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci U.S.A 105:1118-23. (pmid: 18216267)

PubMed ] [ DOI ] To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network-including physics, chemistry, molecular biology, and medicine-information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.

Rosvall & Bergstrom (2011) Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6:e18209. (pmid: 21494658)

PubMed ] [ DOI ] To comprehend the hierarchical organization of large integrated systems, we introduce the hierarchical map equation, which reveals multilevel structures in networks. In this information-theoretic approach, we exploit the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network--the optimal number of levels and modular partition at each level--with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. For example, from the global air traffic network we uncover countries and continents, and from the pattern of scientific communication we reveal more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures in globally interconnected systems, such as neural networks, and rich multilevel organizations in systems with highly separated regions, such as road networks.

The Map equation site contains a very nice Flash applet to demonstrate the algorithm as well as tools to analyze networks.
From network clusters to protein complexes
Bader & Hogue (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4:2. (pmid: 12525261)

PubMed ] [ DOI ] BACKGROUND: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. RESULTS: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. CONCLUSION: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.


 

Motifs

Alon (2007) Network motifs: theory and experimental approaches. Nat Rev Genet 8:450-61. (pmid: 17510665)

PubMed ] [ DOI ] Transcription regulation networks control the expression of genes. The transcription networks of well-studied microorganisms appear to be made up of a small set of recurring regulation patterns, called network motifs. The same network motifs have recently been found in diverse organisms from bacteria to humans, suggesting that they serve as basic building blocks of transcription networks. Here I review network motifs and their functions, with an emphasis on experimental studies. Network motifs in other biological networks are also mentioned, including signalling and neuronal networks.


Shen-Orr et al. (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31:64-8. (pmid: 11967538)

PubMed ] [ DOI ] Little is known about the design principles of transcriptional regulation networks that control gene expression in cells. Recent advances in data collection and analysis, however, are generating unprecedented amounts of information about gene regulation networks. To understand these complex wiring diagrams, we sought to break down such networks into basic building blocks. We generalize the notion of motifs, widely used for sequence analysis, to the level of networks. We define 'network motifs' as patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks. We applied new algorithms for systematically detecting network motifs to one of the best-characterized regulation networks, that of direct transcriptional interactions in Escherichia coli. We find that much of the network is composed of repeated appearances of three highly significant motifs. Each network motif has a specific function in determining gene expression, such as generating temporal expression programs and governing the responses to fluctuating external signals. The motif structure also allows an easily interpretable view of the entire known transcriptional network of the organism. This approach may help define the basic computational elements of other biological networks.

Milo et al. (2002) Network motifs: simple building blocks of complex networks. Science 298:824-7. (pmid: 12399590)

PubMed ] [ DOI ] Complex networks are studied across many fields of science. To uncover their structural design principles, we defined "network motifs," patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This approach may uncover the basic building blocks of most networks.

Artzy-Randrup et al. (2004) Comment on "Network motifs: simple building blocks of complex networks" and "Superfamilies of evolved and designed networks". Science 305:1107; author reply 1107. (pmid: 15326338)

PubMed ] [ DOI ]

Yeger-Lotem et al. (2004) Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci U.S.A 101:5934-9. (pmid: 15079056)

PubMed ] [ DOI ] Genes and proteins generate molecular circuitry that enables the cell to process information and respond to stimuli. A major challenge is to identify characteristic patterns in this network of interactions that may shed light on basic cellular mechanisms. Previous studies have analyzed aspects of this network, concentrating on either transcription-regulation or protein-protein interactions. Here we search for composite network motifs: characteristic network patterns consisting of both transcription-regulation and protein-protein interactions that recur significantly more often than in random networks. To this end we developed algorithms for detecting motifs in networks with two or more types of interactions and applied them to an integrated data set of protein-protein interactions and transcription regulation in Saccharomyces cerevisiae. We found a two-protein mixed-feedback loop motif, five types of three-protein motifs exhibiting coregulation and complex formation, and many motifs involving four proteins. Virtually all four-protein motifs consisted of combinations of smaller motifs. This study presents a basic framework for detecting the building blocks of networks with multiple types of interactions.

Shoval & Alon (2010) SnapShot: network motifs. Cell 143:326-e1. (pmid: 20946989)

PubMed ] [ DOI ]


Exercises

Work through the following protocol to use SBGN-ED for creating pathway maps.
Junker et al. (2012) Creating interactive, web-based and data-enriched maps with the Systems Biology Graphical Notation. Nat Protoc 7:579-93. (pmid: 22383037)

PubMed ] [ DOI ] The Systems Biology Graphical Notation (SBGN) is an emerging standard for the uniform representation of biological processes and networks. By using examples from gene regulation and metabolism, this protocol shows the construction of SBGN maps by either manual drawing or automatic translation using the tool SBGN-ED. In addition, it discusses the enrichment of SBGN maps with different kinds of -omics data to bring numerical data into the context of these networks in order to facilitate the interpretation of experimental data. Finally, the export of such maps to public websites, including clickable images, supports the communication of results within the scientific community. With regard to the described functionalities, other tools partially overlap with SBGN-ED. However, currently, SBGN-ED is the only tool that combines all of these functions, including the representation in SBGN, data mapping and website export. This protocol aims to assist scientists with the step-by-step procedure, which altogether takes ∼90 min.


Further reading and resources

Tan et al. (2013) Network2Canvas: network visualization on a canvas with enrichment analysis. Bioinformatics 29:1872-8. (pmid: 23749960)

PubMed ] [ DOI ] MOTIVATION: Networks are vital to computational systems biology research, but visualizing them is a challenge. For networks larger than ∼100 nodes and ∼200 links, ball-and-stick diagrams fail to convey much information. To address this, we developed Network2Canvas (N2C), a web application that provides an alternative way to view networks. N2C visualizes networks by placing nodes on a square toroidal canvas. The network nodes are clustered on the canvas using simulated annealing to maximize local connections where a node's brightness is made proportional to its local fitness. The interactive canvas is implemented in HyperText Markup Language (HTML)5 with the JavaScript library Data-Driven Documents (D3). We applied N2C to visualize 30 canvases made from human and mouse gene-set libraries and 6 canvases made from the Food and Drug Administration (FDA)-approved drug-set libraries. Given lists of genes or drugs, enriched terms are highlighted on the canvases, and their degree of clustering is computed. Because N2C produces visual patterns of enriched terms on canvases, a trained eye can detect signatures instantly. In summary, N2C provides a new flexible method to visualize large networks and can be used to perform and visualize gene-set and drug-set enrichment analyses. AVAILABILITY: N2C is freely available at http://www.maayanlab.net/N2C and is open source. CONTACT: avi.maayan@mssm.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Papadopolous et al. develop a new formalism to explain preferential attachment and Grandmaster Barabási comments on it ...

Barabási (2012) Network science: Luck or reason. Nature 489:507-8. (pmid: 22972190)

PubMed ] [ DOI ]

Papadopoulos et al. (2012) Popularity versus similarity in growing networks. Nature 489:537-40. (pmid: 22972194)

PubMed ] [ DOI ] The principle that 'popularity is attractive' underlies preferential attachment, which is a common explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections possessed by nodes follows power laws, as observed in many real networks. Preferential attachment has been directly validated for some real networks (including the Internet), and can be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks or duplication. Here we show that popularity is just one dimension of attractiveness; another dimension is similarity. We develop a framework in which new connections optimize certain trade-offs between popularity and similarity, instead of simply preferring popular nodes. The framework has a geometric interpretation in which popularity preference emerges from local optimization. As opposed to preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, predicting the probability of new links with high precision. The framework that we have developed can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.


Mones et al. (2012) Hierarchy measure for complex networks. PLoS ONE 7:e33799. (pmid: 22470477)

PubMed ] [ DOI ] Nature, technology and society are full of complexity arising from the intricate web of the interactions among the units of the related systems (e.g., proteins, computers, people). Consequently, one of the most successful recent approaches to capturing the fundamental features of the structure and dynamics of complex systems has been the investigation of the networks associated with the above units (nodes) together with their relations (edges). Most complex systems have an inherently hierarchical organization and, correspondingly, the networks behind them also exhibit hierarchical features. Indeed, several papers have been devoted to describing this essential aspect of networks, however, without resulting in a widely accepted, converging concept concerning the quantitative characterization of the level of their hierarchy. Here we develop an approach and propose a quantity (measure) which is simple enough to be widely applicable, reveals a number of universal features of the organization of real-world networks and, as we demonstrate, is capable of capturing the essential features of the structure and the degree of hierarchy in a complex network. The measure we introduce is based on a generalization of the m-reach centrality, which we first extend to directed/partially directed graphs. Then, we define the global reaching centrality (GRC), which is the difference between the maximum and the average value of the generalized reach centralities over the network. We investigate the behavior of the GRC considering both a synthetic model with an adjustable level of hierarchy and real networks. Results for real networks show that our hierarchy measure is related to the controllability of the given system. We also propose a visualization procedure for large complex networks that can be used to obtain an overall qualitative picture about the nature of their hierarchical structure.

Cui et al. (2011) Phylogenetically informed logic relationships improve detection of biological network organization. BMC Bioinformatics 12:476. (pmid: 22172058)

PubMed ] [ DOI ] BACKGROUND: A "phylogenetic profile" refers to the presence or absence of a gene across a set of organisms, and it has been proven valuable for understanding gene functional relationships and network organization. Despite this success, few studies have attempted to search beyond just pairwise relationships among genes. Here we search for logic relationships involving three genes, and explore its potential application in gene network analyses. RESULTS: Taking advantage of a phylogenetic matrix constructed from the large orthologs database Roundup, we invented a method to create balanced profiles for individual triplets of genes that guarantee equal weight on the different phylogenetic scenarios of coevolution between genes. When we applied this idea to LAPP, the method to search for logic triplets of genes, the balanced profiles resulted in significant performance improvement and the discovery of hundreds of thousands more putative triplets than unadjusted profiles. We found that logic triplets detected biological network organization and identified key proteins and their functions, ranging from neighbouring proteins in local pathways, to well separated proteins in the whole pathway, and to the interactions among different pathways at the system level. Finally, our case study suggested that the directionality in a logic relationship and the profile of a triplet could disclose the connectivity between the triplet and surrounding networks. CONCLUSION: Balanced profiles are superior to the raw profiles employed by traditional methods of phylogenetic profiling in searching for high order gene sets. Gene triplets can provide valuable information in detection of biological network organization and identification of key genes at different levels of cellular interaction.

Koyutürk (2010) Algorithmic and analytical methods in network biology. Wiley Interdiscip Rev Syst Biol Med 2:277-292. (pmid: 20836029)

PubMed ] [ DOI ] During the genomic revolution, algorithmic and analytical methods for organizing, integrating, analyzing, and querying biological sequence data proved invaluable. Today, increasing availability of high-throughput data pertaining to functional states of biomolecules, as well as their interactions, enables genome-scale studies of the cell from a systems perspective. The past decade witnessed significant efforts on the development of computational infrastructure for large-scale modeling and analysis of biological systems, commonly using network models. Such efforts lead to novel insights into the complexity of living systems, through development of sophisticated abstractions, algorithms, and analytical techniques that address a broad range of problems, including the following: (1) inference and reconstruction of complex cellular networks; (2) identification of common and coherent patterns in cellular networks, with a view to understanding the organizing principles and building blocks of cellular signaling, regulation, and metabolism; and (3) characterization of cellular mechanisms that underlie the differences between living systems, in terms of evolutionary diversity, development and differentiation, and complex phenotypes, including human disease. These problems pose significant algorithmic and analytical challenges because of the inherent complexity of the systems being studied; limitations of data in terms of availability, scope, and scale; intractability of resulting computational problems; and limitations of reference models for reliable statistical inference. This article provides a broad overview of existing algorithmic and analytical approaches to these problems, highlights key biological insights provided by these approaches, and outlines emerging opportunities and challenges in computational systems biology.

Serrano et al. (2008) Self-similarity of complex networks and hidden metric spaces. Phys Rev Lett 100:078701. (pmid: 18352602)

PubMed ] [ DOI ] We demonstrate that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that a class of hidden variable models with underlying metric spaces are able to accurately reproduce the self-similarity properties that we measured in the real networks. Our findings indicate that hidden geometries underlying these real networks are a plausible explanation for their observed topologies and, in particular, for their self-similarity with respect to the degree-based renormalization.