Interactome
Interactome
The interactome describes the entire set of interactions in a cell. Researchers often include the so called "genetic" interactions in this description; this is a very unfortunate nomenclature and the distinction between physical and genetic interactions is important. This page focuses primarily on the biological foundations.
Introductory reading
Fung et al. (2012) Visualization of the interactome: what are we looking at?. Proteomics 12:1669-86. (pmid: 22610544) |
[ PubMed ] [ DOI ] Network visualization of the interactome has been become routine in systems biology research. Not only does it serve as an illustration on the cellular organization of protein-protein interactions, it also serves as a biological context for gaining insights from high-throughput data. However, the challenges to produce an effective visualization have been great owing to the fact that the scale, biological context and dynamics of any given interactome are too large and complex to be captured by a single visualization. Visualization design therefore requires a pragmatic trade-off between capturing biological concept and being comprehensible. In this review, we focus on the biological interpretation of different network visualizations. We will draw on examples predominantly from our experiences but elaborate them in the context of the broader field. A rich variety of networks will be introduced including interactomes and the complexome in 2D, interactomes in 2.5D and 3D and dynamic networks. |
Contents
- Principles of interaction biology
- Physical vs. genetic interactions
- Key experimental approaches
Further reading and resources
Wang et al. (2012) Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol 30:159-64. (pmid: 22252508) |
[ PubMed ] [ DOI ] To better understand the molecular mechanisms and genetic basis of human disease, we systematically examine relationships between 3,949 genes, 62,663 mutations and 3,453 associated disorders by generating a three-dimensional, structurally resolved human interactome. This network consists of 4,222 high-quality binary protein-protein interactions with their atomic-resolution interfaces. We find that in-frame mutations (missense point mutations and in-frame insertions and deletions) are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and that the disease specificity for different mutations of the same gene can be explained by their location within an interface. We also predict 292 candidate genes for 694 unknown disease-to-gene associations with proposed molecular mechanism hypotheses. This work indicates that knowledge of how in-frame disease mutations alter specific interactions is critical to understanding pathogenesis. Structurally resolved interaction networks should be valuable tools for interpreting the wealth of data being generated by large-scale structural genomics and disease association studies. |
Seebacher & Gavin (2011) SnapShot: Protein-protein interaction networks. Cell 144:1000, 1000.e1. (pmid: 21414489) |
Vidal et al. (2011) Interactome networks and human disease. Cell 144:986-98. (pmid: 21414488) |
[ PubMed ] [ DOI ] Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease. |
Michaut et al. (2011) Protein complexes are central in the yeast genetic landscape. PLoS Comput Biol 7:e1001092. (pmid: 21390331) |
[ PubMed ] [ DOI ] If perturbing two genes together has a stronger or weaker effect than expected, they are said to genetically interact. Genetic interactions are important because they help map gene function, and functionally related genes have similar genetic interaction patterns. Mapping quantitative (positive and negative) genetic interactions on a global scale has recently become possible. This data clearly shows groups of genes connected by predominantly positive or negative interactions, termed monochromatic groups. These groups often correspond to functional modules, like biological processes or complexes, or connections between modules. However it is not yet known how these patterns globally relate to known functional modules. Here we systematically study the monochromatic nature of known biological processes using the largest quantitative genetic interaction data set available, which includes fitness measurements for ∼5.4 million gene pairs in the yeast Saccharomyces cerevisiae. We find that only 10% of biological processes, as defined by Gene Ontology annotations, and less than 1% of inter-process connections are monochromatic. Further, we show that protein complexes are responsible for a surprisingly large fraction of these patterns. This suggests that complexes play a central role in shaping the monochromatic landscape of biological processes. Altogether this work shows that both positive and negative monochromatic patterns are found in known biological processes and in their connections and that protein complexes play an important role in these patterns. The monochromatic processes, complexes and connections we find chart a hierarchical and modular map of sensitive and redundant biological systems in the yeast cell that will be useful for gene function prediction and comparison across phenotypes and organisms. Furthermore the analysis methods we develop are applicable to other species for which genetic interactions will progressively become more available. |
Jain & Bader (2010) An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology. BMC Bioinformatics 11:562. (pmid: 21078182) |
[ PubMed ] [ DOI ] BACKGROUND: Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. RESULTS: We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. CONCLUSIONS: The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F1 score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations. |
Schüler & Bornberg-Bauer (2011) The evolution of protein interaction networks. Methods Mol Biol 696:273-89. (pmid: 21063954) |
[ PubMed ] [ DOI ] The availability of high-throughput methods to detect protein interactions made construction of comprehensive protein interaction networks for several important model organisms possible. Many studies have since focused on uncovering the structural principles of these networks and relating these structures to biological processes. On a global scale, there are striking similarities in the structure of different protein interaction networks, even when distantly related species, such as the yeast Saccharomyces cerevisiae and the fruit fly Drosophila melanogaster, are compared. However, there is also considerable variance in network structures caused by the gain and loss of genes and mutations which alter the interaction behavior of the encoded proteins. Here, we focus on the current state of knowledge on the structure of protein interaction networks and the evolutionary processes that shaped these structures. |
Costanzo et al. (2010) The genetic landscape of a cell. Science 327:425-31. (pmid: 20093466) |
[ PubMed ] [ DOI ] A genome-scale genetic interaction map was constructed by examining 5.4 million gene-gene pairs for synthetic genetic interactions, generating quantitative genetic interaction profiles for approximately 75% of all genes in the budding yeast, Saccharomyces cerevisiae. A network based on genetic interaction profiles reveals a functional map of the cell in which genes of similar biological processes cluster together in coherent subsets, and highly correlated profiles delineate specific pathways to define gene function. The global network identifies functional cross-connections between all bioprocesses, mapping a cellular wiring diagram of pleiotropy. Genetic interaction degree correlated with a number of different gene attributes, which may be informative about genetic network hubs in other organisms. We also demonstrate that extensive and unbiased mapping of the genetic landscape provides a key for interpretation of chemical-genetic interactions and drug target identification. |
Yu et al. (2009) A novel scoring approach for protein co-purification data reveals high interaction specificity. PLoS Comput Biol 5:e1000515. (pmid: 19779545) |
[ PubMed ] [ DOI ] Large-scale protein interaction networks (PINs) have typically been discerned using affinity purification followed by mass spectrometry (AP/MS) and yeast two-hybrid (Y2H) techniques. It is generally recognized that Y2H screens detect direct binary interactions while the AP/MS method captures co-complex associations; however, the latter technique is known to yield prevalent false positives arising from a number of effects, including abundance. We describe a novel approach to compute the propensity for two proteins to co-purify in an AP/MS data set, thereby allowing us to assess the detected level of interaction specificity by analyzing the corresponding distribution of interaction scores. We find that two recent AP/MS data sets of yeast contain enrichments of specific, or high-scoring, associations as compared to commensurate random profiles, and that curated, direct physical interactions in two prominent data bases have consistently high scores. Our scored interaction data sets are generally more comprehensive than those of previous studies when compared against four diverse, high-quality reference sets. Furthermore, we find that our scored data sets are more enriched with curated, direct physical associations than Y2H sets. A high-confidence protein interaction network (PIN) derived from the AP/MS data is revealed to be highly modular, and we show that this topology is not the result of misrepresenting indirect associations as direct interactions. In fact, we propose that the modularity in Y2H data sets may be underrepresented, as they contain indirect associations that are significantly enriched with false negatives. The AP/MS PIN is also found to contain significant assortative mixing; however, in line with a previous study we confirm that Y2H interaction data show weak disassortativeness, thus revealing more clearly the distinctive natures of the interaction detection methods. We expect that our scored yeast data sets are ideal for further biological discovery and that our scoring system will prove useful for other AP/MS data sets. |
Simonis et al. (2009) Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network. Nat Methods 6:47-54. (pmid: 19123269) |
[ PubMed ] [ DOI ] To provide accurate biological hypotheses and elucidate global properties of cellular networks, systematic identification of protein-protein interactions must meet high quality standards.We present an expanded C. elegans protein-protein interaction network, or 'interactome' map, derived from testing a matrix of approximately 10,000 x approximately 10,000 proteins using a highly specific, high-throughput yeast two-hybrid system. Through a new empirical quality control framework, we show that the resulting data set (Worm Interactome 2007, or WI-2007) was similar in quality to low-throughput data curated from the literature. We filtered previous interaction data sets and integrated them with WI-2007 to generate a high-confidence consolidated map (Worm Interactome version 8, or WI8). This work allowed us to estimate the size of the worm interactome at approximately 116,000 interactions. Comparison with other types of functional genomic data shows the complementarity of distinct experimental approaches in predicting different functional relationships between genes or proteins |
Reguly et al. (2006) Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol 5:11. (pmid: 16762047) |
[ PubMed ] [ DOI ] BACKGROUND: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. RESULTS: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases. CONCLUSION: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. |