Interaction databases
Interaction databases
Interaction databases have similar problems as sequence databases: standrads for abstraction of biological concepts into computable objects, data integrity, search and retrieval, and the metrics of comparison. There is however an added complication: interactions are rarely all-or-none, and the high-throughput experimental methods have large false-positive and false-negative rates. This makes it necessary to define confidence scores for interactions.
Introductory reading
Turner et al. (2010) iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database (Oxford) 2010:baq023. (pmid: 20940177) |
[ PubMed ] [ DOI ] We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein-protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/ |
Contents
- Abstraction and standards
- Databases
- Confidence scores
Exercises
Mora & Donaldson (2011) iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database. BMC Bioinformatics 12:455. (pmid: 22115179) |
[ PubMed ] [ DOI ] BACKGROUND: The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment. RESULTS: The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL.The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis. CONCLUSIONS: The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers. |
Further reading and resources
- Standards
Orchard & Hermjakob (2011) Data standardization by the HUPO-PSI: how has the community benefitted?. Methods Mol Biol 696:149-60. (pmid: 21063946) |
[ PubMed ] [ DOI ] The groundwork allowing the systematic capture of proteomics data has now largely been completed, with the design and publication of exchange formats and interchange standards by the Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI). Our focus can now shift to gathering the ever-increasing amounts of generated data, and finding novel ways to catalog and present it so that a deeper understanding of basic science, health, and disease can be gained by scientists mining these increasingly rich resources. |
- Data
Razick et al. (2008) iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9:405. (pmid: 18823568) |
[ PubMed ] [ DOI ] BACKGROUND: Interaction data for a given protein may be spread across multiple databases. We set out to create a unifying index that would facilitate searching for these data and that would group together redundant interaction data while recording the methods used to perform this grouping. RESULTS: We present a method to generate a key for a protein interaction record and a key for each participant protein. These keys may be generated by anyone using only the primary sequence of the proteins, their taxonomy identifiers and the Secure Hash Algorithm. Two interaction records will have identical keys if they refer to the same set of identical protein sequences and taxonomy identifiers. We define records with identical keys as a redundant group. Our method required that we map protein database references found in interaction records to current protein sequence records. Operations performed during this mapping are described by a mapping score that may provide valuable feedback to source interaction databases on problematic references that are malformed, deprecated, ambiguous or unfound. Keys for protein participants allow for retrieval of interaction information independent of the protein references used in the original records. CONCLUSION: We have applied our method to protein interaction records from BIND, BioGrid, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. The resulting interaction reference index is provided in PSI-MITAB 2.5 format at http://irefindex.uio.no. This index may form the basis of alternative redundant groupings based on gene identifiers or near sequence identity groupings. |
Ooi et al. (2010) Databases of protein-protein interactions and complexes. Methods Mol Biol 609:145-59. (pmid: 20221918) |
[ PubMed ] [ DOI ] In the current understanding, translation of genomic sequences into proteins is the most important path for realization of genome information. In exercising their intended function, proteins work together through various forms of direct (physical) or indirect interaction mechanisms. For a variety of basic functions, many proteins form a large complex representing a molecular machine or a macromolecular super-structural building block. After several high-throughput techniques for detection of protein-protein interactions had matured, protein interaction data became available in a large scale and curated databases for protein-protein interactions (PPIs) are a new necessity for efficient research. Here, their scope, annotation quality, and retrieval tools are reviewed. In addition, attention is paid to portals that provide unified access to a variety of such databases with added annotation value. |
Wodak et al. (2011) High-throughput analyses and curation of protein interactions in yeast. Methods Mol Biol 759:381-406. (pmid: 21863499) |
[ PubMed ] [ DOI ] The yeast Saccharomyces cerevisiae is the model organism in which protein interactions have been most extensively analyzed. The vast majority of these interactions have been characterized by a variety of sophisticated high-throughput techniques probing different aspects of protein association. This chapter summarizes the major techniques, highlights their complementary nature, discusses the data they produce, and highlights some of the biases from which they suffer. A main focus is the key role played by computational methods for processing, analyzing, and validating the large body of noisy data produced by the experimental procedures. It also describes how computational methods are used to extend the coverage and reliability of protein interaction data by integrating information from heterogeneous sources and reviews the current status of literature-curated data on yeast protein interactions stored in specialized databases. |
Musso et al. (2011) Filtering and interpreting large-scale experimental protein-protein interaction data. Methods Mol Biol 781:295-309. (pmid: 21877287) |
[ PubMed ] [ DOI ] Rarely acting in isolation, it is invariably the physical associations among proteins that define their biological activity, necessitating the study of the cellular meshwork of protein-protein interactions (PPI) before a full appreciation of gene function can be achieved. The past few years have seen a marked expansion in the both the sheer volume and number of organisms for which high-quality interaction data is available, with high-throughput interaction screening and detection techniques showing consistent improvement both in scale and sensitivity. Although techniques for large-scale PPI mapping are increasingly being applied to new organisms, including human, there is a corresponding need to rigorously evaluate, benchmark, and impartially filter the results. This chapter explores methods for PPI dataset evaluation, including a survey of previous techniques applied by landmark studies in the field and a discussion of promising new experimental approaches. We further outline practical suggestions and useful tools for interpreting newly generated PPI data. As the majority of large-scale experimental data has been generated for the budding yeast S. cerevisiae, most of the techniques and datasets described are from the perspective of this model unicellular eukaryote; however, extensions to other organisms including mammals are mentioned where possible. |