Difference between revisions of "BIN-PPI-Databases"

From "A B C"
Jump to navigation Jump to search
m
m
Line 52: Line 52:
 
<!-- included from "../components/BIN-PPI-Databases.components.wtxt", section: "objectives" -->
 
<!-- included from "../components/BIN-PPI-Databases.components.wtxt", section: "objectives" -->
 
This unit will ...
 
This unit will ...
* ... introduce ;
+
* ... introduce issues surrounding the collection and curation of protein-protein interactions in databases;
* ... demonstrate ;
+
* ... explore the Web interfaces to IntAct and BioGRID;
* ... teach ;
+
* ... discuss the limitations of interaction predictions based on homology ;
  
 
{{Vspace}}
 
{{Vspace}}
Line 62: Line 62:
 
<!-- included from "../components/BIN-PPI-Databases.components.wtxt", section: "outcomes" -->
 
<!-- included from "../components/BIN-PPI-Databases.components.wtxt", section: "outcomes" -->
 
After working through this unit you ...
 
After working through this unit you ...
* ... can ;
+
* ... can access IntAct and BioGRID and discover interactions with a protein of interest.
* ... are familar with ;
 
* ... have begun to.
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 130: Line 128:
 
* Access [http://www.ebi.ac.uk/intact/ '''IntAct'''] and enter the UniProt ID for yeast Mbp1 <tt>P39678</tt>.
 
* Access [http://www.ebi.ac.uk/intact/ '''IntAct'''] and enter the UniProt ID for yeast Mbp1 <tt>P39678</tt>.
 
*The EBI search directly returns a table of pairwise interactions; both partners are listed as a pair and, in each pair one of the partners should be YDL056W (the systematic name of yeast Mbp1).
 
*The EBI search directly returns a table of pairwise interactions; both partners are listed as a pair and, in each pair one of the partners should be YDL056W (the systematic name of yeast Mbp1).
 
 
*How many different physical interaction detection methods do the IntAct records list? Follow the links and read their definitions. <small>('''Bravo''' to the IntAct developers, for '''defining''' their terms. In a better world, all the semantics of our databases should be similarly defined to be meaningful.)</small>
 
*How many different physical interaction detection methods do the IntAct records list? Follow the links and read their definitions. <small>('''Bravo''' to the IntAct developers, for '''defining''' their terms. In a better world, all the semantics of our databases should be similarly defined to be meaningful.)</small>
 
* Click on the "Graph" tab to load a network graph.
 
* Click on the "Graph" tab to load a network graph.
Line 160: Line 157:
 
{{Vspace}}
 
{{Vspace}}
  
Now, what about MYSPE? Could you infer interactions between proteins whose orthologs interact in another species? Such predictions are called ''interologs'' (''inter''acting homo''logs''). Unfortunately, that does not appear to be the case. Confident prediction of interologs can only be achieved in cases of >80% joint sequence identity of both pairs [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020079], a level of similarity that (I believe) none of our Mbp1 proteins achieves. Does this mean the pathways and interactions are not conserved? Certainly not. We expect a very high degree of conservation of the system's function, but we can't say for sure whether any two specific proteins interact in a different species the same way they interact in yeast. All we can do is to use annotation transfer fro hypothesis generation. But that is a useful starting point.
+
Now, what about MYSPE? Could you infer interactions between proteins whose orthologs interact in another species? Such predictions are called ''interologs'' (''inter''acting homo''logs''). Unfortunately, that does not appear to be the case. Confident prediction of interologs can only be achieved in cases of >80% joint sequence identity of both pairs<ref>{{#pmid:16854211}}</ref>, a level of similarity that (I believe) none of our Mbp1 proteins achieves. Does this mean the pathways and interactions are not conserved? Certainly not. We expect a very high degree of conservation of the system's function, but we can't say for sure whether any two specific proteins interact in a different species the same way they interact in yeast. All we can do is to use annotation transfer for hypothesis generation. But that is a useful starting point.
  
 
{{Vspace}}
 
{{Vspace}}
Line 170: Line 167:
 
== Further reading, links and resources ==
 
== Further reading, links and resources ==
  
{{#pmid: 22115179}}
+
<!--{{#pmid: 22115179}}--><!-- iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database -->
 +
{{#pmid: 23028270}}<!-- What evidence is there for the homology of protein-protein interactions? -->
 
{{#pmid: 22689642}}<!-- BIANA Interolog Prediction Server -->
 
{{#pmid: 22689642}}<!-- BIANA Interolog Prediction Server -->
 
{{#pmid: 27074302}}<!-- Predicting Protein-Protein Interactions from the Molecular to the Proteome Level -->
 
{{#pmid: 27074302}}<!-- Predicting Protein-Protein Interactions from the Molecular to the Proteome Level -->

Revision as of 18:03, 9 November 2017

Protein-Protein Interaction Databases


 

Keywords:  IntAct, iRef,


 



 


Caution!

This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.


 


Abstract

Exploring IntAct and BioGRID PPI databases.


 


This unit ...

Prerequisites

You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:

  • Biomolecules: The molecules of life; nucleic acids and amino acids; the genetic code; protein folding; post-translational modifications and protein biochemistry; membrane proteins; biological function.

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... introduce issues surrounding the collection and curation of protein-protein interactions in databases;
  • ... explore the Web interfaces to IntAct and BioGRID;
  • ... discuss the limitations of interaction predictions based on homology ;


 


Outcomes

After working through this unit you ...

  • ... can access IntAct and BioGRID and discover interactions with a protein of interest.


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.


 


Contents

In high-throughput biology, the genome was the beginning. As Sydney Brenner has phrased it: we have now written the "white-pages" of the cell, fulfilling the "CAP-criterion" (Comprehensive, Accurate and Permanent). The next level is figuring out the way the parts work - if you will, the "Yellow Pages" - and many of us expect that substantial progress can be made by mapping their interactions. After all, physiological function can be described to a large part as the result of physical interaction.

Please note that there are different types of physical interactions. We most often think of complexes, either stable or transient homo- or heterooligomers when we speak of physical interactions. But there are also interactions between substrates and products and not all of them correspond to classical enzymatic pathways. Phosphorylation and dephosphorylation are processes of key importance in signal transduction and acetylation/deacetylation plays a critical role in regulatory pathways. Here, the substrates are proteins and the interaction with the modifying enzyme is of course a physical interaction.

Genetic interactions on the other hand are another story. Here the word interaction is used in an entirely different sense: it is not synonymous with contact it is synonymous with influence. In fact, most proteins that display genetic interactions would not be expected to interact physically as well. (Why? Think.) It is important not to mix up the two. To understand what genetic interactions imply, think of the following analogy. If I were to break the wrist of my right arm, my survival would probably not be affected. My left arm would provide sufficient redundancy for most tasks. What about breaking the right index finger, or spraining the elbow as well? Painful, but functionally not much worse than breaking the wrist alone. What about indigestion, or Alzheimers in addition to the fracture? Annoying, but really not significantly more so with or without a broken wrist. What about a broken left wrist? That would be bad. Losing the function of both hands is much, much worse than losing the function of only one hand. This is the kind of functional ordering that genetic interactions achieve: if two genes are active in the same system (like the right wrist and index finger) they will not display genetic interaction. The pathway is blocked and it matters little whether it is blocked in one or two points. If two genes work in completely different systems, they will also not show genetic interactions (like a fracture, combined with indigestion). Only if two genes affect parallel, mutually redundant pathways (left and right arm) will their joint deletion cause a critical situation for the affected organism. If the organism dies, we call this a synthetic lethal effect.


 


Task:

  • Read
Chatr-Aryamontri et al. (2017) The BioGRID interaction database: 2017 update. Nucleic Acids Res 45:D369-D379. (pmid: 27980099)

PubMed ] [ DOI ] The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical-protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases.


 


Data Sources

Interaction databases have similar problems as sequence databases: the need for standards for abstracting biological concepts into computable objects, data integrity, search and retrieval, and the metrics of comparison. There is however an added complication: interactions are rarely all-or-none, and the high-throughput experimental methods have large false-positive and false-negative rates. This makes it necessary to define confidence scores for interactions. On top of experimental methods, there are also a variety of methods for computational interaction prediction. However, even though the "gold standard" are careful, small-scale laboratory experiments, different curated efforts on the same experimental publication usually lead to different results - with as little as 42% overlap between databases being reported.

Currently, likely the best integrated protein-protein interaction database is IntAct, at the EBI, which besides curating interactions from the literature hosts interactions from the IMEx consortium, an extensive data-sharing agreement between a number of general and specialized source databases.


 

Task:

  • Access IntAct and enter the UniProt ID for yeast Mbp1 P39678.
  • The EBI search directly returns a table of pairwise interactions; both partners are listed as a pair and, in each pair one of the partners should be YDL056W (the systematic name of yeast Mbp1).
  • How many different physical interaction detection methods do the IntAct records list? Follow the links and read their definitions. (Bravo to the IntAct developers, for defining their terms. In a better world, all the semantics of our databases should be similarly defined to be meaningful.)
  • Click on the "Graph" tab to load a network graph.
  • Switch "Merge edges" off to show the reported edges for this interaction individually. Which protein pair has the most interactions? Does this make sense?

But then what?

If you are like me, you would now like to be able to link expression profiles, information about known complexes, GO annotations, knock-out phenotypes etc. etc. Too bad.


 


Next, we explore the BioGRID interaction database. BioGrid stores physical and genetic interactions.

 

Task:

  • Access the the BioGRID database at the Samuel-Lunenfeld Research Institute, Mount Sinai Hospital, Toronto. Search for interactions of the Mbp1 gene by entering the gene name into the form field.
  • Follow the correct link in BioGrid for saccharomyces cerevisiae Mbp1 (YDL056W). All genes listed in that table have demonstrated interactions with Mbp1.
  • List what general experimental type(s) the BioGrid interactors come from. (In particular note the difference between yellow and green boxes).

You will note that some, but not all physical interactions listed by BioGRID and IntAct are the same according to a restrictive interpretation: same organism, same proteins, same experiment, same publication.

  • Which of the IntAct Mbp1 interactions are the same in BioGrid?
  • Check whether all of the interactions between the regulators of the G1/S phase as per the digram in the "Systems Concepts" PDF are present in BioGRID interactions.


 

Now, what about MYSPE? Could you infer interactions between proteins whose orthologs interact in another species? Such predictions are called interologs (interacting homologs). Unfortunately, that does not appear to be the case. Confident prediction of interologs can only be achieved in cases of >80% joint sequence identity of both pairs[1], a level of similarity that (I believe) none of our Mbp1 proteins achieves. Does this mean the pathways and interactions are not conserved? Certainly not. We expect a very high degree of conservation of the system's function, but we can't say for sure whether any two specific proteins interact in a different species the same way they interact in yeast. All we can do is to use annotation transfer for hypothesis generation. But that is a useful starting point.


 


 


Further reading, links and resources

Lewis et al. (2012) What evidence is there for the homology of protein-protein interactions?. PLoS Comput Biol 8:e1002645. (pmid: 23028270)

PubMed ] [ DOI ] The notion that sequence homology implies functional similarity underlies much of computational biology. In the case of protein-protein interactions, an interaction can be inferred between two proteins on the basis that sequence-similar proteins have been observed to interact. The use of transferred interactions is common, but the legitimacy of such inferred interactions is not clear. Here we investigate transferred interactions and whether data incompleteness explains the lack of evidence found for them. Using definitions of homology associated with functional annotation transfer, we estimate that conservation rates of interactions are low even after taking interactome incompleteness into account. For example, at a blastp E-value threshold of 10(-70), we estimate the conservation rate to be about 11 % between S. cerevisiae and H. sapiens. Our method also produces estimates of interactome sizes (which are similar to those previously proposed). Using our estimates of interaction conservation we estimate the rate at which protein-protein interactions are lost across species. To our knowledge, this is the first such study based on large-scale data. Previous work has suggested that interactions transferred within species are more reliable than interactions transferred across species. By controlling for factors that are specific to within-species interaction prediction, we propose that the transfer of interactions within species might be less reliable than transfers between species. Protein-protein interactions appear to be very rarely conserved unless very high sequence similarity is observed. Consequently, inferred interactions should be used with care.

Garcia-Garcia et al. (2012) BIPS: BIANA Interolog Prediction Server. A tool for protein-protein interaction inference. Nucleic Acids Res 40:W147-51. (pmid: 22689642)

PubMed ] [ DOI ] Protein-protein interactions (PPIs) play a crucial role in biology, and high-throughput experiments have greatly increased the coverage of known interactions. Still, identification of complete inter- and intraspecies interactomes is far from being complete. Experimental data can be complemented by the prediction of PPIs within an organism or between two organisms based on the known interactions of the orthologous genes of other organisms (interologs). Here, we present the BIANA (Biologic Interactions and Network Analysis) Interolog Prediction Server (BIPS), which offers a web-based interface to facilitate PPI predictions based on interolog information. BIPS benefits from the capabilities of the framework BIANA to integrate the several PPI-related databases. Additional metadata can be used to improve the reliability of the predicted interactions. Sensitivity and specificity of the server have been calculated using known PPIs from different interactomes using a leave-one-out approach. The specificity is between 72 and 98%, whereas sensitivity varies between 1 and 59%, depending on the sequence identity cut-off used to calculate similarities between sequences. BIPS is freely accessible at http://sbi.imim.es/BIPS.php.

Keskin et al. (2016) Predicting Protein-Protein Interactions from the Molecular to the Proteome Level. Chem Rev 116:4884-909. (pmid: 27074302)

PubMed ] [ DOI ] Identification of protein-protein interactions (PPIs) is at the center of molecular biology considering the unquestionable role of proteins in cells. Combinatorial interactions result in a repertoire of multiple functions; hence, knowledge of PPI and binding regions naturally serve to functional proteomics and drug discovery. Given experimental limitations to find all interactions in a proteome, computational prediction/modeling of protein interactions is a prerequisite to proceed on the way to complete interactions at the proteome level. This review aims to provide a background on PPIs and their types. Computational methods for PPI predictions can use a variety of biological data including sequence-, evolution-, expression-, and structure-based data. Physical and statistical modeling are commonly used to integrate these data and infer PPI predictions. We review and list the state-of-the-art methods, servers, databases, and tools for protein-protein interaction prediction.


 


Notes

  1. Mika & Rost (2006) Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2:e79. (pmid: 16854211)

    PubMed ] [ DOI ] Experimental high-throughput studies of protein-protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein-protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein-protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein-protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-09-11

Version:

1.0

Version history:

  • 1.0 First live
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.